All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] 32/64 bit llseek hashes
@ 2011-07-27 11:02 Bernd Schubert
       [not found] ` <20110727110148.204979.49551.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  2011-07-27 11:02 ` [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH Bernd Schubert
  0 siblings, 2 replies; 13+ messages in thread
From: Bernd Schubert @ 2011-07-27 11:02 UTC (permalink / raw)
  To: linux-nfs, linux-ext4; +Cc: linux-fsdevel, yong.fan, adilger, tytso

With the ext3/ext4 directory index implementation hashes are used to specify
offsets for llseek(). For compatibility with NFSv2 and 32-bit user space
on 64-bit systems (kernel space) ext3/ext4 currently only return 32-bit 
hashes and therefore the probability of hash collisions for larger directories
is rather high. As recently reported on the NFS mailing list that theoretical
problem also happens on real systems:
http://comments.gmane.org/gmane.linux.nfs/40863

The following series adds two new open flags to tell ext4
to to 32-bit or 64-bit hash values for llseek() calls.
These flags are then used by NFS to use 32-bit (NFSv2) or 64-bit
offsets (hashes in case of ext3/ext4) for readdir and seekdir.
User space does not need to specify these flags, but usually the check
for is_32bit_api() should be sufficient.

---

Bernd Schubert (2):
      Remove check for a 32-bit cookie in nfsd4_readdir()
      nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH

Fan Yong (1):
      Return 32/64-bit dir name hash according to usage type


 fs/ext4/dir.c               |  160 ++++++++++++++++++++++++++++++++++---------
 fs/fcntl.c                  |    5 +
 fs/nfsd/nfs4proc.c          |    2 -
 fs/nfsd/vfs.c               |    6 ++
 include/asm-generic/fcntl.h |    9 ++
 5 files changed, 145 insertions(+), 37 deletions(-)

-- 
Bernd Schubert

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] Return 32/64-bit dir name hash according to usage type
  2011-07-27 11:02 [PATCH 0/3] 32/64 bit llseek hashes Bernd Schubert
@ 2011-07-27 11:02     ` Bernd Schubert
  2011-07-27 11:02 ` [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH Bernd Schubert
  1 sibling, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2011-07-27 11:02 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-ext4-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	yong.fan-KloliPT79xf2eFz/2MeuCQ, adilger-KloliPT79xf2eFz/2MeuCQ,
	tytso-3s7WtUTddSA

From: Fan Yong <yong.fan-KloliPT79xf2eFz/2MeuCQ@public.gmane.org>

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir().  However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.  This still needs
integration on the NFS side.

Signed-off-by: Fan Yong <yong.fan-KloliPT79xf2eFz/2MeuCQ@public.gmane.org>
Signed-off-by: Andreas Dilger <adilger-KloliPT79xf2eFz/2MeuCQ@public.gmane.org>
Signed-off-by: Bernd Schubert <bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
---
 fs/ext4/dir.c               |  160 ++++++++++++++++++++++++++++++++++---------
 fs/fcntl.c                  |    5 +
 include/asm-generic/fcntl.h |    9 ++
 3 files changed, 138 insertions(+), 36 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 164c560..8258240 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -32,24 +32,8 @@ static unsigned char ext4_filetype_table[] = {
 	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
 };
 
-static int ext4_readdir(struct file *, void *, filldir_t);
 static int ext4_dx_readdir(struct file *filp,
 			   void *dirent, filldir_t filldir);
-static int ext4_release_dir(struct inode *inode,
-				struct file *filp);
-
-const struct file_operations ext4_dir_operations = {
-	.llseek		= ext4_llseek,
-	.read		= generic_read_dir,
-	.readdir	= ext4_readdir,		/* we take BKL. needed?*/
-	.unlocked_ioctl = ext4_ioctl,
-#ifdef CONFIG_COMPAT
-	.compat_ioctl	= ext4_compat_ioctl,
-#endif
-	.fsync		= ext4_sync_file,
-	.release	= ext4_release_dir,
-};
-
 
 static unsigned char get_dtype(struct super_block *sb, int filetype)
 {
@@ -254,22 +238,117 @@ out:
 	return ret;
 }
 
+static inline int is_32bit_api(void)
+{
+#ifdef HAVE_IS_COMPAT_TASK
+	return is_compat_task();
+#else
+	return (BITS_PER_LONG == 32);
+#endif
+}
+
 /*
  * These functions convert from the major/minor hash to an f_pos
  * value.
  *
- * Currently we only use major hash numer.  This is unfortunate, but
- * on 32-bit machines, the same VFS interface is used for lseek and
- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
- * lseek/telldir/seekdir will blow out spectacularly, and from within
- * the ext2 low-level routine, we don't know if we're being called by
- * a 64-bit version of the system call or the 32-bit version of the
- * system call.  Worse yet, NFSv2 only allows for a 32-bit readdir
- * cookie.  Sigh.
+ * Upper layer should specify O_32BITHASH or O_64BITHASH explicitly.
+ * On the other hand, we allow ext4 to be mounted directly on both 32-bit
+ * and 64-bit nodes, under such case, neither O_32BITHASH nor O_64BITHASH
+ * is specified.
  */
-#define hash2pos(major, minor)	(major >> 1)
-#define pos2maj_hash(pos)	((pos << 1) & 0xffffffff)
-#define pos2min_hash(pos)	(0)
+static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
+{
+	if ((filp->f_flags & O_32BITHASH) ||
+	    (!(filp->f_flags & O_64BITHASH) && is_32bit_api()))
+		return major >> 1;
+	else
+		return ((__u64)(major >> 1) << 32) | (__u64)minor;
+}
+
+static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
+{
+	if ((filp->f_flags & O_32BITHASH) ||
+	    (!(filp->f_flags & O_64BITHASH) && is_32bit_api()))
+		return (pos << 1) & 0xffffffff;
+	else
+		return ((pos >> 32) << 1) & 0xffffffff;
+}
+
+static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
+{
+	if ((filp->f_flags & O_32BITHASH) ||
+	    (!(filp->f_flags & O_64BITHASH) && is_32bit_api()))
+		return 0;
+	else
+		return pos & 0xffffffff;
+}
+
+/*
+ * ext4_dir_llseek() based on generic_file_llseek() to handle both
+ * non-htree and htree directories, where the "offset" is in terms
+ * of the filename hash value instead of the byte offset.
+ *
+ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
+ *       will be invalid once the directory was converted into a dx directory
+ */
+loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+	struct inode *inode = file->f_mapping->host;
+	loff_t ret = -EINVAL;
+	int is_dx_dir = ext4_test_inode_flag(inode, EXT4_INODE_INDEX);
+
+	mutex_lock(&inode->i_mutex);
+
+	switch (origin) {
+	case SEEK_END:
+		if (unlikely(offset > 0))
+			goto out_err; /* not supported for directories */
+		if (is_dx_dir)
+			offset = EXT4_HTREE_EOF;
+		else
+			offset = inode->i_size;
+		break;
+	case SEEK_CUR:
+		/*
+		 * Here we special-case the lseek(fd, 0, SEEK_CUR)
+		 * position-querying operation.  Avoid rewriting the "same"
+		 * f_pos value back to the file because a concurrent read(),
+		 * write() or lseek() might have altered it
+		 */
+		if (offset == 0) {
+			offset = file->f_pos;
+			goto out_ok;
+		}
+
+		/* NOTE: relative offsets with dx directories might not work
+		 *       as expected, as it is difficult to figure out the
+		 *       correct offset between dx hashes */
+		offset += file->f_pos;
+		break;
+	}
+
+	if (unlikely(offset < 0))
+		goto out_err;
+
+	if (!is_dx_dir) {
+		if (offset > inode->i_sb->s_maxbytes)
+			goto out_err;
+	} else if (offset > EXT4_HTREE_EOF)
+		goto out_err;
+
+	/* Special lock needed here? */
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+
+out_ok:
+	ret = offset;
+out_err:
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
 
 /*
  * This structure holds the nodes of the red-black tree used to store
@@ -330,15 +409,16 @@ static void free_rb_tree_fname(struct rb_root *root)
 }
 
 
-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
+static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
+							   loff_t pos)
 {
 	struct dir_private_info *p;
 
 	p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
 	if (!p)
 		return NULL;
-	p->curr_hash = pos2maj_hash(pos);
-	p->curr_minor_hash = pos2min_hash(pos);
+	p->curr_hash = pos2maj_hash(filp, pos);
+	p->curr_minor_hash = pos2min_hash(filp, pos);
 	return p;
 }
 
@@ -429,7 +509,7 @@ static int call_filldir(struct file *filp, void *dirent,
 		       "null fname?!?\n");
 		return 0;
 	}
-	curr_pos = hash2pos(fname->hash, fname->minor_hash);
+	curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
 	while (fname) {
 		error = filldir(dirent, fname->name,
 				fname->name_len, curr_pos,
@@ -454,7 +534,7 @@ static int ext4_dx_readdir(struct file *filp,
 	int	ret;
 
 	if (!info) {
-		info = ext4_htree_create_dir_info(filp->f_pos);
+		info = ext4_htree_create_dir_info(filp, filp->f_pos);
 		if (!info)
 			return -ENOMEM;
 		filp->private_data = info;
@@ -468,8 +548,8 @@ static int ext4_dx_readdir(struct file *filp,
 		free_rb_tree_fname(&info->root);
 		info->curr_node = NULL;
 		info->extra_fname = NULL;
-		info->curr_hash = pos2maj_hash(filp->f_pos);
-		info->curr_minor_hash = pos2min_hash(filp->f_pos);
+		info->curr_hash = pos2maj_hash(filp, filp->f_pos);
+		info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
 	}
 
 	/*
@@ -540,3 +620,15 @@ static int ext4_release_dir(struct inode *inode, struct file *filp)
 
 	return 0;
 }
+
+const struct file_operations ext4_dir_operations = {
+	.llseek		= ext4_dir_llseek,
+	.read		= generic_read_dir,
+	.readdir	= ext4_readdir,
+	.unlocked_ioctl = ext4_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= ext4_compat_ioctl,
+#endif
+	.fsync		= ext4_sync_file,
+	.release	= ext4_release_dir,
+};
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 22764c7..cb6ef40 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -835,14 +835,15 @@ static int __init fcntl_init(void)
 	 * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
 	 * is defined as O_NONBLOCK on some platforms and not on others.
 	 */
-	BUILD_BUG_ON(19 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
+	BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
 		O_RDONLY	| O_WRONLY	| O_RDWR	|
 		O_CREAT		| O_EXCL	| O_NOCTTY	|
 		O_TRUNC		| O_APPEND	| /* O_NONBLOCK	| */
 		__O_SYNC	| O_DSYNC	| FASYNC	|
 		O_DIRECT	| O_LARGEFILE	| O_DIRECTORY	|
 		O_NOFOLLOW	| O_NOATIME	| O_CLOEXEC	|
-		__FMODE_EXEC	| O_PATH
+		__FMODE_EXEC	| O_PATH        | O_32BITHASH   |
+		O_64BITHASH
 		));
 
 	fasync_cache = kmem_cache_create("fasync_cache",
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index 84793c7..a40e1d5 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -84,10 +84,19 @@
 #define O_PATH		010000000
 #endif
 
+
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
 
+#ifndef O_32BITHASH
+#define O_32BITHASH	020000000	/* use 64 bit hash from dir llseek() */
+#endif
+#ifndef O_64BITHASH
+#define O_64BITHASH	040000000	/* use 32 bit hash from dir llseek() */
+#endif
+
+
 #define F_DUPFD		0	/* dup */
 #define F_GETFD		1	/* get close_on_exec */
 #define F_SETFD		2	/* set/clear close_on_exec */

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/3] Return 32/64-bit dir name hash according to usage type
@ 2011-07-27 11:02     ` Bernd Schubert
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2011-07-27 11:02 UTC (permalink / raw)
  To: linux-nfs, linux-ext4; +Cc: linux-fsdevel, yong.fan, adilger, tytso

From: Fan Yong <yong.fan@whamcloud.com>

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir().  However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.  This still needs
integration on the NFS side.

Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/ext4/dir.c               |  160 ++++++++++++++++++++++++++++++++++---------
 fs/fcntl.c                  |    5 +
 include/asm-generic/fcntl.h |    9 ++
 3 files changed, 138 insertions(+), 36 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 164c560..8258240 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -32,24 +32,8 @@ static unsigned char ext4_filetype_table[] = {
 	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
 };
 
-static int ext4_readdir(struct file *, void *, filldir_t);
 static int ext4_dx_readdir(struct file *filp,
 			   void *dirent, filldir_t filldir);
-static int ext4_release_dir(struct inode *inode,
-				struct file *filp);
-
-const struct file_operations ext4_dir_operations = {
-	.llseek		= ext4_llseek,
-	.read		= generic_read_dir,
-	.readdir	= ext4_readdir,		/* we take BKL. needed?*/
-	.unlocked_ioctl = ext4_ioctl,
-#ifdef CONFIG_COMPAT
-	.compat_ioctl	= ext4_compat_ioctl,
-#endif
-	.fsync		= ext4_sync_file,
-	.release	= ext4_release_dir,
-};
-
 
 static unsigned char get_dtype(struct super_block *sb, int filetype)
 {
@@ -254,22 +238,117 @@ out:
 	return ret;
 }
 
+static inline int is_32bit_api(void)
+{
+#ifdef HAVE_IS_COMPAT_TASK
+	return is_compat_task();
+#else
+	return (BITS_PER_LONG == 32);
+#endif
+}
+
 /*
  * These functions convert from the major/minor hash to an f_pos
  * value.
  *
- * Currently we only use major hash numer.  This is unfortunate, but
- * on 32-bit machines, the same VFS interface is used for lseek and
- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
- * lseek/telldir/seekdir will blow out spectacularly, and from within
- * the ext2 low-level routine, we don't know if we're being called by
- * a 64-bit version of the system call or the 32-bit version of the
- * system call.  Worse yet, NFSv2 only allows for a 32-bit readdir
- * cookie.  Sigh.
+ * Upper layer should specify O_32BITHASH or O_64BITHASH explicitly.
+ * On the other hand, we allow ext4 to be mounted directly on both 32-bit
+ * and 64-bit nodes, under such case, neither O_32BITHASH nor O_64BITHASH
+ * is specified.
  */
-#define hash2pos(major, minor)	(major >> 1)
-#define pos2maj_hash(pos)	((pos << 1) & 0xffffffff)
-#define pos2min_hash(pos)	(0)
+static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
+{
+	if ((filp->f_flags & O_32BITHASH) ||
+	    (!(filp->f_flags & O_64BITHASH) && is_32bit_api()))
+		return major >> 1;
+	else
+		return ((__u64)(major >> 1) << 32) | (__u64)minor;
+}
+
+static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
+{
+	if ((filp->f_flags & O_32BITHASH) ||
+	    (!(filp->f_flags & O_64BITHASH) && is_32bit_api()))
+		return (pos << 1) & 0xffffffff;
+	else
+		return ((pos >> 32) << 1) & 0xffffffff;
+}
+
+static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
+{
+	if ((filp->f_flags & O_32BITHASH) ||
+	    (!(filp->f_flags & O_64BITHASH) && is_32bit_api()))
+		return 0;
+	else
+		return pos & 0xffffffff;
+}
+
+/*
+ * ext4_dir_llseek() based on generic_file_llseek() to handle both
+ * non-htree and htree directories, where the "offset" is in terms
+ * of the filename hash value instead of the byte offset.
+ *
+ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
+ *       will be invalid once the directory was converted into a dx directory
+ */
+loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+	struct inode *inode = file->f_mapping->host;
+	loff_t ret = -EINVAL;
+	int is_dx_dir = ext4_test_inode_flag(inode, EXT4_INODE_INDEX);
+
+	mutex_lock(&inode->i_mutex);
+
+	switch (origin) {
+	case SEEK_END:
+		if (unlikely(offset > 0))
+			goto out_err; /* not supported for directories */
+		if (is_dx_dir)
+			offset = EXT4_HTREE_EOF;
+		else
+			offset = inode->i_size;
+		break;
+	case SEEK_CUR:
+		/*
+		 * Here we special-case the lseek(fd, 0, SEEK_CUR)
+		 * position-querying operation.  Avoid rewriting the "same"
+		 * f_pos value back to the file because a concurrent read(),
+		 * write() or lseek() might have altered it
+		 */
+		if (offset == 0) {
+			offset = file->f_pos;
+			goto out_ok;
+		}
+
+		/* NOTE: relative offsets with dx directories might not work
+		 *       as expected, as it is difficult to figure out the
+		 *       correct offset between dx hashes */
+		offset += file->f_pos;
+		break;
+	}
+
+	if (unlikely(offset < 0))
+		goto out_err;
+
+	if (!is_dx_dir) {
+		if (offset > inode->i_sb->s_maxbytes)
+			goto out_err;
+	} else if (offset > EXT4_HTREE_EOF)
+		goto out_err;
+
+	/* Special lock needed here? */
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+
+out_ok:
+	ret = offset;
+out_err:
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
 
 /*
  * This structure holds the nodes of the red-black tree used to store
@@ -330,15 +409,16 @@ static void free_rb_tree_fname(struct rb_root *root)
 }
 
 
-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
+static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
+							   loff_t pos)
 {
 	struct dir_private_info *p;
 
 	p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
 	if (!p)
 		return NULL;
-	p->curr_hash = pos2maj_hash(pos);
-	p->curr_minor_hash = pos2min_hash(pos);
+	p->curr_hash = pos2maj_hash(filp, pos);
+	p->curr_minor_hash = pos2min_hash(filp, pos);
 	return p;
 }
 
@@ -429,7 +509,7 @@ static int call_filldir(struct file *filp, void *dirent,
 		       "null fname?!?\n");
 		return 0;
 	}
-	curr_pos = hash2pos(fname->hash, fname->minor_hash);
+	curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
 	while (fname) {
 		error = filldir(dirent, fname->name,
 				fname->name_len, curr_pos,
@@ -454,7 +534,7 @@ static int ext4_dx_readdir(struct file *filp,
 	int	ret;
 
 	if (!info) {
-		info = ext4_htree_create_dir_info(filp->f_pos);
+		info = ext4_htree_create_dir_info(filp, filp->f_pos);
 		if (!info)
 			return -ENOMEM;
 		filp->private_data = info;
@@ -468,8 +548,8 @@ static int ext4_dx_readdir(struct file *filp,
 		free_rb_tree_fname(&info->root);
 		info->curr_node = NULL;
 		info->extra_fname = NULL;
-		info->curr_hash = pos2maj_hash(filp->f_pos);
-		info->curr_minor_hash = pos2min_hash(filp->f_pos);
+		info->curr_hash = pos2maj_hash(filp, filp->f_pos);
+		info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
 	}
 
 	/*
@@ -540,3 +620,15 @@ static int ext4_release_dir(struct inode *inode, struct file *filp)
 
 	return 0;
 }
+
+const struct file_operations ext4_dir_operations = {
+	.llseek		= ext4_dir_llseek,
+	.read		= generic_read_dir,
+	.readdir	= ext4_readdir,
+	.unlocked_ioctl = ext4_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= ext4_compat_ioctl,
+#endif
+	.fsync		= ext4_sync_file,
+	.release	= ext4_release_dir,
+};
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 22764c7..cb6ef40 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -835,14 +835,15 @@ static int __init fcntl_init(void)
 	 * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
 	 * is defined as O_NONBLOCK on some platforms and not on others.
 	 */
-	BUILD_BUG_ON(19 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
+	BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
 		O_RDONLY	| O_WRONLY	| O_RDWR	|
 		O_CREAT		| O_EXCL	| O_NOCTTY	|
 		O_TRUNC		| O_APPEND	| /* O_NONBLOCK	| */
 		__O_SYNC	| O_DSYNC	| FASYNC	|
 		O_DIRECT	| O_LARGEFILE	| O_DIRECTORY	|
 		O_NOFOLLOW	| O_NOATIME	| O_CLOEXEC	|
-		__FMODE_EXEC	| O_PATH
+		__FMODE_EXEC	| O_PATH        | O_32BITHASH   |
+		O_64BITHASH
 		));
 
 	fasync_cache = kmem_cache_create("fasync_cache",
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index 84793c7..a40e1d5 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -84,10 +84,19 @@
 #define O_PATH		010000000
 #endif
 
+
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
 
+#ifndef O_32BITHASH
+#define O_32BITHASH	020000000	/* use 64 bit hash from dir llseek() */
+#endif
+#ifndef O_64BITHASH
+#define O_64BITHASH	040000000	/* use 32 bit hash from dir llseek() */
+#endif
+
+
 #define F_DUPFD		0	/* dup */
 #define F_GETFD		1	/* get close_on_exec */
 #define F_SETFD		2	/* set/clear close_on_exec */


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] Remove check for a 32-bit cookie in nfsd4_readdir()
  2011-07-27 11:02 [PATCH 0/3] 32/64 bit llseek hashes Bernd Schubert
@ 2011-07-27 11:02     ` Bernd Schubert
  2011-07-27 11:02 ` [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH Bernd Schubert
  1 sibling, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2011-07-27 11:02 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-ext4-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	yong.fan-KloliPT79xf2eFz/2MeuCQ, adilger-KloliPT79xf2eFz/2MeuCQ,
	tytso-3s7WtUTddSA

Fan Yong <yong.fan-KloliPT79xf2eFz/2MeuCQ@public.gmane.org>  noticed simply setting
O_64BITHASH wouldn't work with nfsd v4, as
nfsd4_readdir() checks for a 32 bit cookie. However, according to RFC 3530
cookies have a 64 bit type and it is also defined as u64
'struct nfsd4_readdir'.
So remove the test for >32-bit values.

Signed-off-by: Bernd Schubert <bernd.schubert-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
---
 fs/nfsd/nfs4proc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3a6dbd7..f7799d3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -682,7 +682,7 @@ nfsd4_readdir(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	readdir->rd_bmval[1] &= nfsd_suppattrs1(cstate->minorversion);
 	readdir->rd_bmval[2] &= nfsd_suppattrs2(cstate->minorversion);
 
-	if ((cookie > ~(u32)0) || (cookie == 1) || (cookie == 2) ||
+	if ((cookie == 1) || (cookie == 2) ||
 	    (cookie == 0 && memcmp(readdir->rd_verf.data, zeroverf.data, NFS4_VERIFIER_SIZE)))
 		return nfserr_bad_cookie;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] Remove check for a 32-bit cookie in nfsd4_readdir()
@ 2011-07-27 11:02     ` Bernd Schubert
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2011-07-27 11:02 UTC (permalink / raw)
  To: linux-nfs, linux-ext4; +Cc: linux-fsdevel, yong.fan, adilger, tytso

Fan Yong <yong.fan@whamcloud.com>  noticed simply setting
O_64BITHASH wouldn't work with nfsd v4, as
nfsd4_readdir() checks for a 32 bit cookie. However, according to RFC 3530
cookies have a 64 bit type and it is also defined as u64
'struct nfsd4_readdir'.
So remove the test for >32-bit values.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/nfsd/nfs4proc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3a6dbd7..f7799d3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -682,7 +682,7 @@ nfsd4_readdir(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	readdir->rd_bmval[1] &= nfsd_suppattrs1(cstate->minorversion);
 	readdir->rd_bmval[2] &= nfsd_suppattrs2(cstate->minorversion);
 
-	if ((cookie > ~(u32)0) || (cookie == 1) || (cookie == 2) ||
+	if ((cookie == 1) || (cookie == 2) ||
 	    (cookie == 0 && memcmp(readdir->rd_verf.data, zeroverf.data, NFS4_VERIFIER_SIZE)))
 		return nfserr_bad_cookie;
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH
  2011-07-27 11:02 [PATCH 0/3] 32/64 bit llseek hashes Bernd Schubert
       [not found] ` <20110727110148.204979.49551.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2011-07-27 11:02 ` Bernd Schubert
       [not found]   ` <20110727110259.204979.56782.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: Bernd Schubert @ 2011-07-27 11:02 UTC (permalink / raw)
  To: linux-nfs, linux-ext4; +Cc: linux-fsdevel, yong.fan, adilger, tytso

Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/nfsd/vfs.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fd0acca..d79bbcd 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1994,6 +1994,12 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
 	if (err)
 		goto out;
 
+	/* NFSv2 only supports 32 bit cookies */
+	if (rqstp->rq_vers > 2)
+		file->f_flags &= O_64BITHASH;
+	else
+		file->f_flags &= O_32BITHASH;
+
 	offset = vfs_llseek(file, offset, 0);
 	if (offset < 0) {
 		err = nfserrno((int)offset);


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] Return 32/64-bit dir name hash according to usage type
  2011-07-27 11:02     ` Bernd Schubert
@ 2011-07-27 21:02         ` Christoph Hellwig
  -1 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2011-07-27 21:02 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	yong.fan-KloliPT79xf2eFz/2MeuCQ, adilger-KloliPT79xf2eFz/2MeuCQ,
	tytso-3s7WtUTddSA

Please don't mix fs-specific and generic changes in a single patch.

Also adding an O_ constant for a kernel-internal flag is wrong,
use the FMODE_ namespace for it.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] Return 32/64-bit dir name hash according to usage type
@ 2011-07-27 21:02         ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2011-07-27 21:02 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-nfs, linux-ext4, linux-fsdevel, yong.fan, adilger, tytso

Please don't mix fs-specific and generic changes in a single patch.

Also adding an O_ constant for a kernel-internal flag is wrong,
use the FMODE_ namespace for it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH
  2011-07-27 11:02 ` [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH Bernd Schubert
@ 2011-07-27 21:03       ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2011-07-27 21:03 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	yong.fan-KloliPT79xf2eFz/2MeuCQ, adilger-KloliPT79xf2eFz/2MeuCQ,
	tytso-3s7WtUTddSA

On Wed, Jul 27, 2011 at 01:02:59PM +0200, Bernd Schubert wrote:
> Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
> the NFS version. NFSv2 gets 32-bit hashes only.

Independent of the O_ vs FMODE thing make sure you pass the correct
flag at open time, instead of racy runtime modifications.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH
@ 2011-07-27 21:03       ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2011-07-27 21:03 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-nfs, linux-ext4, linux-fsdevel, yong.fan, adilger, tytso

On Wed, Jul 27, 2011 at 01:02:59PM +0200, Bernd Schubert wrote:
> Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
> the NFS version. NFSv2 gets 32-bit hashes only.

Independent of the O_ vs FMODE thing make sure you pass the correct
flag at open time, instead of racy runtime modifications.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH
  2011-07-27 21:03       ` Christoph Hellwig
  (?)
@ 2011-07-28  9:19       ` Bernd Schubert
  2011-07-28 11:46         ` Christoph Hellwig
  -1 siblings, 1 reply; 13+ messages in thread
From: Bernd Schubert @ 2011-07-28  9:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nfs, linux-ext4, linux-fsdevel, yong.fan, adilger, tytso

On 07/27/2011 11:03 PM, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 01:02:59PM +0200, Bernd Schubert wrote:
>> Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
>> the NFS version. NFSv2 gets 32-bit hashes only.
>
> Independent of the O_ vs FMODE thing make sure you pass the correct
> flag at open time, instead of racy runtime modifications.
>

Christoph,

before I'm going to work further on the patch sets, I have few questions 
first. Could you please help me with that?

file->f_mode is set in __dentry_open() based on O_ flags.

So if f_mode is supposed to be set directly during the NFS open call we 
would need O_ *and* FMODE flags,

Now I still do not understand why we cannot add a flag *after* the open 
call and only in nfsd_readdir()? I do not see any races there.

nfsd_readdir() gets its 'struct file' from get_empty_filp() and 
__dentry_open(). Now as 'struct file' is allocated by get_empty_filp() 
it also cannot be used by any other thread.

nfsd_readdir() just reads the directory and closes it immedeatily after 
the readdir().

So where is there supposed to be a race?


And lastly, if we are going to set f_mode directly at open time, we have 
the choice to

1) Specify those new O_ flags for all files. While setting a flag is a 
cheap operation, it still wastes CPU cycles for file opens, as files do 
not need that flag.

2) Duplicate nfsd_open() to implement a new function for directories 
only. I think not a good idea either.

3) Rewrite the nfsd_open() function to allow to set flags from calling 
functions. That would mean to update the nfsd code at at least 8 places. 
Do we really want to go that way?


So altogether, updating the patches to replace O_ by FMODE flags is 
easy, but setting that flag in nfsd at open time, would mean a large 
overhead.


Thanks,
Bernd


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH
  2011-07-28  9:19       ` Bernd Schubert
@ 2011-07-28 11:46         ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2011-07-28 11:46 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Christoph Hellwig, linux-nfs, linux-ext4, linux-fsdevel,
	yong.fan, adilger, tytso

On Thu, Jul 28, 2011 at 11:19:07AM +0200, Bernd Schubert wrote:
> before I'm going to work further on the patch sets, I have few
> questions first. Could you please help me with that?
> 
> file->f_mode is set in __dentry_open() based on O_ flags.

Or right after dentry_open, before anyone can possibly grab a reference
to the file.  Just do that in nfsd_open based on an NFSD_MAY_ flag.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] Return 32/64-bit dir name hash according to usage type
  2011-07-27 11:02     ` Bernd Schubert
  (?)
  (?)
@ 2011-07-28 20:04     ` Bernd Schubert
  -1 siblings, 0 replies; 13+ messages in thread
From: Bernd Schubert @ 2011-07-28 20:04 UTC (permalink / raw)
  To: linux-nfs, linux-ext4; +Cc: linux-fsdevel, yong.fan, adilger, tytso

Ooops, something important slipped through the tests and the current 
patch doesn't work at all. Please disregard it for now. Updated patches 
probably tomorrow.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-07-28 20:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-27 11:02 [PATCH 0/3] 32/64 bit llseek hashes Bernd Schubert
     [not found] ` <20110727110148.204979.49551.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2011-07-27 11:02   ` [PATCH 1/3] Return 32/64-bit dir name hash according to usage type Bernd Schubert
2011-07-27 11:02     ` Bernd Schubert
     [not found]     ` <20110727110249.204979.58769.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2011-07-27 21:02       ` Christoph Hellwig
2011-07-27 21:02         ` Christoph Hellwig
2011-07-28 20:04     ` Bernd Schubert
2011-07-27 11:02   ` [PATCH 2/3] Remove check for a 32-bit cookie in nfsd4_readdir() Bernd Schubert
2011-07-27 11:02     ` Bernd Schubert
2011-07-27 11:02 ` [PATCH 3/3] nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH Bernd Schubert
     [not found]   ` <20110727110259.204979.56782.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2011-07-27 21:03     ` Christoph Hellwig
2011-07-27 21:03       ` Christoph Hellwig
2011-07-28  9:19       ` Bernd Schubert
2011-07-28 11:46         ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.