All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] vfs: get_next_ino(), never inum=0 and uniqueness
@ 2014-05-21 18:48 hooanon05g
  2014-05-21 18:48 ` [RFC 1/3] vfs: get_next_ino(), never inum=0 hooanon05g
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: hooanon05g @ 2014-05-21 18:48 UTC (permalink / raw)
  To: linux-fsdevel, adilger, hch, dchinner, viro; +Cc: jlbec, gregkh, hughd

From: "J. R. Okajima" <hooanon05g@gmail.com>

In addition to "inum=0" patch, I added another one to keep the
uniqueness of inum. The type is still "unisigned int."

J. R. Okajima (3):
  vfs: get_next_ino(), never inum=0
  vfs: get_next_ino(), support for the uniqueness
  uniqueness of inode number, configfs, debugfs, procfs, ramfs and
    tmpfs

 arch/powerpc/platforms/cell/spufs/inode.c |    2 +-
 arch/s390/hypfs/inode.c                   |    2 +-
 drivers/infiniband/hw/ipath/ipath_fs.c    |    2 +-
 drivers/infiniband/hw/qib/qib_fs.c        |    2 +-
 drivers/misc/ibmasm/ibmasmfs.c            |    2 +-
 drivers/oprofile/oprofilefs.c             |    2 +-
 drivers/usb/gadget/f_fs.c                 |    2 +-
 drivers/usb/gadget/inode.c                |    2 +-
 fs/autofs4/inode.c                        |    2 +-
 fs/binfmt_misc.c                          |    2 +-
 fs/configfs/inode.c                       |    2 +-
 fs/debugfs/inode.c                        |    2 +-
 fs/efivarfs/inode.c                       |    2 +-
 fs/freevxfs/vxfs_inode.c                  |    2 +-
 fs/fuse/control.c                         |    2 +-
 fs/hugetlbfs/inode.c                      |    4 +-
 fs/inode.c                                |   59 ++++++++++++++++++++++++++++-
 fs/libfs.c                                |    2 +-
 fs/ocfs2/dlmfs/dlmfs.c                    |    4 +-
 fs/pipe.c                                 |    2 +-
 fs/proc/base.c                            |    2 +-
 fs/proc/proc_sysctl.c                     |    2 +-
 fs/pstore/inode.c                         |    2 +-
 fs/ramfs/inode.c                          |    2 +-
 include/linux/fs.h                        |    2 +-
 ipc/mqueue.c                              |    2 +-
 mm/shmem.c                                |    2 +-
 net/socket.c                              |    2 +-
 net/sunrpc/rpc_pipe.c                     |    2 +-
 security/inode.c                          |    2 +-
 30 files changed, 88 insertions(+), 33 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 1/3] vfs: get_next_ino(), never inum=0
  2014-05-21 18:48 [RFC 0/3] vfs: get_next_ino(), never inum=0 and uniqueness hooanon05g
@ 2014-05-21 18:48 ` hooanon05g
  2014-05-28  4:28   ` Hugh Dickins
  2014-05-21 18:48 ` [RFC 2/3] vfs: get_next_ino(), support for the uniqueness hooanon05g
  2014-05-21 18:49 ` [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs hooanon05g
  2 siblings, 1 reply; 26+ messages in thread
From: hooanon05g @ 2014-05-21 18:48 UTC (permalink / raw)
  To: linux-fsdevel, adilger, hch, dchinner, viro; +Cc: jlbec, gregkh, hughd

From: "J. R. Okajima" <hooanon05g@gmail.com>

It is very rare for get_next_ino() to return zero as a new inode number
since its type is unsigned int, but it can surely happen eventually.

Interestingly, ls(1) and find(1) (actually readdir(3)) don't show a file
whose inum is zero, so people won't be able to find it. This issue may
be harmful especially for tmpfs.

On a very long lived and busy system, users may frequently create files
on tmpfs. And if unluckily he gets inum=0, then he cannot see its
filename. If he remembers its name, he may be able to use or unlink it
by its name since the file surely exists. Otherwise, the file remains on
tmpfs silently. No one can touch it. This behaviour looks like resource
leak.
As a worse case, if a dir gets inum=0 and a user creates several files
under it, then the leaked memory will increase since a user cannot see
the name of all files under the dir whose inum=0, regardless the inum of
the children.

There is another unpleasant effect when get_next_ino() wraps
around. When there is a file whose inum=100 on tmpfs, a new file may get
inum=100, ie. the duplicated inums. I am not sure what will happen when
the duplicated inums exist on tmpfs. If it happens, then some tools
won't work correctly such as backup and exporting via NFS, I am afraid.
Anyway this is not a issue in get_next_ino(). It should be
fixed in mm/shmem.c separatly if it is really necessary.

There are many other get_next_ino() callers other than tmpfs, such as
several drivers, anon_inode, autofs4, freevxfs, procfs, pis, hugetlbfs,
configfs, ramfs, fuse, ocfs2, debugfs, securityfs, cgroup, socket, ipc.
Some of them will not care inum so this issue is harmless for them. But
the others may suffer from inum=0. For example, if procfs gets inum=0
for a task dir (or for one of its children), then several utilities
won't work correctly, including ps(1), lsof(8), etc.

Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 fs/inode.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index f96d2a6..a3e274a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -848,7 +848,11 @@ unsigned int get_next_ino(void)
 	}
 #endif
 
-	*p = ++res;
+	res++;
+	/* never zero */
+	if (unlikely(!res))
+		res++;
+	*p = res;
 	put_cpu_var(last_ino);
 	return res;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 2/3] vfs: get_next_ino(), support for the uniqueness
  2014-05-21 18:48 [RFC 0/3] vfs: get_next_ino(), never inum=0 and uniqueness hooanon05g
  2014-05-21 18:48 ` [RFC 1/3] vfs: get_next_ino(), never inum=0 hooanon05g
@ 2014-05-21 18:48 ` hooanon05g
  2014-05-22 11:56   ` Jan Kara
  2014-05-21 18:49 ` [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs hooanon05g
  2 siblings, 1 reply; 26+ messages in thread
From: hooanon05g @ 2014-05-21 18:48 UTC (permalink / raw)
  To: linux-fsdevel, adilger, hch, dchinner, viro; +Cc: jlbec, gregkh, hughd

From: "J. R. Okajima" <hooanon05g@gmail.com>

Add a feature to keep the uniqueness of inum in the superblock.
In this commit, the parameter superblock sets NULL, so the feature is
OFF. It will be ON individually be thier maintainer.

Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 arch/powerpc/platforms/cell/spufs/inode.c |    2 +-
 arch/s390/hypfs/inode.c                   |    2 +-
 drivers/infiniband/hw/ipath/ipath_fs.c    |    2 +-
 drivers/infiniband/hw/qib/qib_fs.c        |    2 +-
 drivers/misc/ibmasm/ibmasmfs.c            |    2 +-
 drivers/oprofile/oprofilefs.c             |    2 +-
 drivers/usb/gadget/f_fs.c                 |    2 +-
 drivers/usb/gadget/inode.c                |    2 +-
 fs/autofs4/inode.c                        |    2 +-
 fs/binfmt_misc.c                          |    2 +-
 fs/configfs/inode.c                       |    2 +-
 fs/debugfs/inode.c                        |    2 +-
 fs/efivarfs/inode.c                       |    2 +-
 fs/freevxfs/vxfs_inode.c                  |    2 +-
 fs/fuse/control.c                         |    2 +-
 fs/hugetlbfs/inode.c                      |    4 +--
 fs/inode.c                                |   55 +++++++++++++++++++++++++++--
 fs/libfs.c                                |    2 +-
 fs/ocfs2/dlmfs/dlmfs.c                    |    4 +--
 fs/pipe.c                                 |    2 +-
 fs/proc/base.c                            |    2 +-
 fs/proc/proc_sysctl.c                     |    2 +-
 fs/pstore/inode.c                         |    2 +-
 fs/ramfs/inode.c                          |    2 +-
 include/linux/fs.h                        |    2 +-
 ipc/mqueue.c                              |    2 +-
 mm/shmem.c                                |    2 +-
 net/socket.c                              |    2 +-
 net/sunrpc/rpc_pipe.c                     |    2 +-
 security/inode.c                          |    2 +-
 30 files changed, 84 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 87ba7cf..9f4c575 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -99,7 +99,7 @@ spufs_new_inode(struct super_block *sb, umode_t mode)
 	if (!inode)
 		goto out;
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = mode;
 	inode->i_uid = current_fsuid();
 	inode->i_gid = current_fsgid();
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index c952b98..224b731 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -100,7 +100,7 @@ static struct inode *hypfs_make_inode(struct super_block *sb, umode_t mode)
 
 	if (ret) {
 		struct hypfs_sb_info *hypfs_info = sb->s_fs_info;
-		ret->i_ino = get_next_ino();
+		ret->i_ino = get_next_ino(NULL);
 		ret->i_mode = mode;
 		ret->i_uid = hypfs_info->uid;
 		ret->i_gid = hypfs_info->gid;
diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c b/drivers/infiniband/hw/ipath/ipath_fs.c
index e0c404b..f69eee1 100644
--- a/drivers/infiniband/hw/ipath/ipath_fs.c
+++ b/drivers/infiniband/hw/ipath/ipath_fs.c
@@ -57,7 +57,7 @@ static int ipathfs_mknod(struct inode *dir, struct dentry *dentry,
 		goto bail;
 	}
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = mode;
 	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	inode->i_private = data;
diff --git a/drivers/infiniband/hw/qib/qib_fs.c b/drivers/infiniband/hw/qib/qib_fs.c
index cab610c..2eccd83 100644
--- a/drivers/infiniband/hw/qib/qib_fs.c
+++ b/drivers/infiniband/hw/qib/qib_fs.c
@@ -59,7 +59,7 @@ static int qibfs_mknod(struct inode *dir, struct dentry *dentry,
 		goto bail;
 	}
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = mode;
 	inode->i_uid = GLOBAL_ROOT_UID;
 	inode->i_gid = GLOBAL_ROOT_GID;
diff --git a/drivers/misc/ibmasm/ibmasmfs.c b/drivers/misc/ibmasm/ibmasmfs.c
index e8b9331..a9a090f 100644
--- a/drivers/misc/ibmasm/ibmasmfs.c
+++ b/drivers/misc/ibmasm/ibmasmfs.c
@@ -142,7 +142,7 @@ static struct inode *ibmasmfs_make_inode(struct super_block *sb, int mode)
 	struct inode *ret = new_inode(sb);
 
 	if (ret) {
-		ret->i_ino = get_next_ino();
+		ret->i_ino = get_next_ino(NULL);
 		ret->i_mode = mode;
 		ret->i_atime = ret->i_mtime = ret->i_ctime = CURRENT_TIME;
 	}
diff --git a/drivers/oprofile/oprofilefs.c b/drivers/oprofile/oprofilefs.c
index 3f49345..3d85783 100644
--- a/drivers/oprofile/oprofilefs.c
+++ b/drivers/oprofile/oprofilefs.c
@@ -28,7 +28,7 @@ static struct inode *oprofilefs_get_inode(struct super_block *sb, int mode)
 	struct inode *inode = new_inode(sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_mode = mode;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	}
diff --git a/drivers/usb/gadget/f_fs.c b/drivers/usb/gadget/f_fs.c
index 1e12b3e..088387f 100644
--- a/drivers/usb/gadget/f_fs.c
+++ b/drivers/usb/gadget/f_fs.c
@@ -1096,7 +1096,7 @@ ffs_sb_make_inode(struct super_block *sb, void *data,
 	if (likely(inode)) {
 		struct timespec current_time = CURRENT_TIME;
 
-		inode->i_ino	 = get_next_ino();
+		inode->i_ino	 = get_next_ino(NULL);
 		inode->i_mode    = perms->mode;
 		inode->i_uid     = perms->uid;
 		inode->i_gid     = perms->gid;
diff --git a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
index a925d0c..20f5676 100644
--- a/drivers/usb/gadget/inode.c
+++ b/drivers/usb/gadget/inode.c
@@ -1990,7 +1990,7 @@ gadgetfs_make_inode (struct super_block *sb,
 	struct inode *inode = new_inode (sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_mode = mode;
 		inode->i_uid = make_kuid(&init_user_ns, default_uid);
 		inode->i_gid = make_kgid(&init_user_ns, default_gid);
diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index d7bd395..2a05cae 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -356,7 +356,7 @@ struct inode *autofs4_get_inode(struct super_block *sb, umode_t mode)
 		inode->i_gid = sb->s_root->d_inode->i_gid;
 	}
 	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 
 	if (S_ISDIR(mode)) {
 		set_nlink(inode, 2);
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index b605003..1ac7d3b 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -476,7 +476,7 @@ static struct inode *bm_get_inode(struct super_block *sb, int mode)
 	struct inode * inode = new_inode(sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_mode = mode;
 		inode->i_atime = inode->i_mtime = inode->i_ctime =
 			current_fs_time(inode->i_sb);
diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c
index a9d35b0..06034b3 100644
--- a/fs/configfs/inode.c
+++ b/fs/configfs/inode.c
@@ -135,7 +135,7 @@ struct inode *configfs_new_inode(umode_t mode, struct configfs_dirent *sd,
 {
 	struct inode * inode = new_inode(s);
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_mapping->a_ops = &configfs_aops;
 		inode->i_mapping->backing_dev_info = &configfs_backing_dev_info;
 		inode->i_op = &configfs_inode_operations;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 8c41b52..d29eeb8 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -41,7 +41,7 @@ static struct inode *debugfs_get_inode(struct super_block *sb, umode_t mode, dev
 	struct inode *inode = new_inode(sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_mode = mode;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 		switch (mode & S_IFMT) {
diff --git a/fs/efivarfs/inode.c b/fs/efivarfs/inode.c
index 07ab497..3be77f2 100644
--- a/fs/efivarfs/inode.c
+++ b/fs/efivarfs/inode.c
@@ -20,7 +20,7 @@ struct inode *efivarfs_get_inode(struct super_block *sb,
 	struct inode *inode = new_inode(sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_mode = mode;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 		switch (mode & S_IFMT) {
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 363e3ae..3976e7f 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -260,7 +260,7 @@ vxfs_get_fake_inode(struct super_block *sbp, struct vxfs_inode_info *vip)
 	struct inode			*ip = NULL;
 
 	if ((ip = new_inode(sbp))) {
-		ip->i_ino = get_next_ino();
+		ip->i_ino = get_next_ino(NULL);
 		vxfs_iinit(ip, vip);
 		ip->i_mapping->a_ops = &vxfs_aops;
 	}
diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index 205e0d5..5ee65d4 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -216,7 +216,7 @@ static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
 	if (!inode)
 		return NULL;
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = mode;
 	inode->i_uid = fc->user_id;
 	inode->i_gid = fc->group_id;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index e19d4c0..ca48178 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -453,7 +453,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
 	inode = new_inode(sb);
 	if (inode) {
 		struct hugetlbfs_inode_info *info;
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_mode = S_IFDIR | config->mode;
 		inode->i_uid = config->uid;
 		inode->i_gid = config->gid;
@@ -491,7 +491,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 	inode = new_inode(sb);
 	if (inode) {
 		struct hugetlbfs_inode_info *info;
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode_init_owner(inode, dir, mode);
 		lockdep_set_class(&inode->i_mapping->i_mmap_mutex,
 				&hugetlbfs_i_mmap_mutex_key);
diff --git a/fs/inode.c b/fs/inode.c
index a3e274a..1412607 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -834,8 +834,9 @@ repeat:
 #define LAST_INO_BATCH 1024
 static DEFINE_PER_CPU(unsigned int, last_ino);
 
-unsigned int get_next_ino(void)
+static unsigned int do_get_next_ino(struct super_block *sb, int *pwrapped)
 {
+	static int wrapped;
 	unsigned int *p = &get_cpu_var(last_ino);
 	unsigned int res = *p;
 
@@ -850,10 +851,60 @@ unsigned int get_next_ino(void)
 
 	res++;
 	/* never zero */
-	if (unlikely(!res))
+	if (unlikely(!res)) {
 		res++;
+		wrapped = 1;
+	}
 	*p = res;
 	put_cpu_var(last_ino);
+	*pwrapped = wrapped;
+	return res;
+}
+
+static int test_ino_sb(struct super_block *sb, unsigned long ino)
+{
+	int in_use;
+	struct inode *inode, *next;
+
+	in_use = 0;
+	spin_lock(&inode_sb_list_lock);
+	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+		if (inode->i_ino != ino)
+			continue;
+		in_use = 1;
+		break;
+	}
+	spin_unlock(&inode_sb_list_lock);
+
+	return in_use;
+}
+
+unsigned int get_next_ino(struct super_block *sb)
+{
+	unsigned int res, ui;
+	int wrapped, used;
+
+	res = do_get_next_ino(sb, &wrapped);
+	if (!sb || !wrapped)
+		goto out;
+
+	/*
+	 * the inode-numbers wrapped around, but we need to keep its uniqueness.
+	 * test_inode_iunique() doesn't suit for filesystem which doesn't insert
+	 * i_hash in creating its inode such like tmpfs, so we search the
+	 * duplication on sb->s_inodes here.
+	 */
+	for (ui = 0; ui < UINT_MAX; ui++) {
+		used = test_ino_sb(sb, res);
+		if (!used)
+			break;
+		res = do_get_next_ino(sb, &wrapped);
+	}
+	WARN_ONCE(used,
+		  "<%x>: the uniqueness of the inode-number is lost\n",
+		  sb->s_dev);
+
+out:
 	return res;
 }
 EXPORT_SYMBOL(get_next_ino);
diff --git a/fs/libfs.c b/fs/libfs.c
index a184424..1c091ff 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1029,7 +1029,7 @@ struct inode *alloc_anon_inode(struct super_block *s)
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mapping->a_ops = &anon_aops;
 
 	/*
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index 09b7d9d..3b151ca 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -402,7 +402,7 @@ static struct inode *dlmfs_get_root_inode(struct super_block *sb)
 	umode_t mode = S_IFDIR | 0755;
 
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode_init_owner(inode, NULL, mode);
 		inode->i_mapping->backing_dev_info = &dlmfs_backing_dev_info;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
@@ -426,7 +426,7 @@ static struct inode *dlmfs_get_inode(struct inode *parent,
 	if (!inode)
 		return NULL;
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode_init_owner(inode, parent, mode);
 	inode->i_mapping->backing_dev_info = &dlmfs_backing_dev_info;
 	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
diff --git a/fs/pipe.c b/fs/pipe.c
index 034bffa..00ee690 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -740,7 +740,7 @@ static struct inode * get_pipe_inode(void)
 	if (!inode)
 		goto fail_inode;
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 
 	pipe = alloc_pipe_info();
 	if (!pipe)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2d696b0..d43b2be 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1518,7 +1518,7 @@ struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *t
 
 	/* Common stuff */
 	ei = PROC_I(inode);
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	inode->i_op = &proc_def_inode_operations;
 
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 7129046..7457bde 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -402,7 +402,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 	if (!inode)
 		goto out;
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 
 	sysctl_head_get(head);
 	ei = PROC_I(inode);
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 192297b..f7d03df 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -207,7 +207,7 @@ static struct inode *pstore_get_inode(struct super_block *sb)
 {
 	struct inode *inode = new_inode(sb);
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	}
 	return inode;
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index d365b1c..0eff96e 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -64,7 +64,7 @@ struct inode *ramfs_get_inode(struct super_block *sb,
 	struct inode * inode = new_inode(sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode_init_owner(inode, dir, mode);
 		inode->i_mapping->a_ops = &ramfs_aops;
 		inode->i_mapping->backing_dev_info = &ramfs_backing_dev_info;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8780312..807a55f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2364,7 +2364,7 @@ extern void lockdep_annotate_inode_mutex_key(struct inode *inode);
 static inline void lockdep_annotate_inode_mutex_key(struct inode *inode) { };
 #endif
 extern void unlock_new_inode(struct inode *);
-extern unsigned int get_next_ino(void);
+extern unsigned int get_next_ino(struct super_block *);
 
 extern void __iget(struct inode * inode);
 extern void iget_failed(struct inode *);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4fcf39a..6e456c9 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -225,7 +225,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 	if (!inode)
 		goto err;
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = mode;
 	inode->i_uid = current_fsuid();
 	inode->i_gid = current_fsgid();
diff --git a/mm/shmem.c b/mm/shmem.c
index 9f70e02..7dc9f19 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1306,7 +1306,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 
 	inode = new_inode(sb);
 	if (inode) {
-		inode->i_ino = get_next_ino();
+		inode->i_ino = get_next_ino(NULL);
 		inode_init_owner(inode, dir, mode);
 		inode->i_blocks = 0;
 		inode->i_mapping->backing_dev_info = &shmem_backing_dev_info;
diff --git a/net/socket.c b/net/socket.c
index abf56b2..bb4b3a2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -548,7 +548,7 @@ static struct socket *sock_alloc(void)
 	sock = SOCKET_I(inode);
 
 	kmemcheck_annotate_bitfield(sock, type);
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = S_IFSOCK | S_IRWXUGO;
 	inode->i_uid = current_fsuid();
 	inode->i_gid = current_fsgid();
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index b185548..a38610b 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -475,7 +475,7 @@ rpc_get_inode(struct super_block *sb, umode_t mode)
 	struct inode *inode = new_inode(sb);
 	if (!inode)
 		return NULL;
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = mode;
 	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	switch (mode & S_IFMT) {
diff --git a/security/inode.c b/security/inode.c
index 43ce6e1..6bd4cdcc 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -120,7 +120,7 @@ struct dentry *securityfs_create_file(const char *name, umode_t mode,
 		goto out1;
 	}
 
-	inode->i_ino = get_next_ino();
+	inode->i_ino = get_next_ino(NULL);
 	inode->i_mode = mode;
 	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	inode->i_private = data;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs
  2014-05-21 18:48 [RFC 0/3] vfs: get_next_ino(), never inum=0 and uniqueness hooanon05g
  2014-05-21 18:48 ` [RFC 1/3] vfs: get_next_ino(), never inum=0 hooanon05g
  2014-05-21 18:48 ` [RFC 2/3] vfs: get_next_ino(), support for the uniqueness hooanon05g
@ 2014-05-21 18:49 ` hooanon05g
  2014-05-22  1:03   ` J. R. Okajima
  2 siblings, 1 reply; 26+ messages in thread
From: hooanon05g @ 2014-05-21 18:49 UTC (permalink / raw)
  To: linux-fsdevel, adilger, hch, dchinner, viro; +Cc: jlbec, gregkh, hughd

From: "J. R. Okajima" <hooanon05g@gmail.com>

Turn the feature ON which was introduced by previous commit.

Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 fs/configfs/inode.c   |    2 +-
 fs/debugfs/inode.c    |    2 +-
 fs/proc/base.c        |    2 +-
 fs/proc/proc_sysctl.c |    2 +-
 fs/ramfs/inode.c      |    2 +-
 mm/shmem.c            |    2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c
index 06034b3..e58413a 100644
--- a/fs/configfs/inode.c
+++ b/fs/configfs/inode.c
@@ -135,7 +135,7 @@ struct inode *configfs_new_inode(umode_t mode, struct configfs_dirent *sd,
 {
 	struct inode * inode = new_inode(s);
 	if (inode) {
-		inode->i_ino = get_next_ino(NULL);
+		inode->i_ino = get_next_ino(s);
 		inode->i_mapping->a_ops = &configfs_aops;
 		inode->i_mapping->backing_dev_info = &configfs_backing_dev_info;
 		inode->i_op = &configfs_inode_operations;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index d29eeb8..c0a1986 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -41,7 +41,7 @@ static struct inode *debugfs_get_inode(struct super_block *sb, umode_t mode, dev
 	struct inode *inode = new_inode(sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino(NULL);
+		inode->i_ino = get_next_ino(sb);
 		inode->i_mode = mode;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 		switch (mode & S_IFMT) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d43b2be..9929c35 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1518,7 +1518,7 @@ struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *t
 
 	/* Common stuff */
 	ei = PROC_I(inode);
-	inode->i_ino = get_next_ino(NULL);
+	inode->i_ino = get_next_ino(sb);
 	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	inode->i_op = &proc_def_inode_operations;
 
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 7457bde..2d27e24 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -402,7 +402,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
 	if (!inode)
 		goto out;
 
-	inode->i_ino = get_next_ino(NULL);
+	inode->i_ino = get_next_ino(sb);
 
 	sysctl_head_get(head);
 	ei = PROC_I(inode);
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index 0eff96e..eef0528 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -64,7 +64,7 @@ struct inode *ramfs_get_inode(struct super_block *sb,
 	struct inode * inode = new_inode(sb);
 
 	if (inode) {
-		inode->i_ino = get_next_ino(NULL);
+		inode->i_ino = get_next_ino(sb);
 		inode_init_owner(inode, dir, mode);
 		inode->i_mapping->a_ops = &ramfs_aops;
 		inode->i_mapping->backing_dev_info = &ramfs_backing_dev_info;
diff --git a/mm/shmem.c b/mm/shmem.c
index 7dc9f19..56a7f94 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1306,7 +1306,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 
 	inode = new_inode(sb);
 	if (inode) {
-		inode->i_ino = get_next_ino(NULL);
+		inode->i_ino = get_next_ino(sb);
 		inode_init_owner(inode, dir, mode);
 		inode->i_blocks = 0;
 		inode->i_mapping->backing_dev_info = &shmem_backing_dev_info;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs
  2014-05-21 18:49 ` [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs hooanon05g
@ 2014-05-22  1:03   ` J. R. Okajima
  2014-05-22 11:53     ` Jan Kara
  0 siblings, 1 reply; 26+ messages in thread
From: J. R. Okajima @ 2014-05-22  1:03 UTC (permalink / raw)
  To: linux-fsdevel, adilger, hch, dchinner, viro, jlbec, gregkh, hughd


hooanon05g@gmail.com:
> Turn the feature ON which was introduced by previous commit.

It was for ramfs and tmpfs only.
configfs, debugfs and procfs don't need this.


J. R. Okajima

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs
  2014-05-22  1:03   ` J. R. Okajima
@ 2014-05-22 11:53     ` Jan Kara
  2014-05-22 14:58       ` J. R. Okajima
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Kara @ 2014-05-22 11:53 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: linux-fsdevel, adilger, hch, dchinner, viro, jlbec, gregkh, hughd

On Thu 22-05-14 10:03:11, J. R. Okajima wrote:
> 
> hooanon05g@gmail.com:
> > Turn the feature ON which was introduced by previous commit.
> 
> It was for ramfs and tmpfs only.
  Hum, have you observed any real problems with non-unique inode numbers
even for tmpfs? Because e.g. the NFS case you mentioned isn't IMHO right -
tmpfs sets i_generation to current time so even if inode counter wraps,
i_generation will be different and so they will be different inodes for
NFS. And the backup case isn't very convincing either - who would be
backing up tmpfs filesystem ;)?

								Honza

> configfs, debugfs and procfs don't need this.
> 
> 
> J. R. Okajima
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 2/3] vfs: get_next_ino(), support for the uniqueness
  2014-05-21 18:48 ` [RFC 2/3] vfs: get_next_ino(), support for the uniqueness hooanon05g
@ 2014-05-22 11:56   ` Jan Kara
  2014-05-22 15:03     ` J. R. Okajima
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Kara @ 2014-05-22 11:56 UTC (permalink / raw)
  To: hooanon05g
  Cc: linux-fsdevel, adilger, hch, dchinner, viro, jlbec, gregkh, hughd

On Thu 22-05-14 03:48:59, hooanon05g@gmail.com wrote:
> From: "J. R. Okajima" <hooanon05g@gmail.com>
> 
> Add a feature to keep the uniqueness of inum in the superblock.
> In this commit, the parameter superblock sets NULL, so the feature is
> OFF. It will be ON individually be thier maintainer.
> 
> Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
...
> diff --git a/fs/inode.c b/fs/inode.c
> index a3e274a..1412607 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -834,8 +834,9 @@ repeat:
>  #define LAST_INO_BATCH 1024
>  static DEFINE_PER_CPU(unsigned int, last_ino);
>  
> -unsigned int get_next_ino(void)
> +static unsigned int do_get_next_ino(struct super_block *sb, int *pwrapped)
>  {
> +	static int wrapped;
>  	unsigned int *p = &get_cpu_var(last_ino);
>  	unsigned int res = *p;
>  
> @@ -850,10 +851,60 @@ unsigned int get_next_ino(void)
>  
>  	res++;
>  	/* never zero */
> -	if (unlikely(!res))
> +	if (unlikely(!res)) {
>  		res++;
> +		wrapped = 1;
> +	}
>  	*p = res;
>  	put_cpu_var(last_ino);
> +	*pwrapped = wrapped;
> +	return res;
> +}
> +
> +static int test_ino_sb(struct super_block *sb, unsigned long ino)
> +{
> +	int in_use;
> +	struct inode *inode, *next;
> +
> +	in_use = 0;
> +	spin_lock(&inode_sb_list_lock);
> +	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_ino != ino)
> +			continue;
> +		in_use = 1;
> +		break;
> +	}
> +	spin_unlock(&inode_sb_list_lock);
> +
> +	return in_use;
> +}
  I don't think scanning the whole superblock list is really a viable
alternative. That is going to contend a lot on inode_sb_list_lock and burn
a lot of CPU when there is even moderate number of inodes in tmpfs. If we
ever have to really use unique inode numbers for tmpfs (but I'm not
convinced - see my other email), we should probably hash those inodes and
use iunique()...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs
  2014-05-22 11:53     ` Jan Kara
@ 2014-05-22 14:58       ` J. R. Okajima
  2014-05-22 15:09         ` Jan Kara
  0 siblings, 1 reply; 26+ messages in thread
From: J. R. Okajima @ 2014-05-22 14:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, adilger, hch, dchinner, viro, jlbec, gregkh, hughd


Jan Kara:
>   Hum, have you observed any real problems with non-unique inode numbers
> even for tmpfs? Because e.g. the NFS case you mentioned isn't IMHO right -
> tmpfs sets i_generation to current time so even if inode counter wraps,
> i_generation will be different and so they will be different inodes for
> NFS. And the backup case isn't very convincing either - who would be
> backing up tmpfs filesystem ;)?

For NFS, maybe you are right.
I forgot about i_generation. It won't be a problem as you wrote
probably.

For backup case for tmpfs, which I have not confirmed the actual case
either, there several cases.
- some people doesn't want to write flash medias (SSD) frequently.
  they store the changes on tmpfs and then move the files from tmpfs to
  SSD later.
- this is one use-case of a stackable filesystem.

By the way, I personally don't know how effective it is to make SSD to
live longer. I have read a report saying "the life of flash medias are
much longer than we expect" and the reporter said "we tried writing to
flash medias for a long time over and over again, but could not meet the
end of life." But also I have read several reports saying "I met a end
of lifetime of my ssd. All writes are gone, but I can read the old
contents."

The uniqueness of inums may not be important, but the inum should not be
zero. Do you agree about the first patch "never inum=0"?


J. R. Okajima

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 2/3] vfs: get_next_ino(), support for the uniqueness
  2014-05-22 11:56   ` Jan Kara
@ 2014-05-22 15:03     ` J. R. Okajima
  2014-05-22 15:12       ` Jan Kara
  0 siblings, 1 reply; 26+ messages in thread
From: J. R. Okajima @ 2014-05-22 15:03 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, adilger, hch, dchinner, viro, jlbec, gregkh, hughd


Jan Kara:
>   I don't think scanning the whole superblock list is really a viable
> alternative. That is going to contend a lot on inode_sb_list_lock and burn
> a lot of CPU when there is even moderate number of inodes in tmpfs. If we
> ever have to really use unique inode numbers for tmpfs (but I'm not
> convinced - see my other email), we should probably hash those inodes and
> use iunique()...

Actually I tried
static int test_inode_iunique(struct super_block *sb, unsigned long ino)
which is defined fs/inode.c.
But tmpfs doesn't add inode into hash list when creating the inode.


J. R. Okajima

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs
  2014-05-22 14:58       ` J. R. Okajima
@ 2014-05-22 15:09         ` Jan Kara
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Kara @ 2014-05-22 15:09 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Jan Kara, linux-fsdevel, adilger, hch, dchinner, viro, jlbec,
	gregkh, hughd

On Thu 22-05-14 23:58:37, J. R. Okajima wrote:
> 
> Jan Kara:
> >   Hum, have you observed any real problems with non-unique inode numbers
> > even for tmpfs? Because e.g. the NFS case you mentioned isn't IMHO right -
> > tmpfs sets i_generation to current time so even if inode counter wraps,
> > i_generation will be different and so they will be different inodes for
> > NFS. And the backup case isn't very convincing either - who would be
> > backing up tmpfs filesystem ;)?
> 
> For NFS, maybe you are right.
> I forgot about i_generation. It won't be a problem as you wrote
> probably.
> 
> For backup case for tmpfs, which I have not confirmed the actual case
> either, there several cases.
> - some people doesn't want to write flash medias (SSD) frequently.
>   they store the changes on tmpfs and then move the files from tmpfs to
>   SSD later.
> - this is one use-case of a stackable filesystem.
> 
> By the way, I personally don't know how effective it is to make SSD to
> live longer. I have read a report saying "the life of flash medias are
> much longer than we expect" and the reporter said "we tried writing to
> flash medias for a long time over and over again, but could not meet the
> end of life." But also I have read several reports saying "I met a end
> of lifetime of my ssd. All writes are gone, but I can read the old
> contents."
> 
> The uniqueness of inums may not be important, but the inum should not be
> zero. Do you agree about the first patch "never inum=0"?
  Oh, yes, that one is definitely good. Feel free to add my:
Reviewed-by: Jan Kara <jack@suse.cz>

  to that patch.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 2/3] vfs: get_next_ino(), support for the uniqueness
  2014-05-22 15:03     ` J. R. Okajima
@ 2014-05-22 15:12       ` Jan Kara
  2014-05-22 15:14         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Kara @ 2014-05-22 15:12 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: Jan Kara, linux-fsdevel, adilger, hch, dchinner, viro, jlbec,
	gregkh, hughd

On Fri 23-05-14 00:03:21, J. R. Okajima wrote:
> 
> Jan Kara:
> >   I don't think scanning the whole superblock list is really a viable
> > alternative. That is going to contend a lot on inode_sb_list_lock and burn
> > a lot of CPU when there is even moderate number of inodes in tmpfs. If we
> > ever have to really use unique inode numbers for tmpfs (but I'm not
> > convinced - see my other email), we should probably hash those inodes and
> > use iunique()...
> 
> Actually I tried
> static int test_inode_iunique(struct super_block *sb, unsigned long ino)
> which is defined fs/inode.c.
> But tmpfs doesn't add inode into hash list when creating the inode.
  Yes, that's why I wrote that if we find that uniqueness of inode numbers
in tmpfs is needed, then we should probably just make tmpfs add inodes to
hash list and use iunique(). That would seem like a better solution to me.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 2/3] vfs: get_next_ino(), support for the uniqueness
  2014-05-22 15:12       ` Jan Kara
@ 2014-05-22 15:14         ` Christoph Hellwig
  2014-05-22 16:06           ` Jan Kara
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Christoph Hellwig @ 2014-05-22 15:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: J. R. Okajima, linux-fsdevel, adilger, hch, dchinner, viro,
	jlbec, gregkh, hughd

On Thu, May 22, 2014 at 05:12:53PM +0200, Jan Kara wrote:
>   Yes, that's why I wrote that if we find that uniqueness of inode numbers
> in tmpfs is needed, then we should probably just make tmpfs add inodes to
> hash list and use iunique(). That would seem like a better solution to me.

Any reason not to just use the ida allocator for inode numbers in tmpfs?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 2/3] vfs: get_next_ino(), support for the uniqueness
  2014-05-22 15:14         ` Christoph Hellwig
@ 2014-05-22 16:06           ` Jan Kara
  2014-05-29 15:46           ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
  2014-06-01 16:18           ` [RFC PATCH v3 0/2] the uniquness of tmpfs inode-number J. R. Okajima
  2 siblings, 0 replies; 26+ messages in thread
From: Jan Kara @ 2014-05-22 16:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, J. R. Okajima, linux-fsdevel, adilger, dchinner, viro,
	jlbec, gregkh, hughd

On Thu 22-05-14 17:14:31, Christoph Hellwig wrote:
> On Thu, May 22, 2014 at 05:12:53PM +0200, Jan Kara wrote:
> >   Yes, that's why I wrote that if we find that uniqueness of inode numbers
> > in tmpfs is needed, then we should probably just make tmpfs add inodes to
> > hash list and use iunique(). That would seem like a better solution to me.
> 
> Any reason not to just use the ida allocator for inode numbers in tmpfs?
  That would be another option, right.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 1/3] vfs: get_next_ino(), never inum=0
  2014-05-21 18:48 ` [RFC 1/3] vfs: get_next_ino(), never inum=0 hooanon05g
@ 2014-05-28  4:28   ` Hugh Dickins
  2014-05-28  5:53     ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Hugh Dickins @ 2014-05-28  4:28 UTC (permalink / raw)
  To: hooanon05g
  Cc: Andrew Morton, Jan Kara, Eric Dumazet, linux-fsdevel, adilger,
	hch, dchinner, viro, jlbec, gregkh

On Thu, 22 May 2014, hooanon05g@gmail.com wrote:
> From: "J. R. Okajima" <hooanon05g@gmail.com>
> 
> It is very rare for get_next_ino() to return zero as a new inode number
> since its type is unsigned int, but it can surely happen eventually.
> 
> Interestingly, ls(1) and find(1) (actually readdir(3)) don't show a file
> whose inum is zero, so people won't be able to find it. This issue may
> be harmful especially for tmpfs.
> 
> On a very long lived and busy system, users may frequently create files
> on tmpfs. And if unluckily he gets inum=0, then he cannot see its
> filename. If he remembers its name, he may be able to use or unlink it
> by its name since the file surely exists. Otherwise, the file remains on
> tmpfs silently. No one can touch it. This behaviour looks like resource
> leak.
> As a worse case, if a dir gets inum=0 and a user creates several files
> under it, then the leaked memory will increase since a user cannot see
> the name of all files under the dir whose inum=0, regardless the inum of
> the children.
> 
> There is another unpleasant effect when get_next_ino() wraps
> around. When there is a file whose inum=100 on tmpfs, a new file may get
> inum=100, ie. the duplicated inums. I am not sure what will happen when
> the duplicated inums exist on tmpfs. If it happens, then some tools
> won't work correctly such as backup and exporting via NFS, I am afraid.
> Anyway this is not a issue in get_next_ino(). It should be
> fixed in mm/shmem.c separatly if it is really necessary.
> 
> There are many other get_next_ino() callers other than tmpfs, such as
> several drivers, anon_inode, autofs4, freevxfs, procfs, pis, hugetlbfs,
> configfs, ramfs, fuse, ocfs2, debugfs, securityfs, cgroup, socket, ipc.
> Some of them will not care inum so this issue is harmless for them. But
> the others may suffer from inum=0. For example, if procfs gets inum=0
> for a task dir (or for one of its children), then several utilities
> won't work correctly, including ps(1), lsof(8), etc.
> 
> Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>

Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  fs/inode.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index f96d2a6..a3e274a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -848,7 +848,11 @@ unsigned int get_next_ino(void)
>  	}
>  #endif
>  
> -	*p = ++res;
> +	res++;
> +	/* never zero */
> +	if (unlikely(!res))
> +		res++;
> +	*p = res;
>  	put_cpu_var(last_ino);
>  	return res;
>  }
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 1/3] vfs: get_next_ino(), never inum=0
  2014-05-28  4:28   ` Hugh Dickins
@ 2014-05-28  5:53     ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2014-05-28  5:53 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: hooanon05g, Andrew Morton, Jan Kara, linux-fsdevel, adilger, hch,
	dchinner, viro, jlbec, Greg Kroah-Hartman

On Tue, May 27, 2014 at 9:28 PM, Hugh Dickins <hughd@google.com> wrote:

>> Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Acked-by: Hugh Dickins <hughd@google.com>
>
>> ---
>>  fs/inode.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index f96d2a6..a3e274a 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -848,7 +848,11 @@ unsigned int get_next_ino(void)
>>       }
>>  #endif
>>
>> -     *p = ++res;
>> +     res++;
>> +     /* never zero */
>> +     if (unlikely(!res))
>> +             res++;
>> +     *p = res;
>>       put_cpu_var(last_ino);
>>       return res;
>>  }

This is not right.

Each cpu 'owns' a window of 1024 consecutive inums.

If res == 0, then 1 does not necessarily belong to this cpu, and you
risk that about 1024 duplicate inums are given.

You need something like (sorry for bad formatting)

diff --git a/fs/inode.c b/fs/inode.c
index f96d2a6f88cc..66fd1189ddaa 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -839,6 +839,8 @@ unsigned int get_next_ino(void)
        unsigned int *p = &get_cpu_var(last_ino);
        unsigned int res = *p;

+start:
+
 #ifdef CONFIG_SMP
        if (unlikely((res & (LAST_INO_BATCH-1)) == 0)) {
                static atomic_t shared_last_ino;
@@ -848,7 +850,10 @@ unsigned int get_next_ino(void)
        }
 #endif

-       *p = ++res;
+       if (unlikely(++res == 0))
+               goto start;
+       *p = res;
+
        put_cpu_var(last_ino);
        return res;
 }

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 1/2] tmpfs: manage the inode-number by IDR
  2014-05-22 15:14         ` Christoph Hellwig
  2014-05-22 16:06           ` Jan Kara
@ 2014-05-29 15:46           ` J. R. Okajima
  2014-05-29 15:46             ` [PATCH v2 2/2] tmpfs: refine a file handle for NFS-exporting J. R. Okajima
  2014-05-31  2:43             ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
  2014-06-01 16:18           ` [RFC PATCH v3 0/2] the uniquness of tmpfs inode-number J. R. Okajima
  2 siblings, 2 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-05-29 15:46 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

To ensure the uniquness of the inode-number, manage it by IDR.
Also it tries using the lowest unused inode-number, so the value will
usually be smaller.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 include/linux/shmem_fs.h |    2 ++
 mm/shmem.c               |   20 +++++++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 4d1771c..30c75f0 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -24,6 +24,8 @@ struct shmem_inode_info {
 };
 
 struct shmem_sb_info {
+	struct mutex idr_lock;
+	struct idr idr;		    /* manages inode-number */
 	unsigned long max_blocks;   /* How many blocks are allowed */
 	struct percpu_counter used_blocks;  /* How many are allocated */
 	unsigned long max_inodes;   /* How many inodes are allowed */
diff --git a/mm/shmem.c b/mm/shmem.c
index 368f314..440db70 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -569,6 +569,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -584,6 +585,9 @@ static void shmem_evict_inode(struct inode *inode)
 
 	simple_xattrs_free(&info->xattrs);
 	WARN_ON(inode->i_blocks);
+	mutex_lock(&sbinfo->idr_lock);
+	idr_remove(&sbinfo->idr, inode->i_ino);
+	mutex_unlock(&sbinfo->idr_lock);
 	shmem_free_inode(inode->i_sb);
 	clear_inode(inode);
 }
@@ -1315,13 +1319,13 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 	struct inode *inode;
 	struct shmem_inode_info *info;
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+	long long ino; /* signed to detect an error */
 
 	if (shmem_reserve_inode(sb))
 		return NULL;
 
 	inode = new_inode(sb);
 	if (inode) {
-		inode->i_ino = get_next_ino();
 		inode_init_owner(inode, dir, mode);
 		inode->i_blocks = 0;
 		inode->i_mapping->backing_dev_info = &shmem_backing_dev_info;
@@ -1362,6 +1366,17 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 			mpol_shared_policy_init(&info->policy, NULL);
 			break;
 		}
+
+		/* inum 0 and 1 are unused */
+		mutex_lock(&sbinfo->idr_lock);
+		ino = idr_alloc(&sbinfo->idr, inode, 2, UINT_MAX, GFP_NOFS);
+		if (ino > 0)
+			inode->i_ino = ino;
+		mutex_unlock(&sbinfo->idr_lock);
+		if (unlikely(ino < 0)) {
+			iput(inode);	/* shmem_free_inode() will be called */
+			inode = NULL;
+		}
 	} else
 		shmem_free_inode(sb);
 	return inode;
@@ -2504,6 +2519,7 @@ static void shmem_put_super(struct super_block *sb)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 
+	idr_destroy(&sbinfo->idr);
 	percpu_counter_destroy(&sbinfo->used_blocks);
 	mpol_put(sbinfo->mpol);
 	kfree(sbinfo);
@@ -2522,6 +2538,8 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent)
 	if (!sbinfo)
 		return -ENOMEM;
 
+	mutex_init(&sbinfo->idr_lock);
+	idr_init(&sbinfo->idr);
 	sbinfo->mode = S_IRWXUGO | S_ISVTX;
 	sbinfo->uid = current_fsuid();
 	sbinfo->gid = current_fsgid();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/2] tmpfs: refine a file handle for NFS-exporting
  2014-05-29 15:46           ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
@ 2014-05-29 15:46             ` J. R. Okajima
  2014-05-31  2:43             ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
  1 sibling, 0 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-05-29 15:46 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

The inode number in tmpfs is always 32bit.
It is unnecessary to put 64bit into NFS handle, and this commit puts
only 32bit.
While it is also possible to replace ilookup5() by idr_find(), I don't
want to confirm and maintain i_state. So let's keep ilookup5().

Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 mm/shmem.c |   37 ++++++++++---------------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 440db70..96bdd2b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1373,7 +1373,9 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 		if (ino > 0)
 			inode->i_ino = ino;
 		mutex_unlock(&sbinfo->idr_lock);
-		if (unlikely(ino < 0)) {
+		if (ino > 0)
+			__insert_inode_hash(inode, inode->i_ino);
+		else {
 			iput(inode);	/* shmem_free_inode() will be called */
 			inode = NULL;
 		}
@@ -2281,8 +2283,7 @@ static struct dentry *shmem_get_parent(struct dentry *child)
 static int shmem_match(struct inode *ino, void *vfh)
 {
 	__u32 *fh = vfh;
-	__u64 inum = fh[2];
-	inum = (inum << 32) | fh[1];
+	__u64 inum = fh[1];
 	return ino->i_ino == inum && fh[0] == ino->i_generation;
 }
 
@@ -2293,14 +2294,11 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
 	struct dentry *dentry = NULL;
 	u64 inum;
 
-	if (fh_len < 3)
+	if (fh_len < 2)
 		return NULL;
 
-	inum = fid->raw[2];
-	inum = (inum << 32) | fid->raw[1];
-
-	inode = ilookup5(sb, (unsigned long)(inum + fid->raw[0]),
-			shmem_match, fid->raw);
+	inum = fid->raw[1];
+	inode = ilookup5(sb, inum, shmem_match, fid->raw);
 	if (inode) {
 		dentry = d_find_alias(inode);
 		iput(inode);
@@ -2312,30 +2310,15 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
 static int shmem_encode_fh(struct inode *inode, __u32 *fh, int *len,
 				struct inode *parent)
 {
-	if (*len < 3) {
-		*len = 3;
+	if (*len < 2) {
+		*len = 2;
 		return FILEID_INVALID;
 	}
 
-	if (inode_unhashed(inode)) {
-		/* Unfortunately insert_inode_hash is not idempotent,
-		 * so as we hash inodes here rather than at creation
-		 * time, we need a lock to ensure we only try
-		 * to do it once
-		 */
-		static DEFINE_SPINLOCK(lock);
-		spin_lock(&lock);
-		if (inode_unhashed(inode))
-			__insert_inode_hash(inode,
-					    inode->i_ino + inode->i_generation);
-		spin_unlock(&lock);
-	}
-
 	fh[0] = inode->i_generation;
 	fh[1] = inode->i_ino;
-	fh[2] = ((__u64)inode->i_ino) >> 32;
 
-	*len = 3;
+	*len = 2;
 	return 1;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] tmpfs: manage the inode-number by IDR
  2014-05-29 15:46           ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
  2014-05-29 15:46             ` [PATCH v2 2/2] tmpfs: refine a file handle for NFS-exporting J. R. Okajima
@ 2014-05-31  2:43             ` J. R. Okajima
  1 sibling, 0 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-05-31  2:43 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara


"J. R. Okajima":
> To ensure the uniquness of the inode-number, manage it by IDR.
> Also it tries using the lowest unused inode-number, so the value will
> usually be smaller.

I made a mistake about error handling.
Please ignore this patch, sorry.


J. R. Okajima

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH v3 0/2] the uniquness of tmpfs inode-number
  2014-05-22 15:14         ` Christoph Hellwig
  2014-05-22 16:06           ` Jan Kara
  2014-05-29 15:46           ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
@ 2014-06-01 16:18           ` J. R. Okajima
  2014-06-01 16:18             ` [RFC PATCH v3 1/2] tmpfs: manage the inode-number by IDR, signed int inum J. R. Okajima
  2014-06-01 16:18             ` [RFC PATCH v3 " J. R. Okajima
  2 siblings, 2 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-06-01 16:18 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

Generally the inode-number should be unique since it is an identifier
within that filesystem. Using a system global function
vfs:get_next_ino() may cause the duplicated inode number in tmpfs. If it
happens, then some userspace "inum-aware" tools may not work correctly
such as backup tools. These patches solve this problem.

Changes from v2:
- the type of tmpfs inode-number is "signed int"
- bugfix about calling idr_remove() in an error path.
- bugfix about the error from idr_alloc().
- [2/2] is not changed essentially.

J. R. Okajima (2):
  tmpfs: manage the inode-number by IDR, signed int inum
  tmpfs: refine a file handle for NFS-exporting

 include/linux/shmem_fs.h |    6 ++--
 mm/shmem.c               |   71 +++++++++++++++++++++++++---------------------
 2 files changed, 43 insertions(+), 34 deletions(-)

--
1.7.10.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH v3 1/2] tmpfs: manage the inode-number by IDR, signed int inum
  2014-06-01 16:18           ` [RFC PATCH v3 0/2] the uniquness of tmpfs inode-number J. R. Okajima
@ 2014-06-01 16:18             ` J. R. Okajima
  2014-06-03  9:04               ` Jan Kara
  2014-06-01 16:18             ` [RFC PATCH v3 " J. R. Okajima
  1 sibling, 1 reply; 26+ messages in thread
From: J. R. Okajima @ 2014-06-01 16:18 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

To ensure the uniquness of the inode-number, manage it by IDR.
Also it tries using the lowest unused inode-number, so the value will
usually be smaller.
Another side effect is the type of the inode-number in tmpfs. By using
IDR, it is limited to signed int. But I don't think it a big
problem. INT_MAX is big enough for the number of inodes in a single tmpfs.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 include/linux/shmem_fs.h |    6 ++++--
 mm/shmem.c               |   37 +++++++++++++++++++++++++++++++------
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 4d1771c..4ba8b43 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -24,10 +24,12 @@ struct shmem_inode_info {
 };
 
 struct shmem_sb_info {
+	struct mutex idr_lock;
+	struct idr idr;		    /* manages inode-number */
 	unsigned long max_blocks;   /* How many blocks are allowed */
 	struct percpu_counter used_blocks;  /* How many are allocated */
-	unsigned long max_inodes;   /* How many inodes are allowed */
-	unsigned long free_inodes;  /* How many are left for allocation */
+	int max_inodes;		    /* How many inodes are allowed */
+	int free_inodes;	    /* How many are left for allocation */
 	spinlock_t stat_lock;	    /* Serialize shmem_sb_info changes */
 	kuid_t uid;		    /* Mount uid for root directory */
 	kgid_t gid;		    /* Mount gid for root directory */
diff --git a/mm/shmem.c b/mm/shmem.c
index 368f314..0023d8d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -107,9 +107,13 @@ static unsigned long shmem_default_max_blocks(void)
 	return totalram_pages / 2;
 }
 
-static unsigned long shmem_default_max_inodes(void)
+static int shmem_default_max_inodes(void)
 {
-	return min(totalram_pages - totalhigh_pages, totalram_pages / 2);
+	int i;
+
+	i = min(totalram_pages - totalhigh_pages, totalram_pages / 2);
+	i = min(i, INT_MAX);
+	return i;
 }
 #endif
 
@@ -569,6 +573,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -584,6 +589,11 @@ static void shmem_evict_inode(struct inode *inode)
 
 	simple_xattrs_free(&info->xattrs);
 	WARN_ON(inode->i_blocks);
+	if (inode->i_ino) {
+		mutex_lock(&sbinfo->idr_lock);
+		idr_remove(&sbinfo->idr, inode->i_ino);
+		mutex_unlock(&sbinfo->idr_lock);
+	}
 	shmem_free_inode(inode->i_sb);
 	clear_inode(inode);
 }
@@ -1315,13 +1325,13 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 	struct inode *inode;
 	struct shmem_inode_info *info;
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+	int ino;
 
 	if (shmem_reserve_inode(sb))
 		return NULL;
 
 	inode = new_inode(sb);
 	if (inode) {
-		inode->i_ino = get_next_ino();
 		inode_init_owner(inode, dir, mode);
 		inode->i_blocks = 0;
 		inode->i_mapping->backing_dev_info = &shmem_backing_dev_info;
@@ -1362,6 +1372,18 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 			mpol_shared_policy_init(&info->policy, NULL);
 			break;
 		}
+
+		/* inum 0 and 1 are unused */
+		mutex_lock(&sbinfo->idr_lock);
+		ino = idr_alloc(&sbinfo->idr, inode, 2, INT_MAX, GFP_NOFS);
+		if (ino > 0) {
+			inode->i_ino = ino;
+			mutex_unlock(&sbinfo->idr_lock);
+		} else {
+			mutex_unlock(&sbinfo->idr_lock);
+			iput(inode);	/* shmem_free_inode() will be called */
+			inode = NULL;
+		}
 	} else
 		shmem_free_inode(sb);
 	return inode;
@@ -2385,7 +2407,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
 				goto bad_val;
 		} else if (!strcmp(this_char,"nr_inodes")) {
 			sbinfo->max_inodes = memparse(value, &rest);
-			if (*rest)
+			if (*rest || sbinfo->max_inodes < 2)
 				goto bad_val;
 		} else if (!strcmp(this_char,"mode")) {
 			if (remount)
@@ -2438,7 +2460,7 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 	struct shmem_sb_info config = *sbinfo;
-	unsigned long inodes;
+	int inodes;
 	int error = -EINVAL;
 
 	config.mpol = NULL;
@@ -2486,7 +2508,7 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 		seq_printf(seq, ",size=%luk",
 			sbinfo->max_blocks << (PAGE_CACHE_SHIFT - 10));
 	if (sbinfo->max_inodes != shmem_default_max_inodes())
-		seq_printf(seq, ",nr_inodes=%lu", sbinfo->max_inodes);
+		seq_printf(seq, ",nr_inodes=%d", sbinfo->max_inodes);
 	if (sbinfo->mode != (S_IRWXUGO | S_ISVTX))
 		seq_printf(seq, ",mode=%03ho", sbinfo->mode);
 	if (!uid_eq(sbinfo->uid, GLOBAL_ROOT_UID))
@@ -2504,6 +2526,7 @@ static void shmem_put_super(struct super_block *sb)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 
+	idr_destroy(&sbinfo->idr);
 	percpu_counter_destroy(&sbinfo->used_blocks);
 	mpol_put(sbinfo->mpol);
 	kfree(sbinfo);
@@ -2522,6 +2545,8 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent)
 	if (!sbinfo)
 		return -ENOMEM;
 
+	mutex_init(&sbinfo->idr_lock);
+	idr_init(&sbinfo->idr);
 	sbinfo->mode = S_IRWXUGO | S_ISVTX;
 	sbinfo->uid = current_fsuid();
 	sbinfo->gid = current_fsgid();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH v3 2/2] tmpfs: refine a file handle for NFS-exporting
  2014-06-01 16:18           ` [RFC PATCH v3 0/2] the uniquness of tmpfs inode-number J. R. Okajima
  2014-06-01 16:18             ` [RFC PATCH v3 1/2] tmpfs: manage the inode-number by IDR, signed int inum J. R. Okajima
@ 2014-06-01 16:18             ` J. R. Okajima
  1 sibling, 0 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-06-01 16:18 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

The inode number in tmpfs is always 32bit.
It is unnecessary to put 64bit into NFS handle, and this commit puts
only 32bit.
While it is also possible to replace ilookup5() by idr_find(), I don't
want to confirm and maintain i_state. So let's keep ilookup5().

Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 mm/shmem.c |   34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 0023d8d..04b399b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1379,6 +1379,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 		if (ino > 0) {
 			inode->i_ino = ino;
 			mutex_unlock(&sbinfo->idr_lock);
+			__insert_inode_hash(inode, inode->i_ino);
 		} else {
 			mutex_unlock(&sbinfo->idr_lock);
 			iput(inode);	/* shmem_free_inode() will be called */
@@ -2288,8 +2289,7 @@ static struct dentry *shmem_get_parent(struct dentry *child)
 static int shmem_match(struct inode *ino, void *vfh)
 {
 	__u32 *fh = vfh;
-	__u64 inum = fh[2];
-	inum = (inum << 32) | fh[1];
+	__u64 inum = fh[1];
 	return ino->i_ino == inum && fh[0] == ino->i_generation;
 }
 
@@ -2300,14 +2300,11 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
 	struct dentry *dentry = NULL;
 	u64 inum;
 
-	if (fh_len < 3)
+	if (fh_len < 2)
 		return NULL;
 
-	inum = fid->raw[2];
-	inum = (inum << 32) | fid->raw[1];
-
-	inode = ilookup5(sb, (unsigned long)(inum + fid->raw[0]),
-			shmem_match, fid->raw);
+	inum = fid->raw[1];
+	inode = ilookup5(sb, inum, shmem_match, fid->raw);
 	if (inode) {
 		dentry = d_find_alias(inode);
 		iput(inode);
@@ -2319,30 +2316,15 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
 static int shmem_encode_fh(struct inode *inode, __u32 *fh, int *len,
 				struct inode *parent)
 {
-	if (*len < 3) {
-		*len = 3;
+	if (*len < 2) {
+		*len = 2;
 		return FILEID_INVALID;
 	}
 
-	if (inode_unhashed(inode)) {
-		/* Unfortunately insert_inode_hash is not idempotent,
-		 * so as we hash inodes here rather than at creation
-		 * time, we need a lock to ensure we only try
-		 * to do it once
-		 */
-		static DEFINE_SPINLOCK(lock);
-		spin_lock(&lock);
-		if (inode_unhashed(inode))
-			__insert_inode_hash(inode,
-					    inode->i_ino + inode->i_generation);
-		spin_unlock(&lock);
-	}
-
 	fh[0] = inode->i_generation;
 	fh[1] = inode->i_ino;
-	fh[2] = ((__u64)inode->i_ino) >> 32;
 
-	*len = 3;
+	*len = 2;
 	return 1;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v3 1/2] tmpfs: manage the inode-number by IDR, signed int inum
  2014-06-01 16:18             ` [RFC PATCH v3 1/2] tmpfs: manage the inode-number by IDR, signed int inum J. R. Okajima
@ 2014-06-03  9:04               ` Jan Kara
  2014-06-03 14:36                 ` J. R. Okajima
  2014-06-05 12:27                 ` [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure) J. R. Okajima
  0 siblings, 2 replies; 26+ messages in thread
From: Jan Kara @ 2014-06-03  9:04 UTC (permalink / raw)
  To: J. R. Okajima
  Cc: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

On Mon 02-06-14 01:18:56, J. R. Okajima wrote:
> To ensure the uniquness of the inode-number, manage it by IDR.
> Also it tries using the lowest unused inode-number, so the value will
> usually be smaller.
> Another side effect is the type of the inode-number in tmpfs. By using
> IDR, it is limited to signed int. But I don't think it a big
> problem. INT_MAX is big enough for the number of inodes in a single tmpfs.
  What I'm missing with this patch is some quantification of costs this
change has - i.e., it's surely going to cost us some performance. Can you
measure how much? I think measuring creation and deletion of lots of empty
files from 1, 2, 4, 8, .. NR_CPU processes (each process in a separate dir
to avoid contention on i_mutex) in tmpfs would make sense.

One correctness nit below as well.

> diff --git a/mm/shmem.c b/mm/shmem.c
> index 368f314..0023d8d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -107,9 +107,13 @@ static unsigned long shmem_default_max_blocks(void)
>  	return totalram_pages / 2;
>  }
>  
> -static unsigned long shmem_default_max_inodes(void)
> +static int shmem_default_max_inodes(void)
>  {
> -	return min(totalram_pages - totalhigh_pages, totalram_pages / 2);
> +	int i;
> +
> +	i = min(totalram_pages - totalhigh_pages, totalram_pages / 2);
> +	i = min(i, INT_MAX);
  With i being 'int' this doesn't make much sense... I guess it should be
unsigned long.

> +	return i;
>  }
>  #endif

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v3 1/2] tmpfs: manage the inode-number by IDR, signed int inum
  2014-06-03  9:04               ` Jan Kara
@ 2014-06-03 14:36                 ` J. R. Okajima
  2014-06-05 12:27                 ` [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure) J. R. Okajima
  1 sibling, 0 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-06-03 14:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger


Jan Kara:
>   What I'm missing with this patch is some quantification of costs this
> change has - i.e., it's surely going to cost us some performance. Can you
> measure how much? I think measuring creation and deletion of lots of empty
> files from 1, 2, 4, 8, .. NR_CPU processes (each process in a separate dir
> to avoid contention on i_mutex) in tmpfs would make sense.

Good point.
I will try.


> One correctness nit below as well.

Thank you very much.


J. R. Okajima

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure)
  2014-06-03  9:04               ` Jan Kara
  2014-06-03 14:36                 ` J. R. Okajima
@ 2014-06-05 12:27                 ` J. R. Okajima
  2014-06-05 12:27                   ` [RFC PATCH v4 1/2] tmpfs: manage the inode-number by IDR, signed int inum J. R. Okajima
  2014-06-05 12:27                   ` [RFC PATCH v4 2/2] tmpfs: refine a file handle for NFS-exporting J. R. Okajima
  1 sibling, 2 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-06-05 12:27 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

Changes from v3:
- add the effect on performance into the commit log.
- in calculating the default number of inodes, replace int by unsigned
  long. pointed out by Jan Kara <jack@suse.cz>.
- [2/2] "tmpfs: refine a file handle for NFS-exporting" is unchanged.

J. R. Okajima (2):
  tmpfs: manage the inode-number by IDR, signed int inum
  tmpfs: refine a file handle for NFS-exporting

 include/linux/shmem_fs.h |    6 ++--
 mm/shmem.c               |   71 +++++++++++++++++++++++++---------------------
 2 files changed, 43 insertions(+), 34 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH v4 1/2] tmpfs: manage the inode-number by IDR, signed int inum
  2014-06-05 12:27                 ` [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure) J. R. Okajima
@ 2014-06-05 12:27                   ` J. R. Okajima
  2014-06-05 12:27                   ` [RFC PATCH v4 2/2] tmpfs: refine a file handle for NFS-exporting J. R. Okajima
  1 sibling, 0 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-06-05 12:27 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

To ensure the uniquness of the inode-number, manage it by IDR.
Also it tries using the lowest unused inode-number, so the value will
usually be smaller.
Another side effect is the type of the inode-number in tmpfs. By using
IDR, it is limited to signed int. But I don't think it a big
problem. INT_MAX is big enough for the number of inodes in a single tmpfs.

Comparision on performance:
- test program: see below
- version: 3.15.0-rc7
- before this commit
  1 procs, 1048575/1048575 file, do unlink, 43.023 secs (usr 1.029 + sys 40.981)
  2 procs, 1048574/1048574 file, do unlink, 24.047 secs (usr 1.048 + sys 45.886)
  1 procs, 524286/524286 file, do unlink, 21.476 secs (usr 0.529 + sys 20.441)
  2 procs, 524286/524286 file, do unlink, 12.029 secs (usr 0.554 + sys 22.880)
  1 procs, 32766/32766 file, do unlink, 1.345 secs (usr 0.035 + sys 1.279)
  2 procs, 32766/32766 file, do unlink, 0.753 secs (usr 0.030 + sys 1.439)
- after this commit
  1 procs, 1048575/1048575 file, do unlink, 45.178 secs (usr 1.183 + sys 43.005)
  2 procs, 1048574/1048574 file, do unlink, 25.328 secs (usr 1.126 + sys 48.481)
  1 procs, 524286/524286 file, do unlink, 22.668 secs (usr 0.367 + sys 21.806)
  2 procs, 524286/524286 file, do unlink, 12.639 secs (usr 0.591 + sys 24.137)
  1 procs, 32766/32766 file, do unlink, 1.414 secs (usr 0.028 + sys 1.356)
  2 procs, 32766/32766 file, do unlink, 0.787 secs (usr 0.036 + sys 1.500)

The overhead surely exists, but looks around 5% or less.

Test prorams.
------- tmpfs-idr.sh -------
#!/bin/sh

set -eu

f() # dir [opts]
{
	local dir=$1
	shift
	seq $(getconf _NPROCESSORS_ONLN) |
	while read ncpu
	do
		seq 1 |
		while read do_unlink
		do
			sudo mount -v -t tmpfs $@ tmpfs $dir
			#stat -f $dir
			free_inodes=$(stat -f -c %d $dir)
			/tmp/tmpfs-idr $dir $ncpu $free_inodes $do_unlink
			sudo umount $dir
		done
	done
}

dir=/tmp/tmpfs-$$
mkdir $dir
uname -a
free -m
#f $dir -o size=50%,nr_inodes=$((0x7fffffff))
#f $dir -o size=50%,nr_inodes=$((0x07ffffff))
#f $dir -o size=50%,nr_inodes=$((0x007fffff))
f $dir -o size=50%,nr_inodes=$((0x00100000))
f $dir -o size=50%,nr_inodes=$((0x0007ffff))
f $dir -o size=50%,nr_inodes=$((0x00007fff))

rm -fr $dir
------- tmpfs-idr.c -------
#define _GNU_SOURCE
#include <pthread.h>

#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

#ifndef O_PATH
#define O_PATH		010000000
#endif

pthread_barrier_t barrier;
int rootfd, nproc, nfile, do_unlink;

static int argton(char *s)
{
	long l;

	errno = 0;
	l = strtol(s, NULL, 0);
	assert(!((l == LONG_MIN || l == LONG_MAX)
		 && errno));
	assert(l >= 0);

	return l;
}

void *f(void *arg)
{
	int err, dirfd, fd, i;
	char a[16];
	int id = (long)arg;

	snprintf(a, sizeof(a), "%d", id);
	err = mkdirat(rootfd, a, 0755);
	assert(!err);
	dirfd = openat(rootfd, a, O_RDONLY | O_PATH);
	assert(dirfd >= 0);

	err = pthread_barrier_wait(&barrier);
	assert(!err || err == PTHREAD_BARRIER_SERIAL_THREAD);

	for (i = 0; i < nfile; i++) {
		snprintf(a, sizeof(a), "%d", i);
		fd = openat(dirfd, a, O_CREAT | O_WRONLY);
		if (fd >= 0) {
			if (do_unlink)
				unlinkat(dirfd, a, /*flags*/ 0);
			close(fd);
		} else
			break;
	}

	return (void *)(long)i;
}

struct perf {
	struct timespec ts;
	struct rusage ru;
};

void perf(struct perf *perf)
{
	clock_gettime(CLOCK_MONOTONIC, &perf->ts);
	getrusage(RUSAGE_SELF, &perf->ru);
}

void ts_subtract(struct timespec *ans, struct timespec *a, struct timespec *b)
{
	ans->tv_sec = a->tv_sec - b->tv_sec;
	ans->tv_nsec = a->tv_nsec - b->tv_nsec;
	if (ans->tv_nsec < 0) {
		ans->tv_sec--;
		ans->tv_nsec += 1000000000;
	}
}

void tv_subtract(struct timeval *ans, struct timeval *a, struct timeval *b)
{
	ans->tv_sec = a->tv_sec - b->tv_sec;
	ans->tv_usec = a->tv_usec - b->tv_usec;
	if (ans->tv_usec < 0) {
		ans->tv_sec--;
		ans->tv_usec += 1000000;
	}
}

#define MAX_NPROC 16
void run(void)
{
	int err, i, n;
	struct {
		pthread_t th;
		void *p;
	} b[MAX_NPROC];
	struct perf s[3];

	err = pthread_barrier_init(&barrier, NULL, nproc + 1);
	assert(!err);
	for (i = 0; i < nproc; i++) {
		err = pthread_create(&b[i].th, NULL, f, (void *)(long)i);
		assert(!err);
	}

	perf(s + 0);
	err = pthread_barrier_wait(&barrier);
	assert(!err || err == PTHREAD_BARRIER_SERIAL_THREAD);

	for (i = 0; i < nproc; i++)
		pthread_join(b[i].th, &b[i].p);
	perf(s + 1);

	n = 0;
	for (i = 0; i < nproc; i++)
		n += (long)b[i].p;

	ts_subtract(&s[2].ts, &s[1].ts, &s[0].ts);
	tv_subtract(&s[2].ru.ru_utime, &s[1].ru.ru_utime, &s[0].ru.ru_utime);
	tv_subtract(&s[2].ru.ru_stime, &s[1].ru.ru_stime, &s[0].ru.ru_stime);

	printf("%d procs, %d/%d file, %s unlink, %lu.%03ld secs"
	       " (usr %lu.%03ld + sys %lu.%03ld)\n",
	       nproc, n, nfile * nproc, do_unlink ? "do" : "no",
	       s[2].ts.tv_sec, s[2].ts.tv_nsec / 1000000,
	       s[2].ru.ru_utime.tv_sec, s[2].ru.ru_utime.tv_usec / 1000,
	       s[2].ru.ru_stime.tv_sec, s[2].ru.ru_stime.tv_usec / 1000);
}

int main(int argc, char *argv[])
{
	rootfd = open(argv[1], O_RDONLY | O_PATH);
	assert(rootfd >= 0);
	nproc = argton(argv[2]);
	assert(nproc < MAX_NPROC);
	nfile = argton(argv[3]);
	nfile /= nproc;
	do_unlink = argton(argv[4]);
	run();

	return 0;
}

/*
 * Local variables: ;
 * compile-command: "gcc -g -Wall -UNDEBUG -pthread -o /tmp/tmpfs-idr tmpfs-idr.c -lrt";
 * End: ;
 */
----------------------------------------

Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 include/linux/shmem_fs.h |    6 ++++--
 mm/shmem.c               |   37 +++++++++++++++++++++++++++++++------
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 4d1771c..4ba8b43 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -24,10 +24,12 @@ struct shmem_inode_info {
 };
 
 struct shmem_sb_info {
+	struct mutex idr_lock;
+	struct idr idr;		    /* manages inode-number */
 	unsigned long max_blocks;   /* How many blocks are allowed */
 	struct percpu_counter used_blocks;  /* How many are allocated */
-	unsigned long max_inodes;   /* How many inodes are allowed */
-	unsigned long free_inodes;  /* How many are left for allocation */
+	int max_inodes;		    /* How many inodes are allowed */
+	int free_inodes;	    /* How many are left for allocation */
 	spinlock_t stat_lock;	    /* Serialize shmem_sb_info changes */
 	kuid_t uid;		    /* Mount uid for root directory */
 	kgid_t gid;		    /* Mount gid for root directory */
diff --git a/mm/shmem.c b/mm/shmem.c
index 368f314..3ac613d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -107,9 +107,13 @@ static unsigned long shmem_default_max_blocks(void)
 	return totalram_pages / 2;
 }
 
-static unsigned long shmem_default_max_inodes(void)
+static int shmem_default_max_inodes(void)
 {
-	return min(totalram_pages - totalhigh_pages, totalram_pages / 2);
+	unsigned long ul;
+
+	ul = INT_MAX;
+	ul = min3(ul, totalram_pages - totalhigh_pages, totalram_pages / 2);
+	return ul;
 }
 #endif
 
@@ -569,6 +573,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -584,6 +589,11 @@ static void shmem_evict_inode(struct inode *inode)
 
 	simple_xattrs_free(&info->xattrs);
 	WARN_ON(inode->i_blocks);
+	if (inode->i_ino) {
+		mutex_lock(&sbinfo->idr_lock);
+		idr_remove(&sbinfo->idr, inode->i_ino);
+		mutex_unlock(&sbinfo->idr_lock);
+	}
 	shmem_free_inode(inode->i_sb);
 	clear_inode(inode);
 }
@@ -1315,13 +1325,13 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 	struct inode *inode;
 	struct shmem_inode_info *info;
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+	int ino;
 
 	if (shmem_reserve_inode(sb))
 		return NULL;
 
 	inode = new_inode(sb);
 	if (inode) {
-		inode->i_ino = get_next_ino();
 		inode_init_owner(inode, dir, mode);
 		inode->i_blocks = 0;
 		inode->i_mapping->backing_dev_info = &shmem_backing_dev_info;
@@ -1362,6 +1372,18 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 			mpol_shared_policy_init(&info->policy, NULL);
 			break;
 		}
+
+		/* inum 0 and 1 are unused */
+		mutex_lock(&sbinfo->idr_lock);
+		ino = idr_alloc(&sbinfo->idr, inode, 2, INT_MAX, GFP_NOFS);
+		if (ino > 0) {
+			inode->i_ino = ino;
+			mutex_unlock(&sbinfo->idr_lock);
+		} else {
+			mutex_unlock(&sbinfo->idr_lock);
+			iput(inode);	/* shmem_free_inode() will be called */
+			inode = NULL;
+		}
 	} else
 		shmem_free_inode(sb);
 	return inode;
@@ -2385,7 +2407,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
 				goto bad_val;
 		} else if (!strcmp(this_char,"nr_inodes")) {
 			sbinfo->max_inodes = memparse(value, &rest);
-			if (*rest)
+			if (*rest || sbinfo->max_inodes < 2)
 				goto bad_val;
 		} else if (!strcmp(this_char,"mode")) {
 			if (remount)
@@ -2438,7 +2460,7 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 	struct shmem_sb_info config = *sbinfo;
-	unsigned long inodes;
+	int inodes;
 	int error = -EINVAL;
 
 	config.mpol = NULL;
@@ -2486,7 +2508,7 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 		seq_printf(seq, ",size=%luk",
 			sbinfo->max_blocks << (PAGE_CACHE_SHIFT - 10));
 	if (sbinfo->max_inodes != shmem_default_max_inodes())
-		seq_printf(seq, ",nr_inodes=%lu", sbinfo->max_inodes);
+		seq_printf(seq, ",nr_inodes=%d", sbinfo->max_inodes);
 	if (sbinfo->mode != (S_IRWXUGO | S_ISVTX))
 		seq_printf(seq, ",mode=%03ho", sbinfo->mode);
 	if (!uid_eq(sbinfo->uid, GLOBAL_ROOT_UID))
@@ -2504,6 +2526,7 @@ static void shmem_put_super(struct super_block *sb)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 
+	idr_destroy(&sbinfo->idr);
 	percpu_counter_destroy(&sbinfo->used_blocks);
 	mpol_put(sbinfo->mpol);
 	kfree(sbinfo);
@@ -2522,6 +2545,8 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent)
 	if (!sbinfo)
 		return -ENOMEM;
 
+	mutex_init(&sbinfo->idr_lock);
+	idr_init(&sbinfo->idr);
 	sbinfo->mode = S_IRWXUGO | S_ISVTX;
 	sbinfo->uid = current_fsuid();
 	sbinfo->gid = current_fsgid();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH v4 2/2] tmpfs: refine a file handle for NFS-exporting
  2014-06-05 12:27                 ` [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure) J. R. Okajima
  2014-06-05 12:27                   ` [RFC PATCH v4 1/2] tmpfs: manage the inode-number by IDR, signed int inum J. R. Okajima
@ 2014-06-05 12:27                   ` J. R. Okajima
  1 sibling, 0 replies; 26+ messages in thread
From: J. R. Okajima @ 2014-06-05 12:27 UTC (permalink / raw)
  To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins,
	Christoph Hellwig, Andreas Dilger, Jan Kara

The inode number in tmpfs is always 32bit.
It is unnecessary to put 64bit into NFS handle, and this commit puts
only 32bit.
While it is also possible to replace ilookup5() by idr_find(), I don't
want to confirm and maintain i_state. So let's keep ilookup5().

Cc: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: J. R. Okajima <hooanon05g@gmail.com>
---
 mm/shmem.c |   34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3ac613d..507444a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1379,6 +1379,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 		if (ino > 0) {
 			inode->i_ino = ino;
 			mutex_unlock(&sbinfo->idr_lock);
+			__insert_inode_hash(inode, inode->i_ino);
 		} else {
 			mutex_unlock(&sbinfo->idr_lock);
 			iput(inode);	/* shmem_free_inode() will be called */
@@ -2288,8 +2289,7 @@ static struct dentry *shmem_get_parent(struct dentry *child)
 static int shmem_match(struct inode *ino, void *vfh)
 {
 	__u32 *fh = vfh;
-	__u64 inum = fh[2];
-	inum = (inum << 32) | fh[1];
+	__u64 inum = fh[1];
 	return ino->i_ino == inum && fh[0] == ino->i_generation;
 }
 
@@ -2300,14 +2300,11 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
 	struct dentry *dentry = NULL;
 	u64 inum;
 
-	if (fh_len < 3)
+	if (fh_len < 2)
 		return NULL;
 
-	inum = fid->raw[2];
-	inum = (inum << 32) | fid->raw[1];
-
-	inode = ilookup5(sb, (unsigned long)(inum + fid->raw[0]),
-			shmem_match, fid->raw);
+	inum = fid->raw[1];
+	inode = ilookup5(sb, inum, shmem_match, fid->raw);
 	if (inode) {
 		dentry = d_find_alias(inode);
 		iput(inode);
@@ -2319,30 +2316,15 @@ static struct dentry *shmem_fh_to_dentry(struct super_block *sb,
 static int shmem_encode_fh(struct inode *inode, __u32 *fh, int *len,
 				struct inode *parent)
 {
-	if (*len < 3) {
-		*len = 3;
+	if (*len < 2) {
+		*len = 2;
 		return FILEID_INVALID;
 	}
 
-	if (inode_unhashed(inode)) {
-		/* Unfortunately insert_inode_hash is not idempotent,
-		 * so as we hash inodes here rather than at creation
-		 * time, we need a lock to ensure we only try
-		 * to do it once
-		 */
-		static DEFINE_SPINLOCK(lock);
-		spin_lock(&lock);
-		if (inode_unhashed(inode))
-			__insert_inode_hash(inode,
-					    inode->i_ino + inode->i_generation);
-		spin_unlock(&lock);
-	}
-
 	fh[0] = inode->i_generation;
 	fh[1] = inode->i_ino;
-	fh[2] = ((__u64)inode->i_ino) >> 32;
 
-	*len = 3;
+	*len = 2;
 	return 1;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-06-05 12:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-21 18:48 [RFC 0/3] vfs: get_next_ino(), never inum=0 and uniqueness hooanon05g
2014-05-21 18:48 ` [RFC 1/3] vfs: get_next_ino(), never inum=0 hooanon05g
2014-05-28  4:28   ` Hugh Dickins
2014-05-28  5:53     ` Eric Dumazet
2014-05-21 18:48 ` [RFC 2/3] vfs: get_next_ino(), support for the uniqueness hooanon05g
2014-05-22 11:56   ` Jan Kara
2014-05-22 15:03     ` J. R. Okajima
2014-05-22 15:12       ` Jan Kara
2014-05-22 15:14         ` Christoph Hellwig
2014-05-22 16:06           ` Jan Kara
2014-05-29 15:46           ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
2014-05-29 15:46             ` [PATCH v2 2/2] tmpfs: refine a file handle for NFS-exporting J. R. Okajima
2014-05-31  2:43             ` [PATCH v2 1/2] tmpfs: manage the inode-number by IDR J. R. Okajima
2014-06-01 16:18           ` [RFC PATCH v3 0/2] the uniquness of tmpfs inode-number J. R. Okajima
2014-06-01 16:18             ` [RFC PATCH v3 1/2] tmpfs: manage the inode-number by IDR, signed int inum J. R. Okajima
2014-06-03  9:04               ` Jan Kara
2014-06-03 14:36                 ` J. R. Okajima
2014-06-05 12:27                 ` [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure) J. R. Okajima
2014-06-05 12:27                   ` [RFC PATCH v4 1/2] tmpfs: manage the inode-number by IDR, signed int inum J. R. Okajima
2014-06-05 12:27                   ` [RFC PATCH v4 2/2] tmpfs: refine a file handle for NFS-exporting J. R. Okajima
2014-06-01 16:18             ` [RFC PATCH v3 " J. R. Okajima
2014-05-21 18:49 ` [RFC 3/3] uniqueness of inode number, configfs, debugfs, procfs, ramfs and tmpfs hooanon05g
2014-05-22  1:03   ` J. R. Okajima
2014-05-22 11:53     ` Jan Kara
2014-05-22 14:58       ` J. R. Okajima
2014-05-22 15:09         ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.