All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] nilfs2: remove own inode hash table
@ 2010-08-22 10:05 Ryusuke Konishi
  2010-08-22 10:05 ` [PATCH 3/6] nilfs2: keep zero value in i_cno except for gc-inodes Ryusuke Konishi
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-22 10:05 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Nick Piggin, Al Viro, Ryusuke Konishi,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

This is a patchset to remove own inode hash table from nilfs.

The current version of nilfs uses inode not only to manage regular
files, directories, symlinks but also for some types of metadata and
for caching file blocks relocated by GC.

The last type of inodes is called gc-inodes, and nilfs keeps them with
an own hash table.

With this patchset, the gc-inodes will be stored in vfs inode cache
like regular inodes, and the own inode hash is deleted.

I hope this would be helpful for the vfs scalability work.

The patchset is also available from "remove-own-inode-hash" branch on:

 git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2.git


My next goal is to remove own inode allocator used to manage metadata
as files, but it still looks to need more effort.

I'd like to queue this for the next merge window if there are no
objections.

Thanks,
Ryusuke Konishi
--
Ryusuke Konishi (6):
      nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes
      nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty
      nilfs2: keep zero value in i_cno except for gc-inodes
      nilfs2: use iget5_locked to get inode
      nilfs2: separate initializer of metadata file inode
      nilfs2: remove own inode hash used for GC

 fs/nilfs2/gcinode.c       |  136 ++++++++------------------------------------
 fs/nilfs2/inode.c         |   64 ++++++++++++++++++++-
 fs/nilfs2/ioctl.c         |   17 +++--
 fs/nilfs2/mdt.c           |   46 +++++++++------
 fs/nilfs2/mdt.h           |    4 +-
 fs/nilfs2/nilfs.h         |    9 +--
 fs/nilfs2/segment.c       |   32 +++++------
 fs/nilfs2/segment.h       |    3 +-
 fs/nilfs2/super.c         |    7 ++
 fs/nilfs2/the_nilfs.c     |    8 +--
 fs/nilfs2/the_nilfs.h     |    7 +--
 include/linux/nilfs2_fs.h |    8 +++
 12 files changed, 164 insertions(+), 177 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes
       [not found] ` <1282471506-29618-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2010-08-22 10:05   ` Ryusuke Konishi
  2010-08-22 10:05   ` [PATCH 2/6] nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty Ryusuke Konishi
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-22 10:05 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Nick Piggin, Al Viro, Ryusuke Konishi,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

The current nilfs_destroy_inode() doesn't handle metadata file inodes
including gc inodes (dummy inodes used for garbage collection).

This allows nilfs_destroy_inode() to destroy inodes of metadata files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
---
 fs/nilfs2/mdt.c   |    2 --
 fs/nilfs2/super.c |    6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index d01aff4..ee943a3 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -577,7 +577,5 @@ void nilfs_mdt_destroy(struct inode *inode)
 		nilfs_palloc_destroy_cache(inode);
 	nilfs_mdt_clear(inode);
 
-	kfree(mdi->mi_bgl); /* kfree(NULL) is safe */
-	kfree(mdi);
 	nilfs_destroy_inode(inode);
 }
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 9222633..6febd32 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -168,6 +168,12 @@ struct inode *nilfs_alloc_inode(struct super_block *sb)
 
 void nilfs_destroy_inode(struct inode *inode)
 {
+	struct nilfs_mdt_info *mdi = NILFS_MDT(inode);
+
+	if (mdi) {
+		kfree(mdi->mi_bgl); /* kfree(NULL) is safe */
+		kfree(mdi);
+	}
 	kmem_cache_free(nilfs_inode_cachep, NILFS_I(inode));
 }
 
-- 
1.6.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty
       [not found] ` <1282471506-29618-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2010-08-22 10:05   ` [PATCH 1/6] nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes Ryusuke Konishi
@ 2010-08-22 10:05   ` Ryusuke Konishi
  2010-08-22 10:05   ` [PATCH 4/6] nilfs2: use iget5_locked to get inode Ryusuke Konishi
  2010-08-22 10:05   ` [PATCH 6/6] nilfs2: remove own inode hash used for GC Ryusuke Konishi
  3 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-22 10:05 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Nick Piggin, Al Viro, Ryusuke Konishi,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

This allows sop->dirty_inode callback function (nilfs_dirty_inode) to
handle metadata file inodes.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
---
 fs/nilfs2/inode.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index eccb2f2..f1750ca 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -808,6 +808,7 @@ int nilfs_mark_inode_dirty(struct inode *inode)
 void nilfs_dirty_inode(struct inode *inode)
 {
 	struct nilfs_transaction_info ti;
+	struct nilfs_mdt_info *mdi = NILFS_MDT(inode);
 
 	if (is_bad_inode(inode)) {
 		nilfs_warning(inode->i_sb, __func__,
@@ -815,6 +816,10 @@ void nilfs_dirty_inode(struct inode *inode)
 		dump_stack();
 		return;
 	}
+	if (mdi) {
+		nilfs_mdt_mark_dirty(inode);
+		return;
+	}
 	nilfs_transaction_begin(inode->i_sb, &ti, 0);
 	nilfs_mark_inode_dirty(inode);
 	nilfs_transaction_commit(inode->i_sb); /* never fails */
-- 
1.6.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] nilfs2: keep zero value in i_cno except for gc-inodes
  2010-08-22 10:05 [PATCH 0/6] nilfs2: remove own inode hash table Ryusuke Konishi
@ 2010-08-22 10:05 ` Ryusuke Konishi
  2010-08-22 10:05 ` [PATCH 5/6] nilfs2: separate initializer of metadata file inode Ryusuke Konishi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-22 10:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Nick Piggin, Al Viro, Ryusuke Konishi, linux-nilfs

On-memory inode structures of nilfs have a member "i_cno" which stores
a checkpoint number related to the inode.  For gc-inodes, this field
indicates version of data each gc-inode caches for GC.  Log writer
temporarily uses "i_cno" to transfer the latest checkpoint number.

This stops the latter use and lets only gc-inodes use it.

The purpose of this patch is to allow the successive change use
"i_cno" for inode lookup.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/segment.c       |   29 ++++++++++++++---------------
 fs/nilfs2/segment.h       |    3 ++-
 include/linux/nilfs2_fs.h |    8 ++++++++
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9fd051a..eee4b22 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -366,8 +366,7 @@ static int nilfs_segctor_reset_segment_buffer(struct nilfs_sc_info *sci)
 
 	if (nilfs_doing_gc())
 		flags = NILFS_SS_GC;
-	err = nilfs_segbuf_reset(segbuf, flags, sci->sc_seg_ctime,
-				 sci->sc_sbi->s_nilfs->ns_cno);
+	err = nilfs_segbuf_reset(segbuf, flags, sci->sc_seg_ctime, sci->sc_cno);
 	if (unlikely(err))
 		return err;
 
@@ -440,17 +439,26 @@ static void nilfs_segctor_end_finfo(struct nilfs_sc_info *sci,
 	struct nilfs_finfo *finfo;
 	struct nilfs_inode_info *ii;
 	struct nilfs_segment_buffer *segbuf;
+	__u64 cno;
 
 	if (sci->sc_blk_cnt == 0)
 		return;
 
 	ii = NILFS_I(inode);
+
+	if (test_bit(NILFS_I_GCINODE, &ii->i_state))
+		cno = ii->i_cno;
+	else if (NILFS_ROOT_METADATA_FILE(inode->i_ino))
+		cno = 0;
+	else
+		cno = sci->sc_cno;
+
 	finfo = nilfs_segctor_map_segsum_entry(sci, &sci->sc_finfo_ptr,
 						 sizeof(*finfo));
 	finfo->fi_ino = cpu_to_le64(inode->i_ino);
 	finfo->fi_nblocks = cpu_to_le32(sci->sc_blk_cnt);
 	finfo->fi_ndatablk = cpu_to_le32(sci->sc_datablk_cnt);
-	finfo->fi_cno = cpu_to_le64(ii->i_cno);
+	finfo->fi_cno = cpu_to_le64(cno);
 
 	segbuf = sci->sc_curseg;
 	segbuf->sb_sum.sumbytes = sci->sc_binfo_ptr.offset +
@@ -1976,7 +1984,6 @@ static int nilfs_segctor_check_in_files(struct nilfs_sc_info *sci,
 					struct nilfs_sb_info *sbi)
 {
 	struct nilfs_inode_info *ii, *n;
-	__u64 cno = sbi->s_nilfs->ns_cno;
 
 	spin_lock(&sbi->s_inode_lock);
  retry:
@@ -2002,7 +2009,6 @@ static int nilfs_segctor_check_in_files(struct nilfs_sc_info *sci,
 				brelse(ibh);
 			goto retry;
 		}
-		ii->i_cno = cno;
 
 		clear_bit(NILFS_I_QUEUED, &ii->i_state);
 		set_bit(NILFS_I_BUSY, &ii->i_state);
@@ -2011,8 +2017,6 @@ static int nilfs_segctor_check_in_files(struct nilfs_sc_info *sci,
 	}
 	spin_unlock(&sbi->s_inode_lock);
 
-	NILFS_I(sbi->s_ifile)->i_cno = cno;
-
 	return 0;
 }
 
@@ -2021,19 +2025,13 @@ static void nilfs_segctor_check_out_files(struct nilfs_sc_info *sci,
 {
 	struct nilfs_transaction_info *ti = current->journal_info;
 	struct nilfs_inode_info *ii, *n;
-	__u64 cno = sbi->s_nilfs->ns_cno;
 
 	spin_lock(&sbi->s_inode_lock);
 	list_for_each_entry_safe(ii, n, &sci->sc_dirty_files, i_dirty) {
 		if (!test_and_clear_bit(NILFS_I_UPDATED, &ii->i_state) ||
-		    test_bit(NILFS_I_DIRTY, &ii->i_state)) {
-			/* The current checkpoint number (=nilfs->ns_cno) is
-			   changed between check-in and check-out only if the
-			   super root is written out.  So, we can update i_cno
-			   for the inodes that remain in the dirty list. */
-			ii->i_cno = cno;
+		    test_bit(NILFS_I_DIRTY, &ii->i_state))
 			continue;
-		}
+
 		clear_bit(NILFS_I_BUSY, &ii->i_state);
 		brelse(ii->i_bh);
 		ii->i_bh = NULL;
@@ -2054,6 +2052,7 @@ static int nilfs_segctor_do_construct(struct nilfs_sc_info *sci, int mode)
 	int err;
 
 	sci->sc_stage.scnt = NILFS_ST_INIT;
+	sci->sc_cno = nilfs->ns_cno;
 
 	err = nilfs_segctor_check_in_files(sci, sbi);
 	if (unlikely(err))
diff --git a/fs/nilfs2/segment.h b/fs/nilfs2/segment.h
index 17c487b..675d932 100644
--- a/fs/nilfs2/segment.h
+++ b/fs/nilfs2/segment.h
@@ -107,6 +107,7 @@ struct nilfs_segsum_pointer {
  * @sc_datablk_cnt: Data block count of a file
  * @sc_nblk_this_inc: Number of blocks included in the current logical segment
  * @sc_seg_ctime: Creation time
+ * @sc_cno: checkpoint number of current log
  * @sc_flags: Internal flags
  * @sc_state_lock: spinlock for sc_state and so on
  * @sc_state: Segctord state flags
@@ -156,7 +157,7 @@ struct nilfs_sc_info {
 	unsigned long		sc_datablk_cnt;
 	unsigned long		sc_nblk_this_inc;
 	time_t			sc_seg_ctime;
-
+	__u64			sc_cno;
 	unsigned long		sc_flags;
 
 	spinlock_t		sc_state_lock;
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index f5487b6..e30bde6 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -270,6 +270,14 @@ struct nilfs_super_block {
 					   segments */
 
 /*
+ * We call DAT, cpfile, and sufile root metadata files.  Inodes of
+ * these files are written in super root block instead of ifile, and
+ * garbage collector doesn't keep any past versions of these files.
+ */
+#define NILFS_ROOT_METADATA_FILE(ino) \
+	((ino) >= NILFS_DAT_INO && (ino) <= NILFS_SUFILE_INO)
+
+/*
  * bytes offset of secondary super block
  */
 #define NILFS_SB2_OFFSET_BYTES(devsize)	((((devsize) >> 12) - 1) << 12)
-- 
1.6.6.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] nilfs2: use iget5_locked to get inode
       [not found] ` <1282471506-29618-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2010-08-22 10:05   ` [PATCH 1/6] nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes Ryusuke Konishi
  2010-08-22 10:05   ` [PATCH 2/6] nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty Ryusuke Konishi
@ 2010-08-22 10:05   ` Ryusuke Konishi
  2010-08-22 10:05   ` [PATCH 6/6] nilfs2: remove own inode hash used for GC Ryusuke Konishi
  3 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-22 10:05 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Nick Piggin, Al Viro, Ryusuke Konishi,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

This uses iget5_locked instead of iget_locked so that gc cache can
look up inodes with an inode number and an optional checkpoint number.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
---
 fs/nilfs2/inode.c |   37 ++++++++++++++++++++++++++++++++++---
 fs/nilfs2/super.c |    1 +
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index f1750ca..6e9df85 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -34,6 +34,11 @@
 #include "cpfile.h"
 #include "ifile.h"
 
+struct nilfs_iget_args {
+	u64 ino;
+	__u64 cno;
+	int for_gc;
+};
 
 /**
  * nilfs_get_block() - get a file block on the filesystem (callback function)
@@ -320,7 +325,6 @@ struct inode *nilfs_new_inode(struct inode *dir, int mode)
 	/* ii->i_file_acl = 0; */
 	/* ii->i_dir_acl = 0; */
 	ii->i_dir_start_lookup = 0;
-	ii->i_cno = 0;
 	nilfs_set_inode_flags(inode);
 	spin_lock(&sbi->s_next_gen_lock);
 	inode->i_generation = sbi->s_next_generation++;
@@ -410,7 +414,6 @@ int nilfs_read_inode_common(struct inode *inode,
 		0 : le32_to_cpu(raw_inode->i_dir_acl);
 #endif
 	ii->i_dir_start_lookup = 0;
-	ii->i_cno = 0;
 	inode->i_generation = le32_to_cpu(raw_inode->i_generation);
 
 	if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
@@ -476,12 +479,40 @@ static int __nilfs_read_inode(struct super_block *sb, unsigned long ino,
 	return err;
 }
 
+static int nilfs_iget_test(struct inode *inode, void *opaque)
+{
+	struct nilfs_iget_args *args = opaque;
+	struct nilfs_inode_info *ii;
+
+	if (args->ino != inode->i_ino)
+		return 0;
+
+	ii = NILFS_I(inode);
+	if (!test_bit(NILFS_I_GCINODE, &ii->i_state))
+		return !args->for_gc;
+
+	return args->for_gc && args->cno == ii->i_cno;
+}
+
+static int nilfs_iget_set(struct inode *inode, void *opaque)
+{
+	struct nilfs_iget_args *args = opaque;
+
+	inode->i_ino = args->ino;
+	if (args->for_gc) {
+		NILFS_I(inode)->i_state = 1 << NILFS_I_GCINODE;
+		NILFS_I(inode)->i_cno = args->cno;
+	}
+	return 0;
+}
+
 struct inode *nilfs_iget(struct super_block *sb, unsigned long ino)
 {
+	struct nilfs_iget_args args = { .ino = ino, .cno = 0, .for_gc = 0 };
 	struct inode *inode;
 	int err;
 
-	inode = iget_locked(sb, ino);
+	inode = iget5_locked(sb, ino, nilfs_iget_test, nilfs_iget_set, &args);
 	if (unlikely(!inode))
 		return ERR_PTR(-ENOMEM);
 	if (!(inode->i_state & I_NEW))
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6febd32..452004b 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -156,6 +156,7 @@ struct inode *nilfs_alloc_inode_common(struct the_nilfs *nilfs)
 		return NULL;
 	ii->i_bh = NULL;
 	ii->i_state = 0;
+	ii->i_cno = 0;
 	ii->vfs_inode.i_version = 1;
 	nilfs_btnode_cache_init(&ii->i_btnode_cache, nilfs->ns_bdi);
 	return &ii->vfs_inode;
-- 
1.6.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] nilfs2: separate initializer of metadata file inode
  2010-08-22 10:05 [PATCH 0/6] nilfs2: remove own inode hash table Ryusuke Konishi
  2010-08-22 10:05 ` [PATCH 3/6] nilfs2: keep zero value in i_cno except for gc-inodes Ryusuke Konishi
@ 2010-08-22 10:05 ` Ryusuke Konishi
       [not found] ` <1282471506-29618-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2010-08-24  7:48 ` [PATCH 0/6] nilfs2: remove own inode hash table Nick Piggin
  3 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-22 10:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Nick Piggin, Al Viro, Ryusuke Konishi, linux-nilfs

This separates a part of initialization code of metadata file inode,
and makes it available from the nilfs iget function that a later patch
will add to.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/gcinode.c |    6 +++++-
 fs/nilfs2/mdt.c     |   44 +++++++++++++++++++++++++++-----------------
 fs/nilfs2/mdt.h     |    4 +++-
 3 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index bed3a78..cd19a37 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -225,10 +225,14 @@ static struct inode *alloc_gcinode(struct the_nilfs *nilfs, ino_t ino,
 	struct inode *inode;
 	struct nilfs_inode_info *ii;
 
-	inode = nilfs_mdt_new_common(nilfs, NULL, ino, GFP_NOFS, 0);
+	inode = nilfs_mdt_new_common(nilfs, NULL, ino);
 	if (!inode)
 		return NULL;
 
+	if (nilfs_mdt_init(inode, nilfs, GFP_NOFS, 0) < 0) {
+		nilfs_destroy_inode(inode);
+		return NULL;
+	}
 	inode->i_op = NULL;
 	inode->i_fop = NULL;
 	inode->i_mapping->a_ops = &def_gcinode_aops;
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index ee943a3..73e5da3 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -439,6 +439,27 @@ static const struct address_space_operations def_mdt_aops = {
 static const struct inode_operations def_mdt_iops;
 static const struct file_operations def_mdt_fops;
 
+
+int nilfs_mdt_init(struct inode *inode, struct the_nilfs *nilfs,
+		   gfp_t gfp_mask, size_t objsz)
+{
+	struct nilfs_mdt_info *mi;
+
+	mi = kzalloc(max(sizeof(*mi), objsz), GFP_NOFS);
+	if (!mi)
+		return -ENOMEM;
+
+	mi->mi_nilfs = nilfs;
+	init_rwsem(&mi->mi_sem);
+	inode->i_private = mi;
+
+	inode->i_mode = S_IFREG;
+	mapping_set_gfp_mask(inode->i_mapping, gfp_mask);
+	inode->i_mapping->backing_dev_info = nilfs->ns_bdi;
+
+	return 0;
+}
+
 /*
  * NILFS2 uses pseudo inodes for meta data files such as DAT, cpfile, sufile,
  * ifile, or gcinodes.  This allows the B-tree code and segment constructor
@@ -454,12 +475,10 @@ static const struct file_operations def_mdt_fops;
  * @nilfs: nilfs object
  * @sb: super block instance the metadata file belongs to
  * @ino: inode number
- * @gfp_mask: gfp mask for data pages
- * @objsz: size of the private object attached to inode->i_private
  */
 struct inode *
 nilfs_mdt_new_common(struct the_nilfs *nilfs, struct super_block *sb,
-		     ino_t ino, gfp_t gfp_mask, size_t objsz)
+		     ino_t ino)
 {
 	struct inode *inode = nilfs_alloc_inode_common(nilfs);
 
@@ -467,15 +486,6 @@ nilfs_mdt_new_common(struct the_nilfs *nilfs, struct super_block *sb,
 		return NULL;
 	else {
 		struct address_space * const mapping = &inode->i_data;
-		struct nilfs_mdt_info *mi;
-
-		mi = kzalloc(max(sizeof(*mi), objsz), GFP_NOFS);
-		if (!mi) {
-			nilfs_destroy_inode(inode);
-			return NULL;
-		}
-		mi->mi_nilfs = nilfs;
-		init_rwsem(&mi->mi_sem);
 
 		inode->i_sb = sb; /* sb may be NULL for some meta data files */
 		inode->i_blkbits = nilfs->ns_blocksize_bits;
@@ -483,8 +493,6 @@ nilfs_mdt_new_common(struct the_nilfs *nilfs, struct super_block *sb,
 		atomic_set(&inode->i_count, 1);
 		inode->i_nlink = 1;
 		inode->i_ino = ino;
-		inode->i_mode = S_IFREG;
-		inode->i_private = mi;
 
 #ifdef INIT_UNUSED_INODE_FIELDS
 		atomic_set(&inode->i_writecount, 0);
@@ -515,9 +523,7 @@ nilfs_mdt_new_common(struct the_nilfs *nilfs, struct super_block *sb,
 
 		mapping->host = NULL;  /* instead of inode */
 		mapping->flags = 0;
-		mapping_set_gfp_mask(mapping, gfp_mask);
 		mapping->assoc_mapping = NULL;
-		mapping->backing_dev_info = nilfs->ns_bdi;
 
 		inode->i_mapping = mapping;
 	}
@@ -530,10 +536,14 @@ struct inode *nilfs_mdt_new(struct the_nilfs *nilfs, struct super_block *sb,
 {
 	struct inode *inode;
 
-	inode = nilfs_mdt_new_common(nilfs, sb, ino, NILFS_MDT_GFP, objsz);
+	inode = nilfs_mdt_new_common(nilfs, sb, ino);
 	if (!inode)
 		return NULL;
 
+	if (nilfs_mdt_init(inode, nilfs, NILFS_MDT_GFP, objsz) < 0) {
+		nilfs_destroy_inode(inode);
+		return NULL;
+	}
 	inode->i_op = &def_mdt_iops;
 	inode->i_fop = &def_mdt_fops;
 	inode->i_mapping->a_ops = &def_mdt_aops;
diff --git a/fs/nilfs2/mdt.h b/fs/nilfs2/mdt.h
index 6c4bbb0..f445602 100644
--- a/fs/nilfs2/mdt.h
+++ b/fs/nilfs2/mdt.h
@@ -76,10 +76,12 @@ int nilfs_mdt_forget_block(struct inode *, unsigned long);
 int nilfs_mdt_mark_block_dirty(struct inode *, unsigned long);
 int nilfs_mdt_fetch_dirty(struct inode *);
 
+int nilfs_mdt_init(struct inode *inode, struct the_nilfs *nilfs,
+		   gfp_t gfp_mask, size_t objsz);
 struct inode *nilfs_mdt_new(struct the_nilfs *, struct super_block *, ino_t,
 			    size_t);
 struct inode *nilfs_mdt_new_common(struct the_nilfs *, struct super_block *,
-				   ino_t, gfp_t, size_t);
+				   ino_t);
 void nilfs_mdt_destroy(struct inode *);
 void nilfs_mdt_set_entry_size(struct inode *, unsigned, unsigned);
 void nilfs_mdt_set_shadow(struct inode *, struct inode *);
-- 
1.6.6.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] nilfs2: remove own inode hash used for GC
       [not found] ` <1282471506-29618-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2010-08-22 10:05   ` [PATCH 4/6] nilfs2: use iget5_locked to get inode Ryusuke Konishi
@ 2010-08-22 10:05   ` Ryusuke Konishi
  3 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-22 10:05 UTC (permalink / raw)
  To: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Nick Piggin, Al Viro, Ryusuke Konishi,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

This uses inode hash function that vfs provides instead of the own
hash table for caching gc inodes.  This finally removes the own inode
hash from nilfs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
---
 fs/nilfs2/gcinode.c   |  140 +++++++++----------------------------------------
 fs/nilfs2/inode.c     |   22 ++++++++
 fs/nilfs2/ioctl.c     |   17 ++++---
 fs/nilfs2/nilfs.h     |    9 ++--
 fs/nilfs2/segment.c   |    3 +-
 fs/nilfs2/the_nilfs.c |    8 +---
 fs/nilfs2/the_nilfs.h |    7 +--
 7 files changed, 64 insertions(+), 142 deletions(-)

diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index cd19a37..34f8f84 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -28,13 +28,6 @@
  * gcinodes), and this file provides lookup function of the dummy
  * inodes and their buffer read function.
  *
- * Since NILFS2 keeps up multiple checkpoints/snapshots across GC, it
- * has to treat blocks that belong to a same file but have different
- * checkpoint numbers.  To avoid interference among generations, dummy
- * inodes are managed separately from actual inodes, and their lookup
- * function (nilfs_gc_iget) is designed to be specified with a
- * checkpoint number argument as well as an inode number.
- *
  * Buffers and pages held by the dummy inodes will be released each
  * time after they are copied to a new log.  Dirty blocks made on the
  * current generation and the blocks to be moved by GC never overlap
@@ -180,124 +173,41 @@ int nilfs_gccache_wait_and_mark_dirty(struct buffer_head *bh)
 	return 0;
 }
 
-/*
- * nilfs_init_gccache() - allocate and initialize gc_inode hash table
- * @nilfs - the_nilfs
- *
- * Return Value: On success, 0.
- * On error, a negative error code is returned.
- */
-int nilfs_init_gccache(struct the_nilfs *nilfs)
-{
-	int loop;
-
-	BUG_ON(nilfs->ns_gc_inodes_h);
-
-	INIT_LIST_HEAD(&nilfs->ns_gc_inodes);
-
-	nilfs->ns_gc_inodes_h =
-		kmalloc(sizeof(struct hlist_head) * NILFS_GCINODE_HASH_SIZE,
-			GFP_NOFS);
-	if (nilfs->ns_gc_inodes_h == NULL)
-		return -ENOMEM;
-
-	for (loop = 0; loop < NILFS_GCINODE_HASH_SIZE; loop++)
-		INIT_HLIST_HEAD(&nilfs->ns_gc_inodes_h[loop]);
-	return 0;
-}
-
-/*
- * nilfs_destroy_gccache() - free gc_inode hash table
- * @nilfs - the nilfs
- */
-void nilfs_destroy_gccache(struct the_nilfs *nilfs)
+int nilfs_init_gcinode(struct inode *inode)
 {
-	if (nilfs->ns_gc_inodes_h) {
-		nilfs_remove_all_gcinode(nilfs);
-		kfree(nilfs->ns_gc_inodes_h);
-		nilfs->ns_gc_inodes_h = NULL;
-	}
-}
-
-static struct inode *alloc_gcinode(struct the_nilfs *nilfs, ino_t ino,
-				   __u64 cno)
-{
-	struct inode *inode;
-	struct nilfs_inode_info *ii;
-
-	inode = nilfs_mdt_new_common(nilfs, NULL, ino);
-	if (!inode)
-		return NULL;
-
-	if (nilfs_mdt_init(inode, nilfs, GFP_NOFS, 0) < 0) {
-		nilfs_destroy_inode(inode);
-		return NULL;
-	}
-	inode->i_op = NULL;
-	inode->i_fop = NULL;
-	inode->i_mapping->a_ops = &def_gcinode_aops;
-
-	ii = NILFS_I(inode);
-	ii->i_cno = cno;
-	ii->i_flags = 0;
-	ii->i_state = 1 << NILFS_I_GCINODE;
-	ii->i_bh = NULL;
-	nilfs_bmap_init_gc(ii->i_bmap);
+	struct nilfs_inode_info *ii = NILFS_I(inode);
+	struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
+	int ret;
 
-	return inode;
-}
+	ret = nilfs_mdt_init(inode, nilfs, GFP_NOFS, 0);
+	if (!ret) {
+		inode->i_mapping->a_ops = &def_gcinode_aops;
 
-static unsigned long ihash(ino_t ino, __u64 cno)
-{
-	return hash_long((unsigned long)((ino << 2) + cno),
-			 NILFS_GCINODE_HASH_BITS);
-}
+		ii->i_flags = 0;
+		nilfs_bmap_init_gc(ii->i_bmap);
 
-/*
- * nilfs_gc_iget() - find or create gc inode with specified (ino,cno)
- */
-struct inode *nilfs_gc_iget(struct the_nilfs *nilfs, ino_t ino, __u64 cno)
-{
-	struct hlist_head *head = nilfs->ns_gc_inodes_h + ihash(ino, cno);
-	struct hlist_node *node;
-	struct inode *inode;
-
-	hlist_for_each_entry(inode, node, head, i_hash) {
-		if (inode->i_ino == ino && NILFS_I(inode)->i_cno == cno)
-			return inode;
-	}
-
-	inode = alloc_gcinode(nilfs, ino, cno);
-	if (likely(inode)) {
-		hlist_add_head(&inode->i_hash, head);
+		/*
+		 * Add the inode to GC inode list. Garbage Collection
+		 * is serialized and no two processes manipulate the
+		 * list simultaneously.
+		 */
+		igrab(inode);
 		list_add(&NILFS_I(inode)->i_dirty, &nilfs->ns_gc_inodes);
 	}
-	return inode;
-}
-
-/*
- * nilfs_clear_gcinode() - clear and free a gc inode
- */
-void nilfs_clear_gcinode(struct inode *inode)
-{
-	nilfs_mdt_destroy(inode);
+	return ret;
 }
 
-/*
- * nilfs_remove_all_gcinode() - remove all inodes from the_nilfs
+/**
+ * nilfs_remove_all_gcinodes() - remove all unprocessed gc inodes
  */
-void nilfs_remove_all_gcinode(struct the_nilfs *nilfs)
+void nilfs_remove_all_gcinodes(struct the_nilfs *nilfs)
 {
-	struct hlist_head *head = nilfs->ns_gc_inodes_h;
-	struct hlist_node *node, *n;
-	struct inode *inode;
-	int loop;
+	struct list_head *head = &nilfs->ns_gc_inodes;
+	struct nilfs_inode_info *ii;
 
-	for (loop = 0; loop < NILFS_GCINODE_HASH_SIZE; loop++, head++) {
-		hlist_for_each_entry_safe(inode, node, n, head, i_hash) {
-			hlist_del_init(&inode->i_hash);
-			list_del_init(&NILFS_I(inode)->i_dirty);
-			nilfs_clear_gcinode(inode); /* might sleep */
-		}
+	while (!list_empty(head)) {
+		ii = list_first_entry(head, struct nilfs_inode_info, i_dirty);
+		list_del_init(&ii->i_dirty);
+		iput(&ii->vfs_inode);
 	}
 }
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 6e9df85..82cfdbc 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -527,6 +527,28 @@ struct inode *nilfs_iget(struct super_block *sb, unsigned long ino)
 	return inode;
 }
 
+struct inode *nilfs_iget_for_gc(struct super_block *sb, unsigned long ino,
+				__u64 cno)
+{
+	struct nilfs_iget_args args = { .ino = ino, .cno = cno, .for_gc = 1 };
+	struct inode *inode;
+	int err;
+
+	inode = iget5_locked(sb, ino, nilfs_iget_test, nilfs_iget_set, &args);
+	if (unlikely(!inode))
+		return ERR_PTR(-ENOMEM);
+	if (!(inode->i_state & I_NEW))
+		return inode;
+
+	err = nilfs_init_gcinode(inode);
+	if (unlikely(err)) {
+		iget_failed(inode);
+		return ERR_PTR(err);
+	}
+	unlock_new_inode(inode);
+	return inode;
+}
+
 void nilfs_write_inode_common(struct inode *inode,
 			      struct nilfs_inode *raw_inode, int has_bmap)
 {
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index f90a33d..f0a99bd 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -334,7 +334,7 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
 	return 0;
 }
 
-static int nilfs_ioctl_move_blocks(struct the_nilfs *nilfs,
+static int nilfs_ioctl_move_blocks(struct super_block *sb,
 				   struct nilfs_argv *argv, void *buf)
 {
 	size_t nmembs = argv->v_nmembs;
@@ -349,7 +349,7 @@ static int nilfs_ioctl_move_blocks(struct the_nilfs *nilfs,
 	for (i = 0, vdesc = buf; i < nmembs; ) {
 		ino = vdesc->vd_ino;
 		cno = vdesc->vd_cno;
-		inode = nilfs_gc_iget(nilfs, ino, cno);
+		inode = nilfs_iget_for_gc(sb, ino, cno);
 		if (unlikely(inode == NULL)) {
 			ret = -ENOMEM;
 			goto failed;
@@ -357,11 +357,15 @@ static int nilfs_ioctl_move_blocks(struct the_nilfs *nilfs,
 		do {
 			ret = nilfs_ioctl_move_inode_block(inode, vdesc,
 							   &buffers);
-			if (unlikely(ret < 0))
+			if (unlikely(ret < 0)) {
+				iput(inode);
 				goto failed;
+			}
 			vdesc++;
 		} while (++i < nmembs &&
 			 vdesc->vd_ino == ino && vdesc->vd_cno == cno);
+
+		iput(inode); /* The inode still remains in GC inode list */
 	}
 
 	list_for_each_entry_safe(bh, n, &buffers, b_assoc_buffers) {
@@ -567,7 +571,7 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
 	}
 
 	/*
-	 * nilfs_ioctl_move_blocks() will call nilfs_gc_iget(),
+	 * nilfs_ioctl_move_blocks() will call nilfs_iget_for_gc(),
 	 * which will operates an inode list without blocking.
 	 * To protect the list from concurrent operations,
 	 * nilfs_ioctl_move_blocks should be atomic operation.
@@ -577,15 +581,14 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
 		goto out_free;
 	}
 
-	ret = nilfs_ioctl_move_blocks(nilfs, &argv[0], kbufs[0]);
+	ret = nilfs_ioctl_move_blocks(inode->i_sb, &argv[0], kbufs[0]);
 	if (ret < 0)
 		printk(KERN_ERR "NILFS: GC failed during preparation: "
 			"cannot read source blocks: err=%d\n", ret);
 	else
 		ret = nilfs_clean_segments(inode->i_sb, argv, kbufs);
 
-	if (ret < 0)
-		nilfs_remove_all_gcinode(nilfs);
+	nilfs_remove_all_gcinodes(nilfs);
 	clear_nilfs_gc_running(nilfs);
 
 out_free:
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index d3d5404..797cd43 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -248,6 +248,8 @@ extern void nilfs_set_inode_flags(struct inode *);
 extern int nilfs_read_inode_common(struct inode *, struct nilfs_inode *);
 extern void nilfs_write_inode_common(struct inode *, struct nilfs_inode *, int);
 extern struct inode *nilfs_iget(struct super_block *, unsigned long);
+extern struct inode *nilfs_iget_for_gc(struct super_block *sb,
+				       unsigned long ino, __u64 cno);
 extern void nilfs_update_inode(struct inode *, struct buffer_head *);
 extern void nilfs_truncate(struct inode *);
 extern void nilfs_evict_inode(struct inode *);
@@ -292,11 +294,8 @@ int nilfs_gccache_submit_read_data(struct inode *, sector_t, sector_t, __u64,
 int nilfs_gccache_submit_read_node(struct inode *, sector_t, __u64,
 				   struct buffer_head **);
 int nilfs_gccache_wait_and_mark_dirty(struct buffer_head *);
-int nilfs_init_gccache(struct the_nilfs *);
-void nilfs_destroy_gccache(struct the_nilfs *);
-void nilfs_clear_gcinode(struct inode *);
-struct inode *nilfs_gc_iget(struct the_nilfs *, ino_t, __u64);
-void nilfs_remove_all_gcinode(struct the_nilfs *);
+int nilfs_init_gcinode(struct inode *inode);
+void nilfs_remove_all_gcinodes(struct the_nilfs *nilfs);
 
 /* gcdat.c */
 int nilfs_init_gcdat_inode(struct the_nilfs *);
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index eee4b22..9cf7138 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2451,9 +2451,8 @@ nilfs_remove_written_gcinodes(struct the_nilfs *nilfs, struct list_head *head)
 	list_for_each_entry_safe(ii, n, head, i_dirty) {
 		if (!test_bit(NILFS_I_UPDATED, &ii->i_state))
 			continue;
-		hlist_del_init(&ii->vfs_inode.i_hash);
 		list_del_init(&ii->i_dirty);
-		nilfs_clear_gcinode(&ii->vfs_inode);
+		iput(&ii->vfs_inode);
 	}
 }
 
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 6af1c00..fa33ad1 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -87,8 +87,8 @@ static struct the_nilfs *alloc_nilfs(struct block_device *bdev)
 	init_rwsem(&nilfs->ns_writer_sem);
 	INIT_LIST_HEAD(&nilfs->ns_list);
 	INIT_LIST_HEAD(&nilfs->ns_supers);
+	INIT_LIST_HEAD(&nilfs->ns_gc_inodes);
 	spin_lock_init(&nilfs->ns_last_segment_lock);
-	nilfs->ns_gc_inodes_h = NULL;
 	init_rwsem(&nilfs->ns_segctor_sem);
 
 	return nilfs;
@@ -164,7 +164,6 @@ void put_nilfs(struct the_nilfs *nilfs)
 		nilfs_mdt_destroy(nilfs->ns_gc_dat);
 	}
 	if (nilfs_init(nilfs)) {
-		nilfs_destroy_gccache(nilfs);
 		brelse(nilfs->ns_sbh[0]);
 		brelse(nilfs->ns_sbh[1]);
 	}
@@ -735,11 +734,6 @@ int init_nilfs(struct the_nilfs *nilfs, struct nilfs_sb_info *sbi, char *data)
 	if (err)
 		goto failed_sbh;
 
-	/* Initialize gcinode cache */
-	err = nilfs_init_gccache(nilfs);
-	if (err)
-		goto failed_sbh;
-
 	set_nilfs_init(nilfs);
 	err = 0;
  out:
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index f785a7b..c7ecd0c 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -81,7 +81,6 @@ enum {
  * @ns_sufile: segusage file inode
  * @ns_gc_dat: shadow inode of the DAT file inode for GC
  * @ns_gc_inodes: dummy inodes to keep live blocks
- * @ns_gc_inodes_h: hash list to keep dummy inode holding live blocks
  * @ns_blocksize_bits: bit length of block size
  * @ns_blocksize: block size
  * @ns_nsegments: number of segments in filesystem
@@ -165,9 +164,8 @@ struct the_nilfs {
 	struct inode	       *ns_sufile;
 	struct inode	       *ns_gc_dat;
 
-	/* GC inode list and hash table head */
+	/* GC inode list */
 	struct list_head	ns_gc_inodes;
-	struct hlist_head      *ns_gc_inodes_h;
 
 	/* Disk layout information (static) */
 	unsigned int		ns_blocksize_bits;
@@ -182,9 +180,6 @@ struct the_nilfs {
 	u32			ns_crc_seed;
 };
 
-#define NILFS_GCINODE_HASH_BITS		8
-#define NILFS_GCINODE_HASH_SIZE		(1<<NILFS_GCINODE_HASH_BITS)
-
 #define THE_NILFS_FNS(bit, name)					\
 static inline void set_nilfs_##name(struct the_nilfs *nilfs)		\
 {									\
-- 
1.6.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
  2010-08-22 10:05 [PATCH 0/6] nilfs2: remove own inode hash table Ryusuke Konishi
                   ` (2 preceding siblings ...)
       [not found] ` <1282471506-29618-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2010-08-24  7:48 ` Nick Piggin
  2010-08-24  8:30   ` Christoph Hellwig
  2010-08-24 12:19   ` Ryusuke Konishi
  3 siblings, 2 replies; 18+ messages in thread
From: Nick Piggin @ 2010-08-24  7:48 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-fsdevel, Nick Piggin, Al Viro, linux-nilfs

On Sun, Aug 22, 2010 at 07:05:00PM +0900, Ryusuke Konishi wrote:
> This is a patchset to remove own inode hash table from nilfs.
> 
> The current version of nilfs uses inode not only to manage regular
> files, directories, symlinks but also for some types of metadata and
> for caching file blocks relocated by GC.
> 
> The last type of inodes is called gc-inodes, and nilfs keeps them with
> an own hash table.
> 
> With this patchset, the gc-inodes will be stored in vfs inode cache
> like regular inodes, and the own inode hash is deleted.
> 
> I hope this would be helpful for the vfs scalability work.

This looks like a reasonable direction to me, and you would get
more of the benefits of the inode cache scalability work, but this
wasn't the biggest prolem for my series, because I'm not going digging
into the filesystems themselves.

The reason I broke nilfs2 is because it duplicates a lot of the
generic inode initialisation code. This really should go in core
code because nilfs2 does not own the generic inode fields.

It just needs some helper function to do the non-sb related init
stuff for you.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
  2010-08-24  7:48 ` [PATCH 0/6] nilfs2: remove own inode hash table Nick Piggin
@ 2010-08-24  8:30   ` Christoph Hellwig
       [not found]     ` <20100824083027.GA27594-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2010-08-24 12:19   ` Ryusuke Konishi
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2010-08-24  8:30 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ryusuke Konishi, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 24, 2010 at 05:48:44PM +1000, Nick Piggin wrote:
> The reason I broke nilfs2 is because it duplicates a lot of the
> generic inode initialisation code. This really should go in core
> code because nilfs2 does not own the generic inode fields.
> 
> It just needs some helper function to do the non-sb related init
> stuff for you.

No, the problem is much deeper than that.  Having inodes outlive
superblock is fundamentally broken.  All that crap simply needs
to go away.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
  2010-08-24  7:48 ` [PATCH 0/6] nilfs2: remove own inode hash table Nick Piggin
  2010-08-24  8:30   ` Christoph Hellwig
@ 2010-08-24 12:19   ` Ryusuke Konishi
  2010-08-24 12:58     ` Nick Piggin
  2010-08-24 13:24     ` Al Viro
  1 sibling, 2 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-24 12:19 UTC (permalink / raw)
  To: npiggin-tSWWG44O7X1aa/9Udqfwiw
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 24 Aug 2010 17:48:44 +1000, Nick Piggin <npiggin-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> wrote:
> On Sun, Aug 22, 2010 at 07:05:00PM +0900, Ryusuke Konishi wrote:
> > This is a patchset to remove own inode hash table from nilfs.
> > 
> > The current version of nilfs uses inode not only to manage regular
> > files, directories, symlinks but also for some types of metadata and
> > for caching file blocks relocated by GC.
> > 
> > The last type of inodes is called gc-inodes, and nilfs keeps them with
> > an own hash table.
> > 
> > With this patchset, the gc-inodes will be stored in vfs inode cache
> > like regular inodes, and the own inode hash is deleted.
> > 
> > I hope this would be helpful for the vfs scalability work.
> 
> This looks like a reasonable direction to me, and you would get
> more of the benefits of the inode cache scalability work, but this
> wasn't the biggest prolem for my series, because I'm not going digging
> into the filesystems themselves.
> 
> The reason I broke nilfs2 is because it duplicates a lot of the
> generic inode initialisation code. This really should go in core
> code because nilfs2 does not own the generic inode fields.

Yes, that is what I want to do next.
I really want to eliminate the duplication.

> It just needs some helper function to do the non-sb related init
> stuff for you.

I'm seeking the way to make that special type of inodes hold a valid
sb.  If it's not feasible, I think nilfs should not borrow inode
object and should define just enough structure instead.

Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
       [not found]     ` <20100824083027.GA27594-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2010-08-24 12:58       ` Ryusuke Konishi
       [not found]         ` <20100824.215823.121933465.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-24 12:58 UTC (permalink / raw)
  To: hch-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: npiggin-tSWWG44O7X1aa/9Udqfwiw,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 24 Aug 2010 04:30:28 -0400, Christoph Hellwig wrote:
> On Tue, Aug 24, 2010 at 05:48:44PM +1000, Nick Piggin wrote:
> > The reason I broke nilfs2 is because it duplicates a lot of the
> > generic inode initialisation code. This really should go in core
> > code because nilfs2 does not own the generic inode fields.
> > 
> > It just needs some helper function to do the non-sb related init
> > stuff for you.
> 
> No, the problem is much deeper than that.  Having inodes outlive
> superblock is fundamentally broken.  All that crap simply needs
> to go away.

I agree..  I will try to resolve this problem, or will separate three
metadata files outlive superblock if I cannot find a proper way.

Actually, I recently wrote a patchset to unify all super block
instances on one device into one sb, which makes the inodes outlive
superblock eliminable.

But the patchset has a problem that it cannot shrink a number of
versions of namespace when unmounting all snapshots and a current time
instance.

If this implementation issue is avoidable, unifying per-snapshot sb
instances seems an ideal way.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
  2010-08-24 12:19   ` Ryusuke Konishi
@ 2010-08-24 12:58     ` Nick Piggin
  2010-08-24 13:24     ` Al Viro
  1 sibling, 0 replies; 18+ messages in thread
From: Nick Piggin @ 2010-08-24 12:58 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: npiggin, konishi.ryusuke, linux-fsdevel, viro, linux-nilfs

On Tue, Aug 24, 2010 at 09:19:31PM +0900, Ryusuke Konishi wrote:
> On Tue, 24 Aug 2010 17:48:44 +1000, Nick Piggin <npiggin@kernel.dk> wrote:
> > On Sun, Aug 22, 2010 at 07:05:00PM +0900, Ryusuke Konishi wrote:
> > > This is a patchset to remove own inode hash table from nilfs.
> > > 
> > > The current version of nilfs uses inode not only to manage regular
> > > files, directories, symlinks but also for some types of metadata and
> > > for caching file blocks relocated by GC.
> > > 
> > > The last type of inodes is called gc-inodes, and nilfs keeps them with
> > > an own hash table.
> > > 
> > > With this patchset, the gc-inodes will be stored in vfs inode cache
> > > like regular inodes, and the own inode hash is deleted.
> > > 
> > > I hope this would be helpful for the vfs scalability work.
> > 
> > This looks like a reasonable direction to me, and you would get
> > more of the benefits of the inode cache scalability work, but this
> > wasn't the biggest prolem for my series, because I'm not going digging
> > into the filesystems themselves.
> > 
> > The reason I broke nilfs2 is because it duplicates a lot of the
> > generic inode initialisation code. This really should go in core
> > code because nilfs2 does not own the generic inode fields.
> 
> Yes, that is what I want to do next.
> I really want to eliminate the duplication.
> 
> > It just needs some helper function to do the non-sb related init
> > stuff for you.
> 
> I'm seeking the way to make that special type of inodes hold a valid
> sb.  If it's not feasible, I think nilfs should not borrow inode
> object and should define just enough structure instead.

Like Christoph said, inode fundamentally has a sb, so a different
data structure would make sense. You really shouldn't be using vfs
for sb-less inodes, so therefore you should be able to use your
own type for this.

But for vfs scale, that doesn't seem to be a problem, the problem
for me is just where it duplicates inode init stuff.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
  2010-08-24 12:19   ` Ryusuke Konishi
  2010-08-24 12:58     ` Nick Piggin
@ 2010-08-24 13:24     ` Al Viro
       [not found]       ` <20100824132453.GU31363-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2010-08-24 13:24 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: npiggin, konishi.ryusuke, linux-fsdevel, linux-nilfs

On Tue, Aug 24, 2010 at 09:19:31PM +0900, Ryusuke Konishi wrote:
> > It just needs some helper function to do the non-sb related init
> > stuff for you.
> 
> I'm seeking the way to make that special type of inodes hold a valid
> sb.  If it's not feasible, I think nilfs should not borrow inode
> object and should define just enough structure instead.

For fsck sake, just grab an active reference to superblock for the
duration of GC and use _normal_ inodes...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
       [not found]         ` <20100824.215823.121933465.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2010-08-24 15:32           ` Christoph Hellwig
       [not found]             ` <20100824153214.GA25897-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2010-08-24 15:32 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: hch-wEGCiKHe2LqWVfeAwA7xHQ, npiggin-tSWWG44O7X1aa/9Udqfwiw,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 24, 2010 at 09:58:23PM +0900, Ryusuke Konishi wrote:
> Actually, I recently wrote a patchset to unify all super block
> instances on one device into one sb, which makes the inodes outlive
> superblock eliminable.

Yes, that's how we handle things everywhere, including btrfs with
it's subvolumes and snapshots.

> But the patchset has a problem that it cannot shrink a number of
> versions of namespace when unmounting all snapshots and a current time
> instance.

Can you explain the issue in more detail?

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
       [not found]       ` <20100824132453.GU31363-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2010-08-24 15:39         ` Ryusuke Konishi
  0 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-24 15:39 UTC (permalink / raw)
  To: viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn
  Cc: npiggin-tSWWG44O7X1aa/9Udqfwiw,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, Christoph Hellwig,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 24 Aug 2010 14:24:53 +0100, Al Viro wrote:
> On Tue, Aug 24, 2010 at 09:19:31PM +0900, Ryusuke Konishi wrote:
> > > It just needs some helper function to do the non-sb related init
> > > stuff for you.
> > 
> > I'm seeking the way to make that special type of inodes hold a valid
> > sb.  If it's not feasible, I think nilfs should not borrow inode
> > object and should define just enough structure instead.
> 
> For fsck sake, just grab an active reference to superblock for the
> duration of GC and use _normal_ inodes...

GC is only allowed while a read/write mount exists, and the "remove
own inode hash table" patchset already solved the sb-less inode issue
for GC.

The remaining problem is that nilfs allocates different sb per
snapshot instead of increasing reference count of one super block.
(to be precise, identical snapshots or current mounts share the same
sb like regular mount)

Since snapshots or a current filesystem are mounted or unmounted
independently sharing the three metadata files, these metadata files
can live across the lifetime of a certain sb.

All other inodes are binded to an sb.


And, my thought is, in the first place, allocating different sb for a
device is not sane.

Thanks,
Ryusuke Konishi

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
       [not found]             ` <20100824153214.GA25897-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2010-08-24 16:20               ` Ryusuke Konishi
       [not found]                 ` <20100825.012012.179486471.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-24 16:20 UTC (permalink / raw)
  To: hch-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: npiggin-tSWWG44O7X1aa/9Udqfwiw,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 24 Aug 2010 11:32:14 -0400, Christoph Hellwig wrote:
> On Tue, Aug 24, 2010 at 09:58:23PM +0900, Ryusuke Konishi wrote:
> > Actually, I recently wrote a patchset to unify all super block
> > instances on one device into one sb, which makes the inodes outlive
> > superblock eliminable.
> 
> Yes, that's how we handle things everywhere, including btrfs with
> it's subvolumes and snapshots.
> 
> > But the patchset has a problem that it cannot shrink a number of
> > versions of namespace when unmounting all snapshots and a current time
> > instance.
> 
> Can you explain the issue in more detail?

Ok, I'll try.

The current nilfs attaches each snapshot on a mountpoint, instead of
making a sub directory like ".snap/xxx/" in a namespace.

In the patchset, I binded

 a) The root directory of the current filesystem onto sb->s_root and
    mnt->mnt_root.

 b) The root directory of each snapshot onto its mnt->mnt_root.

Then,

When I unmounted all snapshots and current filesystem,
nilfs_put_super() was called, but

generic_shutdown_super() only shrunk the dentry tree of sb->s_root and
didn't shrink dentry trees of snapshots.

Thus their inodes and dentries were left in memory despite my
intention.

So, my implementation issue here is how to free dentries and inodes of
snapshots like shrink_dcache_for_umount(sb) does for a single
namespace.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
       [not found]                 ` <20100825.012012.179486471.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2010-08-24 16:42                   ` Al Viro
  2010-08-24 17:00                     ` Ryusuke Konishi
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2010-08-24 16:42 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: hch-wEGCiKHe2LqWVfeAwA7xHQ, npiggin-tSWWG44O7X1aa/9Udqfwiw,
	konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 25, 2010 at 01:20:12AM +0900, Ryusuke Konishi wrote:
> On Tue, 24 Aug 2010 11:32:14 -0400, Christoph Hellwig wrote:
> > On Tue, Aug 24, 2010 at 09:58:23PM +0900, Ryusuke Konishi wrote:
> > > Actually, I recently wrote a patchset to unify all super block
> > > instances on one device into one sb, which makes the inodes outlive
> > > superblock eliminable.
> > 
> > Yes, that's how we handle things everywhere, including btrfs with
> > it's subvolumes and snapshots.
> > 
> > > But the patchset has a problem that it cannot shrink a number of
> > > versions of namespace when unmounting all snapshots and a current time
> > > instance.
> > 
> > Can you explain the issue in more detail?
> 
> Ok, I'll try.
> 
> The current nilfs attaches each snapshot on a mountpoint, instead of
> making a sub directory like ".snap/xxx/" in a namespace.
> 
> In the patchset, I binded
> 
>  a) The root directory of the current filesystem onto sb->s_root and
>     mnt->mnt_root.
> 
>  b) The root directory of each snapshot onto its mnt->mnt_root.
> 
> Then,
> 
> When I unmounted all snapshots and current filesystem,
> nilfs_put_super() was called, but
> 
> generic_shutdown_super() only shrunk the dentry tree of sb->s_root and
> didn't shrink dentry trees of snapshots.
> 
> Thus their inodes and dentries were left in memory despite my
> intention.
> 
> So, my implementation issue here is how to free dentries and inodes of
> snapshots like shrink_dcache_for_umount(sb) does for a single
> namespace.

Use d_obtain_alias() to allocate roots of unattached subtrees.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/6] nilfs2: remove own inode hash table
  2010-08-24 16:42                   ` Al Viro
@ 2010-08-24 17:00                     ` Ryusuke Konishi
  0 siblings, 0 replies; 18+ messages in thread
From: Ryusuke Konishi @ 2010-08-24 17:00 UTC (permalink / raw)
  To: viro; +Cc: hch, npiggin, konishi.ryusuke, linux-fsdevel, linux-nilfs

On Tue, 24 Aug 2010 17:42:16 +0100, Al Viro wrote:
> On Wed, Aug 25, 2010 at 01:20:12AM +0900, Ryusuke Konishi wrote:
> > So, my implementation issue here is how to free dentries and inodes of
> > snapshots like shrink_dcache_for_umount(sb) does for a single
> > namespace.
> 
> Use d_obtain_alias() to allocate roots of unattached subtrees.

OK, I'll try that.

Thank you!

Ryusuke Konishi

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-08-24 17:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-22 10:05 [PATCH 0/6] nilfs2: remove own inode hash table Ryusuke Konishi
2010-08-22 10:05 ` [PATCH 3/6] nilfs2: keep zero value in i_cno except for gc-inodes Ryusuke Konishi
2010-08-22 10:05 ` [PATCH 5/6] nilfs2: separate initializer of metadata file inode Ryusuke Konishi
     [not found] ` <1282471506-29618-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2010-08-22 10:05   ` [PATCH 1/6] nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes Ryusuke Konishi
2010-08-22 10:05   ` [PATCH 2/6] nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty Ryusuke Konishi
2010-08-22 10:05   ` [PATCH 4/6] nilfs2: use iget5_locked to get inode Ryusuke Konishi
2010-08-22 10:05   ` [PATCH 6/6] nilfs2: remove own inode hash used for GC Ryusuke Konishi
2010-08-24  7:48 ` [PATCH 0/6] nilfs2: remove own inode hash table Nick Piggin
2010-08-24  8:30   ` Christoph Hellwig
     [not found]     ` <20100824083027.GA27594-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2010-08-24 12:58       ` Ryusuke Konishi
     [not found]         ` <20100824.215823.121933465.ryusuke-sG5X7nlA6pw@public.gmane.org>
2010-08-24 15:32           ` Christoph Hellwig
     [not found]             ` <20100824153214.GA25897-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2010-08-24 16:20               ` Ryusuke Konishi
     [not found]                 ` <20100825.012012.179486471.ryusuke-sG5X7nlA6pw@public.gmane.org>
2010-08-24 16:42                   ` Al Viro
2010-08-24 17:00                     ` Ryusuke Konishi
2010-08-24 12:19   ` Ryusuke Konishi
2010-08-24 12:58     ` Nick Piggin
2010-08-24 13:24     ` Al Viro
     [not found]       ` <20100824132453.GU31363-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2010-08-24 15:39         ` Ryusuke Konishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.