All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mmotm 0/5] nilfs2 filesystem updates
@ 2008-12-12  5:16 Ryusuke Konishi
  2008-12-12  5:16 ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Ryusuke Konishi
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-12  5:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-kernel

This is a patch set to update nilfs2 filesystem in mmotm (applicable
to mmotm-2008-Dec-12).

The following bugfix and changes which follow comments offered in the
previous submission:

* fix problems around nilfs2 ioctl which may cause system memory shortage.
* avoid double error handlings brought by nilfs_transaction_end() by
  separating nilfs_transaction_abort() function.
* insert description of gcinode in the source file.
* add a maintainer entry of nilfs2 to the MAINTAINERS file (to apply as
  needed).

Regards,
Ryusuke Konishi
---

 MAINTAINERS           |    7 ++++
 fs/nilfs2/gcinode.c   |   21 +++++++++++++-
 fs/nilfs2/inode.c     |   23 +++++++++------
 fs/nilfs2/ioctl.c     |   75 ++++++++++++++++++++++++++++++-------------------
 fs/nilfs2/mdt.c       |    5 ++-
 fs/nilfs2/namei.c     |   74 +++++++++++++++++++++++++++++++++---------------
 fs/nilfs2/nilfs.h     |    3 +-
 fs/nilfs2/segment.c   |   43 ++++++++++++++++++----------
 fs/nilfs2/super.c     |   11 +------
 fs/nilfs2/the_nilfs.h |    4 +-
 10 files changed, 175 insertions(+), 91 deletions(-)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-12  5:16 [PATCH mmotm 0/5] nilfs2 filesystem updates Ryusuke Konishi
@ 2008-12-12  5:16 ` Ryusuke Konishi
  2008-12-12  5:16   ` [PATCH mmotm 2/5] nilfs2: cleanup nilfs_clear_inode Ryusuke Konishi
  2008-12-12 13:47   ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Andi Kleen
  0 siblings, 2 replies; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-12  5:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-kernel, Ryusuke Konishi

The current memory copy function of nilfs2 ioctl has following
problems:

(1) It tries to allocate 128KB size of memory even for small objects.

(2) Though the function repeatedly tries large memory allocations
    while reducing the size, GFP_NOWAIT flag is not specified.
    This increases the possibility of system memory shortage.

(3) During the retries of (2), verbose warnings are printed
    because _GFP_NOWARN flag is not used for the kmalloc calls.

This will fix these problems.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/ioctl.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 35ba60e..5ba6e4e 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -51,13 +51,20 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
 	if (argv->v_nmembs == 0)
 		return 0;
 
-	for (ksize = KMALLOC_SIZE_MAX; ksize >= KMALLOC_SIZE_MIN; ksize /= 2) {
-		buf = kmalloc(ksize, GFP_NOFS);
-		if (buf != NULL)
-			break;
+	ksize = min_t(unsigned long, argv->v_nmembs * argv->v_size,
+		      KMALLOC_SIZE_MAX);
+	while (ksize >= KMALLOC_SIZE_MIN) {
+		buf = kmalloc(ksize, GFP_NOWAIT | __GFP_NOWARN);
+		if (buf)
+			goto allocated;
+		ksize >>= 1;
 	}
-	if (ksize < KMALLOC_SIZE_MIN)
+	ksize = max_t(size_t, ksize, argv->v_size);
+	buf = kmalloc(ksize, GFP_NOFS);
+	if (unlikely(!buf))
 		return -ENOMEM;
+
+ allocated:
 	maxmembs = ksize / argv->v_size;
 
 	ret = 0;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH mmotm 2/5] nilfs2: cleanup nilfs_clear_inode
  2008-12-12  5:16 ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Ryusuke Konishi
@ 2008-12-12  5:16   ` Ryusuke Konishi
  2008-12-12  5:16     ` [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end Ryusuke Konishi
  2008-12-12 13:47   ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Andi Kleen
  1 sibling, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-12  5:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-kernel, Ryusuke Konishi

This will remove the following unnecessary locks and cleanup code in
nilfs_clear_inode():

- unnecessary protection using nilfs_transaction_begin() and
  nilfs_transaction_end().

- cleanup code of i_dirty list field which is never chained
  when this function is called.

- spinlock used when releasing i_bh field.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/super.c |   11 +----------
 1 files changed, 1 insertions(+), 10 deletions(-)

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 046368b..6d5ca65 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -184,8 +184,6 @@ static inline void nilfs_destroy_inode_cache(void)
 static void nilfs_clear_inode(struct inode *inode)
 {
 	struct nilfs_inode_info *ii = NILFS_I(inode);
-	struct nilfs_transaction_info ti;
-	struct nilfs_sb_info *sbi = NILFS_SB(inode->i_sb);
 
 #ifdef CONFIG_NILFS_POSIX_ACL
 	if (ii->i_acl && ii->i_acl != NILFS_ACL_NOT_CACHED) {
@@ -200,21 +198,14 @@ static void nilfs_clear_inode(struct inode *inode)
 	/*
 	 * Free resources allocated in nilfs_read_inode(), here.
 	 */
-	nilfs_transaction_begin(inode->i_sb, &ti, 0);
-
-	spin_lock(&sbi->s_inode_lock);
-	if (!list_empty(&ii->i_dirty))
-		list_del_init(&ii->i_dirty);
+	BUG_ON(!list_empty(&ii->i_dirty));
 	brelse(ii->i_bh);
 	ii->i_bh = NULL;
-	spin_unlock(&sbi->s_inode_lock);
 
 	if (test_bit(NILFS_I_BMAP, &ii->i_state))
 		nilfs_bmap_clear(ii->i_bmap);
 
 	nilfs_btnode_cache_clear(&ii->i_btnode_cache);
-
-	nilfs_transaction_end(inode->i_sb, 0);
 }
 
 /**
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end
  2008-12-12  5:16   ` [PATCH mmotm 2/5] nilfs2: cleanup nilfs_clear_inode Ryusuke Konishi
@ 2008-12-12  5:16     ` Ryusuke Konishi
  2008-12-12  5:17       ` [PATCH mmotm 4/5] nilfs2: insert explanations in gcinode file Ryusuke Konishi
  2008-12-12  7:50       ` [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end Pekka Enberg
  0 siblings, 2 replies; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-12  5:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-kernel, Ryusuke Konishi, Pekka Enberg

Pekka Enberg pointed out that double error handlings found after
nilfs_transaction_end() can be avoided by separating abort operation:

 OK, I don't understand this. The only way nilfs_transaction_end() can
 fail is if we have NILFS_TI_SYNC set and we fail to construct the
 segment. But why do we want to construct a segment if we don't commit?

 I guess what I'm asking is why don't we have a separate
 nilfs_transaction_abort() function that can't fail for the erroneous
 case to avoid this double error value tracking thing?

This does the separation and renames nilfs_transaction_end() to
nilfs_transaction_commit() for clarification.

Since, some calls of these functions were used just for exclusion
control against the segment constructor, they are replaced with
semaphore operations.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/inode.c     |   23 +++++++++------
 fs/nilfs2/ioctl.c     |   58 ++++++++++++++++++++++----------------
 fs/nilfs2/mdt.c       |    5 ++-
 fs/nilfs2/namei.c     |   74 +++++++++++++++++++++++++++++++++---------------
 fs/nilfs2/nilfs.h     |    3 +-
 fs/nilfs2/segment.c   |   43 ++++++++++++++++++----------
 fs/nilfs2/the_nilfs.h |    4 +-
 7 files changed, 135 insertions(+), 75 deletions(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index b4697d9..e29e329 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -76,7 +76,6 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
 			goto out;
 		err = nilfs_bmap_insert(ii->i_bmap, (unsigned long)blkoff,
 					(unsigned long)bh_result);
-		nilfs_transaction_end(inode->i_sb, !err);
 		if (unlikely(err != 0)) {
 			if (err == -EEXIST) {
 				/*
@@ -99,8 +98,10 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
 					    inode->i_ino);
 				err = -EIO;
 			}
+			nilfs_transaction_abort(inode->i_sb);
 			goto out;
 		}
+		nilfs_transaction_commit(inode->i_sb); /* never fails */
 		/* Error handling should be detailed */
 		set_buffer_new(bh_result);
 		map_bh(bh_result, inode->i_sb, 0); /* dbn must be changed
@@ -196,7 +197,7 @@ static int nilfs_write_begin(struct file *file, struct address_space *mapping,
 	err = block_write_begin(file, mapping, pos, len, flags, pagep,
 				fsdata, nilfs_get_block);
 	if (unlikely(err))
-		nilfs_transaction_end(inode->i_sb, 0);
+		nilfs_transaction_abort(inode->i_sb);
 	return err;
 }
 
@@ -214,7 +215,7 @@ static int nilfs_write_end(struct file *file, struct address_space *mapping,
 	copied = generic_write_end(file, mapping, pos, len, copied, page,
 				   fsdata);
 	nilfs_set_file_dirty(NILFS_SB(inode->i_sb), inode, nr_dirty);
-	err = nilfs_transaction_end(inode->i_sb, 1);
+	err = nilfs_transaction_commit(inode->i_sb);
 	return err ? : copied;
 }
 
@@ -639,7 +640,7 @@ void nilfs_truncate(struct inode *inode)
 		nilfs_set_transaction_flag(NILFS_TI_SYNC);
 
 	nilfs_set_file_dirty(NILFS_SB(sb), inode, 0);
-	nilfs_transaction_end(sb, 1);
+	nilfs_transaction_commit(sb);
 	/* May construct a logical segment and may fail in sync mode.
 	   But truncate has no return value. */
 }
@@ -667,7 +668,7 @@ void nilfs_delete_inode(struct inode *inode)
 	/* nilfs_free_inode() marks inode buffer dirty */
 	if (IS_SYNC(inode))
 		nilfs_set_transaction_flag(NILFS_TI_SYNC);
-	nilfs_transaction_end(sb, 1);
+	nilfs_transaction_commit(sb);
 	/* May construct a logical segment and may fail in sync mode.
 	   But delete_inode has no return value. */
 }
@@ -677,7 +678,7 @@ int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	struct nilfs_transaction_info ti;
 	struct inode *inode = dentry->d_inode;
 	struct super_block *sb = inode->i_sb;
-	int err, err2;
+	int err;
 
 	err = inode_change_ok(inode, iattr);
 	if (err)
@@ -689,8 +690,12 @@ int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	err = inode_setattr(inode, iattr);
 	if (!err && (iattr->ia_valid & ATTR_MODE))
 		err = nilfs_acl_chmod(inode);
-	err2 = nilfs_transaction_end(sb, 1);
-	return err ? : err2;
+	if (likely(!err))
+		err = nilfs_transaction_commit(sb);
+	else
+		nilfs_transaction_abort(sb);
+
+	return err;
 }
 
 int nilfs_load_inode_block(struct nilfs_sb_info *sbi, struct inode *inode,
@@ -815,5 +820,5 @@ void nilfs_dirty_inode(struct inode *inode)
 	nilfs_transaction_begin(inode->i_sb, &ti, 0);
 	if (likely(inode->i_ino != NILFS_SKETCH_INO))
 		nilfs_mark_inode_dirty(inode);
-	nilfs_transaction_end(inode->i_sb, 1); /* never fails */
+	nilfs_transaction_commit(inode->i_sb); /* never fails */
 }
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 5ba6e4e..216bb30 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -116,7 +116,11 @@ static int nilfs_ioctl_change_cpmode(struct inode *inode, struct file *filp,
 	nilfs_transaction_begin(inode->i_sb, &ti, 0);
 	ret = nilfs_cpfile_change_cpmode(
 		cpfile, cpmode.cm_cno, cpmode.cm_mode);
-	nilfs_transaction_end(inode->i_sb, !ret);
+	if (unlikely(ret < 0)) {
+		nilfs_transaction_abort(inode->i_sb);
+		return ret;
+	}
+	nilfs_transaction_commit(inode->i_sb); /* never fails */
 	return ret;
 }
 
@@ -136,7 +140,11 @@ nilfs_ioctl_delete_checkpoint(struct inode *inode, struct file *filp,
 
 	nilfs_transaction_begin(inode->i_sb, &ti, 0);
 	ret = nilfs_cpfile_delete_checkpoint(cpfile, cno);
-	nilfs_transaction_end(inode->i_sb, !ret);
+	if (unlikely(ret < 0)) {
+		nilfs_transaction_abort(inode->i_sb);
+		return ret;
+	}
+	nilfs_transaction_commit(inode->i_sb); /* never fails */
 	return ret;
 }
 
@@ -153,16 +161,17 @@ static int nilfs_ioctl_get_cpinfo(struct inode *inode, struct file *filp,
 {
 	struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
 	struct nilfs_argv argv;
-	struct nilfs_transaction_info ti;
 	int ret;
 
 	if (copy_from_user(&argv, argp, sizeof(argv)))
 		return -EFAULT;
 
-	nilfs_transaction_begin(inode->i_sb, &ti, 0);
+	down_read(&nilfs->ns_segctor_sem);
 	ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
 				    nilfs_ioctl_do_get_cpinfo);
-	nilfs_transaction_end(inode->i_sb, 0);
+	up_read(&nilfs->ns_segctor_sem);
+	if (ret < 0)
+		return ret;
 
 	if (copy_to_user(argp, &argv, sizeof(argv)))
 		ret = -EFAULT;
@@ -172,14 +181,13 @@ static int nilfs_ioctl_get_cpinfo(struct inode *inode, struct file *filp,
 static int nilfs_ioctl_get_cpstat(struct inode *inode, struct file *filp,
 				  unsigned int cmd, void __user *argp)
 {
-	struct inode *cpfile = NILFS_SB(inode->i_sb)->s_nilfs->ns_cpfile;
+	struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
 	struct nilfs_cpstat cpstat;
-	struct nilfs_transaction_info ti;
 	int ret;
 
-	nilfs_transaction_begin(inode->i_sb, &ti, 0);
-	ret = nilfs_cpfile_get_stat(cpfile, &cpstat);
-	nilfs_transaction_end(inode->i_sb, 0);
+	down_read(&nilfs->ns_segctor_sem);
+	ret = nilfs_cpfile_get_stat(nilfs->ns_cpfile, &cpstat);
+	up_read(&nilfs->ns_segctor_sem);
 	if (ret < 0)
 		return ret;
 
@@ -200,16 +208,17 @@ static int nilfs_ioctl_get_suinfo(struct inode *inode, struct file *filp,
 {
 	struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
 	struct nilfs_argv argv;
-	struct nilfs_transaction_info ti;
 	int ret;
 
 	if (copy_from_user(&argv, argp, sizeof(argv)))
 		return -EFAULT;
 
-	nilfs_transaction_begin(inode->i_sb, &ti, 0);
+	down_read(&nilfs->ns_segctor_sem);
 	ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
 				    nilfs_ioctl_do_get_suinfo);
-	nilfs_transaction_end(inode->i_sb, 0);
+	up_read(&nilfs->ns_segctor_sem);
+	if (ret < 0)
+		return ret;
 
 	if (copy_to_user(argp, &argv, sizeof(argv)))
 		ret = -EFAULT;
@@ -219,14 +228,13 @@ static int nilfs_ioctl_get_suinfo(struct inode *inode, struct file *filp,
 static int nilfs_ioctl_get_sustat(struct inode *inode, struct file *filp,
 				  unsigned int cmd, void __user *argp)
 {
-	struct inode *sufile = NILFS_SB(inode->i_sb)->s_nilfs->ns_sufile;
+	struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
 	struct nilfs_sustat sustat;
-	struct nilfs_transaction_info ti;
 	int ret;
 
-	nilfs_transaction_begin(inode->i_sb, &ti, 0);
-	ret = nilfs_sufile_get_stat(sufile, &sustat);
-	nilfs_transaction_end(inode->i_sb, 0);
+	down_read(&nilfs->ns_segctor_sem);
+	ret = nilfs_sufile_get_stat(nilfs->ns_sufile, &sustat);
+	up_read(&nilfs->ns_segctor_sem);
 	if (ret < 0)
 		return ret;
 
@@ -247,16 +255,17 @@ static int nilfs_ioctl_get_vinfo(struct inode *inode, struct file *filp,
 {
 	struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
 	struct nilfs_argv argv;
-	struct nilfs_transaction_info ti;
 	int ret;
 
 	if (copy_from_user(&argv, argp, sizeof(argv)))
 		return -EFAULT;
 
-	nilfs_transaction_begin(inode->i_sb, &ti, 0);
+	down_read(&nilfs->ns_segctor_sem);
 	ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
 				    nilfs_ioctl_do_get_vinfo);
-	nilfs_transaction_end(inode->i_sb, 0);
+	up_read(&nilfs->ns_segctor_sem);
+	if (ret < 0)
+		return ret;
 
 	if (copy_to_user(argp, &argv, sizeof(argv)))
 		ret = -EFAULT;
@@ -291,16 +300,17 @@ static int nilfs_ioctl_get_bdescs(struct inode *inode, struct file *filp,
 {
 	struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
 	struct nilfs_argv argv;
-	struct nilfs_transaction_info ti;
 	int ret;
 
 	if (copy_from_user(&argv, argp, sizeof(argv)))
 		return -EFAULT;
 
-	nilfs_transaction_begin(inode->i_sb, &ti, 0);
+	down_read(&nilfs->ns_segctor_sem);
 	ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
 				    nilfs_ioctl_do_get_bdescs);
-	nilfs_transaction_end(inode->i_sb, 0);
+	up_read(&nilfs->ns_segctor_sem);
+	if (ret < 0)
+		return ret;
 
 	if (copy_to_user(argp, &argv, sizeof(argv)))
 		ret = -EFAULT;
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index 6ab8475..e0a632b 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -123,7 +123,10 @@ static int nilfs_mdt_create_block(struct inode *inode, unsigned long block,
 	brelse(bh);
 
  failed_unlock:
-	nilfs_transaction_end(sb, !err);
+	if (likely(!err))
+		err = nilfs_transaction_commit(sb);
+	else
+		nilfs_transaction_abort(sb);
 	if (writer)
 		nilfs_put_writer(nilfs);
  out:
diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
index 95d1b29..df70dad 100644
--- a/fs/nilfs2/namei.c
+++ b/fs/nilfs2/namei.c
@@ -109,7 +109,7 @@ static int nilfs_create(struct inode *dir, struct dentry *dentry, int mode,
 {
 	struct inode *inode;
 	struct nilfs_transaction_info ti;
-	int err, err2;
+	int err;
 
 	err = nilfs_transaction_begin(dir->i_sb, &ti, 1);
 	if (err)
@@ -123,8 +123,12 @@ static int nilfs_create(struct inode *dir, struct dentry *dentry, int mode,
 		mark_inode_dirty(inode);
 		err = nilfs_add_nondir(dentry, inode);
 	}
-	err2 = nilfs_transaction_end(dir->i_sb, !err);
-	return err ? : err2;
+	if (!err)
+		err = nilfs_transaction_commit(dir->i_sb);
+	else
+		nilfs_transaction_abort(dir->i_sb);
+
+	return err;
 }
 
 static int
@@ -132,7 +136,7 @@ nilfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t rdev)
 {
 	struct inode *inode;
 	struct nilfs_transaction_info ti;
-	int err, err2;
+	int err;
 
 	if (!new_valid_dev(rdev))
 		return -EINVAL;
@@ -147,8 +151,12 @@ nilfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t rdev)
 		mark_inode_dirty(inode);
 		err = nilfs_add_nondir(dentry, inode);
 	}
-	err2 = nilfs_transaction_end(dir->i_sb, !err);
-	return err ? : err2;
+	if (!err)
+		err = nilfs_transaction_commit(dir->i_sb);
+	else
+		nilfs_transaction_abort(dir->i_sb);
+
+	return err;
 }
 
 static int nilfs_symlink(struct inode *dir, struct dentry *dentry,
@@ -158,7 +166,7 @@ static int nilfs_symlink(struct inode *dir, struct dentry *dentry,
 	struct super_block *sb = dir->i_sb;
 	unsigned l = strlen(symname)+1;
 	struct inode *inode;
-	int err, err2;
+	int err;
 
 	if (l > sb->s_blocksize)
 		return -ENAMETOOLONG;
@@ -184,8 +192,12 @@ static int nilfs_symlink(struct inode *dir, struct dentry *dentry,
 
 	err = nilfs_add_nondir(dentry, inode);
 out:
-	err2 = nilfs_transaction_end(dir->i_sb, !err);
-	return err ? : err2;
+	if (!err)
+		err = nilfs_transaction_commit(dir->i_sb);
+	else
+		nilfs_transaction_abort(dir->i_sb);
+
+	return err;
 
 out_fail:
 	inode_dec_link_count(inode);
@@ -198,7 +210,7 @@ static int nilfs_link(struct dentry *old_dentry, struct inode *dir,
 {
 	struct inode *inode = old_dentry->d_inode;
 	struct nilfs_transaction_info ti;
-	int err, err2;
+	int err;
 
 	if (inode->i_nlink >= NILFS_LINK_MAX)
 		return -EMLINK;
@@ -212,15 +224,19 @@ static int nilfs_link(struct dentry *old_dentry, struct inode *dir,
 	atomic_inc(&inode->i_count);
 
 	err = nilfs_add_nondir(dentry, inode);
-	err2 = nilfs_transaction_end(dir->i_sb, !err);
-	return err ? : err2;
+	if (!err)
+		err = nilfs_transaction_commit(dir->i_sb);
+	else
+		nilfs_transaction_abort(dir->i_sb);
+
+	return err;
 }
 
 static int nilfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 {
 	struct inode *inode;
 	struct nilfs_transaction_info ti;
-	int err, err2;
+	int err;
 
 	if (dir->i_nlink >= NILFS_LINK_MAX)
 		return -EMLINK;
@@ -252,8 +268,12 @@ static int nilfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 
 	d_instantiate(dentry, inode);
 out:
-	err2 = nilfs_transaction_end(dir->i_sb, !err);
-	return err ? : err2;
+	if (!err)
+		err = nilfs_transaction_commit(dir->i_sb);
+	else
+		nilfs_transaction_abort(dir->i_sb);
+
+	return err;
 
 out_fail:
 	inode_dec_link_count(inode);
@@ -270,7 +290,7 @@ static int nilfs_unlink(struct inode *dir, struct dentry *dentry)
 	struct nilfs_dir_entry *de;
 	struct page *page;
 	struct nilfs_transaction_info ti;
-	int err, err2;
+	int err;
 
 	err = nilfs_transaction_begin(dir->i_sb, &ti, 0);
 	if (err)
@@ -300,15 +320,19 @@ static int nilfs_unlink(struct inode *dir, struct dentry *dentry)
 	inode_dec_link_count(inode);
 	err = 0;
 out:
-	err2 = nilfs_transaction_end(dir->i_sb, !err);
-	return err ? : err2;
+	if (!err)
+		err = nilfs_transaction_commit(dir->i_sb);
+	else
+		nilfs_transaction_abort(dir->i_sb);
+
+	return err;
 }
 
 static int nilfs_rmdir(struct inode *dir, struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 	struct nilfs_transaction_info ti;
-	int err, err2;
+	int err;
 
 	err = nilfs_transaction_begin(dir->i_sb, &ti, 0);
 	if (err)
@@ -323,8 +347,12 @@ static int nilfs_rmdir(struct inode *dir, struct dentry *dentry)
 			inode_dec_link_count(dir);
 		}
 	}
-	err2 = nilfs_transaction_end(dir->i_sb, !err);
-	return err ? : err2;
+	if (!err)
+		err = nilfs_transaction_commit(dir->i_sb);
+	else
+		nilfs_transaction_abort(dir->i_sb);
+
+	return err;
 }
 
 static int nilfs_rename(struct inode *old_dir, struct dentry *old_dentry,
@@ -404,7 +432,7 @@ static int nilfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		inode_dec_link_count(old_dir);
 	}
 
-	err = nilfs_transaction_end(old_dir->i_sb, 1);
+	err = nilfs_transaction_commit(old_dir->i_sb);
 	return err;
 
 out_dir:
@@ -416,7 +444,7 @@ out_old:
 	kunmap(old_page);
 	page_cache_release(old_page);
 out:
-	nilfs_transaction_end(old_dir->i_sb, 0);
+	nilfs_transaction_abort(old_dir->i_sb);
 	return err;
 }
 
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index c33b8db..30c5341 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -166,7 +166,8 @@ struct nilfs_transaction_info {
 
 int nilfs_transaction_begin(struct super_block *,
 			    struct nilfs_transaction_info *, int);
-int nilfs_transaction_end(struct super_block *, int);
+int nilfs_transaction_commit(struct super_block *);
+void nilfs_transaction_abort(struct super_block *);
 
 static inline void nilfs_set_transaction_flag(unsigned int flag)
 {
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 2b6e44b..182fb26 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -163,8 +163,8 @@ static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
 		else {
 			/*
 			 * If journal_info field is occupied by other FS,
-			 * we save it and restore on nilfs_transaction_end().
-			 * But this should never happen.
+			 * it is saved and will be restored on
+			 * nilfs_transaction_commit().
 			 */
 			printk(KERN_WARNING
 			       "NILFS warning: journal info from a different "
@@ -195,7 +195,7 @@ static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
  *
  * nilfs_transaction_begin() acquires a reader/writer semaphore, called
  * the segment semaphore, to make a segment construction and write tasks
- * exclusive.  The function is used with nilfs_transaction_end() in pairs.
+ * exclusive.  The function is used with nilfs_transaction_commit() in pairs.
  * The region enclosed by these two functions can be nested.  To avoid a
  * deadlock, the semaphore is only acquired or released in the outermost call.
  *
@@ -212,8 +212,6 @@ static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
  *
  * %-ENOMEM - Insufficient memory available.
  *
- * %-ERESTARTSYS - Interrupted
- *
  * %-ENOSPC - No space left on device
  */
 int nilfs_transaction_begin(struct super_block *sb,
@@ -248,16 +246,17 @@ int nilfs_transaction_begin(struct super_block *sb,
 }
 
 /**
- * nilfs_transaction_end - end indivisible file operations.
+ * nilfs_transaction_commit - commit indivisible file operations.
  * @sb: super block
- * @commit: commit flag (0 for no change)
  *
- * nilfs_transaction_end() releases the read semaphore which is
- * acquired by nilfs_transaction_begin(). Its releasing is only done
- * in outermost call of this function. If the nilfs_transaction_info
- * was allocated dynamically, it is given back to a slab cache.
+ * nilfs_transaction_commit() releases the read semaphore which is
+ * acquired by nilfs_transaction_begin(). This is only performed
+ * in outermost call of this function.  If a commit flag is set,
+ * nilfs_transaction_commit() sets a timer to start the segment
+ * constructor.  If a sync flag is set, it starts construction
+ * directly.
  */
-int nilfs_transaction_end(struct super_block *sb, int commit)
+int nilfs_transaction_commit(struct super_block *sb)
 {
 	struct nilfs_transaction_info *ti = current->journal_info;
 	struct nilfs_sb_info *sbi;
@@ -265,9 +264,7 @@ int nilfs_transaction_end(struct super_block *sb, int commit)
 	int err = 0;
 
 	BUG_ON(ti == NULL || ti->ti_magic != NILFS_TI_MAGIC);
-
-	if (commit)
-		ti->ti_flags |= NILFS_TI_COMMIT;
+	ti->ti_flags |= NILFS_TI_COMMIT;
 	if (ti->ti_count > 0) {
 		ti->ti_count--;
 		return 0;
@@ -291,6 +288,22 @@ int nilfs_transaction_end(struct super_block *sb, int commit)
 	return err;
 }
 
+void nilfs_transaction_abort(struct super_block *sb)
+{
+	struct nilfs_transaction_info *ti = current->journal_info;
+
+	BUG_ON(ti == NULL || ti->ti_magic != NILFS_TI_MAGIC);
+	if (ti->ti_count > 0) {
+		ti->ti_count--;
+		return;
+	}
+	up_read(&NILFS_SB(sb)->s_nilfs->ns_segctor_sem);
+
+	current->journal_info = ti->ti_save;
+	if (ti->ti_flags & NILFS_TI_DYNAMIC_ALLOC)
+		kmem_cache_free(nilfs_transaction_cachep, ti);
+}
+
 void nilfs_relax_pressure_in_lock(struct super_block *sb)
 {
 	struct nilfs_sb_info *sbi = NILFS_SB(sb);
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index dee8d83..9cd3c11 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -112,8 +112,8 @@ struct the_nilfs {
 	/*
 	 * Following fields are dedicated to a writable FS-instance.
 	 * Except for the period seeking checkpoint, code outside the segment
-	 * constructor must lock a segment semaphore with transaction_begin()
-	 * and transaction_end(), when accessing these fields.
+	 * constructor must lock a segment semaphore while accessing these
+	 * fields.
 	 * The writable FS-instance is sole during a lifetime of the_nilfs.
 	 */
 	u64			ns_seg_seq;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH mmotm 4/5] nilfs2: insert explanations in gcinode file
  2008-12-12  5:16     ` [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end Ryusuke Konishi
@ 2008-12-12  5:17       ` Ryusuke Konishi
  2008-12-12  5:17         ` [PATCH mmotm 5/5] nilfs2: add maintainer Ryusuke Konishi
  2008-12-12  7:50       ` [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end Pekka Enberg
  1 sibling, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-12  5:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-kernel, Ryusuke Konishi, Joern Engel

The file gcinode.c gives buffer cache functions for on-disk blocks
moved in garbage collection.  Joern Engel has suggested inserting its
explanations in the source file (Message-ID:
<20080917144146.GD8750@logfs.org> and
<20080917224953.GB14644@logfs.org>).

This follows the comment.

Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/gcinode.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index 0013952..77615aa 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -1,5 +1,5 @@
 /*
- * gcinode.c - NILFS memory inode for GC
+ * gcinode.c - dummy inodes to buffer blocks for garbage collection
  *
  * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
  *
@@ -22,6 +22,25 @@
  * Revised by Ryusuke Konishi <ryusuke@osrg.net>.
  *
  */
+/*
+ * This file adds the cache of on-disk blocks to be moved in garbage
+ * collection.  The disk blocks are held with dummy inodes (called
+ * gcinodes), and this file provides lookup function of the dummy
+ * inodes and their buffer read function.
+ *
+ * Since NILFS2 keeps up multiple checkpoints/snapshots accross GC, it
+ * has to treat blocks that belong to a same file but have different
+ * checkpoint numbers.  To avoid interference among generations, dummy
+ * inodes are managed separatly from actual inodes, and their lookup
+ * function (nilfs_gc_iget) is designed to be specified with a
+ * checkpoint number argument as well as an inode number.
+ *
+ * Buffers and pages held by the dummy inodes will be released each
+ * time after they are copied to a new log.  Dirty blocks made on the
+ * current generation and the blocks to be moved by GC never overlap
+ * because the dirty blocks make a new generation; they rather must be
+ * written individually.
+ */
 
 #include <linux/buffer_head.h>
 #include <linux/mpage.h>
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH mmotm 5/5] nilfs2: add maintainer
  2008-12-12  5:17       ` [PATCH mmotm 4/5] nilfs2: insert explanations in gcinode file Ryusuke Konishi
@ 2008-12-12  5:17         ` Ryusuke Konishi
  0 siblings, 0 replies; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-12  5:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-kernel, Ryusuke Konishi

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 MAINTAINERS |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 63d5587..2bf4e81 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3151,6 +3151,13 @@ M:	andi@lisas.de
 L:	netdev@vger.kernel.org
 S:	Maintained
 
+NILFS2 FILESYSTEM
+P:	KONISHI Ryusuke
+M:	konishi.ryusuke@lab.ntt.co.jp
+L:	users@nilfs.org
+W:	http://www.nilfs.org/en/
+S:	Supported
+
 NINJA SCSI-3 / NINJA SCSI-32Bi (16bit/CardBus) PCMCIA SCSI HOST ADAPTER DRIVER
 P:	YOKOTA Hiroshi
 M:	yokota@netlab.is.tsukuba.ac.jp
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end
  2008-12-12  5:16     ` [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end Ryusuke Konishi
  2008-12-12  5:17       ` [PATCH mmotm 4/5] nilfs2: insert explanations in gcinode file Ryusuke Konishi
@ 2008-12-12  7:50       ` Pekka Enberg
  1 sibling, 0 replies; 15+ messages in thread
From: Pekka Enberg @ 2008-12-12  7:50 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Andrew Morton, linux-fsdevel, linux-kernel

Hi Ryusuke,

On Fri, Dec 12, 2008 at 7:16 AM, Ryusuke Konishi
<konishi.ryusuke@lab.ntt.co.jp> wrote:
> Pekka Enberg pointed out that double error handlings found after
> nilfs_transaction_end() can be avoided by separating abort operation:
>
>  OK, I don't understand this. The only way nilfs_transaction_end() can
>  fail is if we have NILFS_TI_SYNC set and we fail to construct the
>  segment. But why do we want to construct a segment if we don't commit?
>
>  I guess what I'm asking is why don't we have a separate
>  nilfs_transaction_abort() function that can't fail for the erroneous
>  case to avoid this double error value tracking thing?
>
> This does the separation and renames nilfs_transaction_end() to
> nilfs_transaction_commit() for clarification.
>
> Since, some calls of these functions were used just for exclusion
> control against the segment constructor, they are replaced with
> semaphore operations.
>
> Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

Nice cleanup!

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-12  5:16 ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Ryusuke Konishi
  2008-12-12  5:16   ` [PATCH mmotm 2/5] nilfs2: cleanup nilfs_clear_inode Ryusuke Konishi
@ 2008-12-12 13:47   ` Andi Kleen
  2008-12-12 18:48     ` Ryusuke Konishi
  1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-12-12 13:47 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Andrew Morton, linux-fsdevel, linux-kernel

Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> writes:

> The current memory copy function of nilfs2 ioctl has following
> problems:

Wouldn't it be easier/faster/more reliable to just use vmalloc for
those allocations?  The only reason to use kmalloc for such large
allocations would be if you want to do direct DMA (but even there are
ways to do this using virtual allocations and scatter-gather)

-Andi
>

-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-12 13:47   ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Andi Kleen
@ 2008-12-12 18:48     ` Ryusuke Konishi
  2008-12-12 20:24       ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-12 18:48 UTC (permalink / raw)
  To: andi; +Cc: konishi.ryusuke, akpm, linux-fsdevel, linux-kernel

Hi, 
On Fri, 12 Dec 2008 14:47:58 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> writes:
> 
> > The current memory copy function of nilfs2 ioctl has following
> > problems:
> 
> Wouldn't it be easier/faster/more reliable to just use vmalloc for
> those allocations?  The only reason to use kmalloc for such large
> allocations would be if you want to do direct DMA (but even there are
> ways to do this using virtual allocations and scatter-gather)
> 
> -Andi

Thanks. Using vmalloc seems worthy of consideration.
I'll try to rewrite and test it to confirm whether it can make the
code simpler and robust enough.

By the way, is there any recommended way to exchange such a large
amount of data items between user space and kernel space?

In the current interface, each data item is copied twice: one is to
the allocated memory from user space (via copy_from_user), and another
is to on-memory structures or to buffers/pages from the allocated
memory.

This looks somehow inefficient.

Regards,
Ryusuke

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-12 18:48     ` Ryusuke Konishi
@ 2008-12-12 20:24       ` Andi Kleen
  2008-12-13  8:29         ` Ryusuke Konishi
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-12-12 20:24 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: andi, akpm, linux-fsdevel, linux-kernel

> By the way, is there any recommended way to exchange such a large
> amount of data items between user space and kernel space?
> 
> In the current interface, each data item is copied twice: one is to
> the allocated memory from user space (via copy_from_user), and another

For such large copies it is better to use multiple smaller (e.g. 4K) 
copy user, that gives better real time preempt latencies. Each cfu has a 
cond_resched(), but only one, not multiple times in the inner loop.

> is to on-memory structures or to buffers/pages from the allocated
> memory.

It depends how performance critical it is.

One way for example is to grab the user pages using get_user_pages()
and then reference those pages directly using kmap().
But you would be at the mercy of the user process not modifying in 
parallel then. Normally it is safer to work from copies in kernel
space to avoid races. As long as it doesn't happen too often a few
copies are also usually not a problem. I wouldn't worry about them
unless you see them prominently in profiler logs.

-Andi


-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-12 20:24       ` Andi Kleen
@ 2008-12-13  8:29         ` Ryusuke Konishi
  2008-12-14 12:08           ` Ryusuke Konishi
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-13  8:29 UTC (permalink / raw)
  To: andi; +Cc: konishi.ryusuke, akpm, linux-fsdevel, linux-kernel

On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> > In the current interface, each data item is copied twice: one is to
> > the allocated memory from user space (via copy_from_user), and another
> 
> For such large copies it is better to use multiple smaller (e.g. 4K) 
> copy user, that gives better real time preempt latencies. Each cfu has a 
> cond_resched(), but only one, not multiple times in the inner loop.

For the function in question, the buffer memory can be divided into a
smaller size (at least to 4K bytes) since the buffer is repeatedly
used for small objects, where the copy_from_user (and a copy_to_user)
is only once in each cycle.

So, just reducing the allocation size of the buffer seems good; it is
likely able to avoid both large preempt latencies and large memory
allocation, which also can leave off the use of vmalloc.

> > is to on-memory structures or to buffers/pages from the allocated
> > memory.
> 
> It depends how performance critical it is.
> 
> One way for example is to grab the user pages using get_user_pages()
> and then reference those pages directly using kmap().
> But you would be at the mercy of the user process not modifying in 
> parallel then. Normally it is safer to work from copies in kernel
> space to avoid races. As long as it doesn't happen too often a few
> copies are also usually not a problem. I wouldn't worry about them
> unless you see them prominently in profiler logs.
> 
> -Andi

I got it.  If need arises, then I'll recall get_user_pages(). At
present, there is likely no need to do like that.

Thank you for the informative advises.

With regards,
Ryusuke

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-13  8:29         ` Ryusuke Konishi
@ 2008-12-14 12:08           ` Ryusuke Konishi
  2008-12-14 15:13             ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-14 12:08 UTC (permalink / raw)
  To: akpm; +Cc: andi, konishi.ryusuke, linux-fsdevel, linux-kernel

This is the revised patch for fixing the following problems of a
memory copy function in nilfs2 ioctl.

(1) It tries to allocate 128KB size of memory even for small objects.

(2) Though the function repeatedly tries large memory allocations
    while reducing the size, GFP_NOWAIT flag is not specified.
    This increases the possibility of system memory shortage.

(3) During the retries of (2), verbose warnings are printed
    because _GFP_NOWARN flag is not used for the kmalloc calls.

The first patch was still doing large allocations by kmalloc which are
repeatedly tried while reducing the size.

Andi Kleen has pointed out that just using vmalloc would be
easy/faster/more reliable, and he also told me that using
copy_from_user for large memory is not good from the viewpoint of
preempt latency:

 On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <andi@firstfloor.org> wrote:
 > > In the current interface, each data item is copied twice: one is to
 > > the allocated memory from user space (via copy_from_user), and another
 >
 > For such large copies it is better to use multiple smaller (e.g. 4K)
 > copy user, that gives better real time preempt latencies. Each cfu has a
 > cond_resched(), but only one, not multiple times in the inner loop.

For the function in question, the size of buffer memory can be reduced
since the buffer is repeatedly used for a number of small objects.  On
the other hand, it may incur large preempt latencies for larger buffer
because a copy_from_user (and a copy_to_user) was applied only once
each cycle.

So, this revision avoids the latency issue as well as fixes the
original problems merely by reducing allocation size of the buffer.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/ioctl.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 35ba60e..23378c3 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -34,8 +34,7 @@
 #include "dat.h"
 
 
-#define KMALLOC_SIZE_MIN	4096	/* 4KB */
-#define KMALLOC_SIZE_MAX	131072	/* 128 KB */
+#define NILFS_IOCTL_KMALLOC_SIZE	8192	/* 8KB */
 
 static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
 				 struct nilfs_argv *argv, int dir,
@@ -51,12 +50,9 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
 	if (argv->v_nmembs == 0)
 		return 0;
 
-	for (ksize = KMALLOC_SIZE_MAX; ksize >= KMALLOC_SIZE_MIN; ksize /= 2) {
-		buf = kmalloc(ksize, GFP_NOFS);
-		if (buf != NULL)
-			break;
-	}
-	if (ksize < KMALLOC_SIZE_MIN)
+	ksize = max_t(size_t, NILFS_IOCTL_KMALLOC_SIZE, argv->v_size);
+	buf = kmalloc(ksize, GFP_NOFS);
+	if (unlikely(!buf))
 		return -ENOMEM;
 	maxmembs = ksize / argv->v_size;
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-14 12:08           ` Ryusuke Konishi
@ 2008-12-14 15:13             ` Andi Kleen
  2008-12-15  6:58               ` Ryusuke Konishi
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2008-12-14 15:13 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: akpm, andi, linux-fsdevel, linux-kernel

> -#define KMALLOC_SIZE_MIN	4096	/* 4KB */
> -#define KMALLOC_SIZE_MAX	131072	/* 128 KB */
> +#define NILFS_IOCTL_KMALLOC_SIZE	8192	/* 8KB */

Better would be if you could go to PAGE_SIZE. order 0 allocations
are typically the fastest / least likely to stall.

Also in this case it's a good idea to use __get_free_pages()
directly, kmalloc tends to be become less efficient at larger
sizes.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-14 15:13             ` Andi Kleen
@ 2008-12-15  6:58               ` Ryusuke Konishi
  2008-12-15 10:17                 ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2008-12-15  6:58 UTC (permalink / raw)
  To: andi; +Cc: konishi.ryusuke, akpm, linux-fsdevel, linux-kernel

On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> > -#define KMALLOC_SIZE_MIN	4096	/* 4KB */
> > -#define KMALLOC_SIZE_MAX	131072	/* 128 KB */
> > +#define NILFS_IOCTL_KMALLOC_SIZE	8192	/* 8KB */
> 
> Better would be if you could go to PAGE_SIZE. order 0 allocations
> are typically the fastest / least likely to stall.
> 
> Also in this case it's a good idea to use __get_free_pages()
> directly, kmalloc tends to be become less efficient at larger
> sizes.

Thanks again.  I've rewritten the patch to use __get_free_pages().

Regards,
Ryusuke Konishi

---
nilfs2: fix problems of memory allocation in ioctl 3

This is another patch for fixing the following problems of a memory
copy function in nilfs2 ioctl:

(1) It tries to allocate 128KB size of memory even for small objects.

(2) Though the function repeatedly tries large memory allocations
    while reducing the size, GFP_NOWAIT flag is not specified.
    This increases the possibility of system memory shortage.

(3) During the retries of (2), verbose warnings are printed
    because _GFP_NOWARN flag is not used for the kmalloc calls.

The first patch was still doing large allocations by kmalloc which are
repeatedly tried while reducing the size.

Andi Kleen told me that using copy_from_user for large memory is not
good from the viewpoint of preempt latency:

 On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <andi@firstfloor.org> wrote:
 > > In the current interface, each data item is copied twice: one is to
 > > the allocated memory from user space (via copy_from_user), and another
 >
 > For such large copies it is better to use multiple smaller (e.g. 4K)
 > copy user, that gives better real time preempt latencies. Each cfu has a
 > cond_resched(), but only one, not multiple times in the inner loop.

He also advised me that:

 On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <andi@firstfloor.org> wrote:
 > Better would be if you could go to PAGE_SIZE. order 0 allocations
 > are typically the fastest / least likely to stall.
 >
 > Also in this case it's a good idea to use __get_free_pages()
 > directly, kmalloc tends to be become less efficient at larger
 > sizes.

For the function in question, the size of buffer memory can be reduced
since the buffer is repeatedly used for a number of small objects.  On
the other hand, it may incur large preempt latencies for larger buffer
because a copy_from_user (and a copy_to_user) was applied only once
each cycle.

With that, this revision uses the order 0 allocations with
__get_free_pages() to fix the original problems.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/ioctl.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 35ba60e..02e91e1 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -34,9 +34,6 @@
 #include "dat.h"
 
 
-#define KMALLOC_SIZE_MIN	4096	/* 4KB */
-#define KMALLOC_SIZE_MAX	131072	/* 128 KB */
-
 static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
 				 struct nilfs_argv *argv, int dir,
 				 ssize_t (*dofunc)(struct the_nilfs *,
@@ -44,21 +41,20 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
 						   void *, size_t, size_t))
 {
 	void *buf;
-	size_t ksize, maxmembs, total, n;
+	size_t maxmembs, total, n;
 	ssize_t nr;
 	int ret, i;
 
 	if (argv->v_nmembs == 0)
 		return 0;
 
-	for (ksize = KMALLOC_SIZE_MAX; ksize >= KMALLOC_SIZE_MIN; ksize /= 2) {
-		buf = kmalloc(ksize, GFP_NOFS);
-		if (buf != NULL)
-			break;
-	}
-	if (ksize < KMALLOC_SIZE_MIN)
+	if (argv->v_size > PAGE_SIZE)
+		return -EINVAL;
+
+	buf = (void *)__get_free_pages(GFP_NOFS, 0);
+	if (unlikely(!buf))
 		return -ENOMEM;
-	maxmembs = ksize / argv->v_size;
+	maxmembs = PAGE_SIZE / argv->v_size;
 
 	ret = 0;
 	total = 0;
@@ -89,7 +85,7 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
 	}
 	argv->v_nmembs = total;
 
-	kfree(buf);
+	free_pages((unsigned long)buf, 0);
 	return ret;
 }
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl
  2008-12-15  6:58               ` Ryusuke Konishi
@ 2008-12-15 10:17                 ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2008-12-15 10:17 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: andi, akpm, linux-fsdevel, linux-kernel

On Mon, Dec 15, 2008 at 03:58:40PM +0900, Ryusuke Konishi wrote:
> On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> > > -#define KMALLOC_SIZE_MIN	4096	/* 4KB */
> > > -#define KMALLOC_SIZE_MAX	131072	/* 128 KB */
> > > +#define NILFS_IOCTL_KMALLOC_SIZE	8192	/* 8KB */
> > 
> > Better would be if you could go to PAGE_SIZE. order 0 allocations
> > are typically the fastest / least likely to stall.
> > 
> > Also in this case it's a good idea to use __get_free_pages()
> > directly, kmalloc tends to be become less efficient at larger
> > sizes.
> 
> Thanks again.  I've rewritten the patch to use __get_free_pages().

Thanks looks good now.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-12-15 10:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-12  5:16 [PATCH mmotm 0/5] nilfs2 filesystem updates Ryusuke Konishi
2008-12-12  5:16 ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Ryusuke Konishi
2008-12-12  5:16   ` [PATCH mmotm 2/5] nilfs2: cleanup nilfs_clear_inode Ryusuke Konishi
2008-12-12  5:16     ` [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end Ryusuke Konishi
2008-12-12  5:17       ` [PATCH mmotm 4/5] nilfs2: insert explanations in gcinode file Ryusuke Konishi
2008-12-12  5:17         ` [PATCH mmotm 5/5] nilfs2: add maintainer Ryusuke Konishi
2008-12-12  7:50       ` [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end Pekka Enberg
2008-12-12 13:47   ` [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl Andi Kleen
2008-12-12 18:48     ` Ryusuke Konishi
2008-12-12 20:24       ` Andi Kleen
2008-12-13  8:29         ` Ryusuke Konishi
2008-12-14 12:08           ` Ryusuke Konishi
2008-12-14 15:13             ` Andi Kleen
2008-12-15  6:58               ` Ryusuke Konishi
2008-12-15 10:17                 ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.