All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fs/direct-io.c: Calcuate fs_count correctly in get_more_blocks.
@ 2011-09-19  8:25 Tao Ma
  2011-09-19 22:31 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Tao Ma @ 2011-09-19  8:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Theodore Ts'o, Christoph Hellwig, Al Viro, Andrew Morton

From: Tao Ma <boyu.mt@taobao.com>

In get_more_blocks, we use dio_count to calcuate fs_count and do some
tricky things to increase fs_count if dio_count isn't aligned. But
actually it still has some cornor case that can't be coverd. See the
following example:
./dio_write foo -s 1024 -w 4096(direct write 4096 bytes at offset 1024).
The same goes if the offset isn't aligned to fs_blocksize.

In this case, the old calculation counts fs_count to be 1, but actually
we will write into 2 different blocks(if fs_blocksize=4096). The old code
just works, since it will call get_block twice(and may have to allocate
and create extent twice for file systems like ext4). So we'd better call
get_block just once with the proper fs_count.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/direct-io.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 44a360c..b05f24e 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -569,9 +569,8 @@ static int get_more_blocks(struct dio *dio)
 	int ret;
 	struct buffer_head *map_bh = &dio->map_bh;
 	sector_t fs_startblk;	/* Into file, in filesystem-sized blocks */
+	sector_t fs_endblk;	/* Into file, in filesystem-sized blocks */
 	unsigned long fs_count;	/* Number of filesystem-sized blocks */
-	unsigned long dio_count;/* Number of dio_block-sized blocks */
-	unsigned long blkmask;
 	int create;
 
 	/*
@@ -582,11 +581,8 @@ static int get_more_blocks(struct dio *dio)
 	if (ret == 0) {
 		BUG_ON(dio->block_in_file >= dio->final_block_in_request);
 		fs_startblk = dio->block_in_file >> dio->blkfactor;
-		dio_count = dio->final_block_in_request - dio->block_in_file;
-		fs_count = dio_count >> dio->blkfactor;
-		blkmask = (1 << dio->blkfactor) - 1;
-		if (dio_count & blkmask)	
-			fs_count++;
+		fs_endblk = (dio->final_block_in_request - 1) >> dio->blkfactor;
+		fs_count = fs_endblk - fs_startblk + 1;
 
 		map_bh->b_state = 0;
 		map_bh->b_size = fs_count << dio->inode->i_blkbits;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] fs/direct-io.c: Calcuate fs_count correctly in get_more_blocks.
  2011-09-19  8:25 [PATCH] fs/direct-io.c: Calcuate fs_count correctly in get_more_blocks Tao Ma
@ 2011-09-19 22:31 ` Andrew Morton
  2011-09-20  2:13   ` Tao Ma
  2011-09-23  4:49   ` Tao Ma
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2011-09-19 22:31 UTC (permalink / raw)
  To: Tao Ma
  Cc: linux-kernel, Theodore Ts'o, Christoph Hellwig, Al Viro,
	Andrew Morton

On Mon, 19 Sep 2011 16:25:39 +0800
Tao Ma <tm@tao.ma> wrote:

> In get_more_blocks, we use dio_count to calcuate fs_count and do some
> tricky things to increase fs_count if dio_count isn't aligned. But
> actually it still has some cornor case that can't be coverd. See the
> following example:
> ./dio_write foo -s 1024 -w 4096(direct write 4096 bytes at offset 1024).
> The same goes if the offset isn't aligned to fs_blocksize.
> 
> In this case, the old calculation counts fs_count to be 1, but actually
> we will write into 2 different blocks(if fs_blocksize=4096). The old code
> just works, since it will call get_block twice(and may have to allocate
> and create extent twice for file systems like ext4). So we'd better call
> get_block just once with the proper fs_count.

Has this been carefully tested with more than just ext4?  If so, which?

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fs/direct-io.c: Calcuate fs_count correctly in get_more_blocks.
  2011-09-19 22:31 ` Andrew Morton
@ 2011-09-20  2:13   ` Tao Ma
  2011-09-23  4:49   ` Tao Ma
  1 sibling, 0 replies; 4+ messages in thread
From: Tao Ma @ 2011-09-20  2:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Theodore Ts'o, Christoph Hellwig, Al Viro,
	Andrew Morton

On 09/20/2011 06:31 AM, Andrew Morton wrote:
> On Mon, 19 Sep 2011 16:25:39 +0800
> Tao Ma <tm@tao.ma> wrote:
> 
>> In get_more_blocks, we use dio_count to calcuate fs_count and do some
>> tricky things to increase fs_count if dio_count isn't aligned. But
>> actually it still has some cornor case that can't be coverd. See the
>> following example:
>> ./dio_write foo -s 1024 -w 4096(direct write 4096 bytes at offset 1024).
>> The same goes if the offset isn't aligned to fs_blocksize.
>>
>> In this case, the old calculation counts fs_count to be 1, but actually
>> we will write into 2 different blocks(if fs_blocksize=4096). The old code
>> just works, since it will call get_block twice(and may have to allocate
>> and create extent twice for file systems like ext4). So we'd better call
>> get_block just once with the proper fs_count.
> 
> Has this been carefully tested with more than just ext4?  If so, which?
ext4 only by xfstests, fs_mark, postmark, ffsb, dbench and sysbench. But
I can try xfs later. I will update you with the test result.

Thanks
Tao

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fs/direct-io.c: Calcuate fs_count correctly in get_more_blocks.
  2011-09-19 22:31 ` Andrew Morton
  2011-09-20  2:13   ` Tao Ma
@ 2011-09-23  4:49   ` Tao Ma
  1 sibling, 0 replies; 4+ messages in thread
From: Tao Ma @ 2011-09-23  4:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Theodore Ts'o, Christoph Hellwig, Al Viro,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

Hi Andrew,
On 09/20/2011 06:31 AM, Andrew Morton wrote:
> On Mon, 19 Sep 2011 16:25:39 +0800
> Tao Ma <tm@tao.ma> wrote:
> 
>> In get_more_blocks, we use dio_count to calcuate fs_count and do some
>> tricky things to increase fs_count if dio_count isn't aligned. But
>> actually it still has some cornor case that can't be coverd. See the
>> following example:
>> ./dio_write foo -s 1024 -w 4096(direct write 4096 bytes at offset 1024).
>> The same goes if the offset isn't aligned to fs_blocksize.
>>
>> In this case, the old calculation counts fs_count to be 1, but actually
>> we will write into 2 different blocks(if fs_blocksize=4096). The old code
>> just works, since it will call get_block twice(and may have to allocate
>> and create extent twice for file systems like ext4). So we'd better call
>> get_block just once with the proper fs_count.
> 
> Has this been carefully tested with more than just ext4?  If so, which?
I have done some more tests on both raw devices and file systems of
ext4, btrfs and xfs.
I have attached the fio test cases and ffsb test cases I used.
Besides this, I also run ffsb -I -s 2G against all these 3 file systems.
By now, no kernel error.

Thanks
Tao

[-- Attachment #2: rndrw-bs-16k-filesize-64G --]
[-- Type: text/plain, Size: 176 bytes --]

[global]
direct=1
ioengine=psync
bs=16k
filename=/mnt/ext4/testfile
size=64G
runtime=600
group_reporting
loops=50

[read]
rw=randread
numjobs=8

[write]
rw=randwrite
numjobs=8

[-- Attachment #3: dio_profile --]
[-- Type: text/plain, Size: 2299 bytes --]

directio	= 1
time		= 1800

[filesystem]
	location	= FFSB_DIR

	num_dirs	= 100

	size_weight	4k	33
	size_weight	8k	21
	size_weight	16k	13
	size_weight	32k	10
	size_weight	64k	8
	size_weight	128k	5
	size_weight	256k	4
	size_weight	512k	3
	size_weight	8m	2
	size_weight	32m	1
	size_weight	1g	1

#	min_filesize	= 4k
#	max_filesize	= 10m

	num_files	= 1000
	init_size	= 100m
#	init_size	= 6GB
#	init_size	= 1gb
#	init_util	= 0.002

[end]

[threadgroup0]
	num_threads	= 16

	append_weight		= 1
	append_fsync_weight	= 1
	stat_weight		= 1
	write_weight		= 1
	write_fsync_weight	= 1
	read_weight		= 1
	create_weight		= 1
	create_fsync_weight	= 1
	delete_weight		= 1
	readall_weight		= 1
	writeall_weight		= 1
	writeall_fsync_weight	= 1
	open_close_weight	= 1

	read_random	= 1
	write_random	= 1

	write_size	= 4k
	write_blocksize	= 4k
	read_size	= 4k
	read_blocksize	= 4k

	op_delay	= 0

	[stats]
		enable_stats	= 1
		enable_range	= 0

#		ignore		= close
#		ignore		= open
#		ignore		= lseek
#		ignore		= write
#		ignore		= read

		msec_range	0.00 0.01
		msec_range	0.01 0.02
		msec_range	0.02 0.03
		msec_range	0.03 0.04
		msec_range	0.04 0.05
		msec_range	0.05 0.1
		msec_range	0.1 0.2
		msec_range	0.2 0.5
		msec_range	0.5 1.0
		msec_range	1.0 2.0
		msec_range	2.0 3.0
		msec_range	3.0 4.0
		msec_range	4.0 5.0
		msec_range	5.0 10.0
		msec_range	10.0 10000.0
	[end]
[end]

[threadgroup1]
	num_threads	= 16

	append_weight		= 1
	append_fsync_weight	= 1
	stat_weight		= 1
	write_weight		= 1
	write_fsync_weight	= 1
	read_weight		= 1
	create_weight		= 1
	create_fsync_weight	= 1
	delete_weight		= 1
	readall_weight		= 1
	writeall_weight		= 1
	writeall_fsync_weight	= 1
	open_close_weight	= 1

	read_random	= 0
	write_random	= 0

	write_size	= 4k
	write_blocksize	= 4k
	read_size	= 4k
	read_blocksize	= 4k

	op_delay	= 0

	[stats]
		enable_stats	= 1
		enable_range	= 0

#		ignore		= close
#		ignore		= open
#		ignore		= lseek
#		ignore		= write
#		ignore		= read

		msec_range	0.00 0.01
		msec_range	0.01 0.02
		msec_range	0.02 0.03
		msec_range	0.03 0.04
		msec_range	0.04 0.05
		msec_range	0.05 0.1
		msec_range	0.1 0.2
		msec_range	0.2 0.5
		msec_range	0.5 1.0
		msec_range	1.0 2.0
		msec_range	2.0 3.0
		msec_range	3.0 4.0
		msec_range	4.0 5.0
		msec_range	5.0 10.0
		msec_range	10.0 10000.0
	[end]
[end]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-23  4:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-19  8:25 [PATCH] fs/direct-io.c: Calcuate fs_count correctly in get_more_blocks Tao Ma
2011-09-19 22:31 ` Andrew Morton
2011-09-20  2:13   ` Tao Ma
2011-09-23  4:49   ` Tao Ma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.