* Bad performance of ext4 with kernel 3.0.17
@ 2012-03-01 5:31 Xupeng Yun
2012-03-01 19:47 ` Ted Ts'o
0 siblings, 1 reply; 6+ messages in thread
From: Xupeng Yun @ 2012-03-01 5:31 UTC (permalink / raw)
To: Ext4 development
I just set up a new server (Gentoo 64bit with kernel 3.0.17) with 4 x
15000RPM SAS disks(sdc, sdd, sde and sdf), and created soft RAID10 on
top of them, the partitions are aligned at 1MB:
# fdisk -lu /dev/sd{c,e,d,f}
Disk /dev/sdc: 600.1 GB, 600127266816 bytes
255 heads, 63 sectors/track, 72961 cylinders, total 1172123568 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xdd96eace
Device Boot Start End Blocks Id System
/dev/sdc1 2048 1172123567 586060760 fd Linux raid
autodetect
Disk /dev/sde: 600.1 GB, 600127266816 bytes
3 heads, 63 sectors/track, 6201712 cylinders, total 1172123568 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xf869ba1c
Device Boot Start End Blocks Id System
/dev/sde1 2048 1172123567 586060760 fd Linux raid
autodetect
Disk /dev/sdd: 600.1 GB, 600127266816 bytes
81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xf869ba1c
Device Boot Start End Blocks Id System
/dev/sdd1 2048 1172123567 586060760 fd Linux raid
autodetect
Disk /dev/sdf: 600.1 GB, 600127266816 bytes
81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xb4893c3c
Device Boot Start End Blocks Id System
/dev/sdf1 2048 1172123567 586060760 fd Linux raid
autodetect
and here is the RAID 10 (md3) with 64K chunk size:
cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10]
md3 : active raid10 sdf1[3] sde1[2] sdd1[1] sdc1[0]
1172121344 blocks 64K chunks 2 near-copies [4/4] [UUUU]
md1 : active raid1 sda1[0] sdb1[1]
112320 blocks [2/2] [UU]
md2 : active raid1 sda2[0] sdb2[1]
41953664 blocks [2/2] [UU]
unused devices: <none>
I did IO testing with `fio` against the raw RAID device (md3), and the
result looks good(read IOPS 1723 / write IOPS 168):
# fio --filename=/dev/md3 --direct=1 --rw=randrw --bs=16k
--size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
--rwmixread=90 --thread --ioengine=p
file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
...
file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
fio 2.0.3
Starting 16 threads
Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [28234K/2766K
/s] [1723 /168 iops] [eta 00m:00s]
file1: (groupid=0, jobs=16): err= 0: pid=17107
read : io=1606.3MB, bw=27406KB/s, iops=1712 , runt= 60017msec
clat (usec): min=221 , max=123233 , avg=7693.00, stdev=7734.82
lat (usec): min=221 , max=123233 , avg=7693.12, stdev=7734.82
clat percentiles (usec):
| 1.00th=[ 1128], 5.00th=[ 1560], 10.00th=[ 1928], 20.00th=[ 2640],
| 30.00th=[ 3376], 40.00th=[ 4128], 50.00th=[ 4896], 60.00th=[ 6304],
| 70.00th=[ 8256], 80.00th=[11200], 90.00th=[16768], 95.00th=[23168],
| 99.00th=[38656], 99.50th=[45824], 99.90th=[62720]
bw (KB/s) : min= 888, max=13093, per=7.59%, avg=2079.11, stdev=922.54
write: io=183840KB, bw=3063.2KB/s, iops=191 , runt= 60017msec
clat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
lat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
clat percentiles (usec):
| 1.00th=[ 1816], 5.00th=[ 2544], 10.00th=[ 3248], 20.00th=[ 4512],
| 30.00th=[ 5728], 40.00th=[ 7648], 50.00th=[ 9536], 60.00th=[12480],
| 70.00th=[16320], 80.00th=[22144], 90.00th=[32640], 95.00th=[43264],
| 99.00th=[71168], 99.50th=[82432], 99.90th=[111104]
bw (KB/s) : min= 90, max= 5806, per=33.81%, avg=1035.45, stdev=973.10
lat (usec) : 250=0.05%, 500=0.09%, 750=0.05%, 1000=0.19%
lat (msec) : 2=9.61%, 4=26.05%, 10=38.46%, 20=16.82%, 50=8.02%
lat (msec) : 100=0.63%, 250=0.03%
cpu : usr=1.02%, sys=2.87%, ctx=1926728, majf=0, minf=288891
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
issued : total=r=102801/w=11490/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=1606.3MB, aggrb=27405KB/s, minb=28063KB/s,
maxb=28063KB/s, mint=60017msec, maxt=60017msec
WRITE: io=183840KB, aggrb=3063KB/s, minb=3136KB/s,
maxb=3136KB/s, mint=60017msec, maxt=60017msec
Disk stats (read/write):
md3: ios=102753/11469, merge=0/0, ticks=0/0, in_queue=0,
util=0.00%, aggrios=25764/5746, aggrmerge=0/0, aggrticks=197378/51351,
aggrin_queue=248718, aggrutil=99.31%
sdc: ios=26256/5723, merge=0/0, ticks=204328/68364,
in_queue=272668, util=99.20%
sdd: ios=25290/5723, merge=0/0, ticks=187572/61628,
in_queue=249188, util=98.73%
sde: ios=25689/5769, merge=0/0, ticks=197340/71828,
in_queue=269172, util=99.31%
sdf: ios=25822/5769, merge=0/0, ticks=200272/3584,
in_queue=203844, util=97.87%
then I created ext4 filesystem on top of the RAID device and mounted
it to /mnt/test:
mkfs.ext4 -E stride=16,stripe-width=32 /dev/md3
mount /dev/md3 /mnt/test -o noatime,nodiratime,data=writeback,nobarrier
after that I did the very same IO testing, but the result looks very
bad(read IOPS 926 / write IOPS 97):
# fio --filename=/mnt/test/test --direct=1 --rw=randrw --bs=16k
--size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
--rwmixread=90 --thread --ioengine=psync
file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
...
file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
fio 2.0.3
Starting 16 threads
file1: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [15172K/1604K
/s] [926 /97 iops] [eta 00m:00s]
file1: (groupid=0, jobs=16): err= 0: pid=18764
read : io=838816KB, bw=13974KB/s, iops=873 , runt= 60025msec
clat (usec): min=228 , max=111583 , avg=16412.46, stdev=11632.03
lat (usec): min=228 , max=111583 , avg=16412.60, stdev=11632.03
clat percentiles (usec):
| 1.00th=[ 1384], 5.00th=[ 2320], 10.00th=[ 3376], 20.00th=[ 5216],
| 30.00th=[ 8256], 40.00th=[11456], 50.00th=[14656], 60.00th=[17792],
| 70.00th=[21376], 80.00th=[25472], 90.00th=[32128], 95.00th=[37632],
| 99.00th=[50944], 99.50th=[56576], 99.90th=[70144]
bw (KB/s) : min= 308, max= 4448, per=6.90%, avg=964.30, stdev=339.53
write: io=94208KB, bw=1569.5KB/s, iops=98 , runt= 60025msec
clat (msec): min=1 , max=89 , avg=16.91, stdev=10.24
lat (msec): min=1 , max=89 , avg=16.92, stdev=10.24
clat percentiles (usec):
| 1.00th=[ 2384], 5.00th=[ 3888], 10.00th=[ 5088], 20.00th=[ 7776],
| 30.00th=[10304], 40.00th=[12736], 50.00th=[15296], 60.00th=[17792],
| 70.00th=[20864], 80.00th=[24960], 90.00th=[30848], 95.00th=[35584],
| 99.00th=[47360], 99.50th=[51456], 99.90th=[62208]
bw (KB/s) : min= 31, max= 4676, per=62.37%, avg=978.64, stdev=896.53
lat (usec) : 250=0.01%, 500=0.03%, 750=0.01%, 1000=0.06%
lat (msec) : 2=3.15%, 4=9.42%, 10=22.23%, 20=31.61%, 50=32.39%
lat (msec) : 100=1.08%, 250=0.01%
cpu : usr=0.59%, sys=2.63%, ctx=1700318, majf=0, minf=19888
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
issued : total=r=52426/w=5888/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=838816KB, aggrb=13974KB/s, minb=14309KB/s,
maxb=14309KB/s, mint=60025msec, maxt=60025msec
WRITE: io=94208KB, aggrb=1569KB/s, minb=1607KB/s, maxb=1607KB/s,
mint=60025msec, maxt=60025msec
Disk stats (read/write):
md3: ios=58848/13987, merge=0/0, ticks=0/0, in_queue=0,
util=0.00%, aggrios=14750/4159, aggrmerge=0/2861,
aggrticks=112418/28260, aggrin_queue=140664, aggrutil=84.95%
sdc: ios=17688/4221, merge=0/2878, ticks=148664/37972,
in_queue=186628, util=84.95%
sdd: ios=11801/4219, merge=0/2880, ticks=79396/29192,
in_queue=108572, util=70.71%
sde: ios=16427/4099, merge=0/2843, ticks=129072/35252,
in_queue=164304, util=81.57%
sdf: ios=13086/4097, merge=0/2845, ticks=92540/10624,
in_queue=103152, util=60.02%
anything goes wrong here?
--
Xupeng Yun
http://about.me/xupeng
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bad performance of ext4 with kernel 3.0.17
2012-03-01 5:31 Bad performance of ext4 with kernel 3.0.17 Xupeng Yun
@ 2012-03-01 19:47 ` Ted Ts'o
2012-03-02 0:50 ` Xupeng Yun
0 siblings, 1 reply; 6+ messages in thread
From: Ted Ts'o @ 2012-03-01 19:47 UTC (permalink / raw)
To: Xupeng Yun; +Cc: Ext4 development
Two things I'd try:
#1) If this is a freshly created file system, the kernel may be
initializing the inode table in the background, and this could be
interfering with your benchmark workload. To address this, you can
either (a) add the mount option noinititable, (b) add the mke2fs
option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
take a lot longer, or (c) mount the file system and wait until
"dumpe2fs /dev/md3 | tail" shows that the last block group has the
ITABLE_ZEROED flag set. For benchmarking purposes on a scratch
workload, option (a) above is the fast thing to do.
#2) It could be that the file system is choosing blocks farther away
from the beginning of the disk, which is slower, whereas the fio on
the raw disk will use the blocks closest to the beginning of the disk,
which are the fastest one. You could try creating the file system so
it is only 10GB, and then try running fio on that small, truncated
file system, and see if that makes a difference.
- Ted
On Thu, Mar 01, 2012 at 01:31:58PM +0800, Xupeng Yun wrote:
> I just set up a new server (Gentoo 64bit with kernel 3.0.17) with 4 x
> 15000RPM SAS disks(sdc, sdd, sde and sdf), and created soft RAID10 on
> top of them, the partitions are aligned at 1MB:
>
> # fdisk -lu /dev/sd{c,e,d,f}
>
> Disk /dev/sdc: 600.1 GB, 600127266816 bytes
> 255 heads, 63 sectors/track, 72961 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xdd96eace
>
> Device Boot Start End Blocks Id System
> /dev/sdc1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
> Disk /dev/sde: 600.1 GB, 600127266816 bytes
> 3 heads, 63 sectors/track, 6201712 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xf869ba1c
>
> Device Boot Start End Blocks Id System
> /dev/sde1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
> Disk /dev/sdd: 600.1 GB, 600127266816 bytes
> 81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xf869ba1c
>
> Device Boot Start End Blocks Id System
> /dev/sdd1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
> Disk /dev/sdf: 600.1 GB, 600127266816 bytes
> 81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xb4893c3c
>
> Device Boot Start End Blocks Id System
> /dev/sdf1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
>
> and here is the RAID 10 (md3) with 64K chunk size:
>
> cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid10]
> md3 : active raid10 sdf1[3] sde1[2] sdd1[1] sdc1[0]
> 1172121344 blocks 64K chunks 2 near-copies [4/4] [UUUU]
>
> md1 : active raid1 sda1[0] sdb1[1]
> 112320 blocks [2/2] [UU]
>
> md2 : active raid1 sda2[0] sdb2[1]
> 41953664 blocks [2/2] [UU]
>
> unused devices: <none>
>
> I did IO testing with `fio` against the raw RAID device (md3), and the
> result looks good(read IOPS 1723 / write IOPS 168):
>
> # fio --filename=/dev/md3 --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=p
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> ...
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> fio 2.0.3
> Starting 16 threads
> Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [28234K/2766K
> /s] [1723 /168 iops] [eta 00m:00s]
> file1: (groupid=0, jobs=16): err= 0: pid=17107
> read : io=1606.3MB, bw=27406KB/s, iops=1712 , runt= 60017msec
> clat (usec): min=221 , max=123233 , avg=7693.00, stdev=7734.82
> lat (usec): min=221 , max=123233 , avg=7693.12, stdev=7734.82
> clat percentiles (usec):
> | 1.00th=[ 1128], 5.00th=[ 1560], 10.00th=[ 1928], 20.00th=[ 2640],
> | 30.00th=[ 3376], 40.00th=[ 4128], 50.00th=[ 4896], 60.00th=[ 6304],
> | 70.00th=[ 8256], 80.00th=[11200], 90.00th=[16768], 95.00th=[23168],
> | 99.00th=[38656], 99.50th=[45824], 99.90th=[62720]
> bw (KB/s) : min= 888, max=13093, per=7.59%, avg=2079.11, stdev=922.54
> write: io=183840KB, bw=3063.2KB/s, iops=191 , runt= 60017msec
> clat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
> lat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
> clat percentiles (usec):
> | 1.00th=[ 1816], 5.00th=[ 2544], 10.00th=[ 3248], 20.00th=[ 4512],
> | 30.00th=[ 5728], 40.00th=[ 7648], 50.00th=[ 9536], 60.00th=[12480],
> | 70.00th=[16320], 80.00th=[22144], 90.00th=[32640], 95.00th=[43264],
> | 99.00th=[71168], 99.50th=[82432], 99.90th=[111104]
> bw (KB/s) : min= 90, max= 5806, per=33.81%, avg=1035.45, stdev=973.10
> lat (usec) : 250=0.05%, 500=0.09%, 750=0.05%, 1000=0.19%
> lat (msec) : 2=9.61%, 4=26.05%, 10=38.46%, 20=16.82%, 50=8.02%
> lat (msec) : 100=0.63%, 250=0.03%
> cpu : usr=1.02%, sys=2.87%, ctx=1926728, majf=0, minf=288891
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> issued : total=r=102801/w=11490/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> READ: io=1606.3MB, aggrb=27405KB/s, minb=28063KB/s,
> maxb=28063KB/s, mint=60017msec, maxt=60017msec
> WRITE: io=183840KB, aggrb=3063KB/s, minb=3136KB/s,
> maxb=3136KB/s, mint=60017msec, maxt=60017msec
>
> Disk stats (read/write):
> md3: ios=102753/11469, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=25764/5746, aggrmerge=0/0, aggrticks=197378/51351,
> aggrin_queue=248718, aggrutil=99.31%
> sdc: ios=26256/5723, merge=0/0, ticks=204328/68364,
> in_queue=272668, util=99.20%
> sdd: ios=25290/5723, merge=0/0, ticks=187572/61628,
> in_queue=249188, util=98.73%
> sde: ios=25689/5769, merge=0/0, ticks=197340/71828,
> in_queue=269172, util=99.31%
> sdf: ios=25822/5769, merge=0/0, ticks=200272/3584,
> in_queue=203844, util=97.87%
>
> then I created ext4 filesystem on top of the RAID device and mounted
> it to /mnt/test:
>
> mkfs.ext4 -E stride=16,stripe-width=32 /dev/md3
> mount /dev/md3 /mnt/test -o noatime,nodiratime,data=writeback,nobarrier
>
> after that I did the very same IO testing, but the result looks very
> bad(read IOPS 926 / write IOPS 97):
>
> # fio --filename=/mnt/test/test --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=psync
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> ...
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> fio 2.0.3
> Starting 16 threads
> file1: Laying out IO file(s) (1 file(s) / 5120MB)
> Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [15172K/1604K
> /s] [926 /97 iops] [eta 00m:00s]
> file1: (groupid=0, jobs=16): err= 0: pid=18764
> read : io=838816KB, bw=13974KB/s, iops=873 , runt= 60025msec
> clat (usec): min=228 , max=111583 , avg=16412.46, stdev=11632.03
> lat (usec): min=228 , max=111583 , avg=16412.60, stdev=11632.03
> clat percentiles (usec):
> | 1.00th=[ 1384], 5.00th=[ 2320], 10.00th=[ 3376], 20.00th=[ 5216],
> | 30.00th=[ 8256], 40.00th=[11456], 50.00th=[14656], 60.00th=[17792],
> | 70.00th=[21376], 80.00th=[25472], 90.00th=[32128], 95.00th=[37632],
> | 99.00th=[50944], 99.50th=[56576], 99.90th=[70144]
> bw (KB/s) : min= 308, max= 4448, per=6.90%, avg=964.30, stdev=339.53
> write: io=94208KB, bw=1569.5KB/s, iops=98 , runt= 60025msec
> clat (msec): min=1 , max=89 , avg=16.91, stdev=10.24
> lat (msec): min=1 , max=89 , avg=16.92, stdev=10.24
> clat percentiles (usec):
> | 1.00th=[ 2384], 5.00th=[ 3888], 10.00th=[ 5088], 20.00th=[ 7776],
> | 30.00th=[10304], 40.00th=[12736], 50.00th=[15296], 60.00th=[17792],
> | 70.00th=[20864], 80.00th=[24960], 90.00th=[30848], 95.00th=[35584],
> | 99.00th=[47360], 99.50th=[51456], 99.90th=[62208]
> bw (KB/s) : min= 31, max= 4676, per=62.37%, avg=978.64, stdev=896.53
> lat (usec) : 250=0.01%, 500=0.03%, 750=0.01%, 1000=0.06%
> lat (msec) : 2=3.15%, 4=9.42%, 10=22.23%, 20=31.61%, 50=32.39%
> lat (msec) : 100=1.08%, 250=0.01%
> cpu : usr=0.59%, sys=2.63%, ctx=1700318, majf=0, minf=19888
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> issued : total=r=52426/w=5888/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> READ: io=838816KB, aggrb=13974KB/s, minb=14309KB/s,
> maxb=14309KB/s, mint=60025msec, maxt=60025msec
> WRITE: io=94208KB, aggrb=1569KB/s, minb=1607KB/s, maxb=1607KB/s,
> mint=60025msec, maxt=60025msec
>
> Disk stats (read/write):
> md3: ios=58848/13987, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=14750/4159, aggrmerge=0/2861,
> aggrticks=112418/28260, aggrin_queue=140664, aggrutil=84.95%
> sdc: ios=17688/4221, merge=0/2878, ticks=148664/37972,
> in_queue=186628, util=84.95%
> sdd: ios=11801/4219, merge=0/2880, ticks=79396/29192,
> in_queue=108572, util=70.71%
> sde: ios=16427/4099, merge=0/2843, ticks=129072/35252,
> in_queue=164304, util=81.57%
> sdf: ios=13086/4097, merge=0/2845, ticks=92540/10624,
> in_queue=103152, util=60.02%
>
> anything goes wrong here?
>
>
> --
> Xupeng Yun
> http://about.me/xupeng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bad performance of ext4 with kernel 3.0.17
2012-03-01 19:47 ` Ted Ts'o
@ 2012-03-02 0:50 ` Xupeng Yun
2012-03-02 2:45 ` Ted Ts'o
0 siblings, 1 reply; 6+ messages in thread
From: Xupeng Yun @ 2012-03-02 0:50 UTC (permalink / raw)
To: Ted Ts'o; +Cc: Ext4 development
On Fri, Mar 2, 2012 at 03:47, Ted Ts'o <tytso@mit.edu> wrote:
> Two things I'd try:
>
> #1) If this is a freshly created file system, the kernel may be
> initializing the inode table in the background, and this could be
> interfering with your benchmark workload. To address this, you can
> either (a) add the mount option noinititable, (b) add the mke2fs
> option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
> take a lot longer, or (c) mount the file system and wait until
> "dumpe2fs /dev/md3 | tail" shows that the last block group has the
> ITABLE_ZEROED flag set. For benchmarking purposes on a scratch
> workload, option (a) above is the fast thing to do.
>
Thank you Ted, I followed this and got the same result (read IOPS ~950
/ write IOPS ~100)
> #2) It could be that the file system is choosing blocks farther away
> from the beginning of the disk, which is slower, whereas the fio on
> the raw disk will use the blocks closest to the beginning of the disk,
> which are the fastest one. You could try creating the file system so
> it is only 10GB, and then try running fio on that small, truncated
> file system, and see if that makes a difference.
I created LVM on top of the RAID10 device, and then created a smaller LV(20GB),
after that I took benchmarks against the very same LV with different
filesystems, the
results are interesting:
xfs (read IOPS ~1700 / write IOPS ~200)
ext4 (read IOPS ~950 / write IOPS ~100)
ext3( read IOPS ~900 / write IOPS ~100)
reisferfs (read IOPS ~930 / write IOPS ~100)
btrfs (read IOPS ~1200 / write IOPS ~120)
I got very bad performance from XFS
(http://www.spinics.net/lists/xfs/msg08688.html) about
two months ago, which was caused by known bugs of XFS, then I tried
ext4 on some of
my servers, it works very well until I got a new server set up with soft RAID10.
What should I learn to understand what's happening? any suggestion is
appreciated.
--
Xupeng Yun
http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bad performance of ext4 with kernel 3.0.17
2012-03-02 0:50 ` Xupeng Yun
@ 2012-03-02 2:45 ` Ted Ts'o
2012-03-02 7:06 ` Xupeng Yun
2012-03-03 3:56 ` Xupeng Yun
0 siblings, 2 replies; 6+ messages in thread
From: Ted Ts'o @ 2012-03-02 2:45 UTC (permalink / raw)
To: Xupeng Yun; +Cc: Ext4 development
Hmm, it sounds like we're hitting some kind of scaling problem. How
many CPU's/cores do you have on your server? And it would be
interesting to try varying the --numjobs parameter and see how the
various file systems behave with 1, 2, 4, 8, and 16 threads.
The other thing that's worth checking is to try using filefrag -v on
the test file after the benchmark has finished, just to make sure the
file layout is sane. It should be, but I just want to double check...
- Ted
On Fri, Mar 02, 2012 at 08:50:55AM +0800, Xupeng Yun wrote:
> On Fri, Mar 2, 2012 at 03:47, Ted Ts'o <tytso@mit.edu> wrote:
> > Two things I'd try:
> >
> > #1) If this is a freshly created file system, the kernel may be
> > initializing the inode table in the background, and this could be
> > interfering with your benchmark workload. To address this, you can
> > either (a) add the mount option noinititable, (b) add the mke2fs
> > option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
> > take a lot longer, or (c) mount the file system and wait until
> > "dumpe2fs /dev/md3 | tail" shows that the last block group has the
> > ITABLE_ZEROED flag set. For benchmarking purposes on a scratch
> > workload, option (a) above is the fast thing to do.
> >
>
> Thank you Ted, I followed this and got the same result (read IOPS ~950
> / write IOPS ~100)
>
> > #2) It could be that the file system is choosing blocks farther away
> > from the beginning of the disk, which is slower, whereas the fio on
> > the raw disk will use the blocks closest to the beginning of the disk,
> > which are the fastest one. You could try creating the file system so
> > it is only 10GB, and then try running fio on that small, truncated
> > file system, and see if that makes a difference.
>
> I created LVM on top of the RAID10 device, and then created a smaller LV(20GB),
> after that I took benchmarks against the very same LV with different
> filesystems, the
> results are interesting:
>
> xfs (read IOPS ~1700 / write IOPS ~200)
> ext4 (read IOPS ~950 / write IOPS ~100)
> ext3( read IOPS ~900 / write IOPS ~100)
> reisferfs (read IOPS ~930 / write IOPS ~100)
> btrfs (read IOPS ~1200 / write IOPS ~120)
>
> I got very bad performance from XFS
> (http://www.spinics.net/lists/xfs/msg08688.html) about
> two months ago, which was caused by known bugs of XFS, then I tried
> ext4 on some of
> my servers, it works very well until I got a new server set up with soft RAID10.
>
> What should I learn to understand what's happening? any suggestion is
> appreciated.
>
> --
> Xupeng Yun
> http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bad performance of ext4 with kernel 3.0.17
2012-03-02 2:45 ` Ted Ts'o
@ 2012-03-02 7:06 ` Xupeng Yun
2012-03-03 3:56 ` Xupeng Yun
1 sibling, 0 replies; 6+ messages in thread
From: Xupeng Yun @ 2012-03-02 7:06 UTC (permalink / raw)
To: Ted Ts'o; +Cc: Ext4 development
On Fri, Mar 2, 2012 at 10:45, Ted Ts'o <tytso@mit.edu> wrote:
> Hmm, it sounds like we're hitting some kind of scaling problem. How
> many CPU's/cores do you have on your server? And it would be
> interesting to try varying the --numjobs parameter and see how the
> various file systems behave with 1, 2, 4, 8, and 16 threads.
>
I did various benchmarks with 1, 2, 4, 8, 16 and 32 threads, here are
the results:
http://blog.xupeng.me/misc/4-15krpm-sas-raid10/
> The other thing that's worth checking is to try using filefrag -v on
> the test file after the benchmark has finished, just to make sure the
> file layout is sane. It should be, but I just want to double check...
Output of `filefrag -v` after the benchmark has finished:
# filefrag -v /mnt/ext4/test
Filesystem type is: ef53
File size of /mnt/ext4/test is 5368709120 (1310720 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 2424832 32758
1 32758 2457600 2457590 32758
2 65516 2490368 2490358 32758
3 98274 2523136 2523126 32758
4 131032 2555904 2555894 32758
5 163790 2588672 2588662 32758
6 196548 2719744 2621430 32758
7 229306 2752512 2752502 32758
8 262064 2785280 2785270 32758
9 294822 2818048 2818038 32758
10 327580 2850816 2850806 32758
11 360338 2883584 2883574 32758
12 393096 2916352 2916342 32758
13 425854 2949120 2949110 32758
14 458612 2981888 2981878 32758
15 491370 3014656 3014646 32758
16 524128 3047424 3047414 32758
17 556886 3080192 3080182 32758
18 589644 3112960 3112950 32758
19 622402 3178496 3145718 32758
20 655160 3211264 3211254 32758
21 687918 3244032 3244022 32758
22 720676 3276800 3276790 32758
23 753434 3309568 3309558 32758
24 786192 3342336 3342326 32758
25 818950 3375104 3375094 32758
26 851708 3407872 3407862 32758
27 884466 3440640 3440630 32758
28 917224 3473408 3473398 32758
29 949982 3506176 3506166 32758
30 982740 3538944 3538934 32758
31 1015498 3571712 3571702 32758
32 1048256 3604480 3604470 32758
33 1081014 3637248 3637238 32758
34 1113772 3702784 3670006 32758
35 1146530 3735552 3735542 32758
36 1179288 3768320 3768310 32758
37 1212046 3801088 3801078 32758
38 1244804 3833856 3833846 32758
39 1277562 3866624 3866614 32758
40 1310320 3899392 3899382 400 eof
/mnt/ext4/test: 41 extents found
--
Xupeng Yun
http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bad performance of ext4 with kernel 3.0.17
2012-03-02 2:45 ` Ted Ts'o
2012-03-02 7:06 ` Xupeng Yun
@ 2012-03-03 3:56 ` Xupeng Yun
1 sibling, 0 replies; 6+ messages in thread
From: Xupeng Yun @ 2012-03-03 3:56 UTC (permalink / raw)
To: Ted Ts'o; +Cc: Ext4 development
On Fri, Mar 2, 2012 at 10:45, Ted Ts'o <tytso@mit.edu> wrote:
> Hmm, it sounds like we're hitting some kind of scaling problem. How
> many CPU's/cores do you have on your server? And it would be
> interesting to try varying the --numjobs parameter and see how the
> various file systems behave with 1, 2, 4, 8, and 16 threads.
Oh, there are 8 physical cores with hyper-threading on the server.
--
Xupeng Yun
http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-03-03 3:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-01 5:31 Bad performance of ext4 with kernel 3.0.17 Xupeng Yun
2012-03-01 19:47 ` Ted Ts'o
2012-03-02 0:50 ` Xupeng Yun
2012-03-02 2:45 ` Ted Ts'o
2012-03-02 7:06 ` Xupeng Yun
2012-03-03 3:56 ` Xupeng Yun
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.