All of lore.kernel.org
 help / color / mirror / Atom feed
* Bad performance of ext4 with kernel 3.0.17
@ 2012-03-01  5:31 Xupeng Yun
  2012-03-01 19:47 ` Ted Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Xupeng Yun @ 2012-03-01  5:31 UTC (permalink / raw)
  To: Ext4 development

I just set up a new server (Gentoo 64bit with kernel 3.0.17) with 4 x
15000RPM SAS disks(sdc, sdd, sde and sdf), and created soft RAID10 on
top of them, the partitions are aligned at 1MB:

    # fdisk -lu /dev/sd{c,e,d,f}

    Disk /dev/sdc: 600.1 GB, 600127266816 bytes
    255 heads, 63 sectors/track, 72961 cylinders, total 1172123568 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0xdd96eace

       Device Boot      Start         End      Blocks   Id  System
    /dev/sdc1            2048  1172123567   586060760   fd  Linux raid
autodetect

    Disk /dev/sde: 600.1 GB, 600127266816 bytes
    3 heads, 63 sectors/track, 6201712 cylinders, total 1172123568 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0xf869ba1c

       Device Boot      Start         End      Blocks   Id  System
    /dev/sde1            2048  1172123567   586060760   fd  Linux raid
autodetect

    Disk /dev/sdd: 600.1 GB, 600127266816 bytes
    81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0xf869ba1c

       Device Boot      Start         End      Blocks   Id  System
    /dev/sdd1            2048  1172123567   586060760   fd  Linux raid
autodetect

    Disk /dev/sdf: 600.1 GB, 600127266816 bytes
    81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0xb4893c3c

       Device Boot      Start         End      Blocks   Id  System
    /dev/sdf1            2048  1172123567   586060760   fd  Linux raid
autodetect


and here is the RAID 10 (md3) with 64K chunk size:

    cat /proc/mdstat
    Personalities : [raid0] [raid1] [raid10]
    md3 : active raid10 sdf1[3] sde1[2] sdd1[1] sdc1[0]
          1172121344 blocks 64K chunks 2 near-copies [4/4] [UUUU]

    md1 : active raid1 sda1[0] sdb1[1]
          112320 blocks [2/2] [UU]

    md2 : active raid1 sda2[0] sdb2[1]
          41953664 blocks [2/2] [UU]

    unused devices: <none>

I did IO testing with `fio` against the raw RAID device (md3), and the
result looks good(read IOPS 1723 / write IOPS 168):

    # fio --filename=/dev/md3 --direct=1 --rw=randrw --bs=16k
--size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
--rwmixread=90 --thread --ioengine=p
    file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
    ...
    file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
    fio 2.0.3
    Starting 16 threads
    Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [28234K/2766K
/s] [1723 /168  iops] [eta 00m:00s]
    file1: (groupid=0, jobs=16): err= 0: pid=17107
      read : io=1606.3MB, bw=27406KB/s, iops=1712 , runt= 60017msec
        clat (usec): min=221 , max=123233 , avg=7693.00, stdev=7734.82
         lat (usec): min=221 , max=123233 , avg=7693.12, stdev=7734.82
        clat percentiles (usec):
         |  1.00th=[ 1128],  5.00th=[ 1560], 10.00th=[ 1928], 20.00th=[ 2640],
         | 30.00th=[ 3376], 40.00th=[ 4128], 50.00th=[ 4896], 60.00th=[ 6304],
         | 70.00th=[ 8256], 80.00th=[11200], 90.00th=[16768], 95.00th=[23168],
         | 99.00th=[38656], 99.50th=[45824], 99.90th=[62720]
        bw (KB/s)  : min=  888, max=13093, per=7.59%, avg=2079.11, stdev=922.54
      write: io=183840KB, bw=3063.2KB/s, iops=191 , runt= 60017msec
        clat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
         lat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
        clat percentiles (usec):
         |  1.00th=[ 1816],  5.00th=[ 2544], 10.00th=[ 3248], 20.00th=[ 4512],
         | 30.00th=[ 5728], 40.00th=[ 7648], 50.00th=[ 9536], 60.00th=[12480],
         | 70.00th=[16320], 80.00th=[22144], 90.00th=[32640], 95.00th=[43264],
         | 99.00th=[71168], 99.50th=[82432], 99.90th=[111104]
        bw (KB/s)  : min=   90, max= 5806, per=33.81%, avg=1035.45, stdev=973.10
        lat (usec) : 250=0.05%, 500=0.09%, 750=0.05%, 1000=0.19%
        lat (msec) : 2=9.61%, 4=26.05%, 10=38.46%, 20=16.82%, 50=8.02%
        lat (msec) : 100=0.63%, 250=0.03%
      cpu          : usr=1.02%, sys=2.87%, ctx=1926728, majf=0, minf=288891
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
         issued    : total=r=102801/w=11490/d=0, short=r=0/w=0/d=0

    Run status group 0 (all jobs):
       READ: io=1606.3MB, aggrb=27405KB/s, minb=28063KB/s,
maxb=28063KB/s, mint=60017msec, maxt=60017msec
      WRITE: io=183840KB, aggrb=3063KB/s, minb=3136KB/s,
maxb=3136KB/s, mint=60017msec, maxt=60017msec

    Disk stats (read/write):
        md3: ios=102753/11469, merge=0/0, ticks=0/0, in_queue=0,
util=0.00%, aggrios=25764/5746, aggrmerge=0/0, aggrticks=197378/51351,
aggrin_queue=248718, aggrutil=99.31%
      sdc: ios=26256/5723, merge=0/0, ticks=204328/68364,
in_queue=272668, util=99.20%
      sdd: ios=25290/5723, merge=0/0, ticks=187572/61628,
in_queue=249188, util=98.73%
      sde: ios=25689/5769, merge=0/0, ticks=197340/71828,
in_queue=269172, util=99.31%
      sdf: ios=25822/5769, merge=0/0, ticks=200272/3584,
in_queue=203844, util=97.87%

then I created ext4 filesystem on top of the RAID device and mounted
it to /mnt/test:

    mkfs.ext4 -E stride=16,stripe-width=32 /dev/md3
    mount /dev/md3 /mnt/test -o noatime,nodiratime,data=writeback,nobarrier

after that I did the very same IO testing, but the result looks very
bad(read IOPS 926 / write IOPS 97):

    # fio --filename=/mnt/test/test --direct=1 --rw=randrw --bs=16k
--size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
--rwmixread=90 --thread --ioengine=psync
    file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
    ...
    file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
    fio 2.0.3
    Starting 16 threads
    file1: Laying out IO file(s) (1 file(s) / 5120MB)
    Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [15172K/1604K
/s] [926 /97  iops] [eta 00m:00s]
    file1: (groupid=0, jobs=16): err= 0: pid=18764
      read : io=838816KB, bw=13974KB/s, iops=873 , runt= 60025msec
        clat (usec): min=228 , max=111583 , avg=16412.46, stdev=11632.03
         lat (usec): min=228 , max=111583 , avg=16412.60, stdev=11632.03
        clat percentiles (usec):
         |  1.00th=[ 1384],  5.00th=[ 2320], 10.00th=[ 3376], 20.00th=[ 5216],
         | 30.00th=[ 8256], 40.00th=[11456], 50.00th=[14656], 60.00th=[17792],
         | 70.00th=[21376], 80.00th=[25472], 90.00th=[32128], 95.00th=[37632],
         | 99.00th=[50944], 99.50th=[56576], 99.90th=[70144]
        bw (KB/s)  : min=  308, max= 4448, per=6.90%, avg=964.30, stdev=339.53
      write: io=94208KB, bw=1569.5KB/s, iops=98 , runt= 60025msec
        clat (msec): min=1 , max=89 , avg=16.91, stdev=10.24
         lat (msec): min=1 , max=89 , avg=16.92, stdev=10.24
        clat percentiles (usec):
         |  1.00th=[ 2384],  5.00th=[ 3888], 10.00th=[ 5088], 20.00th=[ 7776],
         | 30.00th=[10304], 40.00th=[12736], 50.00th=[15296], 60.00th=[17792],
         | 70.00th=[20864], 80.00th=[24960], 90.00th=[30848], 95.00th=[35584],
         | 99.00th=[47360], 99.50th=[51456], 99.90th=[62208]
        bw (KB/s)  : min=   31, max= 4676, per=62.37%, avg=978.64, stdev=896.53
        lat (usec) : 250=0.01%, 500=0.03%, 750=0.01%, 1000=0.06%
        lat (msec) : 2=3.15%, 4=9.42%, 10=22.23%, 20=31.61%, 50=32.39%
        lat (msec) : 100=1.08%, 250=0.01%
      cpu          : usr=0.59%, sys=2.63%, ctx=1700318, majf=0, minf=19888
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
64=0.0%, >=64=0.0%
         issued    : total=r=52426/w=5888/d=0, short=r=0/w=0/d=0

    Run status group 0 (all jobs):
       READ: io=838816KB, aggrb=13974KB/s, minb=14309KB/s,
maxb=14309KB/s, mint=60025msec, maxt=60025msec
      WRITE: io=94208KB, aggrb=1569KB/s, minb=1607KB/s, maxb=1607KB/s,
mint=60025msec, maxt=60025msec

    Disk stats (read/write):
        md3: ios=58848/13987, merge=0/0, ticks=0/0, in_queue=0,
util=0.00%, aggrios=14750/4159, aggrmerge=0/2861,
aggrticks=112418/28260, aggrin_queue=140664, aggrutil=84.95%
      sdc: ios=17688/4221, merge=0/2878, ticks=148664/37972,
in_queue=186628, util=84.95%
      sdd: ios=11801/4219, merge=0/2880, ticks=79396/29192,
in_queue=108572, util=70.71%
      sde: ios=16427/4099, merge=0/2843, ticks=129072/35252,
in_queue=164304, util=81.57%
      sdf: ios=13086/4097, merge=0/2845, ticks=92540/10624,
in_queue=103152, util=60.02%

anything goes wrong here?


--
Xupeng Yun
http://about.me/xupeng

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bad performance of ext4 with kernel 3.0.17
  2012-03-01  5:31 Bad performance of ext4 with kernel 3.0.17 Xupeng Yun
@ 2012-03-01 19:47 ` Ted Ts'o
  2012-03-02  0:50   ` Xupeng Yun
  0 siblings, 1 reply; 6+ messages in thread
From: Ted Ts'o @ 2012-03-01 19:47 UTC (permalink / raw)
  To: Xupeng Yun; +Cc: Ext4 development

Two things I'd try:

#1) If this is a freshly created file system, the kernel may be
initializing the inode table in the background, and this could be
interfering with your benchmark workload.  To address this, you can
either (a) add the mount option noinititable, (b) add the mke2fs
option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
take a lot longer, or (c) mount the file system and wait until
"dumpe2fs /dev/md3 | tail" shows that the last block group has the
ITABLE_ZEROED flag set.  For benchmarking purposes on a scratch
workload, option (a) above is the fast thing to do.

#2) It could be that the file system is choosing blocks farther away
from the beginning of the disk, which is slower, whereas the fio on
the raw disk will use the blocks closest to the beginning of the disk,
which are the fastest one.  You could try creating the file system so
it is only 10GB, and then try running fio on that small, truncated
file system, and see if that makes a difference.

     	     	     	     	   - Ted


On Thu, Mar 01, 2012 at 01:31:58PM +0800, Xupeng Yun wrote:
> I just set up a new server (Gentoo 64bit with kernel 3.0.17) with 4 x
> 15000RPM SAS disks(sdc, sdd, sde and sdf), and created soft RAID10 on
> top of them, the partitions are aligned at 1MB:
> 
>     # fdisk -lu /dev/sd{c,e,d,f}
> 
>     Disk /dev/sdc: 600.1 GB, 600127266816 bytes
>     255 heads, 63 sectors/track, 72961 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xdd96eace
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdc1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sde: 600.1 GB, 600127266816 bytes
>     3 heads, 63 sectors/track, 6201712 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xf869ba1c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sde1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sdd: 600.1 GB, 600127266816 bytes
>     81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xf869ba1c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdd1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sdf: 600.1 GB, 600127266816 bytes
>     81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xb4893c3c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdf1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
> 
> and here is the RAID 10 (md3) with 64K chunk size:
> 
>     cat /proc/mdstat
>     Personalities : [raid0] [raid1] [raid10]
>     md3 : active raid10 sdf1[3] sde1[2] sdd1[1] sdc1[0]
>           1172121344 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
>     md1 : active raid1 sda1[0] sdb1[1]
>           112320 blocks [2/2] [UU]
> 
>     md2 : active raid1 sda2[0] sdb2[1]
>           41953664 blocks [2/2] [UU]
> 
>     unused devices: <none>
> 
> I did IO testing with `fio` against the raw RAID device (md3), and the
> result looks good(read IOPS 1723 / write IOPS 168):
> 
>     # fio --filename=/dev/md3 --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=p
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     ...
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     fio 2.0.3
>     Starting 16 threads
>     Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [28234K/2766K
> /s] [1723 /168  iops] [eta 00m:00s]
>     file1: (groupid=0, jobs=16): err= 0: pid=17107
>       read : io=1606.3MB, bw=27406KB/s, iops=1712 , runt= 60017msec
>         clat (usec): min=221 , max=123233 , avg=7693.00, stdev=7734.82
>          lat (usec): min=221 , max=123233 , avg=7693.12, stdev=7734.82
>         clat percentiles (usec):
>          |  1.00th=[ 1128],  5.00th=[ 1560], 10.00th=[ 1928], 20.00th=[ 2640],
>          | 30.00th=[ 3376], 40.00th=[ 4128], 50.00th=[ 4896], 60.00th=[ 6304],
>          | 70.00th=[ 8256], 80.00th=[11200], 90.00th=[16768], 95.00th=[23168],
>          | 99.00th=[38656], 99.50th=[45824], 99.90th=[62720]
>         bw (KB/s)  : min=  888, max=13093, per=7.59%, avg=2079.11, stdev=922.54
>       write: io=183840KB, bw=3063.2KB/s, iops=191 , runt= 60017msec
>         clat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
>          lat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
>         clat percentiles (usec):
>          |  1.00th=[ 1816],  5.00th=[ 2544], 10.00th=[ 3248], 20.00th=[ 4512],
>          | 30.00th=[ 5728], 40.00th=[ 7648], 50.00th=[ 9536], 60.00th=[12480],
>          | 70.00th=[16320], 80.00th=[22144], 90.00th=[32640], 95.00th=[43264],
>          | 99.00th=[71168], 99.50th=[82432], 99.90th=[111104]
>         bw (KB/s)  : min=   90, max= 5806, per=33.81%, avg=1035.45, stdev=973.10
>         lat (usec) : 250=0.05%, 500=0.09%, 750=0.05%, 1000=0.19%
>         lat (msec) : 2=9.61%, 4=26.05%, 10=38.46%, 20=16.82%, 50=8.02%
>         lat (msec) : 100=0.63%, 250=0.03%
>       cpu          : usr=1.02%, sys=2.87%, ctx=1926728, majf=0, minf=288891
>       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
>          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          issued    : total=r=102801/w=11490/d=0, short=r=0/w=0/d=0
> 
>     Run status group 0 (all jobs):
>        READ: io=1606.3MB, aggrb=27405KB/s, minb=28063KB/s,
> maxb=28063KB/s, mint=60017msec, maxt=60017msec
>       WRITE: io=183840KB, aggrb=3063KB/s, minb=3136KB/s,
> maxb=3136KB/s, mint=60017msec, maxt=60017msec
> 
>     Disk stats (read/write):
>         md3: ios=102753/11469, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=25764/5746, aggrmerge=0/0, aggrticks=197378/51351,
> aggrin_queue=248718, aggrutil=99.31%
>       sdc: ios=26256/5723, merge=0/0, ticks=204328/68364,
> in_queue=272668, util=99.20%
>       sdd: ios=25290/5723, merge=0/0, ticks=187572/61628,
> in_queue=249188, util=98.73%
>       sde: ios=25689/5769, merge=0/0, ticks=197340/71828,
> in_queue=269172, util=99.31%
>       sdf: ios=25822/5769, merge=0/0, ticks=200272/3584,
> in_queue=203844, util=97.87%
> 
> then I created ext4 filesystem on top of the RAID device and mounted
> it to /mnt/test:
> 
>     mkfs.ext4 -E stride=16,stripe-width=32 /dev/md3
>     mount /dev/md3 /mnt/test -o noatime,nodiratime,data=writeback,nobarrier
> 
> after that I did the very same IO testing, but the result looks very
> bad(read IOPS 926 / write IOPS 97):
> 
>     # fio --filename=/mnt/test/test --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=psync
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     ...
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     fio 2.0.3
>     Starting 16 threads
>     file1: Laying out IO file(s) (1 file(s) / 5120MB)
>     Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [15172K/1604K
> /s] [926 /97  iops] [eta 00m:00s]
>     file1: (groupid=0, jobs=16): err= 0: pid=18764
>       read : io=838816KB, bw=13974KB/s, iops=873 , runt= 60025msec
>         clat (usec): min=228 , max=111583 , avg=16412.46, stdev=11632.03
>          lat (usec): min=228 , max=111583 , avg=16412.60, stdev=11632.03
>         clat percentiles (usec):
>          |  1.00th=[ 1384],  5.00th=[ 2320], 10.00th=[ 3376], 20.00th=[ 5216],
>          | 30.00th=[ 8256], 40.00th=[11456], 50.00th=[14656], 60.00th=[17792],
>          | 70.00th=[21376], 80.00th=[25472], 90.00th=[32128], 95.00th=[37632],
>          | 99.00th=[50944], 99.50th=[56576], 99.90th=[70144]
>         bw (KB/s)  : min=  308, max= 4448, per=6.90%, avg=964.30, stdev=339.53
>       write: io=94208KB, bw=1569.5KB/s, iops=98 , runt= 60025msec
>         clat (msec): min=1 , max=89 , avg=16.91, stdev=10.24
>          lat (msec): min=1 , max=89 , avg=16.92, stdev=10.24
>         clat percentiles (usec):
>          |  1.00th=[ 2384],  5.00th=[ 3888], 10.00th=[ 5088], 20.00th=[ 7776],
>          | 30.00th=[10304], 40.00th=[12736], 50.00th=[15296], 60.00th=[17792],
>          | 70.00th=[20864], 80.00th=[24960], 90.00th=[30848], 95.00th=[35584],
>          | 99.00th=[47360], 99.50th=[51456], 99.90th=[62208]
>         bw (KB/s)  : min=   31, max= 4676, per=62.37%, avg=978.64, stdev=896.53
>         lat (usec) : 250=0.01%, 500=0.03%, 750=0.01%, 1000=0.06%
>         lat (msec) : 2=3.15%, 4=9.42%, 10=22.23%, 20=31.61%, 50=32.39%
>         lat (msec) : 100=1.08%, 250=0.01%
>       cpu          : usr=0.59%, sys=2.63%, ctx=1700318, majf=0, minf=19888
>       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
>          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          issued    : total=r=52426/w=5888/d=0, short=r=0/w=0/d=0
> 
>     Run status group 0 (all jobs):
>        READ: io=838816KB, aggrb=13974KB/s, minb=14309KB/s,
> maxb=14309KB/s, mint=60025msec, maxt=60025msec
>       WRITE: io=94208KB, aggrb=1569KB/s, minb=1607KB/s, maxb=1607KB/s,
> mint=60025msec, maxt=60025msec
> 
>     Disk stats (read/write):
>         md3: ios=58848/13987, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=14750/4159, aggrmerge=0/2861,
> aggrticks=112418/28260, aggrin_queue=140664, aggrutil=84.95%
>       sdc: ios=17688/4221, merge=0/2878, ticks=148664/37972,
> in_queue=186628, util=84.95%
>       sdd: ios=11801/4219, merge=0/2880, ticks=79396/29192,
> in_queue=108572, util=70.71%
>       sde: ios=16427/4099, merge=0/2843, ticks=129072/35252,
> in_queue=164304, util=81.57%
>       sdf: ios=13086/4097, merge=0/2845, ticks=92540/10624,
> in_queue=103152, util=60.02%
> 
> anything goes wrong here?
> 
> 
> --
> Xupeng Yun
> http://about.me/xupeng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bad performance of ext4 with kernel 3.0.17
  2012-03-01 19:47 ` Ted Ts'o
@ 2012-03-02  0:50   ` Xupeng Yun
  2012-03-02  2:45     ` Ted Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Xupeng Yun @ 2012-03-02  0:50 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Ext4 development

On Fri, Mar 2, 2012 at 03:47, Ted Ts'o <tytso@mit.edu> wrote:
> Two things I'd try:
>
> #1) If this is a freshly created file system, the kernel may be
> initializing the inode table in the background, and this could be
> interfering with your benchmark workload.  To address this, you can
> either (a) add the mount option noinititable, (b) add the mke2fs
> option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
> take a lot longer, or (c) mount the file system and wait until
> "dumpe2fs /dev/md3 | tail" shows that the last block group has the
> ITABLE_ZEROED flag set.  For benchmarking purposes on a scratch
> workload, option (a) above is the fast thing to do.
>

Thank you Ted, I followed this and got the same result (read IOPS ~950
/ write IOPS ~100)

> #2) It could be that the file system is choosing blocks farther away
> from the beginning of the disk, which is slower, whereas the fio on
> the raw disk will use the blocks closest to the beginning of the disk,
> which are the fastest one.  You could try creating the file system so
> it is only 10GB, and then try running fio on that small, truncated
> file system, and see if that makes a difference.

I created LVM on top of the RAID10 device, and then created a smaller LV(20GB),
after that I took benchmarks against the very same LV with different
filesystems, the
results are interesting:

xfs (read IOPS ~1700 / write IOPS ~200)
ext4 (read IOPS ~950 / write IOPS ~100)
ext3( read IOPS ~900 / write IOPS ~100)
reisferfs (read IOPS ~930 / write IOPS ~100)
btrfs (read IOPS ~1200 / write IOPS ~120)

I got very bad performance from XFS
(http://www.spinics.net/lists/xfs/msg08688.html) about
two months ago, which was caused by known bugs of XFS, then I tried
ext4 on some of
my servers, it works very well until I got a new server set up with soft RAID10.

What should I learn to understand what's happening? any suggestion is
appreciated.

-- 
Xupeng Yun
http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bad performance of ext4 with kernel 3.0.17
  2012-03-02  0:50   ` Xupeng Yun
@ 2012-03-02  2:45     ` Ted Ts'o
  2012-03-02  7:06       ` Xupeng Yun
  2012-03-03  3:56       ` Xupeng Yun
  0 siblings, 2 replies; 6+ messages in thread
From: Ted Ts'o @ 2012-03-02  2:45 UTC (permalink / raw)
  To: Xupeng Yun; +Cc: Ext4 development

Hmm, it sounds like we're hitting some kind of scaling problem.  How
many CPU's/cores do you have on your server?  And it would be
interesting to try varying the --numjobs parameter and see how the
various file systems behave with 1, 2, 4, 8, and 16 threads.

The other thing that's worth checking is to try using filefrag -v on
the test file after the benchmark has finished, just to make sure the
file layout is sane.  It should be, but I just want to double check...

          	       	      	 	- Ted

On Fri, Mar 02, 2012 at 08:50:55AM +0800, Xupeng Yun wrote:
> On Fri, Mar 2, 2012 at 03:47, Ted Ts'o <tytso@mit.edu> wrote:
> > Two things I'd try:
> >
> > #1) If this is a freshly created file system, the kernel may be
> > initializing the inode table in the background, and this could be
> > interfering with your benchmark workload.  To address this, you can
> > either (a) add the mount option noinititable, (b) add the mke2fs
> > option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
> > take a lot longer, or (c) mount the file system and wait until
> > "dumpe2fs /dev/md3 | tail" shows that the last block group has the
> > ITABLE_ZEROED flag set.  For benchmarking purposes on a scratch
> > workload, option (a) above is the fast thing to do.
> >
> 
> Thank you Ted, I followed this and got the same result (read IOPS ~950
> / write IOPS ~100)
> 
> > #2) It could be that the file system is choosing blocks farther away
> > from the beginning of the disk, which is slower, whereas the fio on
> > the raw disk will use the blocks closest to the beginning of the disk,
> > which are the fastest one.  You could try creating the file system so
> > it is only 10GB, and then try running fio on that small, truncated
> > file system, and see if that makes a difference.
> 
> I created LVM on top of the RAID10 device, and then created a smaller LV(20GB),
> after that I took benchmarks against the very same LV with different
> filesystems, the
> results are interesting:
> 
> xfs (read IOPS ~1700 / write IOPS ~200)
> ext4 (read IOPS ~950 / write IOPS ~100)
> ext3( read IOPS ~900 / write IOPS ~100)
> reisferfs (read IOPS ~930 / write IOPS ~100)
> btrfs (read IOPS ~1200 / write IOPS ~120)
> 
> I got very bad performance from XFS
> (http://www.spinics.net/lists/xfs/msg08688.html) about
> two months ago, which was caused by known bugs of XFS, then I tried
> ext4 on some of
> my servers, it works very well until I got a new server set up with soft RAID10.
> 
> What should I learn to understand what's happening? any suggestion is
> appreciated.
> 
> -- 
> Xupeng Yun
> http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bad performance of ext4 with kernel 3.0.17
  2012-03-02  2:45     ` Ted Ts'o
@ 2012-03-02  7:06       ` Xupeng Yun
  2012-03-03  3:56       ` Xupeng Yun
  1 sibling, 0 replies; 6+ messages in thread
From: Xupeng Yun @ 2012-03-02  7:06 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Ext4 development

On Fri, Mar 2, 2012 at 10:45, Ted Ts'o <tytso@mit.edu> wrote:
> Hmm, it sounds like we're hitting some kind of scaling problem.  How
> many CPU's/cores do you have on your server?  And it would be
> interesting to try varying the --numjobs parameter and see how the
> various file systems behave with 1, 2, 4, 8, and 16 threads.
>

I did various benchmarks with 1, 2, 4, 8, 16 and 32 threads, here are
the results:
http://blog.xupeng.me/misc/4-15krpm-sas-raid10/

> The other thing that's worth checking is to try using filefrag -v on
> the test file after the benchmark has finished, just to make sure the
> file layout is sane.  It should be, but I just want to double check...

Output of `filefrag -v` after the benchmark has finished:

    # filefrag -v /mnt/ext4/test
    Filesystem type is: ef53
    File size of /mnt/ext4/test is 5368709120 (1310720 blocks, blocksize 4096)
     ext logical physical expected length flags
       0       0  2424832           32758
       1   32758  2457600  2457590  32758
       2   65516  2490368  2490358  32758
       3   98274  2523136  2523126  32758
       4  131032  2555904  2555894  32758
       5  163790  2588672  2588662  32758
       6  196548  2719744  2621430  32758
       7  229306  2752512  2752502  32758
       8  262064  2785280  2785270  32758
       9  294822  2818048  2818038  32758
      10  327580  2850816  2850806  32758
      11  360338  2883584  2883574  32758
      12  393096  2916352  2916342  32758
      13  425854  2949120  2949110  32758
      14  458612  2981888  2981878  32758
      15  491370  3014656  3014646  32758
      16  524128  3047424  3047414  32758
      17  556886  3080192  3080182  32758
      18  589644  3112960  3112950  32758
      19  622402  3178496  3145718  32758
      20  655160  3211264  3211254  32758
      21  687918  3244032  3244022  32758
      22  720676  3276800  3276790  32758
      23  753434  3309568  3309558  32758
      24  786192  3342336  3342326  32758
      25  818950  3375104  3375094  32758
      26  851708  3407872  3407862  32758
      27  884466  3440640  3440630  32758
      28  917224  3473408  3473398  32758
      29  949982  3506176  3506166  32758
      30  982740  3538944  3538934  32758
      31 1015498  3571712  3571702  32758
      32 1048256  3604480  3604470  32758
      33 1081014  3637248  3637238  32758
      34 1113772  3702784  3670006  32758
      35 1146530  3735552  3735542  32758
      36 1179288  3768320  3768310  32758
      37 1212046  3801088  3801078  32758
      38 1244804  3833856  3833846  32758
      39 1277562  3866624  3866614  32758
      40 1310320  3899392  3899382    400 eof
    /mnt/ext4/test: 41 extents found

-- 
Xupeng Yun
http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bad performance of ext4 with kernel 3.0.17
  2012-03-02  2:45     ` Ted Ts'o
  2012-03-02  7:06       ` Xupeng Yun
@ 2012-03-03  3:56       ` Xupeng Yun
  1 sibling, 0 replies; 6+ messages in thread
From: Xupeng Yun @ 2012-03-03  3:56 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Ext4 development

On Fri, Mar 2, 2012 at 10:45, Ted Ts'o <tytso@mit.edu> wrote:
> Hmm, it sounds like we're hitting some kind of scaling problem.  How
> many CPU's/cores do you have on your server?  And it would be
> interesting to try varying the --numjobs parameter and see how the
> various file systems behave with 1, 2, 4, 8, and 16 threads.

Oh, there are 8 physical cores with hyper-threading on the server.

-- 
Xupeng Yun
http://about.me/xupeng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-03  3:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-01  5:31 Bad performance of ext4 with kernel 3.0.17 Xupeng Yun
2012-03-01 19:47 ` Ted Ts'o
2012-03-02  0:50   ` Xupeng Yun
2012-03-02  2:45     ` Ted Ts'o
2012-03-02  7:06       ` Xupeng Yun
2012-03-03  3:56       ` Xupeng Yun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.