* Low IOOP Performance @ 2017-02-27 13:20 John Marrett 2017-02-27 15:46 ` Liu Bo 2017-02-27 16:59 ` Peter Grandi 0 siblings, 2 replies; 7+ messages in thread From: John Marrett @ 2017-02-27 13:20 UTC (permalink / raw) To: linux-btrfs In preparation for a system and storage upgrade I performed some btrfs performance tests. I created a ten disk raid1 using 7.2k 3 TB SAS drives and used aio to test IOOP rates. I was surprised to measure 215 read and 72 write IOOPs on the clean new filesystem. Sequential writes ran as expected at roughly 650 MB/s. I created a zfs filesystem for comparison on another checksumming filesystem using the same layout and measured IOOP rates at 4315 read, 1449 write with sync enabled (without sync it's clearly just writing to RAM), sequential performance was comparable to btrfs. While I started testing with a Ubuntu 16.04 kernel I also tested using 4.10.0 (ubuntu packaged, from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/) and observed identical performance with both kernels. Reviewing the mailing list I found a previous discussion [1] about disappointing tests with fio; I've attempted to use the ba=4k option discussed in that thread in an effort to improve performance however it doesn't appear to have an impact. In case the issue is hardware related the specs on the server are 24GB of RAM, dual E5620 as well as 12 SAS bays driven by a Dell H700 controller with 12 RAID0 drives with individual disks configured. This controller configuration is clearly suboptimal and I will replace this with a LSI JBOD controller (reflashed H300) once it arrives. Is the performance I'm experiencing expected or is there an issue with either my configuration, tools or testing methodology? Below you will find the commands used to create and test each filesystem as well as their output. [1] http://linux-btrfs.vger.kernel.narkive.com/2lwzu0Is/why-does-btrfs-benchmark-so-badly-in-this-case Thanks in advance for any help you can offer, -JohnF BTRFS Filesystem Creation and Testing johnf@altered-carbon:~$ uname -a Linux altered-carbon 4.10.0-041000-generic #201702191831 SMP Sun Feb 19 23:33:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux johnf@altered-carbon:~$ sudo mkfs.btrfs -f /dev/sdg1 /dev/sdh1 /dev/sdd1 /dev/sdi1 /dev/sdf1 /dev/sdl1 /dev/sde1 /dev/sdm1 /dev/sdj1 /dev/sdk1 [sudo] password for johnf: btrfs-progs v4.4 See http://btrfs.wiki.kernel.org for more information. Label: (null) UUID: ded602d6-1b78-4149-ae29-b58735d2f3f1 Node size: 16384 Sector size: 4096 Filesystem size: 27.28TiB Block group profiles: Data: RAID0 10.01GiB Metadata: RAID1 1.01GiB System: RAID1 12.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 10 Devices: ID SIZE PATH 1 2.73TiB /dev/sdg1 2 2.73TiB /dev/sdh1 3 2.73TiB /dev/sdd1 4 2.73TiB /dev/sdi1 5 2.73TiB /dev/sdf1 6 2.73TiB /dev/sdl1 7 2.73TiB /dev/sde1 8 2.73TiB /dev/sdm1 9 2.73TiB /dev/sdj1 10 2.73TiB /dev/sdk1 johnf@altered-carbon:~$ sudo mount /dev/sdg1 /btrfs/ johnf@altered-carbon:~$ sudo btrfs balance start -dconvert=raid1 -mconvert=raid1 /btrfs Done, had to relocate 3 out of 3 chunks johnf@altered-carbon:~$ cd /btrfs/ johnf@altered-carbon:/btrfs$ sudo mkdir johnf; sudo chown johnf:johnf johnf; cd johnf johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.2.10 Starting 1 process test: Laying out IO file(s) (1 file(s) / 1024MB) Jobs: 1 (f=1): [m(1)] [99.9% done] [4988KB/1664KB/0KB /s] [1247/416/0 iops] [eta 00m:01s] test: (groupid=0, jobs=1): err= 0: pid=5521: Sun Feb 26 08:23:14 2017 read : io=784996KB, bw=883989B/s, iops=215, runt=909327msec write: io=263580KB, bw=296819B/s, iops=72, runt=909327msec cpu : usr=0.33%, sys=1.60%, ctx=196681, majf=0, minf=22 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: io=784996KB, aggrb=863KB/s, minb=863KB/s, maxb=863KB/s, mint=909327msec, maxt=909327msec WRITE: io=263580KB, aggrb=289KB/s, minb=289KB/s, maxb=289KB/s, mint=909327msec, maxt=909327msec johnf@altered-carbon:/btrfs/johnf$ dd if=/dev/zero of=/btrfs/johnf/test bs=1M count=250000 conv=fsync 250000+0 records in 250000+0 records out 262144000000 bytes (262 GB, 244 GiB) copied, 404.289 s, 648 MB/s johnf@altered-carbon:/btrfs/johnf$ sudo btrfs fi show [sudo] password for johnf: Label: none uuid: 330a8131-9f68-4d2d-a141-e83ea829de79 Total devices 2 FS bytes used 7.04GiB devid 1 size 2.73TiB used 8.03GiB path /dev/sdb3 devid 2 size 2.73TiB used 8.03GiB path /dev/sdc3 Label: none uuid: ded602d6-1b78-4149-ae29-b58735d2f3f1 Total devices 10 FS bytes used 244.45GiB devid 1 size 2.73TiB used 50.00GiB path /dev/sdg1 devid 2 size 2.73TiB used 49.00GiB path /dev/sdh1 devid 3 size 2.73TiB used 49.03GiB path /dev/sdd1 devid 4 size 2.73TiB used 49.00GiB path /dev/sdi1 devid 5 size 2.73TiB used 49.00GiB path /dev/sdf1 devid 6 size 2.73TiB used 49.00GiB path /dev/sdl1 devid 7 size 2.73TiB used 49.00GiB path /dev/sde1 devid 8 size 2.73TiB used 49.03GiB path /dev/sdm1 devid 9 size 2.73TiB used 49.00GiB path /dev/sdj1 devid 10 size 2.73TiB used 50.00GiB path /dev/sdk1 Label: none uuid: 5a99aa45-40b6-4400-a0d8-74e9636fe209 Total devices 1 FS bytes used 1.39GiB devid 1 size 2.73TiB used 4.02GiB path /dev/sdb1 ZFS Filesystem Creation and Testing johnf@altered-carbon:~$ sudo zpool create -f zfs mirror /dev/sdg1 /dev/sdh1 mirror /dev/sdd1 /dev/sdi1 mirror /dev/sdf1 /dev/sdl1 mirror /dev/sde1 /dev/sdm1 mirror /dev/sdj1 /dev/sdk1 johnf@altered-carbon:~$ cd /zfs/ johnf@altered-carbon:/zfs$ sudo mkdir johnf; sudo chown johnf:johnf johnf; cd johnf johnf@altered-carbon:/zfs/johnf$ fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --sync=1 test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.2.10 Starting 1 process Jobs: 1 (f=1): [m(1)] [100.0% done] [19492KB/6396KB/0KB /s] [4873/1599/0 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=7431: Sat Feb 25 17:12:34 2017 read : io=784996KB, bw=17263KB/s, iops=4315, runt= 45474msec write: io=263580KB, bw=5796.3KB/s, iops=1449, runt= 45474msec cpu : usr=2.24%, sys=16.44%, ctx=131967, majf=0, minf=561 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: io=784996KB, aggrb=17262KB/s, minb=17262KB/s, maxb=17262KB/s, mint=45474msec, maxt=45474msec WRITE: io=263580KB, aggrb=5796KB/s, minb=5796KB/s, maxb=5796KB/s, mint=45474msec, maxt=45474msec johnf@altered-carbon:/zfs/johnf$ dd if=/dev/zero of=test bs=1M count=250000 conv=fsync 250000+0 records in 250000+0 records out 262144000000 bytes (262 GB, 244 GiB) copied, 401.401 s, 653 MB/s johnf@altered-carbon:/zfs/johnf$ sudo zpool status pool: zfs state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zfs ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdg1 ONLINE 0 0 0 sdh1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 sdd1 ONLINE 0 0 0 sdi1 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 sdf1 ONLINE 0 0 0 sdl1 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 sde1 ONLINE 0 0 0 sdm1 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 sdj1 ONLINE 0 0 0 sdk1 ONLINE 0 0 0 errors: No known data errors ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance 2017-02-27 13:20 Low IOOP Performance John Marrett @ 2017-02-27 15:46 ` Liu Bo 2017-02-27 16:59 ` Peter Grandi 1 sibling, 0 replies; 7+ messages in thread From: Liu Bo @ 2017-02-27 15:46 UTC (permalink / raw) To: John Marrett; +Cc: linux-btrfs On Mon, Feb 27, 2017 at 08:20:49AM -0500, John Marrett wrote: > In preparation for a system and storage upgrade I performed some btrfs > performance tests. I created a ten disk raid1 using 7.2k 3 TB SAS > drives and used aio to test IOOP rates. I was surprised to measure 215 > read and 72 write IOOPs on the clean new filesystem. Sequential writes > ran as expected at roughly 650 MB/s. I created a zfs filesystem for > comparison on another checksumming filesystem using the same layout > and measured IOOP rates at 4315 read, 1449 write with sync enabled > (without sync it's clearly just writing to RAM), sequential > performance was comparable to btrfs. > > While I started testing with a Ubuntu 16.04 kernel I also tested using > 4.10.0 (ubuntu packaged, from > http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/) and observed > identical performance with both kernels. > > Reviewing the mailing list I found a previous discussion [1] about > disappointing tests with fio; I've attempted to use the ba=4k option > discussed in that thread in an effort to improve performance however > it doesn't appear to have an impact. > > In case the issue is hardware related the specs on the server are 24GB > of RAM, dual E5620 as well as 12 SAS bays driven by a Dell H700 > controller with 12 RAID0 drives with individual disks configured. This > controller configuration is clearly suboptimal and I will replace this > with a LSI JBOD controller (reflashed H300) once it arrives. > > Is the performance I'm experiencing expected or is there an issue with > either my configuration, tools or testing methodology? > For iops testing, libaio + direct=1 is preferred to bypass page cache. Thanks, -liubo > Below you will find the commands used to create and test each > filesystem as well as their output. > > [1] http://linux-btrfs.vger.kernel.narkive.com/2lwzu0Is/why-does-btrfs-benchmark-so-badly-in-this-case > > Thanks in advance for any help you can offer, > > -JohnF > > BTRFS Filesystem Creation and Testing > > johnf@altered-carbon:~$ uname -a > Linux altered-carbon 4.10.0-041000-generic #201702191831 SMP Sun Feb > 19 23:33:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > johnf@altered-carbon:~$ sudo mkfs.btrfs -f /dev/sdg1 /dev/sdh1 > /dev/sdd1 /dev/sdi1 /dev/sdf1 /dev/sdl1 /dev/sde1 /dev/sdm1 /dev/sdj1 > /dev/sdk1 > [sudo] password for johnf: > btrfs-progs v4.4 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: ded602d6-1b78-4149-ae29-b58735d2f3f1 > Node size: 16384 > Sector size: 4096 > Filesystem size: 27.28TiB > Block group profiles: > Data: RAID0 10.01GiB > Metadata: RAID1 1.01GiB > System: RAID1 12.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 10 > Devices: > ID SIZE PATH > 1 2.73TiB /dev/sdg1 > 2 2.73TiB /dev/sdh1 > 3 2.73TiB /dev/sdd1 > 4 2.73TiB /dev/sdi1 > 5 2.73TiB /dev/sdf1 > 6 2.73TiB /dev/sdl1 > 7 2.73TiB /dev/sde1 > 8 2.73TiB /dev/sdm1 > 9 2.73TiB /dev/sdj1 > 10 2.73TiB /dev/sdk1 > > johnf@altered-carbon:~$ sudo mount /dev/sdg1 /btrfs/ > johnf@altered-carbon:~$ sudo btrfs balance start -dconvert=raid1 > -mconvert=raid1 /btrfs > Done, had to relocate 3 out of 3 chunks > johnf@altered-carbon:~$ cd /btrfs/ > johnf@altered-carbon:/btrfs$ sudo mkdir johnf; sudo chown johnf:johnf > johnf; cd johnf > johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1 > --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k > --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 > test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 > fio-2.2.10 > Starting 1 process > test: Laying out IO file(s) (1 file(s) / 1024MB) > Jobs: 1 (f=1): [m(1)] [99.9% done] [4988KB/1664KB/0KB /s] [1247/416/0 > iops] [eta 00m:01s] > test: (groupid=0, jobs=1): err= 0: pid=5521: Sun Feb 26 08:23:14 2017 > read : io=784996KB, bw=883989B/s, iops=215, runt=909327msec > write: io=263580KB, bw=296819B/s, iops=72, runt=909327msec > cpu : usr=0.33%, sys=1.60%, ctx=196681, majf=0, minf=22 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: io=784996KB, aggrb=863KB/s, minb=863KB/s, maxb=863KB/s, > mint=909327msec, maxt=909327msec > WRITE: io=263580KB, aggrb=289KB/s, minb=289KB/s, maxb=289KB/s, > mint=909327msec, maxt=909327msec > > > johnf@altered-carbon:/btrfs/johnf$ dd if=/dev/zero > of=/btrfs/johnf/test bs=1M count=250000 conv=fsync > 250000+0 records in > 250000+0 records out > 262144000000 bytes (262 GB, 244 GiB) copied, 404.289 s, 648 MB/s > > johnf@altered-carbon:/btrfs/johnf$ sudo btrfs fi show > [sudo] password for johnf: > Label: none uuid: 330a8131-9f68-4d2d-a141-e83ea829de79 > Total devices 2 FS bytes used 7.04GiB > devid 1 size 2.73TiB used 8.03GiB path /dev/sdb3 > devid 2 size 2.73TiB used 8.03GiB path /dev/sdc3 > > Label: none uuid: ded602d6-1b78-4149-ae29-b58735d2f3f1 > Total devices 10 FS bytes used 244.45GiB > devid 1 size 2.73TiB used 50.00GiB path /dev/sdg1 > devid 2 size 2.73TiB used 49.00GiB path /dev/sdh1 > devid 3 size 2.73TiB used 49.03GiB path /dev/sdd1 > devid 4 size 2.73TiB used 49.00GiB path /dev/sdi1 > devid 5 size 2.73TiB used 49.00GiB path /dev/sdf1 > devid 6 size 2.73TiB used 49.00GiB path /dev/sdl1 > devid 7 size 2.73TiB used 49.00GiB path /dev/sde1 > devid 8 size 2.73TiB used 49.03GiB path /dev/sdm1 > devid 9 size 2.73TiB used 49.00GiB path /dev/sdj1 > devid 10 size 2.73TiB used 50.00GiB path /dev/sdk1 > > Label: none uuid: 5a99aa45-40b6-4400-a0d8-74e9636fe209 > Total devices 1 FS bytes used 1.39GiB > devid 1 size 2.73TiB used 4.02GiB path /dev/sdb1 > > > ZFS Filesystem Creation and Testing > > johnf@altered-carbon:~$ sudo zpool create -f zfs mirror /dev/sdg1 > /dev/sdh1 mirror /dev/sdd1 /dev/sdi1 mirror /dev/sdf1 /dev/sdl1 > mirror /dev/sde1 /dev/sdm1 mirror /dev/sdj1 /dev/sdk1 > johnf@altered-carbon:~$ cd /zfs/ > johnf@altered-carbon:/zfs$ sudo mkdir johnf; sudo chown johnf:johnf > johnf; cd johnf > johnf@altered-carbon:/zfs/johnf$ fio --randrepeat=1 --ioengine=libaio > --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 > --size=1G --readwrite=randrw --rwmixread=75 --sync=1 > test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 > fio-2.2.10 > Starting 1 process > Jobs: 1 (f=1): [m(1)] [100.0% done] [19492KB/6396KB/0KB /s] > [4873/1599/0 iops] [eta 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=7431: Sat Feb 25 17:12:34 2017 > read : io=784996KB, bw=17263KB/s, iops=4315, runt= 45474msec > write: io=263580KB, bw=5796.3KB/s, iops=1449, runt= 45474msec > cpu : usr=2.24%, sys=16.44%, ctx=131967, majf=0, minf=561 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: io=784996KB, aggrb=17262KB/s, minb=17262KB/s, maxb=17262KB/s, > mint=45474msec, maxt=45474msec > WRITE: io=263580KB, aggrb=5796KB/s, minb=5796KB/s, maxb=5796KB/s, > mint=45474msec, maxt=45474msec > johnf@altered-carbon:/zfs/johnf$ dd if=/dev/zero of=test bs=1M > count=250000 conv=fsync > 250000+0 records in > 250000+0 records out > 262144000000 bytes (262 GB, 244 GiB) copied, 401.401 s, 653 MB/s > johnf@altered-carbon:/zfs/johnf$ sudo zpool status > pool: zfs > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zfs ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > sdg1 ONLINE 0 0 0 > sdh1 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > sdd1 ONLINE 0 0 0 > sdi1 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > sdf1 ONLINE 0 0 0 > sdl1 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 > sde1 ONLINE 0 0 0 > sdm1 ONLINE 0 0 0 > mirror-4 ONLINE 0 0 0 > sdj1 ONLINE 0 0 0 > sdk1 ONLINE 0 0 0 > > errors: No known data errors > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance 2017-02-27 13:20 Low IOOP Performance John Marrett 2017-02-27 15:46 ` Liu Bo @ 2017-02-27 16:59 ` Peter Grandi [not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com> 2017-02-27 22:11 ` Peter Grandi 1 sibling, 2 replies; 7+ messages in thread From: Peter Grandi @ 2017-02-27 16:59 UTC (permalink / raw) To: linux-btrfs [ ... ] > a ten disk raid1 using 7.2k 3 TB SAS drives Those are really low IOPS-per-TB devices, but good choice for SAS, as they will have SCT/ERC. > and used aio to test IOOP rates. I was surprised to measure > 215 read and 72 write IOOPs on the clean new filesystem. For that you really want to use the 'raid10' profile, 'raid1' is quite different, and has an odd recovery "gotcha". Also so far 'raid1' in Btrfs only reads from one of the two mirrors per thread. Anyhow the 72 write IOPS look like single member device IOPS rate and that's puzzling, as if Btrfs is not going multithreading to a many-device 'raid1' profile volume. I have a 6-device test setup at home and I tried various setups and I think I got rather better than that. > Sequential writes ran as expected at roughly 650 MB/s. That's a bit too high: on single similar drive I get around 65MB/s average with relatively large files, I would expect around 4-5x that from a 10-device mirrored profile, regardless of filesystem type. I strongly suspect that we have a different notion of "IOPS", perhaps either logical vs. physical IOPS, or randomish vs. sequentialish IOPS. I'll have a look at your attachments in more detail. > I created a zfs filesystem for comparison on another > checksumming filesystem using the same layout and measured > IOOP rates at 4315 read, 1449 write with sync enabled (without > sync it's clearly just writing to RAM), sequential performance > was comparable to btrfs. It seems unlikely to me that you got that with a 10-device mirror 'vdev', most likely you configured it as a stripe of 5x 2-device mirror vdevs, that is RAID10. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com>]
* Re: Low IOOP Performance [not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com> @ 2017-02-27 19:15 ` John Marrett 2017-02-27 19:43 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 7+ messages in thread From: John Marrett @ 2017-02-27 19:15 UTC (permalink / raw) To: Peter Grandi, bo.li.liu; +Cc: linux-btrfs Liubo correctly identified direct IO as a solution for my test performance issues, with it in use I achieved 908 read and 305 write, not quite as fast as ZFS but more than adequate for my needs. I then applied Peter's recommendation of switching to raid10 and tripled performance again up to 3000 read and 1000 write IOOPs. I do not understand why the page cache had such a large negative impact on performance, it seems like it should have no impact or help slightly with caching rather than severely impact both read and write IO. Is this expected behaviour and what is the real world impact on applications that don't use direct IO? With regards to RAID10 my understanding is that I can't mix drive sizes and use their full capacity on a RAID10 volume. My current server runs a mixture of drive sizes and I am likely to need to do so again in the future. Can I do this and still enjoy the performance benefits of RAID 10? > > a ten disk raid1 using 7.2k 3 TB SAS drives > > Those are really low IOPS-per-TB devices, but good choice for > SAS, as they will have SCT/ERC. I don't expect the best IOOP performance from them, they are intended for bulk data storage, however the results I had previously didn't seem acceptable or normal. > > I strongly suspect that we have a different notion of "IOPS", > perhaps either logical vs. physical IOPS, or randomish vs. > sequentialish IOPS. I'll have a look at your attachments in more > detail. I did not achieve 650 MB/s with random IO nor do I expect to, it was a sequential write of 250 GB performed using dd with the conv=fsync option to ensure that all writes were complete before reporting write speed. > > > I created a zfs filesystem for comparison on another > > checksumming filesystem using the same layout and measured > > IOOP rates at 4315 read, 1449 write with sync enabled (without > > sync it's clearly just writing to RAM), sequential performance > > was comparable to btrfs. > > It seems unlikely to me that you got that with a 10-device > mirror 'vdev', most likely you configured it as a stripe of 5x > 2-device mirror vdevs, that is RAID10. This is correct, it was a RAID10 across 5 mirrored volumes. Thank you both very much for your help with my testing, -JohnF RAID1 Direct IO Test Results johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --direct=1 test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.2.10 Starting 1 process test: Laying out IO file(s) (1 file(s) / 1024MB) Jobs: 1 (f=1): [m(1)] [100.0% done] [8336KB/2732KB/0KB /s] [2084/683/0 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=12270: Mon Feb 27 11:49:04 2017 read : io=784996KB, bw=3634.6KB/s, iops=908, runt=215981msec write: io=263580KB, bw=1220.4KB/s, iops=305, runt=215981msec cpu : usr=1.50%, sys=8.18%, ctx=244134, majf=0, minf=116 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: io=784996KB, aggrb=3634KB/s, minb=3634KB/s, maxb=3634KB/s, mint=215981msec, maxt=215981msec WRITE: io=263580KB, aggrb=1220KB/s, minb=1220KB/s, maxb=1220KB/s, mint=215981msec, maxt=215981msec RAID10 Direct IO Test Results johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --ba=4k --direct=1 test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.2.10 Starting 1 process test: Laying out IO file(s) (1 file(s) / 1024MB) Jobs: 1 (f=1): [m(1)] [100.0% done] [16136KB/5312KB/0KB /s] [4034/1328/0 iops] [eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=12644: Mon Feb 27 13:50:35 2017 read : io=784996KB, bw=12003KB/s, iops=3000, runt= 65401msec write: io=263580KB, bw=4030.3KB/s, iops=1007, runt= 65401msec cpu : usr=3.66%, sys=19.54%, ctx=188302, majf=0, minf=22 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: io=784996KB, aggrb=12002KB/s, minb=12002KB/s, maxb=12002KB/s, mint=65401msec, maxt=65401msec WRITE: io=263580KB, aggrb=4030KB/s, minb=4030KB/s, maxb=4030KB/s, mint=65401msec, maxt=65401msec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance 2017-02-27 19:15 ` John Marrett @ 2017-02-27 19:43 ` Austin S. Hemmelgarn 0 siblings, 0 replies; 7+ messages in thread From: Austin S. Hemmelgarn @ 2017-02-27 19:43 UTC (permalink / raw) To: John Marrett; +Cc: Peter Grandi, bo.li.liu, linux-btrfs On 2017-02-27 14:15, John Marrett wrote: > Liubo correctly identified direct IO as a solution for my test > performance issues, with it in use I achieved 908 read and 305 write, > not quite as fast as ZFS but more than adequate for my needs. I then > applied Peter's recommendation of switching to raid10 and tripled > performance again up to 3000 read and 1000 write IOOPs. > > I do not understand why the page cache had such a large negative > impact on performance, it seems like it should have no impact or help > slightly with caching rather than severely impact both read and write > IO. Is this expected behaviour and what is the real world impact on > applications that don't use direct IO? Generally yes, it is expected behavior, but it's not really all that high impact for most things that don't use direct IO, since most things that don't either: 1. Care more about bulk streaming throughput than IOPS. 2. Aren't performance sensitive enough for it to matter. 3. Actually need the page cache for performance reasons (the Linux page cache actually does pretty well for read-heavy workloads with consistent access patterns). If you look, you should actually see lower bulk streaming throughput with direct IO than without on most devices, especially when dealing with ATA or USB disks, since the page cache functionally reduces the number of IO requests that get sent to a device even if it's all new data. The read-ahead it does to achieve this though only works for purely or mostly sequential workloads, so it ends up being detrimental to random access or very sparsely sequential workloads, which in turn are what usually care about IOPS over streaming throughput. > > With regards to RAID10 my understanding is that I can't mix drive > sizes and use their full capacity on a RAID10 volume. My current > server runs a mixture of drive sizes and I am likely to need to do so > again in the future. Can I do this and still enjoy the performance > benefits of RAID 10? In theory, you should be fine. BTRFS will use narrower stripes when it has to, as long as it has at least 4 disks to put data on. If you can make sure you have even numbers of drives of each size (and ideally an even total number of drives), you should get close to full utilization. Keep in mind though that as the FS gets more and more full (and the stripes therefore get narrower), you'll start to see odd, seemingly arbitrary performance differences based on what your accessing. That said, if you can manage to just use an even number of identically sized disks, you can get even more performance by running BTRFS in raid1 mode on top of two LVM or MD RAID0 volumes. That will give you the same data safety as BTRFS raid10 mode, but depending on the work load can increase performance pretty significantly (I see about a 10-20% difference currently, but I don't have any particularly write intensive workloads). Note that doing so will improve sequential access performance more than random access, so it may not be worth the effort in your case. > >>> a ten disk raid1 using 7.2k 3 TB SAS drives >> >> Those are really low IOPS-per-TB devices, but good choice for >> SAS, as they will have SCT/ERC. > > > I don't expect the best IOOP performance from them, they are intended > for bulk data storage, however the results I had previously didn't > seem acceptable or normal. > >> >> I strongly suspect that we have a different notion of "IOPS", >> perhaps either logical vs. physical IOPS, or randomish vs. >> sequentialish IOPS. I'll have a look at your attachments in more >> detail. > > > I did not achieve 650 MB/s with random IO nor do I expect to, it was a > sequential write of 250 GB performed using dd with the conv=fsync > option to ensure that all writes were complete before reporting write > speed. > >> >>> I created a zfs filesystem for comparison on another >>> checksumming filesystem using the same layout and measured >>> IOOP rates at 4315 read, 1449 write with sync enabled (without >>> sync it's clearly just writing to RAM), sequential performance >>> was comparable to btrfs. >> >> It seems unlikely to me that you got that with a 10-device >> mirror 'vdev', most likely you configured it as a stripe of 5x >> 2-device mirror vdevs, that is RAID10. > > > This is correct, it was a RAID10 across 5 mirrored volumes. > > Thank you both very much for your help with my testing, > > -JohnF > > RAID1 Direct IO Test Results > > johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1 > --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k > --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --direct=1 > test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 > fio-2.2.10 > Starting 1 process > test: Laying out IO file(s) (1 file(s) / 1024MB) > Jobs: 1 (f=1): [m(1)] [100.0% done] [8336KB/2732KB/0KB /s] [2084/683/0 > iops] [eta 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=12270: Mon Feb 27 11:49:04 2017 > read : io=784996KB, bw=3634.6KB/s, iops=908, runt=215981msec > write: io=263580KB, bw=1220.4KB/s, iops=305, runt=215981msec > cpu : usr=1.50%, sys=8.18%, ctx=244134, majf=0, minf=116 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: io=784996KB, aggrb=3634KB/s, minb=3634KB/s, maxb=3634KB/s, > mint=215981msec, maxt=215981msec > WRITE: io=263580KB, aggrb=1220KB/s, minb=1220KB/s, maxb=1220KB/s, > mint=215981msec, maxt=215981msec > > > RAID10 Direct IO Test Results > > johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1 > --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k > --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --ba=4k > --direct=1 > test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 > fio-2.2.10 > Starting 1 process > test: Laying out IO file(s) (1 file(s) / 1024MB) > Jobs: 1 (f=1): [m(1)] [100.0% done] [16136KB/5312KB/0KB /s] > [4034/1328/0 iops] [eta 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=12644: Mon Feb 27 13:50:35 2017 > read : io=784996KB, bw=12003KB/s, iops=3000, runt= 65401msec > write: io=263580KB, bw=4030.3KB/s, iops=1007, runt= 65401msec > cpu : usr=3.66%, sys=19.54%, ctx=188302, majf=0, minf=22 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: io=784996KB, aggrb=12002KB/s, minb=12002KB/s, maxb=12002KB/s, > mint=65401msec, maxt=65401msec > WRITE: io=263580KB, aggrb=4030KB/s, minb=4030KB/s, maxb=4030KB/s, > mint=65401msec, maxt=65401msec > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance 2017-02-27 16:59 ` Peter Grandi [not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com> @ 2017-02-27 22:11 ` Peter Grandi 2017-02-27 22:32 ` Peter Grandi 1 sibling, 1 reply; 7+ messages in thread From: Peter Grandi @ 2017-02-27 22:11 UTC (permalink / raw) To: linux-btrfs [ ... ] > I have a 6-device test setup at home and I tried various setups > and I think I got rather better than that. * 'raid1' profile: soft# btrfs fi df /mnt/sdb5 Data, RAID1: total=273.00GiB, used=269.94GiB System, RAID1: total=32.00MiB, used=56.00KiB Metadata, RAID1: total=1.00GiB, used=510.70MiB GlobalReserve, single: total=176.00MiB, used=0.00B soft# fio --directory=/mnt/sdb5 --runtime=30 --status-interval=10 blocks-randomish.fio | tail -3 Run status group 0 (all jobs): READ: io=105508KB, aggrb=3506KB/s, minb=266KB/s, maxb=311KB/s, mint=30009msec, maxt=30090msec WRITE: io=100944KB, aggrb=3354KB/s, minb=256KB/s, maxb=296KB/s, mint=30009msec, maxt=30090msec * 'raid10' profile: soft# btrfs fi df /mnt/sdb6 Data, RAID10: total=276.00GiB, used=272.49GiB System, RAID10: total=96.00MiB, used=48.00KiB Metadata, RAID10: total=3.00GiB, used=512.06MiB GlobalReserve, single: total=176.00MiB, used=0.00B soft# fio --directory=/mnt/sdb6 --runtime=30 --status-interval=10 blocks-randomish.fio | tail -3 Run status group 0 (all jobs): READ: io=89056KB, aggrb=2961KB/s, minb=225KB/s, maxb=271KB/s, mint=30009msec, maxt=30076msec WRITE: io=85248KB, aggrb=2834KB/s, minb=212KB/s, maxb=261KB/s, mint=30009msec, maxt=30076msec * 'single' profile on MD RAID10: soft# btrfs fi df /mnt/md0 Data, single: total=278.01GiB, used=274.32GiB System, single: total=4.00MiB, used=48.00KiB Metadata, single: total=2.01GiB, used=615.73MiB GlobalReserve, single: total=208.00MiB, used=0.00B soft# grep -A1 md0 /proc/mdstat md0 : active raid10 sdg1[6] sdb1[0] sdd1[2] sdf1[4] sdc1[1] sde1[3] 364904232 blocks super 1.0 8K chunks 2 near-copies [6/6] [UUUUUU] soft# fio --directory=/mnt/md0 --runtime=30 --status-interval=10 blocks-randomish.fio | tail -3 Run status group 0 (all jobs): READ: io=160928KB, aggrb=5357KB/s, minb=271KB/s, maxb=615KB/s, mint=30012msec, maxt=30038msec WRITE: io=158892KB, aggrb=5289KB/s, minb=261KB/s, maxb=616KB/s, mint=30012msec, maxt=30038msec That's a range of 700-1300 4KiB random mixed-rw IOPS, quite reasonable for 6x 1TB 7200RPM SATA drives, each capable of 100-120. It helps that the test file is just 100G, 10% of the total drive extent, so arm movement is limited. Not surprising that the much more mature MD RAID has an edge, a bit stranger that on this the 'raid1' profile seems a bit faster than the 'raid10' profile. The much smaller numbers seem to happen to me too (probably some misfeature of 'fio') with 'buffered=1', and the larger numbers for ZFSonLinux are "suspicious". > It seems unlikely to me that you got that with a 10-device > mirror 'vdev', most likely you configured it as a stripe of 5x > 2-device mirror vdevs, that is RAID10. Indeed I double checked the end of the attached lost and that was the case. My FIO config file: # vim:set ft=ini: [global] filename=FIO-TEST fallocate=keep size=100G buffered=0 ioengine=libaio io_submit_mode=offload iodepth=2 numjobs=12 blocksize=4K kb_base=1024 [rand-mixed] rw=randrw stonewall ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance 2017-02-27 22:11 ` Peter Grandi @ 2017-02-27 22:32 ` Peter Grandi 0 siblings, 0 replies; 7+ messages in thread From: Peter Grandi @ 2017-02-27 22:32 UTC (permalink / raw) To: linux-btrfs >>> On Mon, 27 Feb 2017 22:11:29 +0000, pg@btrfs.list.sabi.co.UK (Peter Grandi) said: > [ ... ] >> I have a 6-device test setup at home and I tried various setups >> and I think I got rather better than that. [ ... ] > That's a range of 700-1300 4KiB random mixed-rw IOPS, Rerun with 1M blocksize: soft# fio --directory=/mnt/sdb5 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | tail -3 Run status group 0 (all jobs): READ: io=2646.0MB, aggrb=89372KB/s, minb=7130KB/s, maxb=7776KB/s, mint=30081msec, maxt=30317msec WRITE: io=2297.0MB, aggrb=77584KB/s, minb=6082KB/s, maxb=6796KB/s, mint=30081msec, maxt=30317msec soft# fio --directory=/mnt/sdb6 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | tail -3 Run status group 0 (all jobs): READ: io=2781.0MB, aggrb=94015KB/s, minb=5932KB/s, maxb=10290KB/s, mint=30121msec, maxt=30290msec WRITE: io=2431.0MB, aggrb=82183KB/s, minb=4779KB/s, maxb=9102KB/s, mint=30121msec, maxt=30290msec soft# killall -9 fio fio: no process found soft# fio --directory=/mnt/md0 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | tail -3 Run status group 0 (all jobs): READ: io=1504.0MB, aggrb=50402KB/s, minb=3931KB/s, maxb=4387KB/s, mint=30343msec, maxt=30556msec WRITE: io=1194.0MB, aggrb=40013KB/s, minb=3158KB/s, maxb=3475KB/s, mint=30343msec, maxt=30556msec Interesting that Btrfs 'single' on MD RAID10 becomes rather slower (I guess low level of intrinsic parallelism). For comparison, the same on a JFS on top of MD RAID10: soft# grep -A1 md40 /proc/mdstat md40 : active raid10 sdg4[5] sdd4[2] sdb4[0] sdf4[4] sdc4[1] sde4[3] 486538240 blocks super 1.0 512K chunks 3 near-copies [6/6] [UUUUUU] soft# fio --directory=/mnt/md40 --runtime=30 --status-interval=10 --blocksize=4K blocks-randomish.fio | grep -A2 '(all jobs)' | tail -3 Run status group 0 (all jobs): READ: io=31408KB, aggrb=1039KB/s, minb=80KB/s, maxb=90KB/s, mint=30206msec, maxt=30227msec WRITE: io=27800KB, aggrb=919KB/s, minb=70KB/s, maxb=81KB/s, mint=30206msec, maxt=30227msec soft# fio --directory=/mnt/md40 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | grep -A2 '(all jobs)' | tail -3 Run status group 0 (all jobs): READ: io=2151.0MB, aggrb=72619KB/s, minb=5865KB/s, maxb=6383KB/s, mint=30134msec, maxt=30331msec WRITE: io=1772.0MB, aggrb=59824KB/s, minb=4712KB/s, maxb=5365KB/s, mint=30134msec, maxt=30331msec XFS is usually better at multithreaded workloads within the same file (rather than across files). ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-02-27 23:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-27 13:20 Low IOOP Performance John Marrett 2017-02-27 15:46 ` Liu Bo 2017-02-27 16:59 ` Peter Grandi [not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com> 2017-02-27 19:15 ` John Marrett 2017-02-27 19:43 ` Austin S. Hemmelgarn 2017-02-27 22:11 ` Peter Grandi 2017-02-27 22:32 ` Peter Grandi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.