* Low IOOP Performance
@ 2017-02-27 13:20 John Marrett
2017-02-27 15:46 ` Liu Bo
2017-02-27 16:59 ` Peter Grandi
0 siblings, 2 replies; 7+ messages in thread
From: John Marrett @ 2017-02-27 13:20 UTC (permalink / raw)
To: linux-btrfs
In preparation for a system and storage upgrade I performed some btrfs
performance tests. I created a ten disk raid1 using 7.2k 3 TB SAS
drives and used aio to test IOOP rates. I was surprised to measure 215
read and 72 write IOOPs on the clean new filesystem. Sequential writes
ran as expected at roughly 650 MB/s. I created a zfs filesystem for
comparison on another checksumming filesystem using the same layout
and measured IOOP rates at 4315 read, 1449 write with sync enabled
(without sync it's clearly just writing to RAM), sequential
performance was comparable to btrfs.
While I started testing with a Ubuntu 16.04 kernel I also tested using
4.10.0 (ubuntu packaged, from
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/) and observed
identical performance with both kernels.
Reviewing the mailing list I found a previous discussion [1] about
disappointing tests with fio; I've attempted to use the ba=4k option
discussed in that thread in an effort to improve performance however
it doesn't appear to have an impact.
In case the issue is hardware related the specs on the server are 24GB
of RAM, dual E5620 as well as 12 SAS bays driven by a Dell H700
controller with 12 RAID0 drives with individual disks configured. This
controller configuration is clearly suboptimal and I will replace this
with a LSI JBOD controller (reflashed H300) once it arrives.
Is the performance I'm experiencing expected or is there an issue with
either my configuration, tools or testing methodology?
Below you will find the commands used to create and test each
filesystem as well as their output.
[1] http://linux-btrfs.vger.kernel.narkive.com/2lwzu0Is/why-does-btrfs-benchmark-so-badly-in-this-case
Thanks in advance for any help you can offer,
-JohnF
BTRFS Filesystem Creation and Testing
johnf@altered-carbon:~$ uname -a
Linux altered-carbon 4.10.0-041000-generic #201702191831 SMP Sun Feb
19 23:33:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
johnf@altered-carbon:~$ sudo mkfs.btrfs -f /dev/sdg1 /dev/sdh1
/dev/sdd1 /dev/sdi1 /dev/sdf1 /dev/sdl1 /dev/sde1 /dev/sdm1 /dev/sdj1
/dev/sdk1
[sudo] password for johnf:
btrfs-progs v4.4
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: ded602d6-1b78-4149-ae29-b58735d2f3f1
Node size: 16384
Sector size: 4096
Filesystem size: 27.28TiB
Block group profiles:
Data: RAID0 10.01GiB
Metadata: RAID1 1.01GiB
System: RAID1 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 10
Devices:
ID SIZE PATH
1 2.73TiB /dev/sdg1
2 2.73TiB /dev/sdh1
3 2.73TiB /dev/sdd1
4 2.73TiB /dev/sdi1
5 2.73TiB /dev/sdf1
6 2.73TiB /dev/sdl1
7 2.73TiB /dev/sde1
8 2.73TiB /dev/sdm1
9 2.73TiB /dev/sdj1
10 2.73TiB /dev/sdk1
johnf@altered-carbon:~$ sudo mount /dev/sdg1 /btrfs/
johnf@altered-carbon:~$ sudo btrfs balance start -dconvert=raid1
-mconvert=raid1 /btrfs
Done, had to relocate 3 out of 3 chunks
johnf@altered-carbon:~$ cd /btrfs/
johnf@altered-carbon:/btrfs$ sudo mkdir johnf; sudo chown johnf:johnf
johnf; cd johnf
johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
--ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
--iodepth=64 --size=1G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m(1)] [99.9% done] [4988KB/1664KB/0KB /s] [1247/416/0
iops] [eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=5521: Sun Feb 26 08:23:14 2017
read : io=784996KB, bw=883989B/s, iops=215, runt=909327msec
write: io=263580KB, bw=296819B/s, iops=72, runt=909327msec
cpu : usr=0.33%, sys=1.60%, ctx=196681, majf=0, minf=22
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: io=784996KB, aggrb=863KB/s, minb=863KB/s, maxb=863KB/s,
mint=909327msec, maxt=909327msec
WRITE: io=263580KB, aggrb=289KB/s, minb=289KB/s, maxb=289KB/s,
mint=909327msec, maxt=909327msec
johnf@altered-carbon:/btrfs/johnf$ dd if=/dev/zero
of=/btrfs/johnf/test bs=1M count=250000 conv=fsync
250000+0 records in
250000+0 records out
262144000000 bytes (262 GB, 244 GiB) copied, 404.289 s, 648 MB/s
johnf@altered-carbon:/btrfs/johnf$ sudo btrfs fi show
[sudo] password for johnf:
Label: none uuid: 330a8131-9f68-4d2d-a141-e83ea829de79
Total devices 2 FS bytes used 7.04GiB
devid 1 size 2.73TiB used 8.03GiB path /dev/sdb3
devid 2 size 2.73TiB used 8.03GiB path /dev/sdc3
Label: none uuid: ded602d6-1b78-4149-ae29-b58735d2f3f1
Total devices 10 FS bytes used 244.45GiB
devid 1 size 2.73TiB used 50.00GiB path /dev/sdg1
devid 2 size 2.73TiB used 49.00GiB path /dev/sdh1
devid 3 size 2.73TiB used 49.03GiB path /dev/sdd1
devid 4 size 2.73TiB used 49.00GiB path /dev/sdi1
devid 5 size 2.73TiB used 49.00GiB path /dev/sdf1
devid 6 size 2.73TiB used 49.00GiB path /dev/sdl1
devid 7 size 2.73TiB used 49.00GiB path /dev/sde1
devid 8 size 2.73TiB used 49.03GiB path /dev/sdm1
devid 9 size 2.73TiB used 49.00GiB path /dev/sdj1
devid 10 size 2.73TiB used 50.00GiB path /dev/sdk1
Label: none uuid: 5a99aa45-40b6-4400-a0d8-74e9636fe209
Total devices 1 FS bytes used 1.39GiB
devid 1 size 2.73TiB used 4.02GiB path /dev/sdb1
ZFS Filesystem Creation and Testing
johnf@altered-carbon:~$ sudo zpool create -f zfs mirror /dev/sdg1
/dev/sdh1 mirror /dev/sdd1 /dev/sdi1 mirror /dev/sdf1 /dev/sdl1
mirror /dev/sde1 /dev/sdm1 mirror /dev/sdj1 /dev/sdk1
johnf@altered-carbon:~$ cd /zfs/
johnf@altered-carbon:/zfs$ sudo mkdir johnf; sudo chown johnf:johnf
johnf; cd johnf
johnf@altered-carbon:/zfs/johnf$ fio --randrepeat=1 --ioengine=libaio
--gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64
--size=1G --readwrite=randrw --rwmixread=75 --sync=1
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [m(1)] [100.0% done] [19492KB/6396KB/0KB /s]
[4873/1599/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=7431: Sat Feb 25 17:12:34 2017
read : io=784996KB, bw=17263KB/s, iops=4315, runt= 45474msec
write: io=263580KB, bw=5796.3KB/s, iops=1449, runt= 45474msec
cpu : usr=2.24%, sys=16.44%, ctx=131967, majf=0, minf=561
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: io=784996KB, aggrb=17262KB/s, minb=17262KB/s, maxb=17262KB/s,
mint=45474msec, maxt=45474msec
WRITE: io=263580KB, aggrb=5796KB/s, minb=5796KB/s, maxb=5796KB/s,
mint=45474msec, maxt=45474msec
johnf@altered-carbon:/zfs/johnf$ dd if=/dev/zero of=test bs=1M
count=250000 conv=fsync
250000+0 records in
250000+0 records out
262144000000 bytes (262 GB, 244 GiB) copied, 401.401 s, 653 MB/s
johnf@altered-carbon:/zfs/johnf$ sudo zpool status
pool: zfs
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zfs ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdg1 ONLINE 0 0 0
sdh1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd1 ONLINE 0 0 0
sdi1 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdf1 ONLINE 0 0 0
sdl1 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sde1 ONLINE 0 0 0
sdm1 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
sdj1 ONLINE 0 0 0
sdk1 ONLINE 0 0 0
errors: No known data errors
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance
2017-02-27 13:20 Low IOOP Performance John Marrett
@ 2017-02-27 15:46 ` Liu Bo
2017-02-27 16:59 ` Peter Grandi
1 sibling, 0 replies; 7+ messages in thread
From: Liu Bo @ 2017-02-27 15:46 UTC (permalink / raw)
To: John Marrett; +Cc: linux-btrfs
On Mon, Feb 27, 2017 at 08:20:49AM -0500, John Marrett wrote:
> In preparation for a system and storage upgrade I performed some btrfs
> performance tests. I created a ten disk raid1 using 7.2k 3 TB SAS
> drives and used aio to test IOOP rates. I was surprised to measure 215
> read and 72 write IOOPs on the clean new filesystem. Sequential writes
> ran as expected at roughly 650 MB/s. I created a zfs filesystem for
> comparison on another checksumming filesystem using the same layout
> and measured IOOP rates at 4315 read, 1449 write with sync enabled
> (without sync it's clearly just writing to RAM), sequential
> performance was comparable to btrfs.
>
> While I started testing with a Ubuntu 16.04 kernel I also tested using
> 4.10.0 (ubuntu packaged, from
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/) and observed
> identical performance with both kernels.
>
> Reviewing the mailing list I found a previous discussion [1] about
> disappointing tests with fio; I've attempted to use the ba=4k option
> discussed in that thread in an effort to improve performance however
> it doesn't appear to have an impact.
>
> In case the issue is hardware related the specs on the server are 24GB
> of RAM, dual E5620 as well as 12 SAS bays driven by a Dell H700
> controller with 12 RAID0 drives with individual disks configured. This
> controller configuration is clearly suboptimal and I will replace this
> with a LSI JBOD controller (reflashed H300) once it arrives.
>
> Is the performance I'm experiencing expected or is there an issue with
> either my configuration, tools or testing methodology?
>
For iops testing, libaio + direct=1 is preferred to bypass page cache.
Thanks,
-liubo
> Below you will find the commands used to create and test each
> filesystem as well as their output.
>
> [1] http://linux-btrfs.vger.kernel.narkive.com/2lwzu0Is/why-does-btrfs-benchmark-so-badly-in-this-case
>
> Thanks in advance for any help you can offer,
>
> -JohnF
>
> BTRFS Filesystem Creation and Testing
>
> johnf@altered-carbon:~$ uname -a
> Linux altered-carbon 4.10.0-041000-generic #201702191831 SMP Sun Feb
> 19 23:33:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> johnf@altered-carbon:~$ sudo mkfs.btrfs -f /dev/sdg1 /dev/sdh1
> /dev/sdd1 /dev/sdi1 /dev/sdf1 /dev/sdl1 /dev/sde1 /dev/sdm1 /dev/sdj1
> /dev/sdk1
> [sudo] password for johnf:
> btrfs-progs v4.4
> See http://btrfs.wiki.kernel.org for more information.
>
> Label: (null)
> UUID: ded602d6-1b78-4149-ae29-b58735d2f3f1
> Node size: 16384
> Sector size: 4096
> Filesystem size: 27.28TiB
> Block group profiles:
> Data: RAID0 10.01GiB
> Metadata: RAID1 1.01GiB
> System: RAID1 12.00MiB
> SSD detected: no
> Incompat features: extref, skinny-metadata
> Number of devices: 10
> Devices:
> ID SIZE PATH
> 1 2.73TiB /dev/sdg1
> 2 2.73TiB /dev/sdh1
> 3 2.73TiB /dev/sdd1
> 4 2.73TiB /dev/sdi1
> 5 2.73TiB /dev/sdf1
> 6 2.73TiB /dev/sdl1
> 7 2.73TiB /dev/sde1
> 8 2.73TiB /dev/sdm1
> 9 2.73TiB /dev/sdj1
> 10 2.73TiB /dev/sdk1
>
> johnf@altered-carbon:~$ sudo mount /dev/sdg1 /btrfs/
> johnf@altered-carbon:~$ sudo btrfs balance start -dconvert=raid1
> -mconvert=raid1 /btrfs
> Done, had to relocate 3 out of 3 chunks
> johnf@altered-carbon:~$ cd /btrfs/
> johnf@altered-carbon:/btrfs$ sudo mkdir johnf; sudo chown johnf:johnf
> johnf; cd johnf
> johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
> --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
> --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75
> test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
> fio-2.2.10
> Starting 1 process
> test: Laying out IO file(s) (1 file(s) / 1024MB)
> Jobs: 1 (f=1): [m(1)] [99.9% done] [4988KB/1664KB/0KB /s] [1247/416/0
> iops] [eta 00m:01s]
> test: (groupid=0, jobs=1): err= 0: pid=5521: Sun Feb 26 08:23:14 2017
> read : io=784996KB, bw=883989B/s, iops=215, runt=909327msec
> write: io=263580KB, bw=296819B/s, iops=72, runt=909327msec
> cpu : usr=0.33%, sys=1.60%, ctx=196681, majf=0, minf=22
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: io=784996KB, aggrb=863KB/s, minb=863KB/s, maxb=863KB/s,
> mint=909327msec, maxt=909327msec
> WRITE: io=263580KB, aggrb=289KB/s, minb=289KB/s, maxb=289KB/s,
> mint=909327msec, maxt=909327msec
>
>
> johnf@altered-carbon:/btrfs/johnf$ dd if=/dev/zero
> of=/btrfs/johnf/test bs=1M count=250000 conv=fsync
> 250000+0 records in
> 250000+0 records out
> 262144000000 bytes (262 GB, 244 GiB) copied, 404.289 s, 648 MB/s
>
> johnf@altered-carbon:/btrfs/johnf$ sudo btrfs fi show
> [sudo] password for johnf:
> Label: none uuid: 330a8131-9f68-4d2d-a141-e83ea829de79
> Total devices 2 FS bytes used 7.04GiB
> devid 1 size 2.73TiB used 8.03GiB path /dev/sdb3
> devid 2 size 2.73TiB used 8.03GiB path /dev/sdc3
>
> Label: none uuid: ded602d6-1b78-4149-ae29-b58735d2f3f1
> Total devices 10 FS bytes used 244.45GiB
> devid 1 size 2.73TiB used 50.00GiB path /dev/sdg1
> devid 2 size 2.73TiB used 49.00GiB path /dev/sdh1
> devid 3 size 2.73TiB used 49.03GiB path /dev/sdd1
> devid 4 size 2.73TiB used 49.00GiB path /dev/sdi1
> devid 5 size 2.73TiB used 49.00GiB path /dev/sdf1
> devid 6 size 2.73TiB used 49.00GiB path /dev/sdl1
> devid 7 size 2.73TiB used 49.00GiB path /dev/sde1
> devid 8 size 2.73TiB used 49.03GiB path /dev/sdm1
> devid 9 size 2.73TiB used 49.00GiB path /dev/sdj1
> devid 10 size 2.73TiB used 50.00GiB path /dev/sdk1
>
> Label: none uuid: 5a99aa45-40b6-4400-a0d8-74e9636fe209
> Total devices 1 FS bytes used 1.39GiB
> devid 1 size 2.73TiB used 4.02GiB path /dev/sdb1
>
>
> ZFS Filesystem Creation and Testing
>
> johnf@altered-carbon:~$ sudo zpool create -f zfs mirror /dev/sdg1
> /dev/sdh1 mirror /dev/sdd1 /dev/sdi1 mirror /dev/sdf1 /dev/sdl1
> mirror /dev/sde1 /dev/sdm1 mirror /dev/sdj1 /dev/sdk1
> johnf@altered-carbon:~$ cd /zfs/
> johnf@altered-carbon:/zfs$ sudo mkdir johnf; sudo chown johnf:johnf
> johnf; cd johnf
> johnf@altered-carbon:/zfs/johnf$ fio --randrepeat=1 --ioengine=libaio
> --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64
> --size=1G --readwrite=randrw --rwmixread=75 --sync=1
> test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
> fio-2.2.10
> Starting 1 process
> Jobs: 1 (f=1): [m(1)] [100.0% done] [19492KB/6396KB/0KB /s]
> [4873/1599/0 iops] [eta 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=7431: Sat Feb 25 17:12:34 2017
> read : io=784996KB, bw=17263KB/s, iops=4315, runt= 45474msec
> write: io=263580KB, bw=5796.3KB/s, iops=1449, runt= 45474msec
> cpu : usr=2.24%, sys=16.44%, ctx=131967, majf=0, minf=561
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: io=784996KB, aggrb=17262KB/s, minb=17262KB/s, maxb=17262KB/s,
> mint=45474msec, maxt=45474msec
> WRITE: io=263580KB, aggrb=5796KB/s, minb=5796KB/s, maxb=5796KB/s,
> mint=45474msec, maxt=45474msec
> johnf@altered-carbon:/zfs/johnf$ dd if=/dev/zero of=test bs=1M
> count=250000 conv=fsync
> 250000+0 records in
> 250000+0 records out
> 262144000000 bytes (262 GB, 244 GiB) copied, 401.401 s, 653 MB/s
> johnf@altered-carbon:/zfs/johnf$ sudo zpool status
> pool: zfs
> state: ONLINE
> scan: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> zfs ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> sdg1 ONLINE 0 0 0
> sdh1 ONLINE 0 0 0
> mirror-1 ONLINE 0 0 0
> sdd1 ONLINE 0 0 0
> sdi1 ONLINE 0 0 0
> mirror-2 ONLINE 0 0 0
> sdf1 ONLINE 0 0 0
> sdl1 ONLINE 0 0 0
> mirror-3 ONLINE 0 0 0
> sde1 ONLINE 0 0 0
> sdm1 ONLINE 0 0 0
> mirror-4 ONLINE 0 0 0
> sdj1 ONLINE 0 0 0
> sdk1 ONLINE 0 0 0
>
> errors: No known data errors
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance
2017-02-27 13:20 Low IOOP Performance John Marrett
2017-02-27 15:46 ` Liu Bo
@ 2017-02-27 16:59 ` Peter Grandi
[not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com>
2017-02-27 22:11 ` Peter Grandi
1 sibling, 2 replies; 7+ messages in thread
From: Peter Grandi @ 2017-02-27 16:59 UTC (permalink / raw)
To: linux-btrfs
[ ... ]
> a ten disk raid1 using 7.2k 3 TB SAS drives
Those are really low IOPS-per-TB devices, but good choice for
SAS, as they will have SCT/ERC.
> and used aio to test IOOP rates. I was surprised to measure
> 215 read and 72 write IOOPs on the clean new filesystem.
For that you really want to use the 'raid10' profile, 'raid1' is
quite different, and has an odd recovery "gotcha". Also so far
'raid1' in Btrfs only reads from one of the two mirrors per
thread.
Anyhow the 72 write IOPS look like single member device IOPS
rate and that's puzzling, as if Btrfs is not going
multithreading to a many-device 'raid1' profile volume.
I have a 6-device test setup at home and I tried various setups
and I think I got rather better than that.
> Sequential writes ran as expected at roughly 650 MB/s.
That's a bit too high: on single similar drive I get around
65MB/s average with relatively large files, I would expect
around 4-5x that from a 10-device mirrored profile, regardless
of filesystem type.
I strongly suspect that we have a different notion of "IOPS",
perhaps either logical vs. physical IOPS, or randomish vs.
sequentialish IOPS. I'll have a look at your attachments in more
detail.
> I created a zfs filesystem for comparison on another
> checksumming filesystem using the same layout and measured
> IOOP rates at 4315 read, 1449 write with sync enabled (without
> sync it's clearly just writing to RAM), sequential performance
> was comparable to btrfs.
It seems unlikely to me that you got that with a 10-device
mirror 'vdev', most likely you configured it as a stripe of 5x
2-device mirror vdevs, that is RAID10.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance
[not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com>
@ 2017-02-27 19:15 ` John Marrett
2017-02-27 19:43 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 7+ messages in thread
From: John Marrett @ 2017-02-27 19:15 UTC (permalink / raw)
To: Peter Grandi, bo.li.liu; +Cc: linux-btrfs
Liubo correctly identified direct IO as a solution for my test
performance issues, with it in use I achieved 908 read and 305 write,
not quite as fast as ZFS but more than adequate for my needs. I then
applied Peter's recommendation of switching to raid10 and tripled
performance again up to 3000 read and 1000 write IOOPs.
I do not understand why the page cache had such a large negative
impact on performance, it seems like it should have no impact or help
slightly with caching rather than severely impact both read and write
IO. Is this expected behaviour and what is the real world impact on
applications that don't use direct IO?
With regards to RAID10 my understanding is that I can't mix drive
sizes and use their full capacity on a RAID10 volume. My current
server runs a mixture of drive sizes and I am likely to need to do so
again in the future. Can I do this and still enjoy the performance
benefits of RAID 10?
> > a ten disk raid1 using 7.2k 3 TB SAS drives
>
> Those are really low IOPS-per-TB devices, but good choice for
> SAS, as they will have SCT/ERC.
I don't expect the best IOOP performance from them, they are intended
for bulk data storage, however the results I had previously didn't
seem acceptable or normal.
>
> I strongly suspect that we have a different notion of "IOPS",
> perhaps either logical vs. physical IOPS, or randomish vs.
> sequentialish IOPS. I'll have a look at your attachments in more
> detail.
I did not achieve 650 MB/s with random IO nor do I expect to, it was a
sequential write of 250 GB performed using dd with the conv=fsync
option to ensure that all writes were complete before reporting write
speed.
>
> > I created a zfs filesystem for comparison on another
> > checksumming filesystem using the same layout and measured
> > IOOP rates at 4315 read, 1449 write with sync enabled (without
> > sync it's clearly just writing to RAM), sequential performance
> > was comparable to btrfs.
>
> It seems unlikely to me that you got that with a 10-device
> mirror 'vdev', most likely you configured it as a stripe of 5x
> 2-device mirror vdevs, that is RAID10.
This is correct, it was a RAID10 across 5 mirrored volumes.
Thank you both very much for your help with my testing,
-JohnF
RAID1 Direct IO Test Results
johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
--ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
--iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --direct=1
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [8336KB/2732KB/0KB /s] [2084/683/0
iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=12270: Mon Feb 27 11:49:04 2017
read : io=784996KB, bw=3634.6KB/s, iops=908, runt=215981msec
write: io=263580KB, bw=1220.4KB/s, iops=305, runt=215981msec
cpu : usr=1.50%, sys=8.18%, ctx=244134, majf=0, minf=116
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: io=784996KB, aggrb=3634KB/s, minb=3634KB/s, maxb=3634KB/s,
mint=215981msec, maxt=215981msec
WRITE: io=263580KB, aggrb=1220KB/s, minb=1220KB/s, maxb=1220KB/s,
mint=215981msec, maxt=215981msec
RAID10 Direct IO Test Results
johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
--ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
--iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --ba=4k
--direct=1
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [16136KB/5312KB/0KB /s]
[4034/1328/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=12644: Mon Feb 27 13:50:35 2017
read : io=784996KB, bw=12003KB/s, iops=3000, runt= 65401msec
write: io=263580KB, bw=4030.3KB/s, iops=1007, runt= 65401msec
cpu : usr=3.66%, sys=19.54%, ctx=188302, majf=0, minf=22
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: io=784996KB, aggrb=12002KB/s, minb=12002KB/s, maxb=12002KB/s,
mint=65401msec, maxt=65401msec
WRITE: io=263580KB, aggrb=4030KB/s, minb=4030KB/s, maxb=4030KB/s,
mint=65401msec, maxt=65401msec
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance
2017-02-27 19:15 ` John Marrett
@ 2017-02-27 19:43 ` Austin S. Hemmelgarn
0 siblings, 0 replies; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2017-02-27 19:43 UTC (permalink / raw)
To: John Marrett; +Cc: Peter Grandi, bo.li.liu, linux-btrfs
On 2017-02-27 14:15, John Marrett wrote:
> Liubo correctly identified direct IO as a solution for my test
> performance issues, with it in use I achieved 908 read and 305 write,
> not quite as fast as ZFS but more than adequate for my needs. I then
> applied Peter's recommendation of switching to raid10 and tripled
> performance again up to 3000 read and 1000 write IOOPs.
>
> I do not understand why the page cache had such a large negative
> impact on performance, it seems like it should have no impact or help
> slightly with caching rather than severely impact both read and write
> IO. Is this expected behaviour and what is the real world impact on
> applications that don't use direct IO?
Generally yes, it is expected behavior, but it's not really all that
high impact for most things that don't use direct IO, since most things
that don't either:
1. Care more about bulk streaming throughput than IOPS.
2. Aren't performance sensitive enough for it to matter.
3. Actually need the page cache for performance reasons (the Linux page
cache actually does pretty well for read-heavy workloads with consistent
access patterns).
If you look, you should actually see lower bulk streaming throughput
with direct IO than without on most devices, especially when dealing
with ATA or USB disks, since the page cache functionally reduces the
number of IO requests that get sent to a device even if it's all new
data. The read-ahead it does to achieve this though only works for
purely or mostly sequential workloads, so it ends up being detrimental
to random access or very sparsely sequential workloads, which in turn
are what usually care about IOPS over streaming throughput.
>
> With regards to RAID10 my understanding is that I can't mix drive
> sizes and use their full capacity on a RAID10 volume. My current
> server runs a mixture of drive sizes and I am likely to need to do so
> again in the future. Can I do this and still enjoy the performance
> benefits of RAID 10?
In theory, you should be fine. BTRFS will use narrower stripes when it
has to, as long as it has at least 4 disks to put data on. If you can
make sure you have even numbers of drives of each size (and ideally an
even total number of drives), you should get close to full utilization.
Keep in mind though that as the FS gets more and more full (and the
stripes therefore get narrower), you'll start to see odd, seemingly
arbitrary performance differences based on what your accessing.
That said, if you can manage to just use an even number of identically
sized disks, you can get even more performance by running BTRFS in raid1
mode on top of two LVM or MD RAID0 volumes. That will give you the same
data safety as BTRFS raid10 mode, but depending on the work load can
increase performance pretty significantly (I see about a 10-20%
difference currently, but I don't have any particularly write intensive
workloads). Note that doing so will improve sequential access
performance more than random access, so it may not be worth the effort
in your case.
>
>>> a ten disk raid1 using 7.2k 3 TB SAS drives
>>
>> Those are really low IOPS-per-TB devices, but good choice for
>> SAS, as they will have SCT/ERC.
>
>
> I don't expect the best IOOP performance from them, they are intended
> for bulk data storage, however the results I had previously didn't
> seem acceptable or normal.
>
>>
>> I strongly suspect that we have a different notion of "IOPS",
>> perhaps either logical vs. physical IOPS, or randomish vs.
>> sequentialish IOPS. I'll have a look at your attachments in more
>> detail.
>
>
> I did not achieve 650 MB/s with random IO nor do I expect to, it was a
> sequential write of 250 GB performed using dd with the conv=fsync
> option to ensure that all writes were complete before reporting write
> speed.
>
>>
>>> I created a zfs filesystem for comparison on another
>>> checksumming filesystem using the same layout and measured
>>> IOOP rates at 4315 read, 1449 write with sync enabled (without
>>> sync it's clearly just writing to RAM), sequential performance
>>> was comparable to btrfs.
>>
>> It seems unlikely to me that you got that with a 10-device
>> mirror 'vdev', most likely you configured it as a stripe of 5x
>> 2-device mirror vdevs, that is RAID10.
>
>
> This is correct, it was a RAID10 across 5 mirrored volumes.
>
> Thank you both very much for your help with my testing,
>
> -JohnF
>
> RAID1 Direct IO Test Results
>
> johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
> --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
> --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --direct=1
> test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
> fio-2.2.10
> Starting 1 process
> test: Laying out IO file(s) (1 file(s) / 1024MB)
> Jobs: 1 (f=1): [m(1)] [100.0% done] [8336KB/2732KB/0KB /s] [2084/683/0
> iops] [eta 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=12270: Mon Feb 27 11:49:04 2017
> read : io=784996KB, bw=3634.6KB/s, iops=908, runt=215981msec
> write: io=263580KB, bw=1220.4KB/s, iops=305, runt=215981msec
> cpu : usr=1.50%, sys=8.18%, ctx=244134, majf=0, minf=116
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: io=784996KB, aggrb=3634KB/s, minb=3634KB/s, maxb=3634KB/s,
> mint=215981msec, maxt=215981msec
> WRITE: io=263580KB, aggrb=1220KB/s, minb=1220KB/s, maxb=1220KB/s,
> mint=215981msec, maxt=215981msec
>
>
> RAID10 Direct IO Test Results
>
> johnf@altered-carbon:/btrfs/johnf$ fio --randrepeat=1
> --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k
> --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 --ba=4k
> --direct=1
> test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
> fio-2.2.10
> Starting 1 process
> test: Laying out IO file(s) (1 file(s) / 1024MB)
> Jobs: 1 (f=1): [m(1)] [100.0% done] [16136KB/5312KB/0KB /s]
> [4034/1328/0 iops] [eta 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=12644: Mon Feb 27 13:50:35 2017
> read : io=784996KB, bw=12003KB/s, iops=3000, runt= 65401msec
> write: io=263580KB, bw=4030.3KB/s, iops=1007, runt= 65401msec
> cpu : usr=3.66%, sys=19.54%, ctx=188302, majf=0, minf=22
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=196249/w=65895/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: io=784996KB, aggrb=12002KB/s, minb=12002KB/s, maxb=12002KB/s,
> mint=65401msec, maxt=65401msec
> WRITE: io=263580KB, aggrb=4030KB/s, minb=4030KB/s, maxb=4030KB/s,
> mint=65401msec, maxt=65401msec
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance
2017-02-27 16:59 ` Peter Grandi
[not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com>
@ 2017-02-27 22:11 ` Peter Grandi
2017-02-27 22:32 ` Peter Grandi
1 sibling, 1 reply; 7+ messages in thread
From: Peter Grandi @ 2017-02-27 22:11 UTC (permalink / raw)
To: linux-btrfs
[ ... ]
> I have a 6-device test setup at home and I tried various setups
> and I think I got rather better than that.
* 'raid1' profile:
soft# btrfs fi df /mnt/sdb5
Data, RAID1: total=273.00GiB, used=269.94GiB
System, RAID1: total=32.00MiB, used=56.00KiB
Metadata, RAID1: total=1.00GiB, used=510.70MiB
GlobalReserve, single: total=176.00MiB, used=0.00B
soft# fio --directory=/mnt/sdb5 --runtime=30 --status-interval=10 blocks-randomish.fio | tail -3
Run status group 0 (all jobs):
READ: io=105508KB, aggrb=3506KB/s, minb=266KB/s, maxb=311KB/s, mint=30009msec, maxt=30090msec
WRITE: io=100944KB, aggrb=3354KB/s, minb=256KB/s, maxb=296KB/s, mint=30009msec, maxt=30090msec
* 'raid10' profile:
soft# btrfs fi df /mnt/sdb6
Data, RAID10: total=276.00GiB, used=272.49GiB
System, RAID10: total=96.00MiB, used=48.00KiB
Metadata, RAID10: total=3.00GiB, used=512.06MiB
GlobalReserve, single: total=176.00MiB, used=0.00B
soft# fio --directory=/mnt/sdb6 --runtime=30 --status-interval=10 blocks-randomish.fio | tail -3
Run status group 0 (all jobs):
READ: io=89056KB, aggrb=2961KB/s, minb=225KB/s, maxb=271KB/s, mint=30009msec, maxt=30076msec
WRITE: io=85248KB, aggrb=2834KB/s, minb=212KB/s, maxb=261KB/s, mint=30009msec, maxt=30076msec
* 'single' profile on MD RAID10:
soft# btrfs fi df /mnt/md0
Data, single: total=278.01GiB, used=274.32GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=2.01GiB, used=615.73MiB
GlobalReserve, single: total=208.00MiB, used=0.00B
soft# grep -A1 md0 /proc/mdstat
md0 : active raid10 sdg1[6] sdb1[0] sdd1[2] sdf1[4] sdc1[1] sde1[3]
364904232 blocks super 1.0 8K chunks 2 near-copies [6/6] [UUUUUU]
soft# fio --directory=/mnt/md0 --runtime=30 --status-interval=10 blocks-randomish.fio | tail -3
Run status group 0 (all jobs):
READ: io=160928KB, aggrb=5357KB/s, minb=271KB/s, maxb=615KB/s, mint=30012msec, maxt=30038msec
WRITE: io=158892KB, aggrb=5289KB/s, minb=261KB/s, maxb=616KB/s, mint=30012msec, maxt=30038msec
That's a range of 700-1300 4KiB random mixed-rw IOPS, quite
reasonable for 6x 1TB 7200RPM SATA drives, each capable of
100-120. It helps that the test file is just 100G, 10% of the
total drive extent, so arm movement is limited.
Not surprising that the much more mature MD RAID has an edge, a
bit stranger that on this the 'raid1' profile seems a bit faster
than the 'raid10' profile.
The much smaller numbers seem to happen to me too (probably some
misfeature of 'fio') with 'buffered=1', and the larger numbers
for ZFSonLinux are "suspicious".
> It seems unlikely to me that you got that with a 10-device
> mirror 'vdev', most likely you configured it as a stripe of 5x
> 2-device mirror vdevs, that is RAID10.
Indeed I double checked the end of the attached lost and that
was the case.
My FIO config file:
# vim:set ft=ini:
[global]
filename=FIO-TEST
fallocate=keep
size=100G
buffered=0
ioengine=libaio
io_submit_mode=offload
iodepth=2
numjobs=12
blocksize=4K
kb_base=1024
[rand-mixed]
rw=randrw
stonewall
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Low IOOP Performance
2017-02-27 22:11 ` Peter Grandi
@ 2017-02-27 22:32 ` Peter Grandi
0 siblings, 0 replies; 7+ messages in thread
From: Peter Grandi @ 2017-02-27 22:32 UTC (permalink / raw)
To: linux-btrfs
>>> On Mon, 27 Feb 2017 22:11:29 +0000, pg@btrfs.list.sabi.co.UK (Peter Grandi) said:
> [ ... ]
>> I have a 6-device test setup at home and I tried various setups
>> and I think I got rather better than that.
[ ... ]
> That's a range of 700-1300 4KiB random mixed-rw IOPS,
Rerun with 1M blocksize:
soft# fio --directory=/mnt/sdb5 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | tail -3
Run status group 0 (all jobs):
READ: io=2646.0MB, aggrb=89372KB/s, minb=7130KB/s, maxb=7776KB/s, mint=30081msec, maxt=30317msec
WRITE: io=2297.0MB, aggrb=77584KB/s, minb=6082KB/s, maxb=6796KB/s, mint=30081msec, maxt=30317msec
soft# fio --directory=/mnt/sdb6 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | tail -3
Run status group 0 (all jobs):
READ: io=2781.0MB, aggrb=94015KB/s, minb=5932KB/s, maxb=10290KB/s, mint=30121msec, maxt=30290msec
WRITE: io=2431.0MB, aggrb=82183KB/s, minb=4779KB/s, maxb=9102KB/s, mint=30121msec, maxt=30290msec
soft# killall -9 fio
fio: no process found
soft# fio --directory=/mnt/md0 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | tail -3
Run status group 0 (all jobs):
READ: io=1504.0MB, aggrb=50402KB/s, minb=3931KB/s, maxb=4387KB/s, mint=30343msec, maxt=30556msec
WRITE: io=1194.0MB, aggrb=40013KB/s, minb=3158KB/s, maxb=3475KB/s, mint=30343msec, maxt=30556msec
Interesting that Btrfs 'single' on MD RAID10 becomes rather
slower (I guess low level of intrinsic parallelism).
For comparison, the same on a JFS on top of MD RAID10:
soft# grep -A1 md40 /proc/mdstat
md40 : active raid10 sdg4[5] sdd4[2] sdb4[0] sdf4[4] sdc4[1] sde4[3]
486538240 blocks super 1.0 512K chunks 3 near-copies [6/6] [UUUUUU]
soft# fio --directory=/mnt/md40 --runtime=30 --status-interval=10 --blocksize=4K blocks-randomish.fio | grep -A2 '(all jobs)' | tail -3
Run status group 0 (all jobs):
READ: io=31408KB, aggrb=1039KB/s, minb=80KB/s, maxb=90KB/s, mint=30206msec, maxt=30227msec
WRITE: io=27800KB, aggrb=919KB/s, minb=70KB/s, maxb=81KB/s, mint=30206msec, maxt=30227msec
soft# fio --directory=/mnt/md40 --runtime=30 --status-interval=10 --blocksize=1M blocks-randomish.fio | grep -A2 '(all jobs)' | tail -3
Run status group 0 (all jobs):
READ: io=2151.0MB, aggrb=72619KB/s, minb=5865KB/s, maxb=6383KB/s, mint=30134msec, maxt=30331msec
WRITE: io=1772.0MB, aggrb=59824KB/s, minb=4712KB/s, maxb=5365KB/s, mint=30134msec, maxt=30331msec
XFS is usually better at multithreaded workloads within the same
file (rather than across files).
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-02-27 23:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 13:20 Low IOOP Performance John Marrett
2017-02-27 15:46 ` Liu Bo
2017-02-27 16:59 ` Peter Grandi
[not found] ` <CAAafysEG7qXF3ZRQPnOjALNCd0jLTHJKRi5ns8xrH4s-6eYgog@mail.gmail.com>
2017-02-27 19:15 ` John Marrett
2017-02-27 19:43 ` Austin S. Hemmelgarn
2017-02-27 22:11 ` Peter Grandi
2017-02-27 22:32 ` Peter Grandi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.