* 4.2.1: Poor write performance for DomU.
@ 2013-02-20 2:10 Steven Haigh
2013-02-20 8:26 ` Roger Pau Monné
0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-02-20 2:10 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 5443 bytes --]
Hi guys,
Firstly, please CC me in to any replies as I'm not a subscriber these days.
I've been trying to debug a problem with Xen 4.2.1 where I am unable to
achieve more than ~50Mb/sec sustained sequential write to a disk. The
DomU is configured as such:
name = "zeus.vm"
memory = 1024
vcpus = 2
cpus = "1-3"
disk = [ 'phy:/dev/RAID1/zeus.vm,xvda,w',
'phy:/dev/vg_raid6/fileshare,xvdb,w' ]
vif = [ "mac=02:16:36:35:35:09, bridge=br203,
vifname=vm.zeus.203", "mac=10:16:36:35:35:09, bridge=br10,
vifname=vm.zeus.10" ]
bootloader = "pygrub"
on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'restart'
I have tested the underlying LVM config by mounting
/dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as a
benchmark:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
xenhost.lan.crc. 2G 667 96 186976 21 80430 14 956 95 290591 26
373.7 8
Latency 26416us 212ms 168ms 35494us 35989us 83759us
Version 1.96 ------Sequential Create------ --------Random
Create--------
xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 14901 32 +++++ +++ 19672 39 15307 34 +++++ +++
18158 37
Latency 17838us 141us 298us 365us 133us 296us
~186Mb/sec write, ~290Mb/sec read. Awesome.
I then started a single DomU which gets passed /dev/vg_raid6/fileshare
through as xvdb. It is then mounted in /mnt/fileshare/. I then ran
bonnie++ again in the DomU:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 658 96 50618 8 42398 10 1138 99 267568 30
494.9 11
Latency 22959us 226ms 311ms 14617us 41816us 72814us
Version 1.96 ------Sequential Create------ --------Random
Create--------
zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 21749 59 +++++ +++ 31089 73 23283 64 +++++ +++
31114 75
Latency 18989us 164us 928us 480us 26us 87us
~50Mb/sec write, ~267Mb/sec read. Not so awesome.
/dev/vg_raid6/fileshare exists as an LV on /dev/md2:
# lvdisplay vg_raid6/fileshare
--- Logical volume ---
LV Path /dev/vg_raid6/fileshare
LV Name fileshare
VG Name vg_raid6
LV UUID cwC0yK-Xr56-WB5v-10bw-3AZT-pYj0-piWett
LV Write Access read/write
LV Creation host, time xenhost.lan.crc.id.au, 2013-02-18 20:59:40 +1100
LV Status available
# open 1
LV Size 2.50 TiB
Current LE 655360
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 1024
Block device 253:5
md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
[4/4] [UUUU]
Heres a quick output of 'xm info' - although its full VM load is running
now:
# xm info
host : xenhost.lan.crc.id.au
release : 3.7.9-1.el6xen.x86_64
version : #1 SMP Mon Feb 18 14:46:35 EST 2013
machine : x86_64
nr_cpus : 4
nr_nodes : 1
cores_per_socket : 4
threads_per_core : 1
cpu_mhz : 3303
hw_caps :
bfebfbff:28100800:00000000:00003f40:179ae3bf:00000000:00000001:00000000
virt_caps : hvm
total_memory : 8116
free_memory : 1346
free_cpus : 0
xen_major : 4
xen_minor : 2
xen_extra : .1
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : unavailable
xen_commandline : dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1
dom0_vcpus_pin
cc_compiler : gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
cc_compile_by : mockbuild
cc_compile_domain : crc.id.au
cc_compile_date : Sat Feb 16 19:16:38 EST 2013
xend_config_format : 4
In a nutshell, does anyone know *why* I am only able to get ~50Mb/sec
sequential writes to the DomU? It certainly isn't a problem getting
normal speeds to the LV while mounted in the Dom0.
All OS are Scientific Linux 6.3. The Dom0 runs packages from my
kernel-xen repo (http://au1.mirror.crc.id.au/repo/el6/x86_64/). The DomU
is completely stock packages.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 2:10 4.2.1: Poor write performance for DomU Steven Haigh
@ 2013-02-20 8:26 ` Roger Pau Monné
2013-02-20 8:49 ` Steven Haigh
0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2013-02-20 8:26 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
On 20/02/13 03:10, Steven Haigh wrote:
> Hi guys,
>
> Firstly, please CC me in to any replies as I'm not a subscriber these days.
>
> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
> achieve more than ~50Mb/sec sustained sequential write to a disk. The
> DomU is configured as such:
Since you mention 4.2.1 explicitly, is this a performance regression
from previous versions? (4.2.0 or the 4.1 branch)
> name = "zeus.vm"
> memory = 1024
> vcpus = 2
> cpus = "1-3"
> disk = [ 'phy:/dev/RAID1/zeus.vm,xvda,w',
> 'phy:/dev/vg_raid6/fileshare,xvdb,w' ]
> vif = [ "mac=02:16:36:35:35:09, bridge=br203,
> vifname=vm.zeus.203", "mac=10:16:36:35:35:09, bridge=br10,
> vifname=vm.zeus.10" ]
> bootloader = "pygrub"
>
> on_poweroff = 'destroy'
> on_reboot = 'restart'
> on_crash = 'restart'
>
> I have tested the underlying LVM config by mounting
> /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as a
> benchmark:
>
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xenhost.lan.crc. 2G 667 96 186976 21 80430 14 956 95 290591 26
> 373.7 8
> Latency 26416us 212ms 168ms 35494us 35989us 83759us
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 14901 32 +++++ +++ 19672 39 15307 34 +++++ +++
> 18158 37
> Latency 17838us 141us 298us 365us 133us 296us
>
> ~186Mb/sec write, ~290Mb/sec read. Awesome.
>
> I then started a single DomU which gets passed /dev/vg_raid6/fileshare
> through as xvdb. It is then mounted in /mnt/fileshare/. I then ran
> bonnie++ again in the DomU:
>
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zeus.crc.id.au 2G 658 96 50618 8 42398 10 1138 99 267568 30
> 494.9 11
> Latency 22959us 226ms 311ms 14617us 41816us 72814us
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 21749 59 +++++ +++ 31089 73 23283 64 +++++ +++
> 31114 75
> Latency 18989us 164us 928us 480us 26us 87us
>
> ~50Mb/sec write, ~267Mb/sec read. Not so awesome.
We are currently working on improving the speed of pv block drivers, I
will look into this difference between the read/write speed, but I would
guess this is due to the size of the request/ring.
>
> /dev/vg_raid6/fileshare exists as an LV on /dev/md2:
>
> # lvdisplay vg_raid6/fileshare
> --- Logical volume ---
> LV Path /dev/vg_raid6/fileshare
> LV Name fileshare
> VG Name vg_raid6
> LV UUID cwC0yK-Xr56-WB5v-10bw-3AZT-pYj0-piWett
> LV Write Access read/write
> LV Creation host, time xenhost.lan.crc.id.au, 2013-02-18 20:59:40 +1100
> LV Status available
> # open 1
> LV Size 2.50 TiB
> Current LE 655360
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 1024
> Block device 253:5
>
>
> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
> [4/4] [UUUU]
>
> Heres a quick output of 'xm info' - although its full VM load is running
> now:
> # xm info
> host : xenhost.lan.crc.id.au
> release : 3.7.9-1.el6xen.x86_64
> version : #1 SMP Mon Feb 18 14:46:35 EST 2013
> machine : x86_64
> nr_cpus : 4
> nr_nodes : 1
> cores_per_socket : 4
> threads_per_core : 1
> cpu_mhz : 3303
> hw_caps :
> bfebfbff:28100800:00000000:00003f40:179ae3bf:00000000:00000001:00000000
> virt_caps : hvm
> total_memory : 8116
> free_memory : 1346
> free_cpus : 0
> xen_major : 4
> xen_minor : 2
> xen_extra : .1
> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler : credit
> xen_pagesize : 4096
> platform_params : virt_start=0xffff800000000000
> xen_changeset : unavailable
> xen_commandline : dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1
> dom0_vcpus_pin
> cc_compiler : gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
> cc_compile_by : mockbuild
> cc_compile_domain : crc.id.au
> cc_compile_date : Sat Feb 16 19:16:38 EST 2013
> xend_config_format : 4
>
> In a nutshell, does anyone know *why* I am only able to get ~50Mb/sec
> sequential writes to the DomU? It certainly isn't a problem getting
> normal speeds to the LV while mounted in the Dom0.
>
> All OS are Scientific Linux 6.3. The Dom0 runs packages from my
> kernel-xen repo (http://au1.mirror.crc.id.au/repo/el6/x86_64/). The DomU
> is completely stock packages.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 8:26 ` Roger Pau Monné
@ 2013-02-20 8:49 ` Steven Haigh
2013-02-20 9:49 ` Steven Haigh
0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-02-20 8:49 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2086 bytes --]
On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
> On 20/02/13 03:10, Steven Haigh wrote:
>> Hi guys,
>>
>> Firstly, please CC me in to any replies as I'm not a subscriber these days.
>>
>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>> DomU is configured as such:
>
> Since you mention 4.2.1 explicitly, is this a performance regression
> from previous versions? (4.2.0 or the 4.1 branch)
This is actually a very good question. I've reinstalled my older
packages of Xen 4.1.3 back on the system. Rebooting into the new
hypervisor, then starting the single DomU again. Ran bonnie++ again on
the DomU:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 658 97 54893 9 40845 10 1056 97 280453 33
561.2 13
Latency 27145us 426ms 257ms 31900us 24701us
222ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 19281 52 +++++ +++ +++++ +++ 24435 66 +++++ +++
+++++ +++
Latency 22860us 182us 706us 14803us 28us
300us
Still around 50Mb/sec - so this doesn't seem to be a regression, but
something else?
>> ~50Mb/sec write, ~267Mb/sec read. Not so awesome.
>
> We are currently working on improving the speed of pv block drivers, I
> will look into this difference between the read/write speed, but I would
> guess this is due to the size of the request/ring.
I would assume this would be in the DomU kernel?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 8:49 ` Steven Haigh
@ 2013-02-20 9:49 ` Steven Haigh
2013-02-20 10:12 ` Jan Beulich
2013-03-08 8:54 ` Steven Haigh
0 siblings, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-02-20 9:49 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2143 bytes --]
On 20/02/2013 7:49 PM, Steven Haigh wrote:
> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>> On 20/02/13 03:10, Steven Haigh wrote:
>>> Hi guys,
>>>
>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>> days.
>>>
>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>> DomU is configured as such:
>>
>> Since you mention 4.2.1 explicitly, is this a performance regression
>> from previous versions? (4.2.0 or the 4.1 branch)
>
> This is actually a very good question. I've reinstalled my older
> packages of Xen 4.1.3 back on the system. Rebooting into the new
> hypervisor, then starting the single DomU again. Ran bonnie++ again on
> the DomU:
>
> Still around 50Mb/sec - so this doesn't seem to be a regression, but
> something else?
I've actually done a bit of thinking about this... A recent thread on
linux-raid kernel mailing list about Xen and DomU throughput made me
revisit my setup. I know I used to be able to saturate GigE both ways
(send and receive) to the samba share served by this DomU. This would
mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
versions this was as this point in time I cannot say.
As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
size. This seemed to make rebuild/resync speeds way worse - so I
reverted to 128Kb chunk size.
The benchmarks I am getting from the Dom0 is about what I'd expect - but
I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
through of the LV.
From my known config where I could saturate the GigE connection, I have
changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
kernels - currently 3.7.9.
My build of Xen 4.2.1 also has all of the recent security advisories
patched as well. Although it is interesting to note that downgrading to
Xen 4.1.2 made no difference to write speeds.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 9:49 ` Steven Haigh
@ 2013-02-20 10:12 ` Jan Beulich
2013-02-20 11:06 ` Andrew Cooper
2013-03-08 8:54 ` Steven Haigh
1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2013-02-20 10:12 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel, roger.pau
>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
> My build of Xen 4.2.1 also has all of the recent security advisories
> patched as well. Although it is interesting to note that downgrading to
> Xen 4.1.2 made no difference to write speeds.
Not surprising at all, considering that the hypervisor is only a passive
library for all PV I/O purposes. You're likely hunting for a kernel side
regression (and hence the mentioning of the hypervisor version as
the main factor in the subject is probably misleading).
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 10:12 ` Jan Beulich
@ 2013-02-20 11:06 ` Andrew Cooper
2013-02-20 11:08 ` Steven Haigh
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2013-02-20 11:06 UTC (permalink / raw)
To: Steven Haigh; +Cc: Roger Pau Monne, Jan Beulich, xen-devel
On 20/02/13 10:12, Jan Beulich wrote:
>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
>> My build of Xen 4.2.1 also has all of the recent security advisories
>> patched as well. Although it is interesting to note that downgrading to
>> Xen 4.1.2 made no difference to write speeds.
> Not surprising at all, considering that the hypervisor is only a passive
> library for all PV I/O purposes. You're likely hunting for a kernel side
> regression (and hence the mentioning of the hypervisor version as
> the main factor in the subject is probably misleading).
>
> Jan
Further to this, do try to verify if your disk driver has changed
recently to use >0 order page allocations for DMA. If it has, then
speed will be much slower as there will now be the swiotlb cpu-copy
overhead.
~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 11:06 ` Andrew Cooper
@ 2013-02-20 11:08 ` Steven Haigh
2013-02-20 12:48 ` Andrew Cooper
2013-02-20 13:18 ` Pasi Kärkkäinen
0 siblings, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-02-20 11:08 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monne, Jan Beulich, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1322 bytes --]
On 20/02/2013 10:06 PM, Andrew Cooper wrote:
> On 20/02/13 10:12, Jan Beulich wrote:
>>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>> patched as well. Although it is interesting to note that downgrading to
>>> Xen 4.1.2 made no difference to write speeds.
>> Not surprising at all, considering that the hypervisor is only a passive
>> library for all PV I/O purposes. You're likely hunting for a kernel side
>> regression (and hence the mentioning of the hypervisor version as
>> the main factor in the subject is probably misleading).
>>
>> Jan
>
> Further to this, do try to verify if your disk driver has changed
> recently to use >0 order page allocations for DMA. If it has, then
> speed will be much slower as there will now be the swiotlb cpu-copy
> overhead.
Any hints on how to do this? ;)
The kernel modules in use for my SATA drives are ahci and sata_mv. There
are 6 drives in total on the system.
sda + sdb = RAID1
sd[c-f] = RAID6
sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
sde, sdf are on the sata_mv 4x PCIe controller.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 11:08 ` Steven Haigh
@ 2013-02-20 12:48 ` Andrew Cooper
2013-02-20 13:18 ` Pasi Kärkkäinen
1 sibling, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2013-02-20 12:48 UTC (permalink / raw)
To: Steven Haigh; +Cc: Roger Pau Monne, Jan Beulich, xen-devel
On 20/02/13 11:08, Steven Haigh wrote:
> On 20/02/2013 10:06 PM, Andrew Cooper wrote:
>> On 20/02/13 10:12, Jan Beulich wrote:
>>>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
>>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>>> patched as well. Although it is interesting to note that
>>>> downgrading to
>>>> Xen 4.1.2 made no difference to write speeds.
>>> Not surprising at all, considering that the hypervisor is only a
>>> passive
>>> library for all PV I/O purposes. You're likely hunting for a kernel
>>> side
>>> regression (and hence the mentioning of the hypervisor version as
>>> the main factor in the subject is probably misleading).
>>>
>>> Jan
>>
>> Further to this, do try to verify if your disk driver has changed
>> recently to use >0 order page allocations for DMA. If it has, then
>> speed will be much slower as there will now be the swiotlb cpu-copy
>> overhead.
>
> Any hints on how to do this? ;)
>
> The kernel modules in use for my SATA drives are ahci and sata_mv.
> There are 6 drives in total on the system.
>
> sda + sdb = RAID1
> sd[c-f] = RAID6
>
> sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
> sde, sdf are on the sata_mv 4x PCIe controller.
>
Sadly that is a hard question to answer, and is driver specific. I cant
suggest an easy way other than digging into the source.
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 11:08 ` Steven Haigh
2013-02-20 12:48 ` Andrew Cooper
@ 2013-02-20 13:18 ` Pasi Kärkkäinen
2013-03-08 20:42 ` Konrad Rzeszutek Wilk
1 sibling, 1 reply; 29+ messages in thread
From: Pasi Kärkkäinen @ 2013-02-20 13:18 UTC (permalink / raw)
To: Steven Haigh; +Cc: Andrew Cooper, xen-devel, Jan Beulich, Roger Pau Monne
On Wed, Feb 20, 2013 at 10:08:58PM +1100, Steven Haigh wrote:
> On 20/02/2013 10:06 PM, Andrew Cooper wrote:
> >On 20/02/13 10:12, Jan Beulich wrote:
> >>>>>On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
> >>>My build of Xen 4.2.1 also has all of the recent security advisories
> >>>patched as well. Although it is interesting to note that downgrading to
> >>>Xen 4.1.2 made no difference to write speeds.
> >>Not surprising at all, considering that the hypervisor is only a passive
> >>library for all PV I/O purposes. You're likely hunting for a kernel side
> >>regression (and hence the mentioning of the hypervisor version as
> >>the main factor in the subject is probably misleading).
> >>
> >>Jan
> >
> >Further to this, do try to verify if your disk driver has changed
> >recently to use >0 order page allocations for DMA. If it has, then
> >speed will be much slower as there will now be the swiotlb cpu-copy
> >overhead.
>
> Any hints on how to do this? ;)
>
> The kernel modules in use for my SATA drives are ahci and sata_mv.
> There are 6 drives in total on the system.
>
> sda + sdb = RAID1
> sd[c-f] = RAID6
>
> sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
> sde, sdf are on the sata_mv 4x PCIe controller.
>
Can you try using only the disks on the ahci controller?
sata_mv is known to be buggy and problematic..
I'm not sure if that's the case here, but if you're able to easily
try using only ahci, it's worth a shot.
-- Pasi
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 9:49 ` Steven Haigh
2013-02-20 10:12 ` Jan Beulich
@ 2013-03-08 8:54 ` Steven Haigh
2013-03-08 9:43 ` Roger Pau Monné
2013-03-08 20:49 ` Konrad Rzeszutek Wilk
1 sibling, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-08 8:54 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2336 bytes --]
On 20/02/2013 8:49 PM, Steven Haigh wrote:
> On 20/02/2013 7:49 PM, Steven Haigh wrote:
>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>>> On 20/02/13 03:10, Steven Haigh wrote:
>>>> Hi guys,
>>>>
>>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>>> days.
>>>>
>>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>>> DomU is configured as such:
>>>
>>> Since you mention 4.2.1 explicitly, is this a performance regression
>>> from previous versions? (4.2.0 or the 4.1 branch)
>>
>> This is actually a very good question. I've reinstalled my older
>> packages of Xen 4.1.3 back on the system. Rebooting into the new
>> hypervisor, then starting the single DomU again. Ran bonnie++ again on
>> the DomU:
>>
>> Still around 50Mb/sec - so this doesn't seem to be a regression, but
>> something else?
>
> I've actually done a bit of thinking about this... A recent thread on
> linux-raid kernel mailing list about Xen and DomU throughput made me
> revisit my setup. I know I used to be able to saturate GigE both ways
> (send and receive) to the samba share served by this DomU. This would
> mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
> versions this was as this point in time I cannot say.
>
> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
> size. This seemed to make rebuild/resync speeds way worse - so I
> reverted to 128Kb chunk size.
>
> The benchmarks I am getting from the Dom0 is about what I'd expect - but
> I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
> through of the LV.
>
> From my known config where I could saturate the GigE connection, I have
> changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
> kernels - currently 3.7.9.
>
> My build of Xen 4.2.1 also has all of the recent security advisories
> patched as well. Although it is interesting to note that downgrading to
> Xen 4.1.2 made no difference to write speeds.
>
Just wondering if there is any further news or tests that I might be
able to do on this?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-08 8:54 ` Steven Haigh
@ 2013-03-08 9:43 ` Roger Pau Monné
2013-03-08 9:46 ` Steven Haigh
2013-03-08 20:49 ` Konrad Rzeszutek Wilk
1 sibling, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2013-03-08 9:43 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
On 08/03/13 09:54, Steven Haigh wrote:
> Just wondering if there is any further news or tests that I might be
> able to do on this?
I have been working on speed improvements for blkfront/blkback, and
submitted the first RFC series of patches last week, which can be found
at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a
WIP, so if you want to test them please be aware there might be hidden
bugs. I've also pushed them to a branch in my git repo:
git://xenbits.xen.org/people/royger/linux.git xen-block-indirect
You will need to recompile both the Dom0/DomU kernels (if they are not
the same) if you want to test them.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-08 9:43 ` Roger Pau Monné
@ 2013-03-08 9:46 ` Steven Haigh
2013-03-08 9:54 ` Roger Pau Monné
0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-08 9:46 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 991 bytes --]
On 8/03/2013 8:43 PM, Roger Pau Monné wrote:
> On 08/03/13 09:54, Steven Haigh wrote:
>> Just wondering if there is any further news or tests that I might be
>> able to do on this?
>
> I have been working on speed improvements for blkfront/blkback, and
> submitted the first RFC series of patches last week, which can be found
> at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a
> WIP, so if you want to test them please be aware there might be hidden
> bugs. I've also pushed them to a branch in my git repo:
>
> git://xenbits.xen.org/people/royger/linux.git xen-block-indirect
>
> You will need to recompile both the Dom0/DomU kernels (if they are not
> the same) if you want to test them.
>
Hmm - how will this react with using say, a stock kernel in the DomU (ie
EL6.3 kernel) but these changes in the Dom0?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-08 9:46 ` Steven Haigh
@ 2013-03-08 9:54 ` Roger Pau Monné
0 siblings, 0 replies; 29+ messages in thread
From: Roger Pau Monné @ 2013-03-08 9:54 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
On 08/03/13 10:46, Steven Haigh wrote:
> On 8/03/2013 8:43 PM, Roger Pau Monné wrote:
>> On 08/03/13 09:54, Steven Haigh wrote:
>>> Just wondering if there is any further news or tests that I might be
>>> able to do on this?
>>
>> I have been working on speed improvements for blkfront/blkback, and
>> submitted the first RFC series of patches last week, which can be found
>> at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a
>> WIP, so if you want to test them please be aware there might be hidden
>> bugs. I've also pushed them to a branch in my git repo:
>>
>> git://xenbits.xen.org/people/royger/linux.git xen-block-indirect
>>
>> You will need to recompile both the Dom0/DomU kernels (if they are not
>> the same) if you want to test them.
>>
>
> Hmm - how will this react with using say, a stock kernel in the DomU (ie
> EL6.3 kernel) but these changes in the Dom0?
It should work, but you won't be able to see much performance
improvements (if any at all). Anyway, I've just referred to this series
for testing, but this should not be used on anything that's supposed to
be stable/production.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-02-20 13:18 ` Pasi Kärkkäinen
@ 2013-03-08 20:42 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-08 20:42 UTC (permalink / raw)
To: Pasi Kärkkäinen
Cc: Andrew Cooper, Roger Pau Monne, Steven Haigh, Jan Beulich, xen-devel
On Wed, Feb 20, 2013 at 03:18:46PM +0200, Pasi Kärkkäinen wrote:
> On Wed, Feb 20, 2013 at 10:08:58PM +1100, Steven Haigh wrote:
> > On 20/02/2013 10:06 PM, Andrew Cooper wrote:
> > >On 20/02/13 10:12, Jan Beulich wrote:
> > >>>>>On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
> > >>>My build of Xen 4.2.1 also has all of the recent security advisories
> > >>>patched as well. Although it is interesting to note that downgrading to
> > >>>Xen 4.1.2 made no difference to write speeds.
> > >>Not surprising at all, considering that the hypervisor is only a passive
> > >>library for all PV I/O purposes. You're likely hunting for a kernel side
> > >>regression (and hence the mentioning of the hypervisor version as
> > >>the main factor in the subject is probably misleading).
> > >>
> > >>Jan
> > >
> > >Further to this, do try to verify if your disk driver has changed
> > >recently to use >0 order page allocations for DMA. If it has, then
> > >speed will be much slower as there will now be the swiotlb cpu-copy
> > >overhead.
> >
> > Any hints on how to do this? ;)
They are fine. They use the SG DMA API:
konrad@phenom:~/linux/drivers/ata$ grep "dma_map" *
libata-core.c: n_elem = dma_map_sg(ap->dev, qc->sg, qc->n_elem, qc->dma_dir);
> >
> > The kernel modules in use for my SATA drives are ahci and sata_mv.
> > There are 6 drives in total on the system.
> >
> > sda + sdb = RAID1
> > sd[c-f] = RAID6
> >
> > sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
> > sde, sdf are on the sata_mv 4x PCIe controller.
> >
>
> Can you try using only the disks on the ahci controller?
>
> sata_mv is known to be buggy and problematic..
> I'm not sure if that's the case here, but if you're able to easily
> try using only ahci, it's worth a shot.
>
> -- Pasi
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-08 8:54 ` Steven Haigh
2013-03-08 9:43 ` Roger Pau Monné
@ 2013-03-08 20:49 ` Konrad Rzeszutek Wilk
2013-03-08 22:30 ` Steven Haigh
1 sibling, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-08 20:49 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel, Roger Pau Monné
On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
> On 20/02/2013 8:49 PM, Steven Haigh wrote:
> >On 20/02/2013 7:49 PM, Steven Haigh wrote:
> >>On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
> >>>On 20/02/13 03:10, Steven Haigh wrote:
> >>>>Hi guys,
> >>>>
> >>>>Firstly, please CC me in to any replies as I'm not a subscriber these
> >>>>days.
> >>>>
> >>>>I've been trying to debug a problem with Xen 4.2.1 where I am unable to
> >>>>achieve more than ~50Mb/sec sustained sequential write to a disk. The
> >>>>DomU is configured as such:
> >>>
> >>>Since you mention 4.2.1 explicitly, is this a performance regression
> >>>from previous versions? (4.2.0 or the 4.1 branch)
> >>
> >>This is actually a very good question. I've reinstalled my older
> >>packages of Xen 4.1.3 back on the system. Rebooting into the new
> >>hypervisor, then starting the single DomU again. Ran bonnie++ again on
> >>the DomU:
> >>
> >>Still around 50Mb/sec - so this doesn't seem to be a regression, but
> >>something else?
> >
> >I've actually done a bit of thinking about this... A recent thread on
> >linux-raid kernel mailing list about Xen and DomU throughput made me
> >revisit my setup. I know I used to be able to saturate GigE both ways
> >(send and receive) to the samba share served by this DomU. This would
> >mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
> >versions this was as this point in time I cannot say.
> >
> >As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
> >size. This seemed to make rebuild/resync speeds way worse - so I
> >reverted to 128Kb chunk size.
> >
> >The benchmarks I am getting from the Dom0 is about what I'd expect - but
> >I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
> >through of the LV.
> >
> > From my known config where I could saturate the GigE connection, I have
> >changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
> >kernels - currently 3.7.9.
> >
> >My build of Xen 4.2.1 also has all of the recent security advisories
> >patched as well. Although it is interesting to note that downgrading to
> >Xen 4.1.2 made no difference to write speeds.
> >
>
> Just wondering if there is any further news or tests that I might be
> able to do on this?
So usually the problem like this is to unpeel the layers and find out
which of them is at fault. You have a stacked block system - LVM on
top of RAID6 on top of block devices.
To figure out who is interferring with the speeds I would recommend
you fault one of the RAID6 disks (so take it out of the RAID6). Pass
it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
whether the write performance is different.
This is how I how do it:
[/dev/xvdXXX]
rw=write
direct=1
size=4g
ioengine=libaio
iodepth=32
Then progress up the stack. Try sticking the disk back in RAID6
and doing it on the RAID6. Then on the LVM and so on.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-08 20:49 ` Konrad Rzeszutek Wilk
@ 2013-03-08 22:30 ` Steven Haigh
2013-03-11 13:30 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-08 22:30 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: Roger Pau Monné, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 5616 bytes --]
On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
>> On 20/02/2013 8:49 PM, Steven Haigh wrote:
>>> On 20/02/2013 7:49 PM, Steven Haigh wrote:
>>>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>>>>> On 20/02/13 03:10, Steven Haigh wrote:
>>>>>> Hi guys,
>>>>>>
>>>>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>>>>> days.
>>>>>>
>>>>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>>>>> DomU is configured as such:
>>>>>
>>>>> Since you mention 4.2.1 explicitly, is this a performance regression
>>>> >from previous versions? (4.2.0 or the 4.1 branch)
>>>>
>>>> This is actually a very good question. I've reinstalled my older
>>>> packages of Xen 4.1.3 back on the system. Rebooting into the new
>>>> hypervisor, then starting the single DomU again. Ran bonnie++ again on
>>>> the DomU:
>>>>
>>>> Still around 50Mb/sec - so this doesn't seem to be a regression, but
>>>> something else?
>>>
>>> I've actually done a bit of thinking about this... A recent thread on
>>> linux-raid kernel mailing list about Xen and DomU throughput made me
>>> revisit my setup. I know I used to be able to saturate GigE both ways
>>> (send and receive) to the samba share served by this DomU. This would
>>> mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
>>> versions this was as this point in time I cannot say.
>>>
>>> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
>>> size. This seemed to make rebuild/resync speeds way worse - so I
>>> reverted to 128Kb chunk size.
>>>
>>> The benchmarks I am getting from the Dom0 is about what I'd expect - but
>>> I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
>>> through of the LV.
>>>
>>> From my known config where I could saturate the GigE connection, I have
>>> changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
>>> kernels - currently 3.7.9.
>>>
>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>> patched as well. Although it is interesting to note that downgrading to
>>> Xen 4.1.2 made no difference to write speeds.
>>>
>>
>> Just wondering if there is any further news or tests that I might be
>> able to do on this?
>
> So usually the problem like this is to unpeel the layers and find out
> which of them is at fault. You have a stacked block system - LVM on
> top of RAID6 on top of block devices.
>
> To figure out who is interferring with the speeds I would recommend
> you fault one of the RAID6 disks (so take it out of the RAID6). Pass
> it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
> run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
> whether the write performance is different.
>
> This is how I how do it:
>
> [/dev/xvdXXX]
> rw=write
> direct=1
> size=4g
> ioengine=libaio
> iodepth=32
>
> Then progress up the stack. Try sticking the disk back in RAID6
> and doing it on the RAID6. Then on the LVM and so on.
I did try to peel it back a single layer at a time. My test was simply
using the same XFS filesystem in the Dom0 instead of the DomU.
I tested the underlying LVM config by mounting /dev/vg_raid6/fileshare
from within the Dom0 and running bonnie++ as a benchmark:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
xenhost.lan.crc. 2G 667 96 186976 21 80430 14 956 95 290591 26
373.7 8
Latency 26416us 212ms 168ms 35494us 35989us 83759us
Version 1.96 ------Sequential Create------ --------Random
Create--------
xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 14901 32 +++++ +++ 19672 39 15307 34 +++++ +++
18158 37
Latency 17838us 141us 298us 365us 133us 296us
~186Mb/sec write, ~290Mb/sec read. Awesome.
I then started a single DomU which gets passed /dev/vg_raid6/fileshare
through as xvdb. It is then mounted in /mnt/fileshare/. I then ran
bonnie++ again in the DomU:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 658 96 50618 8 42398 10 1138 99 267568 30
494.9 11
Latency 22959us 226ms 311ms 14617us 41816us 72814us
Version 1.96 ------Sequential Create------ --------Random
Create--------
zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 21749 59 +++++ +++ 31089 73 23283 64 +++++ +++
31114 75
Latency 18989us 164us 928us 480us 26us 87us
~50Mb/sec write, ~267Mb/sec read. Not so awesome.
As such, the filesystem, RAID6, etc are completely unchanged. The only
change is the access method Dom0 vs DomU.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-08 22:30 ` Steven Haigh
@ 2013-03-11 13:30 ` Konrad Rzeszutek Wilk
2013-03-11 13:37 ` Steven Haigh
0 siblings, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 13:30 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel, Roger Pau Monné
On Sat, Mar 09, 2013 at 09:30:54AM +1100, Steven Haigh wrote:
> On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote:
> >On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
> >>On 20/02/2013 8:49 PM, Steven Haigh wrote:
> >>>On 20/02/2013 7:49 PM, Steven Haigh wrote:
> >>>>On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
> >>>>>On 20/02/13 03:10, Steven Haigh wrote:
> >>>>>>Hi guys,
> >>>>>>
> >>>>>>Firstly, please CC me in to any replies as I'm not a subscriber these
> >>>>>>days.
> >>>>>>
> >>>>>>I've been trying to debug a problem with Xen 4.2.1 where I am unable to
> >>>>>>achieve more than ~50Mb/sec sustained sequential write to a disk. The
> >>>>>>DomU is configured as such:
> >>>>>
> >>>>>Since you mention 4.2.1 explicitly, is this a performance regression
> >>>>>from previous versions? (4.2.0 or the 4.1 branch)
> >>>>
> >>>>This is actually a very good question. I've reinstalled my older
> >>>>packages of Xen 4.1.3 back on the system. Rebooting into the new
> >>>>hypervisor, then starting the single DomU again. Ran bonnie++ again on
> >>>>the DomU:
> >>>>
> >>>>Still around 50Mb/sec - so this doesn't seem to be a regression, but
> >>>>something else?
> >>>
> >>>I've actually done a bit of thinking about this... A recent thread on
> >>>linux-raid kernel mailing list about Xen and DomU throughput made me
> >>>revisit my setup. I know I used to be able to saturate GigE both ways
> >>>(send and receive) to the samba share served by this DomU. This would
> >>>mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
> >>>versions this was as this point in time I cannot say.
> >>>
> >>>As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
> >>>size. This seemed to make rebuild/resync speeds way worse - so I
> >>>reverted to 128Kb chunk size.
> >>>
> >>>The benchmarks I am getting from the Dom0 is about what I'd expect - but
> >>>I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
> >>>through of the LV.
> >>>
> >>> From my known config where I could saturate the GigE connection, I have
> >>>changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
> >>>kernels - currently 3.7.9.
> >>>
> >>>My build of Xen 4.2.1 also has all of the recent security advisories
> >>>patched as well. Although it is interesting to note that downgrading to
> >>>Xen 4.1.2 made no difference to write speeds.
> >>>
> >>
> >>Just wondering if there is any further news or tests that I might be
> >>able to do on this?
> >
> >So usually the problem like this is to unpeel the layers and find out
> >which of them is at fault. You have a stacked block system - LVM on
> >top of RAID6 on top of block devices.
> >
> >To figure out who is interferring with the speeds I would recommend
> >you fault one of the RAID6 disks (so take it out of the RAID6). Pass
> >it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
> >run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
> >whether the write performance is different.
> >
> >This is how I how do it:
> >
> >[/dev/xvdXXX]
> >rw=write
> >direct=1
> >size=4g
> >ioengine=libaio
> >iodepth=32
> >
> >Then progress up the stack. Try sticking the disk back in RAID6
> >and doing it on the RAID6. Then on the LVM and so on.
>
> I did try to peel it back a single layer at a time. My test was
> simply using the same XFS filesystem in the Dom0 instead of the
> DomU.
Right, you are using a filesystem. That is another layer :-)
And depending on what version of QEMU you have you might be using
QEMU as the block PV backend instead of the kernel one. There
were versions of QEMU that had highly inferior performance.
Hence I was thinking of just using a raw disk to test that.
>
> I tested the underlying LVM config by mounting
> /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as
> a benchmark:
So still filesystem. Fio can do it on a block level.
What does 'xenstore-ls' show you and 'losetup -a'? I am really
curious as to where that file you are providing to the guest as
disk is being handled via 'loop' or via 'QEMU'.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-11 13:30 ` Konrad Rzeszutek Wilk
@ 2013-03-11 13:37 ` Steven Haigh
2013-03-12 13:04 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-11 13:37 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Roger Pau Monné
[-- Attachment #1.1: Type: text/plain, Size: 5905 bytes --]
On 12/03/2013 12:30 AM, Konrad Rzeszutek Wilk wrote:
> On Sat, Mar 09, 2013 at 09:30:54AM +1100, Steven Haigh wrote:
>> On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
>>>> On 20/02/2013 8:49 PM, Steven Haigh wrote:
>>>>> On 20/02/2013 7:49 PM, Steven Haigh wrote:
>>>>>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>>>>>>> On 20/02/13 03:10, Steven Haigh wrote:
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>>>>>>> days.
>>>>>>>>
>>>>>>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>>>>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>>>>>>> DomU is configured as such:
>>>>>>>
>>>>>>> Since you mention 4.2.1 explicitly, is this a performance regression
>>>>>> >from previous versions? (4.2.0 or the 4.1 branch)
>>>>>>
>>>>>> This is actually a very good question. I've reinstalled my older
>>>>>> packages of Xen 4.1.3 back on the system. Rebooting into the new
>>>>>> hypervisor, then starting the single DomU again. Ran bonnie++ again on
>>>>>> the DomU:
>>>>>>
>>>>>> Still around 50Mb/sec - so this doesn't seem to be a regression, but
>>>>>> something else?
>>>>>
>>>>> I've actually done a bit of thinking about this... A recent thread on
>>>>> linux-raid kernel mailing list about Xen and DomU throughput made me
>>>>> revisit my setup. I know I used to be able to saturate GigE both ways
>>>>> (send and receive) to the samba share served by this DomU. This would
>>>>> mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
>>>>> versions this was as this point in time I cannot say.
>>>>>
>>>>> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
>>>>> size. This seemed to make rebuild/resync speeds way worse - so I
>>>>> reverted to 128Kb chunk size.
>>>>>
>>>>> The benchmarks I am getting from the Dom0 is about what I'd expect - but
>>>>> I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
>>>>> through of the LV.
>>>>>
>>>>> From my known config where I could saturate the GigE connection, I have
>>>>> changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
>>>>> kernels - currently 3.7.9.
>>>>>
>>>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>>>> patched as well. Although it is interesting to note that downgrading to
>>>>> Xen 4.1.2 made no difference to write speeds.
>>>>>
>>>>
>>>> Just wondering if there is any further news or tests that I might be
>>>> able to do on this?
>>>
>>> So usually the problem like this is to unpeel the layers and find out
>>> which of them is at fault. You have a stacked block system - LVM on
>>> top of RAID6 on top of block devices.
>>>
>>> To figure out who is interferring with the speeds I would recommend
>>> you fault one of the RAID6 disks (so take it out of the RAID6). Pass
>>> it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
>>> run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
>>> whether the write performance is different.
>>>
>>> This is how I how do it:
>>>
>>> [/dev/xvdXXX]
>>> rw=write
>>> direct=1
>>> size=4g
>>> ioengine=libaio
>>> iodepth=32
>>>
>>> Then progress up the stack. Try sticking the disk back in RAID6
>>> and doing it on the RAID6. Then on the LVM and so on.
>>
>> I did try to peel it back a single layer at a time. My test was
>> simply using the same XFS filesystem in the Dom0 instead of the
>> DomU.
>
> Right, you are using a filesystem. That is another layer :-)
>
> And depending on what version of QEMU you have you might be using
> QEMU as the block PV backend instead of the kernel one. There
> were versions of QEMU that had highly inferior performance.
>
> Hence I was thinking of just using a raw disk to test that.
>
>>
>> I tested the underlying LVM config by mounting
>> /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as
>> a benchmark:
>
>
> So still filesystem. Fio can do it on a block level.
>
> What does 'xenstore-ls' show you and 'losetup -a'? I am really
> curious as to where that file you are providing to the guest as
> disk is being handled via 'loop' or via 'QEMU'.
>
I've picked out what I believe is the most relevant from xenstore-ls
that belongs to the DomU in question:
1 = ""
51712 = ""
domain = "zeus.vm"
frontend = "/local/domain/1/device/vbd/51712"
uuid = "3aa72be1-0e83-1ee2-a346-8ccef71e9d34"
bootable = "1"
dev = "xvda"
state = "4"
params = "/dev/RAID1/zeus.vm"
mode = "w"
online = "1"
frontend-id = "1"
type = "phy"
physical-device = "fd:6"
hotplug-status = "connected"
feature-flush-cache = "1"
feature-discard = "0"
feature-barrier = "1"
feature-persistent = "1"
sectors = "135397376"
info = "0"
sector-size = "512"
51728 = ""
domain = "zeus.vm"
frontend = "/local/domain/1/device/vbd/51728"
uuid = "28375672-321c-0e33-4549-d64ee4daadec"
bootable = "0"
dev = "xvdb"
state = "4"
params = "/dev/vg_raid6/fileshare"
mode = "w"
online = "1"
frontend-id = "1"
type = "phy"
physical-device = "fd:5"
hotplug-status = "connected"
feature-flush-cache = "1"
feature-discard = "0"
feature-barrier = "1"
feature-persistent = "1"
sectors = "5368709120"
info = "0"
sector-size = "512"
losetup -a returns nothing.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-11 13:37 ` Steven Haigh
@ 2013-03-12 13:04 ` Konrad Rzeszutek Wilk
2013-03-12 14:08 ` Steven Haigh
[not found] ` <514EA337.7030303@crc.id.au>
0 siblings, 2 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-12 13:04 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel, Roger Pau Monné
> >So still filesystem. Fio can do it on a block level.
> >
> >What does 'xenstore-ls' show you and 'losetup -a'? I am really
> >curious as to where that file you are providing to the guest as
> >disk is being handled via 'loop' or via 'QEMU'.
> >
>
> I've picked out what I believe is the most relevant from xenstore-ls
> that belongs to the DomU in question:
Great.
.. snip..
> params = "/dev/vg_raid6/fileshare"
> mode = "w"
> online = "1"
> frontend-id = "1"
> type = "phy"
> physical-device = "fd:5"
> hotplug-status = "connected"
> feature-flush-cache = "1"
> feature-discard = "0"
> feature-barrier = "1"
> feature-persistent = "1"
> sectors = "5368709120"
> info = "0"
> sector-size = "512"
OK, so the flow of data from the guest is:
bonnie++ -> FS -> xen-blkfront -> xen-blkback -> LVM -> RAID6 -> multiple disks.
Any way you can restructure this to be:
fio -> xen-blkfront -> xen-blkback -> one disk from the raid.
to see if the issue is between "LVM -> RAID6" or the "bonnie++ -> FS" part?
Is the cpu load quite high when you do these writes?
What are the RAID6 disks you have? How many?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-12 13:04 ` Konrad Rzeszutek Wilk
@ 2013-03-12 14:08 ` Steven Haigh
[not found] ` <514EA337.7030303@crc.id.au>
1 sibling, 0 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-12 14:08 UTC (permalink / raw)
To: xen-devel
On 13/03/13 00:04, Konrad Rzeszutek Wilk wrote:
>>> So still filesystem. Fio can do it on a block level.
>>>
>>> What does 'xenstore-ls' show you and 'losetup -a'? I am really
>>> curious as to where that file you are providing to the guest as
>>> disk is being handled via 'loop' or via 'QEMU'.
>>>
>>
>> I've picked out what I believe is the most relevant from xenstore-ls
>> that belongs to the DomU in question:
>
> Great.
> .. snip..
>> params = "/dev/vg_raid6/fileshare"
>> mode = "w"
>> online = "1"
>> frontend-id = "1"
>> type = "phy"
>> physical-device = "fd:5"
>> hotplug-status = "connected"
>> feature-flush-cache = "1"
>> feature-discard = "0"
>> feature-barrier = "1"
>> feature-persistent = "1"
>> sectors = "5368709120"
>> info = "0"
>> sector-size = "512"
>
> OK, so the flow of data from the guest is:
> bonnie++ -> FS -> xen-blkfront -> xen-blkback -> LVM -> RAID6 -> multiple disks.
>
> Any way you can restructure this to be:
>
> fio -> xen-blkfront -> xen-blkback -> one disk from the raid.
>
>
> to see if the issue is between "LVM -> RAID6" or the "bonnie++ -> FS" part?
> Is the cpu load quite high when you do these writes?
Maybe I'm missing something, but running this directly from the Dom0
would give a result of:
bonnie++ -> FS -> LVM -> RAID6
These figures were well over 200Mb/sec read and well over 100Mb/sec write.
This only takes out the xen-blkfront and xen-blkback - which I thought
was the aim?
Or is the point of this to make sure that we can replicate it with a
single disk and that it isn't some weird interaction between
blkfront/blkback and the LVM/RAID6?
CPU Usage doesn't seem to be a limiting factor. I certainly don't see
massive loads for writing.
>
> What are the RAID6 disks you have? How many?
The RAID6 is made up of 4 x 2Tb 7200RPM Seagate SATA drives...
Model Family: Seagate SV35
Device Model: ST2000VX000-9YW164
Serial Number: Z1E10QQJ
LU WWN Device Id: 5 000c50 04dd3a1f1
Firmware Version: CV13
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Then in /proc/mdstat:
md2 : active raid6 sdd[4] sdc[0] sdf[5] sde[1]
3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
[4/4] [UUUU]
I decided to use whole disks so that I don't run into alignment issues.
The VG is using 4Mb extents, so that should be fine too:
# vgdisplay vg_raid6
--- Volume group ---
VG Name vg_raid6
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 7
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 5
Open LV 5
Max PV 0
Cur PV 1
Act PV 1
VG Size 3.64 TiB
PE Size 4.00 MiB
Total PE 953863
Alloc PE / Size 688640 / 2.63 TiB
Free PE / Size 265223 / 1.01 TiB
VG UUID md7G8X-F2mT-JBQa-f5qm-TN4O-kOqs-KWHGR1
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
[not found] ` <514EA741.7050403@crc.id.au>
@ 2013-03-24 9:10 ` Steven Haigh
2013-03-24 9:54 ` Steven Haigh
2013-03-25 2:21 ` Steven Haigh
0 siblings, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-24 9:10 UTC (permalink / raw)
To: konrad.wilk; +Cc: xen-devel, roger.pau
On 24/03/13 18:12, Steven Haigh wrote:
> In fact, I just thought of something else.... I have an eSATA caddy that
> connects to the same SATA controller. With this, I can slot any SATA
> drive into it - and I should easily be able to pass this to any DomU.
>
> I'll throw in a 1Tb SATA drive do that I don't have to break the
> existing RAID6 array - as the testing on this drive can be destructive
> testing - as otherwise the drive is blank.
Disk info:
Model Family: Seagate Barracuda 7200.12
Device Model: ST31000528AS
Serial Number: 9VP3BE9W
LU WWN Device Id: 5 000c50 01a238fd0
Firmware Version: CC49
User Capacity: 1,000,203,804,160 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Results...
Dom0 (host machine):
# dd if=/dev/zero of=/dev/sdi bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 33.6909 s, 127 MB/s
Created an ext4 filesystem on /dev/sdi1...
# mkfs.ext4 -j /dev/sdi1
Run bonnie++ on the filesystem:
# mount /dev/sdi1 /mnt/esata
# cd /mnt/esata/
# bonnie++ -u 0:0
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
xenhost.lan.crc. 2G 433 95 119107 22 36723 7 960 95 145026 12
191.9 4
Latency 33231us 39824us 211ms 31466us 17459us
5073ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
+++++ +++
Latency 285us 642us 315us 217us 349us
127us
We get ~145Mb/sec block read, ~119Mb/sec block write.
Now, lets pass the whole device through to a DomU.
# xm block-attach zeus.vm phy:/dev/sdi xvdc w
From the DomU now:
Firstly, the same dd as above:
# dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 33.6708 s, 128 MB/s
Create the ext4 filesystem again:
# mkfs.ext4 -j /dev/xvdc1
Run bonnie++ on the DomU:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 387 99 121891 24 47759 14 992 98 141103 17
248.9 7
Latency 40518us 126ms 152ms 47174us 30061us
250ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
+++++ +++
Latency 174us 839us 2249us 113us 42us
185us
Interesting. We're at almost full speed in the DomU. 121Mb/sec write,
141Mb/sec read.
So my wonder is now... Why when put in a RAID6 do we have a 180Mb/sec+
from the Dom0, but only 50Mb/sec from the DomU of the same filesystem...
Any further testing that may indicate something?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-24 9:10 ` Steven Haigh
@ 2013-03-24 9:54 ` Steven Haigh
2013-03-25 2:21 ` Steven Haigh
1 sibling, 0 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-24 9:54 UTC (permalink / raw)
To: xen-devel
On 24/03/13 20:10, Steven Haigh wrote:
> So my wonder is now... Why when put in a RAID6 do we have a 180Mb/sec+
> from the Dom0, but only 50Mb/sec from the DomU of the same filesystem...
I should actually clarify that this is 180Mb/sec write speed from the
Dom0 and 50Mb/sec from the DomU. Really not quite sure why this is.
The filesystem in question is XFS - the tests in my previous post were
on ext4.
>
> Any further testing that may indicate something?
>
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-24 9:10 ` Steven Haigh
2013-03-24 9:54 ` Steven Haigh
@ 2013-03-25 2:21 ` Steven Haigh
2013-08-20 16:48 ` Konrad Rzeszutek Wilk
1 sibling, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-25 2:21 UTC (permalink / raw)
To: konrad.wilk; +Cc: roger.pau, xen-devel
So, based on my tests yesterday, I decided to break the RAID6 and pull a
drive out of it to test directly on the 2Tb drives in question.
The array in question:
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
[4/4] [UUUU]
# mdadm /dev/md2 --fail /dev/sdf
mdadm: set /dev/sdf faulty in /dev/md2
# mdadm /dev/md2 --remove /dev/sdf
mdadm: hot removed /dev/sdf from /dev/md2
So, all tests are to be done on /dev/sdf.
Model Family: Seagate SV35
Device Model: ST2000VX000-9YW164
Serial Number: Z1E17C3X
LU WWN Device Id: 5 000c50 04e1bc6f0
Firmware Version: CV13
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
From the Dom0:
# dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
Create a single partition on the drive, and format it with ext4:
Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x98d8baaf
Device Boot Start End Blocks Id System
/dev/sdf1 2048 3907029167 1953513560 83 Linux
Command (m for help): w
# mkfs.ext4 -j /dev/sdf1
......
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
Mount it on the Dom0:
# mount /dev/sdf1 /mnt/esata/
# cd /mnt/esata/
# bonnie++ -d . -u 0:0
....
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114 17
296.4 6
Latency 70971us 190ms 221ms 40369us 17657us
164ms
So from the Dom0: 133Mb/sec write, 209Mb/sec read.
Now, I'll attach the full disk to a DomU:
# xm block-attach zeus.vm phy:/dev/sdf xvdc w
And we'll test from the DomU.
# dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
Partition the same as in the Dom0 and create an ext4 filesystem on it:
I notice something interesting here. In the Dom0, the device is seen as:
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
In the DomU, it is seen as:
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Not sure if this could be related - but continuing testing:
Device Boot Start End Blocks Id System
/dev/xvdc1 2048 3907029167 1953513560 83 Linux
# mkfs.ext4 -j /dev/xvdc1
....
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
# mount /dev/xvdc1 /mnt/esata/
# cd /mnt/esata/
# bonnie++ -d . -u 0:0
....
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407 23
313.4 9
Latency 34615us 130ms 128ms 33316us 74401us
130ms
So still... 116Mb/sec write, 176Mb/sec read to the physical device from
the DomU. More than acceptable.
It leaves me to wonder.... Could there be something in the Dom0 seeing
the drives as 4096 byte sectors, but the DomU seeing it as 512 byte
sectors cause an issue?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-03-25 2:21 ` Steven Haigh
@ 2013-08-20 16:48 ` Konrad Rzeszutek Wilk
2013-08-20 18:25 ` Steven Haigh
2013-09-05 8:28 ` Steven Haigh
0 siblings, 2 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-20 16:48 UTC (permalink / raw)
To: Steven Haigh; +Cc: roger.pau, xen-devel
On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
> So, based on my tests yesterday, I decided to break the RAID6 and
> pull a drive out of it to test directly on the 2Tb drives in
> question.
>
> The array in question:
> # cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
> [4/4] [UUUU]
>
> # mdadm /dev/md2 --fail /dev/sdf
> mdadm: set /dev/sdf faulty in /dev/md2
> # mdadm /dev/md2 --remove /dev/sdf
> mdadm: hot removed /dev/sdf from /dev/md2
>
> So, all tests are to be done on /dev/sdf.
> Model Family: Seagate SV35
> Device Model: ST2000VX000-9YW164
> Serial Number: Z1E17C3X
> LU WWN Device Id: 5 000c50 04e1bc6f0
> Firmware Version: CV13
> User Capacity: 2,000,398,934,016 bytes [2.00 TB]
> Sector Sizes: 512 bytes logical, 4096 bytes physical
>
> From the Dom0:
> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>
> Create a single partition on the drive, and format it with ext4:
> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x98d8baaf
>
> Device Boot Start End Blocks Id System
> /dev/sdf1 2048 3907029167 1953513560 83 Linux
>
> Command (m for help): w
>
> # mkfs.ext4 -j /dev/sdf1
> ......
> Writing inode tables: done
> Creating journal (32768 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> Mount it on the Dom0:
> # mount /dev/sdf1 /mnt/esata/
> # cd /mnt/esata/
> # bonnie++ -d . -u 0:0
> ....
> Version 1.96 ------Sequential Output------ --Sequential
> Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
> --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> %CP /sec %CP
> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114
> 17 296.4 6
> Latency 70971us 190ms 221ms 40369us 17657us
> 164ms
>
> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>
> Now, I'll attach the full disk to a DomU:
> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>
> And we'll test from the DomU.
>
> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>
> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>
> I notice something interesting here. In the Dom0, the device is seen as:
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>
> In the DomU, it is seen as:
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
>
> Not sure if this could be related - but continuing testing:
> Device Boot Start End Blocks Id System
> /dev/xvdc1 2048 3907029167 1953513560 83 Linux
>
> # mkfs.ext4 -j /dev/xvdc1
> ....
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (32768 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> # mount /dev/xvdc1 /mnt/esata/
> # cd /mnt/esata/
> # bonnie++ -d . -u 0:0
> ....
> Version 1.96 ------Sequential Output------ --Sequential
> Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
> --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> %CP /sec %CP
> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407
> 23 313.4 9
> Latency 34615us 130ms 128ms 33316us 74401us
> 130ms
>
> So still... 116Mb/sec write, 176Mb/sec read to the physical device
> from the DomU. More than acceptable.
>
> It leaves me to wonder.... Could there be something in the Dom0
> seeing the drives as 4096 byte sectors, but the DomU seeing it as
> 512 byte sectors cause an issue?
There is certain overhead in it. I still have this in my mailbox
so I am not sure whether this issue got ever resolved? I know that the
indirect patches in Xen blkback and xen blkfront are meant to resolve
some of these issues - by being able to carry a bigger payload.
Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
>
> --
> Steven Haigh
>
> Email: netwiz@crc.id.au
> Web: https://www.crc.id.au
> Phone: (03) 9001 6090 - 0412 935 897
> Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-08-20 16:48 ` Konrad Rzeszutek Wilk
@ 2013-08-20 18:25 ` Steven Haigh
2013-09-05 8:28 ` Steven Haigh
1 sibling, 0 replies; 29+ messages in thread
From: Steven Haigh @ 2013-08-20 18:25 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: roger.pau, xen-devel
On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>> So, based on my tests yesterday, I decided to break the RAID6 and
>> pull a drive out of it to test directly on the 2Tb drives in
>> question.
>>
>> The array in question:
>> # cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> [4/4] [UUUU]
>>
>> # mdadm /dev/md2 --fail /dev/sdf
>> mdadm: set /dev/sdf faulty in /dev/md2
>> # mdadm /dev/md2 --remove /dev/sdf
>> mdadm: hot removed /dev/sdf from /dev/md2
>>
>> So, all tests are to be done on /dev/sdf.
>> Model Family: Seagate SV35
>> Device Model: ST2000VX000-9YW164
>> Serial Number: Z1E17C3X
>> LU WWN Device Id: 5 000c50 04e1bc6f0
>> Firmware Version: CV13
>> User Capacity: 2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>
>> From the Dom0:
>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>
>> Create a single partition on the drive, and format it with ext4:
>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x98d8baaf
>>
>> Device Boot Start End Blocks Id System
>> /dev/sdf1 2048 3907029167 1953513560 83 Linux
>>
>> Command (m for help): w
>>
>> # mkfs.ext4 -j /dev/sdf1
>> ......
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> Mount it on the Dom0:
>> # mount /dev/sdf1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version 1.96 ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114
>> 17 296.4 6
>> Latency 70971us 190ms 221ms 40369us 17657us
>> 164ms
>>
>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>
>> Now, I'll attach the full disk to a DomU:
>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>
>> And we'll test from the DomU.
>>
>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>
>> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>>
>> I notice something interesting here. In the Dom0, the device is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>
>> In the DomU, it is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>
>> Not sure if this could be related - but continuing testing:
>> Device Boot Start End Blocks Id System
>> /dev/xvdc1 2048 3907029167 1953513560 83 Linux
>>
>> # mkfs.ext4 -j /dev/xvdc1
>> ....
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> # mount /dev/xvdc1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version 1.96 ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407
>> 23 313.4 9
>> Latency 34615us 130ms 128ms 33316us 74401us
>> 130ms
>>
>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>> from the DomU. More than acceptable.
>>
>> It leaves me to wonder.... Could there be something in the Dom0
>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>> 512 byte sectors cause an issue?
>
> There is certain overhead in it. I still have this in my mailbox
> so I am not sure whether this issue got ever resolved? I know that the
> indirect patches in Xen blkback and xen blkfront are meant to resolve
> some of these issues - by being able to carry a bigger payload.
>
> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
Hi Konrad,
I don't believe I ever fixed it - however I haven't tried kernel 3.11 in
Dom0 OR DomU...
I'll keep this in my inbox and try to build a 3.11 kernel for both in
the near future for testing...
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-08-20 16:48 ` Konrad Rzeszutek Wilk
2013-08-20 18:25 ` Steven Haigh
@ 2013-09-05 8:28 ` Steven Haigh
2013-09-06 13:33 ` Konrad Rzeszutek Wilk
1 sibling, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-09-05 8:28 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: roger.pau, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 6577 bytes --]
On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>> So, based on my tests yesterday, I decided to break the RAID6 and
>> pull a drive out of it to test directly on the 2Tb drives in
>> question.
>>
>> The array in question:
>> # cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> [4/4] [UUUU]
>>
>> # mdadm /dev/md2 --fail /dev/sdf
>> mdadm: set /dev/sdf faulty in /dev/md2
>> # mdadm /dev/md2 --remove /dev/sdf
>> mdadm: hot removed /dev/sdf from /dev/md2
>>
>> So, all tests are to be done on /dev/sdf.
>> Model Family: Seagate SV35
>> Device Model: ST2000VX000-9YW164
>> Serial Number: Z1E17C3X
>> LU WWN Device Id: 5 000c50 04e1bc6f0
>> Firmware Version: CV13
>> User Capacity: 2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>
>> From the Dom0:
>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>
>> Create a single partition on the drive, and format it with ext4:
>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x98d8baaf
>>
>> Device Boot Start End Blocks Id System
>> /dev/sdf1 2048 3907029167 1953513560 83 Linux
>>
>> Command (m for help): w
>>
>> # mkfs.ext4 -j /dev/sdf1
>> ......
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> Mount it on the Dom0:
>> # mount /dev/sdf1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version 1.96 ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114
>> 17 296.4 6
>> Latency 70971us 190ms 221ms 40369us 17657us
>> 164ms
>>
>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>
>> Now, I'll attach the full disk to a DomU:
>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>
>> And we'll test from the DomU.
>>
>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>
>> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>>
>> I notice something interesting here. In the Dom0, the device is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>
>> In the DomU, it is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>
>> Not sure if this could be related - but continuing testing:
>> Device Boot Start End Blocks Id System
>> /dev/xvdc1 2048 3907029167 1953513560 83 Linux
>>
>> # mkfs.ext4 -j /dev/xvdc1
>> ....
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> # mount /dev/xvdc1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version 1.96 ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407
>> 23 313.4 9
>> Latency 34615us 130ms 128ms 33316us 74401us
>> 130ms
>>
>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>> from the DomU. More than acceptable.
>>
>> It leaves me to wonder.... Could there be something in the Dom0
>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>> 512 byte sectors cause an issue?
>
> There is certain overhead in it. I still have this in my mailbox
> so I am not sure whether this issue got ever resolved? I know that the
> indirect patches in Xen blkback and xen blkfront are meant to resolve
> some of these issues - by being able to carry a bigger payload.
>
> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
Ok, so I finally got around to building kernel 3.11 RPMs today for
testing. I upgraded both the Dom0 and DomU to the same kernel:
DomU:
# dmesg | grep blkfront
blkfront: xvda: flush diskcache: enabled; persistent grants: enabled;
indirect descriptors: enabled;
blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled;
indirect descriptors: enabled;
Looks good.
Transfer tests using bonnie++ as per before:
# bonnie -d . -u 0:0
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 30
492.3 13
Latency 27305us 124ms 158ms 34222us 16865us
374ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 10048 22 +++++ +++ 17849 29 11109 25 +++++ +++
18389 31
Latency 17775us 154us 180us 16008us 38us
58us
Still seems to be a massive discrepancy between Dom0 and DomU write
speeds. Interesting is that sequential block reads are nearly 300MB/sec,
yet sequential writes were only ~58MB/sec.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-09-05 8:28 ` Steven Haigh
@ 2013-09-06 13:33 ` Konrad Rzeszutek Wilk
2013-09-06 23:06 ` Steven Haigh
0 siblings, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-06 13:33 UTC (permalink / raw)
To: Steven Haigh; +Cc: roger.pau, xen-devel
On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:
> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
> > On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
> >> So, based on my tests yesterday, I decided to break the RAID6 and
> >> pull a drive out of it to test directly on the 2Tb drives in
> >> question.
> >>
> >> The array in question:
> >> # cat /proc/mdstat
> >> Personalities : [raid1] [raid6] [raid5] [raid4]
> >> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
> >> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
> >> [4/4] [UUUU]
> >>
> >> # mdadm /dev/md2 --fail /dev/sdf
> >> mdadm: set /dev/sdf faulty in /dev/md2
> >> # mdadm /dev/md2 --remove /dev/sdf
> >> mdadm: hot removed /dev/sdf from /dev/md2
> >>
> >> So, all tests are to be done on /dev/sdf.
> >> Model Family: Seagate SV35
> >> Device Model: ST2000VX000-9YW164
> >> Serial Number: Z1E17C3X
> >> LU WWN Device Id: 5 000c50 04e1bc6f0
> >> Firmware Version: CV13
> >> User Capacity: 2,000,398,934,016 bytes [2.00 TB]
> >> Sector Sizes: 512 bytes logical, 4096 bytes physical
> >>
> >> From the Dom0:
> >> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
> >> 4096+0 records in
> >> 4096+0 records out
> >> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
> >>
> >> Create a single partition on the drive, and format it with ext4:
> >> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
> >> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> >> Units = sectors of 1 * 512 = 512 bytes
> >> Sector size (logical/physical): 512 bytes / 4096 bytes
> >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> >> Disk identifier: 0x98d8baaf
> >>
> >> Device Boot Start End Blocks Id System
> >> /dev/sdf1 2048 3907029167 1953513560 83 Linux
> >>
> >> Command (m for help): w
> >>
> >> # mkfs.ext4 -j /dev/sdf1
> >> ......
> >> Writing inode tables: done
> >> Creating journal (32768 blocks): done
> >> Writing superblocks and filesystem accounting information: done
> >>
> >> Mount it on the Dom0:
> >> # mount /dev/sdf1 /mnt/esata/
> >> # cd /mnt/esata/
> >> # bonnie++ -d . -u 0:0
> >> ....
> >> Version 1.96 ------Sequential Output------ --Sequential
> >> Input- --Random-
> >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
> >> --Block-- --Seeks--
> >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> >> %CP /sec %CP
> >> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114
> >> 17 296.4 6
> >> Latency 70971us 190ms 221ms 40369us 17657us
> >> 164ms
> >>
> >> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
> >>
> >> Now, I'll attach the full disk to a DomU:
> >> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
> >>
> >> And we'll test from the DomU.
> >>
> >> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
> >> 4096+0 records in
> >> 4096+0 records out
> >> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
> >>
> >> Partition the same as in the Dom0 and create an ext4 filesystem on it:
> >>
> >> I notice something interesting here. In the Dom0, the device is seen as:
> >> Units = sectors of 1 * 512 = 512 bytes
> >> Sector size (logical/physical): 512 bytes / 4096 bytes
> >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> >>
> >> In the DomU, it is seen as:
> >> Units = sectors of 1 * 512 = 512 bytes
> >> Sector size (logical/physical): 512 bytes / 512 bytes
> >> I/O size (minimum/optimal): 512 bytes / 512 bytes
> >>
> >> Not sure if this could be related - but continuing testing:
> >> Device Boot Start End Blocks Id System
> >> /dev/xvdc1 2048 3907029167 1953513560 83 Linux
> >>
> >> # mkfs.ext4 -j /dev/xvdc1
> >> ....
> >> Allocating group tables: done
> >> Writing inode tables: done
> >> Creating journal (32768 blocks): done
> >> Writing superblocks and filesystem accounting information: done
> >>
> >> # mount /dev/xvdc1 /mnt/esata/
> >> # cd /mnt/esata/
> >> # bonnie++ -d . -u 0:0
> >> ....
> >> Version 1.96 ------Sequential Output------ --Sequential
> >> Input- --Random-
> >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
> >> --Block-- --Seeks--
> >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> >> %CP /sec %CP
> >> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407
> >> 23 313.4 9
> >> Latency 34615us 130ms 128ms 33316us 74401us
> >> 130ms
> >>
> >> So still... 116Mb/sec write, 176Mb/sec read to the physical device
> >> from the DomU. More than acceptable.
> >>
> >> It leaves me to wonder.... Could there be something in the Dom0
> >> seeing the drives as 4096 byte sectors, but the DomU seeing it as
> >> 512 byte sectors cause an issue?
> >
> > There is certain overhead in it. I still have this in my mailbox
> > so I am not sure whether this issue got ever resolved? I know that the
> > indirect patches in Xen blkback and xen blkfront are meant to resolve
> > some of these issues - by being able to carry a bigger payload.
> >
> > Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
>
> Ok, so I finally got around to building kernel 3.11 RPMs today for
> testing. I upgraded both the Dom0 and DomU to the same kernel:
Woohoo!
>
> DomU:
> # dmesg | grep blkfront
> blkfront: xvda: flush diskcache: enabled; persistent grants: enabled;
> indirect descriptors: enabled;
> blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled;
> indirect descriptors: enabled;
>
> Looks good.
>
> Transfer tests using bonnie++ as per before:
> # bonnie -d . -u 0:0
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 30
> 492.3 13
> Latency 27305us 124ms 158ms 34222us 16865us
> 374ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 10048 22 +++++ +++ 17849 29 11109 25 +++++ +++
> 18389 31
> Latency 17775us 154us 180us 16008us 38us
> 58us
>
> Still seems to be a massive discrepancy between Dom0 and DomU write
> speeds. Interesting is that sequential block reads are nearly 300MB/sec,
> yet sequential writes were only ~58MB/sec.
OK, so the other thing that people were pointing out that is you
can use xen-blkfront.max parameter. By default it is 32, but try 8.
Or 64. Or 256.
The indirect descriptor allows us to put more I/Os on the ring - and
I am hoping that will:
a) solve your problem
b) not solve your problem, but demonstrate that the issue is not with
the ring, but with something else making your writes slower.
Hmm, are you by any chance using O_DIRECT when running bonnie++ in
dom0? The xen-blkback tacks on O_DIRECT to all write requests. This is
done to not use the dom0 page cache - otherwise you end up with
a double buffer where the writes are insane speed - but with absolutly
no safety.
If you want to try disabling that (so no O_DIRECT), I would do this
little change:
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index bf4b9d2..823b629 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
break;
case BLKIF_OP_WRITE:
blkif->st_wr_req++;
- operation = WRITE_ODIRECT;
+ operation = WRITE;
break;
case BLKIF_OP_WRITE_BARRIER:
drain = true;
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-09-06 13:33 ` Konrad Rzeszutek Wilk
@ 2013-09-06 23:06 ` Steven Haigh
2013-09-06 23:37 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-09-06 23:06 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: roger.pau, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 11785 bytes --]
On 06/09/13 23:33, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:
>> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>>>> So, based on my tests yesterday, I decided to break the RAID6 and
>>>> pull a drive out of it to test directly on the 2Tb drives in
>>>> question.
>>>>
>>>> The array in question:
>>>> # cat /proc/mdstat
>>>> Personalities : [raid1] [raid6] [raid5] [raid4]
>>>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>>>> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>>> [4/4] [UUUU]
>>>>
>>>> # mdadm /dev/md2 --fail /dev/sdf
>>>> mdadm: set /dev/sdf faulty in /dev/md2
>>>> # mdadm /dev/md2 --remove /dev/sdf
>>>> mdadm: hot removed /dev/sdf from /dev/md2
>>>>
>>>> So, all tests are to be done on /dev/sdf.
>>>> Model Family: Seagate SV35
>>>> Device Model: ST2000VX000-9YW164
>>>> Serial Number: Z1E17C3X
>>>> LU WWN Device Id: 5 000c50 04e1bc6f0
>>>> Firmware Version: CV13
>>>> User Capacity: 2,000,398,934,016 bytes [2.00 TB]
>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>>>
>>>> From the Dom0:
>>>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>>>
>>>> Create a single partition on the drive, and format it with ext4:
>>>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>>>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>> Disk identifier: 0x98d8baaf
>>>>
>>>> Device Boot Start End Blocks Id System
>>>> /dev/sdf1 2048 3907029167 1953513560 83 Linux
>>>>
>>>> Command (m for help): w
>>>>
>>>> # mkfs.ext4 -j /dev/sdf1
>>>> ......
>>>> Writing inode tables: done
>>>> Creating journal (32768 blocks): done
>>>> Writing superblocks and filesystem accounting information: done
>>>>
>>>> Mount it on the Dom0:
>>>> # mount /dev/sdf1 /mnt/esata/
>>>> # cd /mnt/esata/
>>>> # bonnie++ -d . -u 0:0
>>>> ....
>>>> Version 1.96 ------Sequential Output------ --Sequential
>>>> Input- --Random-
>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>> --Block-- --Seeks--
>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>> %CP /sec %CP
>>>> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95 209114
>>>> 17 296.4 6
>>>> Latency 70971us 190ms 221ms 40369us 17657us
>>>> 164ms
>>>>
>>>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>>>
>>>> Now, I'll attach the full disk to a DomU:
>>>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>>>
>>>> And we'll test from the DomU.
>>>>
>>>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>>>
>>>> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>>>>
>>>> I notice something interesting here. In the Dom0, the device is seen as:
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>>
>>>> In the DomU, it is seen as:
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 512 bytes
>>>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>>>
>>>> Not sure if this could be related - but continuing testing:
>>>> Device Boot Start End Blocks Id System
>>>> /dev/xvdc1 2048 3907029167 1953513560 83 Linux
>>>>
>>>> # mkfs.ext4 -j /dev/xvdc1
>>>> ....
>>>> Allocating group tables: done
>>>> Writing inode tables: done
>>>> Creating journal (32768 blocks): done
>>>> Writing superblocks and filesystem accounting information: done
>>>>
>>>> # mount /dev/xvdc1 /mnt/esata/
>>>> # cd /mnt/esata/
>>>> # bonnie++ -d . -u 0:0
>>>> ....
>>>> Version 1.96 ------Sequential Output------ --Sequential
>>>> Input- --Random-
>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>> --Block-- --Seeks--
>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>> %CP /sec %CP
>>>> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99 176407
>>>> 23 313.4 9
>>>> Latency 34615us 130ms 128ms 33316us 74401us
>>>> 130ms
>>>>
>>>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>>>> from the DomU. More than acceptable.
>>>>
>>>> It leaves me to wonder.... Could there be something in the Dom0
>>>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>>>> 512 byte sectors cause an issue?
>>>
>>> There is certain overhead in it. I still have this in my mailbox
>>> so I am not sure whether this issue got ever resolved? I know that the
>>> indirect patches in Xen blkback and xen blkfront are meant to resolve
>>> some of these issues - by being able to carry a bigger payload.
>>>
>>> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
>>
>> Ok, so I finally got around to building kernel 3.11 RPMs today for
>> testing. I upgraded both the Dom0 and DomU to the same kernel:
>
> Woohoo!
>>
>> DomU:
>> # dmesg | grep blkfront
>> blkfront: xvda: flush diskcache: enabled; persistent grants: enabled;
>> indirect descriptors: enabled;
>> blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled;
>> indirect descriptors: enabled;
>>
>> Looks good.
>>
>> Transfer tests using bonnie++ as per before:
>> # bonnie -d . -u 0:0
>> Version 1.96 ------Sequential Output------ --Sequential Input-
>> --Random-
>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec %CP
>> zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757 30
>> 492.3 13
>> Latency 27305us 124ms 158ms 34222us 16865us
>> 374ms
>> Version 1.96 ------Sequential Create------ --------Random
>> Create--------
>> zeus.crc.id.au -Create-- --Read--- -Delete-- -Create-- --Read---
>> -Delete--
>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> /sec %CP
>> 16 10048 22 +++++ +++ 17849 29 11109 25 +++++ +++
>> 18389 31
>> Latency 17775us 154us 180us 16008us 38us
>> 58us
>>
>> Still seems to be a massive discrepancy between Dom0 and DomU write
>> speeds. Interesting is that sequential block reads are nearly 300MB/sec,
>> yet sequential writes were only ~58MB/sec.
>
> OK, so the other thing that people were pointing out that is you
> can use xen-blkfront.max parameter. By default it is 32, but try 8.
> Or 64. Or 256.
Ahh - interesting.
I used the following:
Kernel command line: ro root=/dev/xvda rd_NO_LUKS rd_NO_DM
LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
crashkernel=auto console=hvc0 xen-blkfront.max=X
8:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 696 92 50906 7 46102 11 1013 97 256784 27
496.5 10
Latency 24374us 199ms 117ms 30855us 38008us
85175us
16:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 675 92 58078 8 57585 13 1005 97 262735 25
505.6 10
Latency 24412us 187ms 183ms 23661us 53850us
232ms
32:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 698 92 57416 8 63328 13 1063 97 267154 24
498.2 12
Latency 24264us 199ms 81362us 33144us 22526us
237ms
64:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 574 86 88447 13 68988 17 897 97 265128 27
493.7 13
128:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 702 97 107638 14 70158 15 1045 97 255596 24
491.0 12
Latency 27279us 17553us 134ms 29771us 38392us
65761us
256:
Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au 2G 689 91 102554 14 67337 15 1012 97 262475 24
484.4 12
Latency 20642us 104ms 189ms 36624us 45286us
80023us
So, as a nice summary:
8: 50Mb/sec
16: 58Mb/sec
32: 57Mb/sec
64: 88Mb/sec
128: 107Mb/sec
256: 102Mb/sec
So, maybe it's coincidence, maybe it isn't - but the best (factoring
margin of error) seems to be 128 - which happens to be the block size of
the underlying RAID6 array on the Dom0.
# cat /proc/mdstat
md2 : active raid6 sdd[5] sdc[4] sdf[1] sde[0]
3906766592 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4]
[UUUU]
> The indirect descriptor allows us to put more I/Os on the ring - and
> I am hoping that will:
> a) solve your problem
Well, it looks like this solves the issue - at least increasing the max
causes almost double the write speed - and no change to read speeds
(within margin of error).
> b) not solve your problem, but demonstrate that the issue is not with
> the ring, but with something else making your writes slower.
>
> Hmm, are you by any chance using O_DIRECT when running bonnie++ in
> dom0? The xen-blkback tacks on O_DIRECT to all write requests. This is
> done to not use the dom0 page cache - otherwise you end up with
> a double buffer where the writes are insane speed - but with absolutly
> no safety.
>
> If you want to try disabling that (so no O_DIRECT), I would do this
> little change:
>
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index bf4b9d2..823b629 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> break;
> case BLKIF_OP_WRITE:
> blkif->st_wr_req++;
> - operation = WRITE_ODIRECT;
> + operation = WRITE;
> break;
> case BLKIF_OP_WRITE_BARRIER:
> drain = true;
With the above results, is this still useful?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 4.2.1: Poor write performance for DomU.
2013-09-06 23:06 ` Steven Haigh
@ 2013-09-06 23:37 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-06 23:37 UTC (permalink / raw)
To: Steven Haigh; +Cc: roger.pau, xen-devel
Steven Haigh <netwiz@crc.id.au> wrote:
>On 06/09/13 23:33, Konrad Rzeszutek Wilk wrote:
>> On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:
>>> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
>>>> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>>>>> So, based on my tests yesterday, I decided to break the RAID6 and
>>>>> pull a drive out of it to test directly on the 2Tb drives in
>>>>> question.
>>>>>
>>>>> The array in question:
>>>>> # cat /proc/mdstat
>>>>> Personalities : [raid1] [raid6] [raid5] [raid4]
>>>>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>>>>> 3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>>>> [4/4] [UUUU]
>>>>>
>>>>> # mdadm /dev/md2 --fail /dev/sdf
>>>>> mdadm: set /dev/sdf faulty in /dev/md2
>>>>> # mdadm /dev/md2 --remove /dev/sdf
>>>>> mdadm: hot removed /dev/sdf from /dev/md2
>>>>>
>>>>> So, all tests are to be done on /dev/sdf.
>>>>> Model Family: Seagate SV35
>>>>> Device Model: ST2000VX000-9YW164
>>>>> Serial Number: Z1E17C3X
>>>>> LU WWN Device Id: 5 000c50 04e1bc6f0
>>>>> Firmware Version: CV13
>>>>> User Capacity: 2,000,398,934,016 bytes [2.00 TB]
>>>>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>>>>
>>>>> From the Dom0:
>>>>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>>>>
>>>>> Create a single partition on the drive, and format it with ext4:
>>>>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>>>>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168
>sectors
>>>>> Units = sectors of 1 * 512 = 512 bytes
>>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>>> Disk identifier: 0x98d8baaf
>>>>>
>>>>> Device Boot Start End Blocks Id System
>>>>> /dev/sdf1 2048 3907029167 1953513560 83 Linux
>>>>>
>>>>> Command (m for help): w
>>>>>
>>>>> # mkfs.ext4 -j /dev/sdf1
>>>>> ......
>>>>> Writing inode tables: done
>>>>> Creating journal (32768 blocks): done
>>>>> Writing superblocks and filesystem accounting information: done
>>>>>
>>>>> Mount it on the Dom0:
>>>>> # mount /dev/sdf1 /mnt/esata/
>>>>> # cd /mnt/esata/
>>>>> # bonnie++ -d . -u 0:0
>>>>> ....
>>>>> Version 1.96 ------Sequential Output------ --Sequential
>>>>> Input- --Random-
>>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>>> --Block-- --Seeks--
>>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>>> %CP /sec %CP
>>>>> xenhost.lan.crc. 2G 425 94 133607 24 60544 12 973 95
>209114
>>>>> 17 296.4 6
>>>>> Latency 70971us 190ms 221ms 40369us
>17657us
>>>>> 164ms
>>>>>
>>>>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>>>>
>>>>> Now, I'll attach the full disk to a DomU:
>>>>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>>>>
>>>>> And we'll test from the DomU.
>>>>>
>>>>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>>>>
>>>>> Partition the same as in the Dom0 and create an ext4 filesystem on
>it:
>>>>>
>>>>> I notice something interesting here. In the Dom0, the device is
>seen as:
>>>>> Units = sectors of 1 * 512 = 512 bytes
>>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>>>
>>>>> In the DomU, it is seen as:
>>>>> Units = sectors of 1 * 512 = 512 bytes
>>>>> Sector size (logical/physical): 512 bytes / 512 bytes
>>>>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>>>>
>>>>> Not sure if this could be related - but continuing testing:
>>>>> Device Boot Start End Blocks Id System
>>>>> /dev/xvdc1 2048 3907029167 1953513560 83 Linux
>>>>>
>>>>> # mkfs.ext4 -j /dev/xvdc1
>>>>> ....
>>>>> Allocating group tables: done
>>>>> Writing inode tables: done
>>>>> Creating journal (32768 blocks): done
>>>>> Writing superblocks and filesystem accounting information: done
>>>>>
>>>>> # mount /dev/xvdc1 /mnt/esata/
>>>>> # cd /mnt/esata/
>>>>> # bonnie++ -d . -u 0:0
>>>>> ....
>>>>> Version 1.96 ------Sequential Output------ --Sequential
>>>>> Input- --Random-
>>>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>>> --Block-- --Seeks--
>>>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>>> %CP /sec %CP
>>>>> zeus.crc.id.au 2G 396 99 116530 23 50451 15 1035 99
>176407
>>>>> 23 313.4 9
>>>>> Latency 34615us 130ms 128ms 33316us
>74401us
>>>>> 130ms
>>>>>
>>>>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>>>>> from the DomU. More than acceptable.
>>>>>
>>>>> It leaves me to wonder.... Could there be something in the Dom0
>>>>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>>>>> 512 byte sectors cause an issue?
>>>>
>>>> There is certain overhead in it. I still have this in my mailbox
>>>> so I am not sure whether this issue got ever resolved? I know that
>the
>>>> indirect patches in Xen blkback and xen blkfront are meant to
>resolve
>>>> some of these issues - by being able to carry a bigger payload.
>>>>
>>>> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
>>>
>>> Ok, so I finally got around to building kernel 3.11 RPMs today for
>>> testing. I upgraded both the Dom0 and DomU to the same kernel:
>>
>> Woohoo!
>>>
>>> DomU:
>>> # dmesg | grep blkfront
>>> blkfront: xvda: flush diskcache: enabled; persistent grants:
>enabled;
>>> indirect descriptors: enabled;
>>> blkfront: xvdb: flush diskcache: enabled; persistent grants:
>enabled;
>>> indirect descriptors: enabled;
>>>
>>> Looks good.
>>>
>>> Transfer tests using bonnie++ as per before:
>>> # bonnie -d . -u 0:0
>>> Version 1.96 ------Sequential Output------ --Sequential
>Input-
>>> --Random-
>>> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr-
>--Block--
>>> --Seeks--
>>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>%CP
>>> /sec %CP
>>> zeus.crc.id.au 2G 603 92 58250 9 62248 14 886 99 295757
>30
>>> 492.3 13
>>> Latency 27305us 124ms 158ms 34222us 16865us
>>> 374ms
>>> Version 1.96 ------Sequential Create------ --------Random
>>> Create--------
>>> zeus.crc.id.au -Create-- --Read--- -Delete-- -Create--
>--Read---
>>> -Delete--
>>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec
>%CP
>>> /sec %CP
>>> 16 10048 22 +++++ +++ 17849 29 11109 25 +++++
>+++
>>> 18389 31
>>> Latency 17775us 154us 180us 16008us 38us
>>> 58us
>>>
>>> Still seems to be a massive discrepancy between Dom0 and DomU write
>>> speeds. Interesting is that sequential block reads are nearly
>300MB/sec,
>>> yet sequential writes were only ~58MB/sec.
>>
>> OK, so the other thing that people were pointing out that is you
>> can use xen-blkfront.max parameter. By default it is 32, but try 8.
>> Or 64. Or 256.
>
>Ahh - interesting.
>
>I used the following:
>Kernel command line: ro root=/dev/xvda rd_NO_LUKS rd_NO_DM
>LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
>crashkernel=auto console=hvc0 xen-blkfront.max=X
>
>8:
>Version 1.96 ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au 2G 696 92 50906 7 46102 11 1013 97 256784 27
>496.5 10
>Latency 24374us 199ms 117ms 30855us 38008us
>85175us
>
>16:
>Version 1.96 ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au 2G 675 92 58078 8 57585 13 1005 97 262735 25
>505.6 10
>Latency 24412us 187ms 183ms 23661us 53850us
>232ms
>
>32:
>Version 1.96 ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au 2G 698 92 57416 8 63328 13 1063 97 267154 24
>498.2 12
>Latency 24264us 199ms 81362us 33144us 22526us
>237ms
>
>64:
>Version 1.96 ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au 2G 574 86 88447 13 68988 17 897 97 265128 27
>493.7 13
>
>128:
>Version 1.96 ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au 2G 702 97 107638 14 70158 15 1045 97 255596 24
>491.0 12
>Latency 27279us 17553us 134ms 29771us 38392us
>65761us
>
>256:
>Version 1.96 ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au 2G 689 91 102554 14 67337 15 1012 97 262475 24
>484.4 12
>Latency 20642us 104ms 189ms 36624us 45286us
>80023us
>
>So, as a nice summary:
>8: 50Mb/sec
>16: 58Mb/sec
>32: 57Mb/sec
>64: 88Mb/sec
>128: 107Mb/sec
>256: 102Mb/sec
>
>So, maybe it's coincidence, maybe it isn't - but the best (factoring
>margin of error) seems to be 128 - which happens to be the block size
>of
>the underlying RAID6 array on the Dom0.
>
># cat /proc/mdstat
>md2 : active raid6 sdd[5] sdc[4] sdf[1] sde[0]
> 3906766592 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4]
>[UUUU]
>
>> The indirect descriptor allows us to put more I/Os on the ring - and
>> I am hoping that will:
>> a) solve your problem
>
>Well, it looks like this solves the issue - at least increasing the max
>causes almost double the write speed - and no change to read speeds
>(within margin of error).
>
>> b) not solve your problem, but demonstrate that the issue is not
>with
>> the ring, but with something else making your writes slower.
>>
>> Hmm, are you by any chance using O_DIRECT when running bonnie++ in
>> dom0? The xen-blkback tacks on O_DIRECT to all write requests. This
>is
>> done to not use the dom0 page cache - otherwise you end up with
>> a double buffer where the writes are insane speed - but with
>absolutly
>> no safety.
>>
>> If you want to try disabling that (so no O_DIRECT), I would do this
>> little change:
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c
>b/drivers/block/xen-blkback/blkback.c
>> index bf4b9d2..823b629 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct
>xen_blkif *blkif,
>> break;
>> case BLKIF_OP_WRITE:
>> blkif->st_wr_req++;
>> - operation = WRITE_ODIRECT;
>> + operation = WRITE;
>> break;
>> case BLKIF_OP_WRITE_BARRIER:
>> drain = true;
>
>With the above results, is this still useful?
No. There is no need. Awesome that this fixed it. Roger had mentioned that he had seen similar behavior. We should probably do a patch that interrogates the backend for optimal segment size and informs the frontend - so it can set it not.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2013-09-06 23:37 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-20 2:10 4.2.1: Poor write performance for DomU Steven Haigh
2013-02-20 8:26 ` Roger Pau Monné
2013-02-20 8:49 ` Steven Haigh
2013-02-20 9:49 ` Steven Haigh
2013-02-20 10:12 ` Jan Beulich
2013-02-20 11:06 ` Andrew Cooper
2013-02-20 11:08 ` Steven Haigh
2013-02-20 12:48 ` Andrew Cooper
2013-02-20 13:18 ` Pasi Kärkkäinen
2013-03-08 20:42 ` Konrad Rzeszutek Wilk
2013-03-08 8:54 ` Steven Haigh
2013-03-08 9:43 ` Roger Pau Monné
2013-03-08 9:46 ` Steven Haigh
2013-03-08 9:54 ` Roger Pau Monné
2013-03-08 20:49 ` Konrad Rzeszutek Wilk
2013-03-08 22:30 ` Steven Haigh
2013-03-11 13:30 ` Konrad Rzeszutek Wilk
2013-03-11 13:37 ` Steven Haigh
2013-03-12 13:04 ` Konrad Rzeszutek Wilk
2013-03-12 14:08 ` Steven Haigh
[not found] ` <514EA337.7030303@crc.id.au>
[not found] ` <514EA6B0.8010504@crc.id.au>
[not found] ` <514EA741.7050403@crc.id.au>
2013-03-24 9:10 ` Steven Haigh
2013-03-24 9:54 ` Steven Haigh
2013-03-25 2:21 ` Steven Haigh
2013-08-20 16:48 ` Konrad Rzeszutek Wilk
2013-08-20 18:25 ` Steven Haigh
2013-09-05 8:28 ` Steven Haigh
2013-09-06 13:33 ` Konrad Rzeszutek Wilk
2013-09-06 23:06 ` Steven Haigh
2013-09-06 23:37 ` Konrad Rzeszutek Wilk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.