All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.2.1: Poor write performance for DomU.
@ 2013-02-20  2:10 Steven Haigh
  2013-02-20  8:26 ` Roger Pau Monné
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-02-20  2:10 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5443 bytes --]

Hi guys,

Firstly, please CC me in to any replies as I'm not a subscriber these days.

I've been trying to debug a problem with Xen 4.2.1 where I am unable to 
achieve more than ~50Mb/sec sustained sequential write to a disk. The 
DomU is configured as such:

name            = "zeus.vm"
memory          = 1024
vcpus           = 2
cpus            = "1-3"
disk            = [ 'phy:/dev/RAID1/zeus.vm,xvda,w', 
'phy:/dev/vg_raid6/fileshare,xvdb,w' ]
vif             = [ "mac=02:16:36:35:35:09, bridge=br203, 
vifname=vm.zeus.203", "mac=10:16:36:35:35:09, bridge=br10, 
vifname=vm.zeus.10" ]
bootloader      = "pygrub"

on_poweroff     = 'destroy'
on_reboot       = 'restart'
on_crash        = 'restart'

I have tested the underlying LVM config by mounting 
/dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as a 
benchmark:

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
xenhost.lan.crc. 2G   667  96 186976  21 80430  14   956  95 290591  26 
373.7   8
Latency             26416us     212ms     168ms   35494us   35989us 83759us
Version  1.96       ------Sequential Create------ --------Random 
Create--------
xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 14901  32 +++++ +++ 19672  39 15307  34 +++++ +++ 
18158  37
Latency             17838us     141us     298us     365us     133us 296us

~186Mb/sec write, ~290Mb/sec read. Awesome.

I then started a single DomU which gets passed /dev/vg_raid6/fileshare 
through as xvdb. It is then mounted in /mnt/fileshare/. I then ran 
bonnie++ again in the DomU:

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
zeus.crc.id.au   2G   658  96 50618   8 42398  10  1138  99 267568  30 
494.9  11
Latency             22959us     226ms     311ms   14617us   41816us 72814us
Version  1.96       ------Sequential Create------ --------Random 
Create--------
zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 21749  59 +++++ +++ 31089  73 23283  64 +++++ +++ 
31114  75
Latency             18989us     164us     928us     480us      26us  87us

~50Mb/sec write, ~267Mb/sec read. Not so awesome.

/dev/vg_raid6/fileshare exists as an LV on /dev/md2:

# lvdisplay vg_raid6/fileshare
   --- Logical volume ---
   LV Path                /dev/vg_raid6/fileshare
   LV Name                fileshare
   VG Name                vg_raid6
   LV UUID                cwC0yK-Xr56-WB5v-10bw-3AZT-pYj0-piWett
   LV Write Access        read/write
   LV Creation host, time xenhost.lan.crc.id.au, 2013-02-18 20:59:40 +1100
   LV Status              available
   # open                 1
   LV Size                2.50 TiB
   Current LE             655360
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     1024
   Block device           253:5


md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 
[4/4] [UUUU]

Heres a quick output of 'xm info' - although its full VM load is running 
now:
# xm info
host                   : xenhost.lan.crc.id.au
release                : 3.7.9-1.el6xen.x86_64
version                : #1 SMP Mon Feb 18 14:46:35 EST 2013
machine                : x86_64
nr_cpus                : 4
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 3303
hw_caps                : 
bfebfbff:28100800:00000000:00003f40:179ae3bf:00000000:00000001:00000000
virt_caps              : hvm
total_memory           : 8116
free_memory            : 1346
free_cpus              : 0
xen_major              : 4
xen_minor              : 2
xen_extra              : .1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
xen_commandline        : dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 
dom0_vcpus_pin
cc_compiler            : gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
cc_compile_by          : mockbuild
cc_compile_domain      : crc.id.au
cc_compile_date        : Sat Feb 16 19:16:38 EST 2013
xend_config_format     : 4

In a nutshell, does anyone know *why* I am only able to get ~50Mb/sec 
sequential writes to the DomU? It certainly isn't a problem getting 
normal speeds to the LV while mounted in the Dom0.

All OS are Scientific Linux 6.3. The Dom0 runs packages from my 
kernel-xen repo (http://au1.mirror.crc.id.au/repo/el6/x86_64/). The DomU 
is completely stock packages.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20  2:10 4.2.1: Poor write performance for DomU Steven Haigh
@ 2013-02-20  8:26 ` Roger Pau Monné
  2013-02-20  8:49   ` Steven Haigh
  0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2013-02-20  8:26 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel

On 20/02/13 03:10, Steven Haigh wrote:
> Hi guys,
> 
> Firstly, please CC me in to any replies as I'm not a subscriber these days.
> 
> I've been trying to debug a problem with Xen 4.2.1 where I am unable to 
> achieve more than ~50Mb/sec sustained sequential write to a disk. The 
> DomU is configured as such:

Since you mention 4.2.1 explicitly, is this a performance regression
from previous versions? (4.2.0 or the 4.1 branch)

> name            = "zeus.vm"
> memory          = 1024
> vcpus           = 2
> cpus            = "1-3"
> disk            = [ 'phy:/dev/RAID1/zeus.vm,xvda,w', 
> 'phy:/dev/vg_raid6/fileshare,xvdb,w' ]
> vif             = [ "mac=02:16:36:35:35:09, bridge=br203, 
> vifname=vm.zeus.203", "mac=10:16:36:35:35:09, bridge=br10, 
> vifname=vm.zeus.10" ]
> bootloader      = "pygrub"
> 
> on_poweroff     = 'destroy'
> on_reboot       = 'restart'
> on_crash        = 'restart'
> 
> I have tested the underlying LVM config by mounting 
> /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as a 
> benchmark:
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- 
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
> /sec %CP
> xenhost.lan.crc. 2G   667  96 186976  21 80430  14   956  95 290591  26 
> 373.7   8
> Latency             26416us     212ms     168ms   35494us   35989us 83759us
> Version  1.96       ------Sequential Create------ --------Random 
> Create--------
> xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
> /sec %CP
>                   16 14901  32 +++++ +++ 19672  39 15307  34 +++++ +++ 
> 18158  37
> Latency             17838us     141us     298us     365us     133us 296us
> 
> ~186Mb/sec write, ~290Mb/sec read. Awesome.
> 
> I then started a single DomU which gets passed /dev/vg_raid6/fileshare 
> through as xvdb. It is then mounted in /mnt/fileshare/. I then ran 
> bonnie++ again in the DomU:
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- 
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
> /sec %CP
> zeus.crc.id.au   2G   658  96 50618   8 42398  10  1138  99 267568  30 
> 494.9  11
> Latency             22959us     226ms     311ms   14617us   41816us 72814us
> Version  1.96       ------Sequential Create------ --------Random 
> Create--------
> zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
> /sec %CP
>                   16 21749  59 +++++ +++ 31089  73 23283  64 +++++ +++ 
> 31114  75
> Latency             18989us     164us     928us     480us      26us  87us
> 
> ~50Mb/sec write, ~267Mb/sec read. Not so awesome.

We are currently working on improving the speed of pv block drivers, I
will look into this difference between the read/write speed, but I would
guess this is due to the size of the request/ring.

> 
> /dev/vg_raid6/fileshare exists as an LV on /dev/md2:
> 
> # lvdisplay vg_raid6/fileshare
>    --- Logical volume ---
>    LV Path                /dev/vg_raid6/fileshare
>    LV Name                fileshare
>    VG Name                vg_raid6
>    LV UUID                cwC0yK-Xr56-WB5v-10bw-3AZT-pYj0-piWett
>    LV Write Access        read/write
>    LV Creation host, time xenhost.lan.crc.id.au, 2013-02-18 20:59:40 +1100
>    LV Status              available
>    # open                 1
>    LV Size                2.50 TiB
>    Current LE             655360
>    Segments               1
>    Allocation             inherit
>    Read ahead sectors     auto
>    - currently set to     1024
>    Block device           253:5
> 
> 
> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>        3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 
> [4/4] [UUUU]
> 
> Heres a quick output of 'xm info' - although its full VM load is running 
> now:
> # xm info
> host                   : xenhost.lan.crc.id.au
> release                : 3.7.9-1.el6xen.x86_64
> version                : #1 SMP Mon Feb 18 14:46:35 EST 2013
> machine                : x86_64
> nr_cpus                : 4
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 3303
> hw_caps                : 
> bfebfbff:28100800:00000000:00003f40:179ae3bf:00000000:00000001:00000000
> virt_caps              : hvm
> total_memory           : 8116
> free_memory            : 1346
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 2
> xen_extra              : .1
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : unavailable
> xen_commandline        : dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 
> dom0_vcpus_pin
> cc_compiler            : gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
> cc_compile_by          : mockbuild
> cc_compile_domain      : crc.id.au
> cc_compile_date        : Sat Feb 16 19:16:38 EST 2013
> xend_config_format     : 4
> 
> In a nutshell, does anyone know *why* I am only able to get ~50Mb/sec 
> sequential writes to the DomU? It certainly isn't a problem getting 
> normal speeds to the LV while mounted in the Dom0.
> 
> All OS are Scientific Linux 6.3. The Dom0 runs packages from my 
> kernel-xen repo (http://au1.mirror.crc.id.au/repo/el6/x86_64/). The DomU 
> is completely stock packages.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20  8:26 ` Roger Pau Monné
@ 2013-02-20  8:49   ` Steven Haigh
  2013-02-20  9:49     ` Steven Haigh
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-02-20  8:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2086 bytes --]

On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
> On 20/02/13 03:10, Steven Haigh wrote:
>> Hi guys,
>>
>> Firstly, please CC me in to any replies as I'm not a subscriber these days.
>>
>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>> DomU is configured as such:
>
> Since you mention 4.2.1 explicitly, is this a performance regression
> from previous versions? (4.2.0 or the 4.1 branch)

This is actually a very good question. I've reinstalled my older 
packages of Xen 4.1.3 back on the system. Rebooting into the new 
hypervisor, then starting the single DomU again. Ran bonnie++ again on 
the DomU:

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
zeus.crc.id.au   2G   658  97 54893   9 40845  10  1056  97 280453  33 
561.2  13
Latency             27145us     426ms     257ms   31900us   24701us 
222ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 19281  52 +++++ +++ +++++ +++ 24435  66 +++++ +++ 
+++++ +++
Latency             22860us     182us     706us   14803us      28us 
300us


Still around 50Mb/sec - so this doesn't seem to be a regression, but 
something else?


>> ~50Mb/sec write, ~267Mb/sec read. Not so awesome.
>
> We are currently working on improving the speed of pv block drivers, I
> will look into this difference between the read/write speed, but I would
> guess this is due to the size of the request/ring.

I would assume this would be in the DomU kernel?

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20  8:49   ` Steven Haigh
@ 2013-02-20  9:49     ` Steven Haigh
  2013-02-20 10:12       ` Jan Beulich
  2013-03-08  8:54       ` Steven Haigh
  0 siblings, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-02-20  9:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2143 bytes --]

On 20/02/2013 7:49 PM, Steven Haigh wrote:
> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>> On 20/02/13 03:10, Steven Haigh wrote:
>>> Hi guys,
>>>
>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>> days.
>>>
>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>> DomU is configured as such:
>>
>> Since you mention 4.2.1 explicitly, is this a performance regression
>> from previous versions? (4.2.0 or the 4.1 branch)
>
> This is actually a very good question. I've reinstalled my older
> packages of Xen 4.1.3 back on the system. Rebooting into the new
> hypervisor, then starting the single DomU again. Ran bonnie++ again on
> the DomU:
>
> Still around 50Mb/sec - so this doesn't seem to be a regression, but
> something else?

I've actually done a bit of thinking about this... A recent thread on 
linux-raid kernel mailing list about Xen and DomU throughput made me 
revisit my setup. I know I used to be able to saturate GigE both ways 
(send and receive) to the samba share served by this DomU. This would 
mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen 
versions this was as this point in time I cannot say.

As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk 
size. This seemed to make rebuild/resync speeds way worse - so I 
reverted to 128Kb chunk size.

The benchmarks I am getting from the Dom0 is about what I'd expect - but 
I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass 
through of the LV.

 From my known config where I could saturate the GigE connection, I have 
changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla 
kernels - currently 3.7.9.

My build of Xen 4.2.1 also has all of the recent security advisories 
patched as well. Although it is interesting to note that downgrading to 
Xen 4.1.2 made no difference to write speeds.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20  9:49     ` Steven Haigh
@ 2013-02-20 10:12       ` Jan Beulich
  2013-02-20 11:06         ` Andrew Cooper
  2013-03-08  8:54       ` Steven Haigh
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2013-02-20 10:12 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel, roger.pau

>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
> My build of Xen 4.2.1 also has all of the recent security advisories 
> patched as well. Although it is interesting to note that downgrading to 
> Xen 4.1.2 made no difference to write speeds.

Not surprising at all, considering that the hypervisor is only a passive
library for all PV I/O purposes. You're likely hunting for a kernel side
regression (and hence the mentioning of the hypervisor version as
the main factor in the subject is probably misleading).

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20 10:12       ` Jan Beulich
@ 2013-02-20 11:06         ` Andrew Cooper
  2013-02-20 11:08           ` Steven Haigh
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2013-02-20 11:06 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Roger Pau Monne, Jan Beulich, xen-devel

On 20/02/13 10:12, Jan Beulich wrote:
>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
>> My build of Xen 4.2.1 also has all of the recent security advisories 
>> patched as well. Although it is interesting to note that downgrading to 
>> Xen 4.1.2 made no difference to write speeds.
> Not surprising at all, considering that the hypervisor is only a passive
> library for all PV I/O purposes. You're likely hunting for a kernel side
> regression (and hence the mentioning of the hypervisor version as
> the main factor in the subject is probably misleading).
>
> Jan

Further to this, do try to verify if your disk driver has changed
recently to use >0 order page allocations for DMA.  If it has, then
speed will be much slower as there will now be the swiotlb cpu-copy
overhead.

~Andrew

>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20 11:06         ` Andrew Cooper
@ 2013-02-20 11:08           ` Steven Haigh
  2013-02-20 12:48             ` Andrew Cooper
  2013-02-20 13:18             ` Pasi Kärkkäinen
  0 siblings, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-02-20 11:08 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monne, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1322 bytes --]

On 20/02/2013 10:06 PM, Andrew Cooper wrote:
> On 20/02/13 10:12, Jan Beulich wrote:
>>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>> patched as well. Although it is interesting to note that downgrading to
>>> Xen 4.1.2 made no difference to write speeds.
>> Not surprising at all, considering that the hypervisor is only a passive
>> library for all PV I/O purposes. You're likely hunting for a kernel side
>> regression (and hence the mentioning of the hypervisor version as
>> the main factor in the subject is probably misleading).
>>
>> Jan
>
> Further to this, do try to verify if your disk driver has changed
> recently to use >0 order page allocations for DMA.  If it has, then
> speed will be much slower as there will now be the swiotlb cpu-copy
> overhead.

Any hints on how to do this? ;)

The kernel modules in use for my SATA drives are ahci and sata_mv. There 
are 6 drives in total on the system.

sda + sdb = RAID1
sd[c-f] = RAID6

sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
sde, sdf are on the sata_mv 4x PCIe controller.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299



[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20 11:08           ` Steven Haigh
@ 2013-02-20 12:48             ` Andrew Cooper
  2013-02-20 13:18             ` Pasi Kärkkäinen
  1 sibling, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2013-02-20 12:48 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Roger Pau Monne, Jan Beulich, xen-devel

On 20/02/13 11:08, Steven Haigh wrote:
> On 20/02/2013 10:06 PM, Andrew Cooper wrote:
>> On 20/02/13 10:12, Jan Beulich wrote:
>>>>>> On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
>>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>>> patched as well. Although it is interesting to note that
>>>> downgrading to
>>>> Xen 4.1.2 made no difference to write speeds.
>>> Not surprising at all, considering that the hypervisor is only a
>>> passive
>>> library for all PV I/O purposes. You're likely hunting for a kernel
>>> side
>>> regression (and hence the mentioning of the hypervisor version as
>>> the main factor in the subject is probably misleading).
>>>
>>> Jan
>>
>> Further to this, do try to verify if your disk driver has changed
>> recently to use >0 order page allocations for DMA.  If it has, then
>> speed will be much slower as there will now be the swiotlb cpu-copy
>> overhead.
>
> Any hints on how to do this? ;)
>
> The kernel modules in use for my SATA drives are ahci and sata_mv.
> There are 6 drives in total on the system.
>
> sda + sdb = RAID1
> sd[c-f] = RAID6
>
> sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
> sde, sdf are on the sata_mv 4x PCIe controller.
>

Sadly that is a hard question to answer, and is driver specific.  I cant
suggest an easy way other than digging into the source.

~Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20 11:08           ` Steven Haigh
  2013-02-20 12:48             ` Andrew Cooper
@ 2013-02-20 13:18             ` Pasi Kärkkäinen
  2013-03-08 20:42               ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 29+ messages in thread
From: Pasi Kärkkäinen @ 2013-02-20 13:18 UTC (permalink / raw)
  To: Steven Haigh; +Cc: Andrew Cooper, xen-devel, Jan Beulich, Roger Pau Monne

On Wed, Feb 20, 2013 at 10:08:58PM +1100, Steven Haigh wrote:
> On 20/02/2013 10:06 PM, Andrew Cooper wrote:
> >On 20/02/13 10:12, Jan Beulich wrote:
> >>>>>On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
> >>>My build of Xen 4.2.1 also has all of the recent security advisories
> >>>patched as well. Although it is interesting to note that downgrading to
> >>>Xen 4.1.2 made no difference to write speeds.
> >>Not surprising at all, considering that the hypervisor is only a passive
> >>library for all PV I/O purposes. You're likely hunting for a kernel side
> >>regression (and hence the mentioning of the hypervisor version as
> >>the main factor in the subject is probably misleading).
> >>
> >>Jan
> >
> >Further to this, do try to verify if your disk driver has changed
> >recently to use >0 order page allocations for DMA.  If it has, then
> >speed will be much slower as there will now be the swiotlb cpu-copy
> >overhead.
> 
> Any hints on how to do this? ;)
> 
> The kernel modules in use for my SATA drives are ahci and sata_mv.
> There are 6 drives in total on the system.
> 
> sda + sdb = RAID1
> sd[c-f] = RAID6
> 
> sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
> sde, sdf are on the sata_mv 4x PCIe controller.
> 

Can you try using only the disks on the ahci controller?

sata_mv is known to be buggy and problematic.. 
I'm not sure if that's the case here, but if you're able to easily 
try using only ahci, it's worth a shot. 

-- Pasi

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20  9:49     ` Steven Haigh
  2013-02-20 10:12       ` Jan Beulich
@ 2013-03-08  8:54       ` Steven Haigh
  2013-03-08  9:43         ` Roger Pau Monné
  2013-03-08 20:49         ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-08  8:54 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2336 bytes --]

On 20/02/2013 8:49 PM, Steven Haigh wrote:
> On 20/02/2013 7:49 PM, Steven Haigh wrote:
>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>>> On 20/02/13 03:10, Steven Haigh wrote:
>>>> Hi guys,
>>>>
>>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>>> days.
>>>>
>>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>>> DomU is configured as such:
>>>
>>> Since you mention 4.2.1 explicitly, is this a performance regression
>>> from previous versions? (4.2.0 or the 4.1 branch)
>>
>> This is actually a very good question. I've reinstalled my older
>> packages of Xen 4.1.3 back on the system. Rebooting into the new
>> hypervisor, then starting the single DomU again. Ran bonnie++ again on
>> the DomU:
>>
>> Still around 50Mb/sec - so this doesn't seem to be a regression, but
>> something else?
>
> I've actually done a bit of thinking about this... A recent thread on
> linux-raid kernel mailing list about Xen and DomU throughput made me
> revisit my setup. I know I used to be able to saturate GigE both ways
> (send and receive) to the samba share served by this DomU. This would
> mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
> versions this was as this point in time I cannot say.
>
> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
> size. This seemed to make rebuild/resync speeds way worse - so I
> reverted to 128Kb chunk size.
>
> The benchmarks I am getting from the Dom0 is about what I'd expect - but
> I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
> through of the LV.
>
>  From my known config where I could saturate the GigE connection, I have
> changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
> kernels - currently 3.7.9.
>
> My build of Xen 4.2.1 also has all of the recent security advisories
> patched as well. Although it is interesting to note that downgrading to
> Xen 4.1.2 made no difference to write speeds.
>

Just wondering if there is any further news or tests that I might be 
able to do on this?

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-08  8:54       ` Steven Haigh
@ 2013-03-08  9:43         ` Roger Pau Monné
  2013-03-08  9:46           ` Steven Haigh
  2013-03-08 20:49         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2013-03-08  9:43 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel

On 08/03/13 09:54, Steven Haigh wrote:
> Just wondering if there is any further news or tests that I might be
> able to do on this?

I have been working on speed improvements for blkfront/blkback, and
submitted the first RFC series of patches last week, which can be found
at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a
WIP, so if you want to test them please be aware there might be hidden
bugs. I've also pushed them to a branch in my git repo:

git://xenbits.xen.org/people/royger/linux.git xen-block-indirect

You will need to recompile both the Dom0/DomU kernels (if they are not
the same) if you want to test them.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-08  9:43         ` Roger Pau Monné
@ 2013-03-08  9:46           ` Steven Haigh
  2013-03-08  9:54             ` Roger Pau Monné
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-08  9:46 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 991 bytes --]

On 8/03/2013 8:43 PM, Roger Pau Monné wrote:
> On 08/03/13 09:54, Steven Haigh wrote:
>> Just wondering if there is any further news or tests that I might be
>> able to do on this?
>
> I have been working on speed improvements for blkfront/blkback, and
> submitted the first RFC series of patches last week, which can be found
> at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a
> WIP, so if you want to test them please be aware there might be hidden
> bugs. I've also pushed them to a branch in my git repo:
>
> git://xenbits.xen.org/people/royger/linux.git xen-block-indirect
>
> You will need to recompile both the Dom0/DomU kernels (if they are not
> the same) if you want to test them.
>

Hmm - how will this react with using say, a stock kernel in the DomU (ie 
EL6.3 kernel) but these changes in the Dom0?

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-08  9:46           ` Steven Haigh
@ 2013-03-08  9:54             ` Roger Pau Monné
  0 siblings, 0 replies; 29+ messages in thread
From: Roger Pau Monné @ 2013-03-08  9:54 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel

On 08/03/13 10:46, Steven Haigh wrote:
> On 8/03/2013 8:43 PM, Roger Pau Monné wrote:
>> On 08/03/13 09:54, Steven Haigh wrote:
>>> Just wondering if there is any further news or tests that I might be
>>> able to do on this?
>>
>> I have been working on speed improvements for blkfront/blkback, and
>> submitted the first RFC series of patches last week, which can be found
>> at http://thread.gmane.org/gmane.linux.kernel/1448584. This is still a
>> WIP, so if you want to test them please be aware there might be hidden
>> bugs. I've also pushed them to a branch in my git repo:
>>
>> git://xenbits.xen.org/people/royger/linux.git xen-block-indirect
>>
>> You will need to recompile both the Dom0/DomU kernels (if they are not
>> the same) if you want to test them.
>>
> 
> Hmm - how will this react with using say, a stock kernel in the DomU (ie
> EL6.3 kernel) but these changes in the Dom0?

It should work, but you won't be able to see much performance
improvements (if any at all). Anyway, I've just referred to this series
for testing, but this should not be used on anything that's supposed to
be stable/production.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-02-20 13:18             ` Pasi Kärkkäinen
@ 2013-03-08 20:42               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-08 20:42 UTC (permalink / raw)
  To: Pasi Kärkkäinen
  Cc: Andrew Cooper, Roger Pau Monne, Steven Haigh, Jan Beulich, xen-devel

On Wed, Feb 20, 2013 at 03:18:46PM +0200, Pasi Kärkkäinen wrote:
> On Wed, Feb 20, 2013 at 10:08:58PM +1100, Steven Haigh wrote:
> > On 20/02/2013 10:06 PM, Andrew Cooper wrote:
> > >On 20/02/13 10:12, Jan Beulich wrote:
> > >>>>>On 20.02.13 at 10:49, Steven Haigh <netwiz@crc.id.au> wrote:
> > >>>My build of Xen 4.2.1 also has all of the recent security advisories
> > >>>patched as well. Although it is interesting to note that downgrading to
> > >>>Xen 4.1.2 made no difference to write speeds.
> > >>Not surprising at all, considering that the hypervisor is only a passive
> > >>library for all PV I/O purposes. You're likely hunting for a kernel side
> > >>regression (and hence the mentioning of the hypervisor version as
> > >>the main factor in the subject is probably misleading).
> > >>
> > >>Jan
> > >
> > >Further to this, do try to verify if your disk driver has changed
> > >recently to use >0 order page allocations for DMA.  If it has, then
> > >speed will be much slower as there will now be the swiotlb cpu-copy
> > >overhead.
> > 
> > Any hints on how to do this? ;)

They are fine. They use the SG DMA API:

konrad@phenom:~/linux/drivers/ata$ grep "dma_map" *
libata-core.c:	n_elem = dma_map_sg(ap->dev, qc->sg, qc->n_elem, qc->dma_dir);

> > 
> > The kernel modules in use for my SATA drives are ahci and sata_mv.
> > There are 6 drives in total on the system.
> > 
> > sda + sdb = RAID1
> > sd[c-f] = RAID6
> > 
> > sda, sdb, sdc and sdd are on the onboard SATA controller (ahci)
> > sde, sdf are on the sata_mv 4x PCIe controller.
> > 
> 
> Can you try using only the disks on the ahci controller?
> 
> sata_mv is known to be buggy and problematic.. 
> I'm not sure if that's the case here, but if you're able to easily 
> try using only ahci, it's worth a shot. 
> 
> -- Pasi
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-08  8:54       ` Steven Haigh
  2013-03-08  9:43         ` Roger Pau Monné
@ 2013-03-08 20:49         ` Konrad Rzeszutek Wilk
  2013-03-08 22:30           ` Steven Haigh
  1 sibling, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-08 20:49 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel, Roger Pau Monné

On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
> On 20/02/2013 8:49 PM, Steven Haigh wrote:
> >On 20/02/2013 7:49 PM, Steven Haigh wrote:
> >>On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
> >>>On 20/02/13 03:10, Steven Haigh wrote:
> >>>>Hi guys,
> >>>>
> >>>>Firstly, please CC me in to any replies as I'm not a subscriber these
> >>>>days.
> >>>>
> >>>>I've been trying to debug a problem with Xen 4.2.1 where I am unable to
> >>>>achieve more than ~50Mb/sec sustained sequential write to a disk. The
> >>>>DomU is configured as such:
> >>>
> >>>Since you mention 4.2.1 explicitly, is this a performance regression
> >>>from previous versions? (4.2.0 or the 4.1 branch)
> >>
> >>This is actually a very good question. I've reinstalled my older
> >>packages of Xen 4.1.3 back on the system. Rebooting into the new
> >>hypervisor, then starting the single DomU again. Ran bonnie++ again on
> >>the DomU:
> >>
> >>Still around 50Mb/sec - so this doesn't seem to be a regression, but
> >>something else?
> >
> >I've actually done a bit of thinking about this... A recent thread on
> >linux-raid kernel mailing list about Xen and DomU throughput made me
> >revisit my setup. I know I used to be able to saturate GigE both ways
> >(send and receive) to the samba share served by this DomU. This would
> >mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
> >versions this was as this point in time I cannot say.
> >
> >As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
> >size. This seemed to make rebuild/resync speeds way worse - so I
> >reverted to 128Kb chunk size.
> >
> >The benchmarks I am getting from the Dom0 is about what I'd expect - but
> >I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
> >through of the LV.
> >
> > From my known config where I could saturate the GigE connection, I have
> >changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
> >kernels - currently 3.7.9.
> >
> >My build of Xen 4.2.1 also has all of the recent security advisories
> >patched as well. Although it is interesting to note that downgrading to
> >Xen 4.1.2 made no difference to write speeds.
> >
> 
> Just wondering if there is any further news or tests that I might be
> able to do on this?

So usually the problem like this is to unpeel the layers and find out 
which of them is at fault. You have a stacked block system - LVM on
top of RAID6 on top of block devices.

To figure out who is interferring with the speeds I would recommend
you fault one of the RAID6 disks (so take it out of the RAID6). Pass
it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
whether the write performance is different.

This is how I how do it:

[/dev/xvdXXX]
rw=write
direct=1
size=4g
ioengine=libaio
iodepth=32

Then progress up the stack. Try sticking the disk back in RAID6
and doing it on the RAID6. Then on the LVM and so on.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-08 20:49         ` Konrad Rzeszutek Wilk
@ 2013-03-08 22:30           ` Steven Haigh
  2013-03-11 13:30             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-08 22:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Roger Pau Monné, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5616 bytes --]

On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
>> On 20/02/2013 8:49 PM, Steven Haigh wrote:
>>> On 20/02/2013 7:49 PM, Steven Haigh wrote:
>>>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>>>>> On 20/02/13 03:10, Steven Haigh wrote:
>>>>>> Hi guys,
>>>>>>
>>>>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>>>>> days.
>>>>>>
>>>>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>>>>> DomU is configured as such:
>>>>>
>>>>> Since you mention 4.2.1 explicitly, is this a performance regression
>>>> >from previous versions? (4.2.0 or the 4.1 branch)
>>>>
>>>> This is actually a very good question. I've reinstalled my older
>>>> packages of Xen 4.1.3 back on the system. Rebooting into the new
>>>> hypervisor, then starting the single DomU again. Ran bonnie++ again on
>>>> the DomU:
>>>>
>>>> Still around 50Mb/sec - so this doesn't seem to be a regression, but
>>>> something else?
>>>
>>> I've actually done a bit of thinking about this... A recent thread on
>>> linux-raid kernel mailing list about Xen and DomU throughput made me
>>> revisit my setup. I know I used to be able to saturate GigE both ways
>>> (send and receive) to the samba share served by this DomU. This would
>>> mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
>>> versions this was as this point in time I cannot say.
>>>
>>> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
>>> size. This seemed to make rebuild/resync speeds way worse - so I
>>> reverted to 128Kb chunk size.
>>>
>>> The benchmarks I am getting from the Dom0 is about what I'd expect - but
>>> I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
>>> through of the LV.
>>>
>>>  From my known config where I could saturate the GigE connection, I have
>>> changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
>>> kernels - currently 3.7.9.
>>>
>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>> patched as well. Although it is interesting to note that downgrading to
>>> Xen 4.1.2 made no difference to write speeds.
>>>
>>
>> Just wondering if there is any further news or tests that I might be
>> able to do on this?
>
> So usually the problem like this is to unpeel the layers and find out
> which of them is at fault. You have a stacked block system - LVM on
> top of RAID6 on top of block devices.
>
> To figure out who is interferring with the speeds I would recommend
> you fault one of the RAID6 disks (so take it out of the RAID6). Pass
> it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
> run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
> whether the write performance is different.
>
> This is how I how do it:
>
> [/dev/xvdXXX]
> rw=write
> direct=1
> size=4g
> ioengine=libaio
> iodepth=32
>
> Then progress up the stack. Try sticking the disk back in RAID6
> and doing it on the RAID6. Then on the LVM and so on.

I did try to peel it back a single layer at a time. My test was simply 
using the same XFS filesystem in the Dom0 instead of the DomU.

I tested the underlying LVM config by mounting /dev/vg_raid6/fileshare 
from within the Dom0 and running bonnie++ as a benchmark:

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
xenhost.lan.crc. 2G   667  96 186976  21 80430  14   956  95 290591  26 
373.7   8
Latency             26416us     212ms     168ms   35494us   35989us 83759us
Version  1.96       ------Sequential Create------ --------Random 
Create--------
xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 14901  32 +++++ +++ 19672  39 15307  34 +++++ +++ 
18158  37
Latency             17838us     141us     298us     365us     133us 296us

~186Mb/sec write, ~290Mb/sec read. Awesome.

I then started a single DomU which gets passed /dev/vg_raid6/fileshare 
through as xvdb. It is then mounted in /mnt/fileshare/. I then ran 
bonnie++ again in the DomU:

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
zeus.crc.id.au   2G   658  96 50618   8 42398  10  1138  99 267568  30 
494.9  11
Latency             22959us     226ms     311ms   14617us   41816us 72814us
Version  1.96       ------Sequential Create------ --------Random 
Create--------
zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 21749  59 +++++ +++ 31089  73 23283  64 +++++ +++ 
31114  75
Latency             18989us     164us     928us     480us      26us  87us

~50Mb/sec write, ~267Mb/sec read. Not so awesome.

As such, the filesystem, RAID6, etc are completely unchanged. The only 
change is the access method Dom0 vs DomU.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-08 22:30           ` Steven Haigh
@ 2013-03-11 13:30             ` Konrad Rzeszutek Wilk
  2013-03-11 13:37               ` Steven Haigh
  0 siblings, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 13:30 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel, Roger Pau Monné

On Sat, Mar 09, 2013 at 09:30:54AM +1100, Steven Haigh wrote:
> On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote:
> >On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
> >>On 20/02/2013 8:49 PM, Steven Haigh wrote:
> >>>On 20/02/2013 7:49 PM, Steven Haigh wrote:
> >>>>On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
> >>>>>On 20/02/13 03:10, Steven Haigh wrote:
> >>>>>>Hi guys,
> >>>>>>
> >>>>>>Firstly, please CC me in to any replies as I'm not a subscriber these
> >>>>>>days.
> >>>>>>
> >>>>>>I've been trying to debug a problem with Xen 4.2.1 where I am unable to
> >>>>>>achieve more than ~50Mb/sec sustained sequential write to a disk. The
> >>>>>>DomU is configured as such:
> >>>>>
> >>>>>Since you mention 4.2.1 explicitly, is this a performance regression
> >>>>>from previous versions? (4.2.0 or the 4.1 branch)
> >>>>
> >>>>This is actually a very good question. I've reinstalled my older
> >>>>packages of Xen 4.1.3 back on the system. Rebooting into the new
> >>>>hypervisor, then starting the single DomU again. Ran bonnie++ again on
> >>>>the DomU:
> >>>>
> >>>>Still around 50Mb/sec - so this doesn't seem to be a regression, but
> >>>>something else?
> >>>
> >>>I've actually done a bit of thinking about this... A recent thread on
> >>>linux-raid kernel mailing list about Xen and DomU throughput made me
> >>>revisit my setup. I know I used to be able to saturate GigE both ways
> >>>(send and receive) to the samba share served by this DomU. This would
> >>>mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
> >>>versions this was as this point in time I cannot say.
> >>>
> >>>As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
> >>>size. This seemed to make rebuild/resync speeds way worse - so I
> >>>reverted to 128Kb chunk size.
> >>>
> >>>The benchmarks I am getting from the Dom0 is about what I'd expect - but
> >>>I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
> >>>through of the LV.
> >>>
> >>> From my known config where I could saturate the GigE connection, I have
> >>>changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
> >>>kernels - currently 3.7.9.
> >>>
> >>>My build of Xen 4.2.1 also has all of the recent security advisories
> >>>patched as well. Although it is interesting to note that downgrading to
> >>>Xen 4.1.2 made no difference to write speeds.
> >>>
> >>
> >>Just wondering if there is any further news or tests that I might be
> >>able to do on this?
> >
> >So usually the problem like this is to unpeel the layers and find out
> >which of them is at fault. You have a stacked block system - LVM on
> >top of RAID6 on top of block devices.
> >
> >To figure out who is interferring with the speeds I would recommend
> >you fault one of the RAID6 disks (so take it out of the RAID6). Pass
> >it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
> >run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
> >whether the write performance is different.
> >
> >This is how I how do it:
> >
> >[/dev/xvdXXX]
> >rw=write
> >direct=1
> >size=4g
> >ioengine=libaio
> >iodepth=32
> >
> >Then progress up the stack. Try sticking the disk back in RAID6
> >and doing it on the RAID6. Then on the LVM and so on.
> 
> I did try to peel it back a single layer at a time. My test was
> simply using the same XFS filesystem in the Dom0 instead of the
> DomU.

Right, you are using a filesystem. That is another layer :-)

And depending on what version of QEMU you have you might be using
QEMU as the block PV backend instead of the kernel one. There
were versions of QEMU that had highly inferior performance.

Hence I was thinking of just using a raw disk to test that.

> 
> I tested the underlying LVM config by mounting
> /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as
> a benchmark:


So still filesystem. Fio can do it on a block level.

What does 'xenstore-ls' show you and 'losetup -a'? I am really
curious as to where that file you are providing to the guest as
disk is being handled via 'loop' or via 'QEMU'.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-11 13:30             ` Konrad Rzeszutek Wilk
@ 2013-03-11 13:37               ` Steven Haigh
  2013-03-12 13:04                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-11 13:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 5905 bytes --]

On 12/03/2013 12:30 AM, Konrad Rzeszutek Wilk wrote:
> On Sat, Mar 09, 2013 at 09:30:54AM +1100, Steven Haigh wrote:
>> On 9/03/2013 7:49 AM, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Mar 08, 2013 at 07:54:31PM +1100, Steven Haigh wrote:
>>>> On 20/02/2013 8:49 PM, Steven Haigh wrote:
>>>>> On 20/02/2013 7:49 PM, Steven Haigh wrote:
>>>>>> On 20/02/2013 7:26 PM, Roger Pau Monné wrote:
>>>>>>> On 20/02/13 03:10, Steven Haigh wrote:
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> Firstly, please CC me in to any replies as I'm not a subscriber these
>>>>>>>> days.
>>>>>>>>
>>>>>>>> I've been trying to debug a problem with Xen 4.2.1 where I am unable to
>>>>>>>> achieve more than ~50Mb/sec sustained sequential write to a disk. The
>>>>>>>> DomU is configured as such:
>>>>>>>
>>>>>>> Since you mention 4.2.1 explicitly, is this a performance regression
>>>>>> >from previous versions? (4.2.0 or the 4.1 branch)
>>>>>>
>>>>>> This is actually a very good question. I've reinstalled my older
>>>>>> packages of Xen 4.1.3 back on the system. Rebooting into the new
>>>>>> hypervisor, then starting the single DomU again. Ran bonnie++ again on
>>>>>> the DomU:
>>>>>>
>>>>>> Still around 50Mb/sec - so this doesn't seem to be a regression, but
>>>>>> something else?
>>>>>
>>>>> I've actually done a bit of thinking about this... A recent thread on
>>>>> linux-raid kernel mailing list about Xen and DomU throughput made me
>>>>> revisit my setup. I know I used to be able to saturate GigE both ways
>>>>> (send and receive) to the samba share served by this DomU. This would
>>>>> mean I'd get at least 90-100Mbyte/sec. What exact config and kernel/xen
>>>>> versions this was as this point in time I cannot say.
>>>>>
>>>>> As such, I had a bit of a play and recreated my RAID6 with 64Kb chunk
>>>>> size. This seemed to make rebuild/resync speeds way worse - so I
>>>>> reverted to 128Kb chunk size.
>>>>>
>>>>> The benchmarks I am getting from the Dom0 is about what I'd expect - but
>>>>> I wouldn't expect to lose 130Mb/sec write speed to the phy:/ pass
>>>>> through of the LV.
>>>>>
>>>>>  From my known config where I could saturate the GigE connection, I have
>>>>> changed from kernel 2.6.32 (Jeremy's git repo) to the latest vanilla
>>>>> kernels - currently 3.7.9.
>>>>>
>>>>> My build of Xen 4.2.1 also has all of the recent security advisories
>>>>> patched as well. Although it is interesting to note that downgrading to
>>>>> Xen 4.1.2 made no difference to write speeds.
>>>>>
>>>>
>>>> Just wondering if there is any further news or tests that I might be
>>>> able to do on this?
>>>
>>> So usually the problem like this is to unpeel the layers and find out
>>> which of them is at fault. You have a stacked block system - LVM on
>>> top of RAID6 on top of block devices.
>>>
>>> To figure out who is interferring with the speeds I would recommend
>>> you fault one of the RAID6 disks (so take it out of the RAID6). Pass
>>> it to the guest as a raw disk (/dev/sdX as /dev/xvd) and then
>>> run 'fio'. Run 'fio' as well in dom0 on the /dev/sdX and check
>>> whether the write performance is different.
>>>
>>> This is how I how do it:
>>>
>>> [/dev/xvdXXX]
>>> rw=write
>>> direct=1
>>> size=4g
>>> ioengine=libaio
>>> iodepth=32
>>>
>>> Then progress up the stack. Try sticking the disk back in RAID6
>>> and doing it on the RAID6. Then on the LVM and so on.
>>
>> I did try to peel it back a single layer at a time. My test was
>> simply using the same XFS filesystem in the Dom0 instead of the
>> DomU.
>
> Right, you are using a filesystem. That is another layer :-)
>
> And depending on what version of QEMU you have you might be using
> QEMU as the block PV backend instead of the kernel one. There
> were versions of QEMU that had highly inferior performance.
>
> Hence I was thinking of just using a raw disk to test that.
>
>>
>> I tested the underlying LVM config by mounting
>> /dev/vg_raid6/fileshare from within the Dom0 and running bonnie++ as
>> a benchmark:
>
>
> So still filesystem. Fio can do it on a block level.
>
> What does 'xenstore-ls' show you and 'losetup -a'? I am really
> curious as to where that file you are providing to the guest as
> disk is being handled via 'loop' or via 'QEMU'.
>

I've picked out what I believe is the most relevant from xenstore-ls 
that belongs to the DomU in question:

      1 = ""
       51712 = ""
        domain = "zeus.vm"
        frontend = "/local/domain/1/device/vbd/51712"
        uuid = "3aa72be1-0e83-1ee2-a346-8ccef71e9d34"
        bootable = "1"
        dev = "xvda"
        state = "4"
        params = "/dev/RAID1/zeus.vm"
        mode = "w"
        online = "1"
        frontend-id = "1"
        type = "phy"
        physical-device = "fd:6"
        hotplug-status = "connected"
        feature-flush-cache = "1"
        feature-discard = "0"
        feature-barrier = "1"
        feature-persistent = "1"
        sectors = "135397376"
        info = "0"
        sector-size = "512"
       51728 = ""
        domain = "zeus.vm"
        frontend = "/local/domain/1/device/vbd/51728"
        uuid = "28375672-321c-0e33-4549-d64ee4daadec"
        bootable = "0"
        dev = "xvdb"
        state = "4"
        params = "/dev/vg_raid6/fileshare"
        mode = "w"
        online = "1"
        frontend-id = "1"
        type = "phy"
        physical-device = "fd:5"
        hotplug-status = "connected"
        feature-flush-cache = "1"
        feature-discard = "0"
        feature-barrier = "1"
        feature-persistent = "1"
        sectors = "5368709120"
        info = "0"
        sector-size = "512"

losetup -a returns nothing.


-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-11 13:37               ` Steven Haigh
@ 2013-03-12 13:04                 ` Konrad Rzeszutek Wilk
  2013-03-12 14:08                   ` Steven Haigh
       [not found]                   ` <514EA337.7030303@crc.id.au>
  0 siblings, 2 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-12 13:04 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel, Roger Pau Monné

> >So still filesystem. Fio can do it on a block level.
> >
> >What does 'xenstore-ls' show you and 'losetup -a'? I am really
> >curious as to where that file you are providing to the guest as
> >disk is being handled via 'loop' or via 'QEMU'.
> >
> 
> I've picked out what I believe is the most relevant from xenstore-ls
> that belongs to the DomU in question:

Great.
.. snip..
>        params = "/dev/vg_raid6/fileshare"
>        mode = "w"
>        online = "1"
>        frontend-id = "1"
>        type = "phy"
>        physical-device = "fd:5"
>        hotplug-status = "connected"
>        feature-flush-cache = "1"
>        feature-discard = "0"
>        feature-barrier = "1"
>        feature-persistent = "1"
>        sectors = "5368709120"
>        info = "0"
>        sector-size = "512"

OK, so the flow of data from the guest is:
	bonnie++ -> FS -> xen-blkfront -> xen-blkback -> LVM -> RAID6 -> multiple disks.

Any way you can restructure this to be:

	fio -> xen-blkfront -> xen-blkback -> one disk from the raid.


to see if the issue is between "LVM -> RAID6" or the "bonnie++ -> FS" part?
Is the cpu load quite high when you do these writes?

What are the RAID6 disks you have? How many?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-12 13:04                 ` Konrad Rzeszutek Wilk
@ 2013-03-12 14:08                   ` Steven Haigh
       [not found]                   ` <514EA337.7030303@crc.id.au>
  1 sibling, 0 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-12 14:08 UTC (permalink / raw)
  To: xen-devel

On 13/03/13 00:04, Konrad Rzeszutek Wilk wrote:
>>> So still filesystem. Fio can do it on a block level.
>>>
>>> What does 'xenstore-ls' show you and 'losetup -a'? I am really
>>> curious as to where that file you are providing to the guest as
>>> disk is being handled via 'loop' or via 'QEMU'.
>>>
>>
>> I've picked out what I believe is the most relevant from xenstore-ls
>> that belongs to the DomU in question:
>
> Great.
> .. snip..
>>         params = "/dev/vg_raid6/fileshare"
>>         mode = "w"
>>         online = "1"
>>         frontend-id = "1"
>>         type = "phy"
>>         physical-device = "fd:5"
>>         hotplug-status = "connected"
>>         feature-flush-cache = "1"
>>         feature-discard = "0"
>>         feature-barrier = "1"
>>         feature-persistent = "1"
>>         sectors = "5368709120"
>>         info = "0"
>>         sector-size = "512"
>
> OK, so the flow of data from the guest is:
> 	bonnie++ -> FS -> xen-blkfront -> xen-blkback -> LVM -> RAID6 -> multiple disks.
>
> Any way you can restructure this to be:
>
> 	fio -> xen-blkfront -> xen-blkback -> one disk from the raid.
>
>
> to see if the issue is between "LVM -> RAID6" or the "bonnie++ -> FS" part?
> Is the cpu load quite high when you do these writes?

Maybe I'm missing something, but running this directly from the Dom0 
would give a result of:

         bonnie++ -> FS -> LVM -> RAID6

These figures were well over 200Mb/sec read and well over 100Mb/sec write.

This only takes out the xen-blkfront and xen-blkback - which I thought 
was the aim?

Or is the point of this to make sure that we can replicate it with a 
single disk and that it isn't some weird interaction between 
blkfront/blkback and the LVM/RAID6?

CPU Usage doesn't seem to be a limiting factor. I certainly don't see 
massive loads for writing.

>
> What are the RAID6 disks you have? How many?

The RAID6 is made up of 4 x 2Tb 7200RPM Seagate SATA drives...

Model Family:     Seagate SV35
Device Model:     ST2000VX000-9YW164
Serial Number:    Z1E10QQJ
LU WWN Device Id: 5 000c50 04dd3a1f1
Firmware Version: CV13
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical

Then in /proc/mdstat:
md2 : active raid6 sdd[4] sdc[0] sdf[5] sde[1]
       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 
[4/4] [UUUU]

I decided to use whole disks so that I don't run into alignment issues.

The VG is using 4Mb extents, so that should be fine too:
# vgdisplay vg_raid6
   --- Volume group ---
   VG Name               vg_raid6
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  7
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                5
   Open LV               5
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               3.64 TiB
   PE Size               4.00 MiB
   Total PE              953863
   Alloc PE / Size       688640 / 2.63 TiB
   Free  PE / Size       265223 / 1.01 TiB
   VG UUID               md7G8X-F2mT-JBQa-f5qm-TN4O-kOqs-KWHGR1

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
       [not found]                       ` <514EA741.7050403@crc.id.au>
@ 2013-03-24  9:10                         ` Steven Haigh
  2013-03-24  9:54                           ` Steven Haigh
  2013-03-25  2:21                           ` Steven Haigh
  0 siblings, 2 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-24  9:10 UTC (permalink / raw)
  To: konrad.wilk; +Cc: xen-devel, roger.pau

On 24/03/13 18:12, Steven Haigh wrote:
> In fact, I just thought of something else.... I have an eSATA caddy that
> connects to the same SATA controller. With this, I can slot any SATA
> drive into it - and I should easily be able to pass this to any DomU.
>
> I'll throw in a 1Tb SATA drive do that I don't have to break the
> existing RAID6 array - as the testing on this drive can be destructive
> testing - as otherwise the drive is blank.

Disk info:
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST31000528AS
Serial Number:    9VP3BE9W
LU WWN Device Id: 5 000c50 01a238fd0
Firmware Version: CC49
User Capacity:    1,000,203,804,160 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical

Results...

Dom0 (host machine):
# dd if=/dev/zero of=/dev/sdi bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 33.6909 s, 127 MB/s

Created an ext4 filesystem on /dev/sdi1...
# mkfs.ext4 -j /dev/sdi1

Run bonnie++ on the filesystem:
# mount /dev/sdi1 /mnt/esata
# cd /mnt/esata/
# bonnie++ -u 0:0
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
xenhost.lan.crc. 2G   433  95 119107  22 36723   7   960  95 145026  12 
191.9   4
Latency             33231us   39824us     211ms   31466us   17459us 
5073ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
xenhost.lan.crc.id. -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
+++++ +++
Latency               285us     642us     315us     217us     349us 
127us

We get ~145Mb/sec block read, ~119Mb/sec block write.

Now, lets pass the whole device through to a DomU.
# xm block-attach zeus.vm phy:/dev/sdi xvdc w

 From the DomU now:
Firstly, the same dd as above:
# dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 33.6708 s, 128 MB/s

Create the ext4 filesystem again:
# mkfs.ext4 -j /dev/xvdc1

Run bonnie++ on the DomU:
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
zeus.crc.id.au   2G   387  99 121891  24 47759  14   992  98 141103  17 
248.9   7
Latency             40518us     126ms     152ms   47174us   30061us 
250ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
+++++ +++
Latency               174us     839us    2249us     113us      42us 
185us

Interesting. We're at almost full speed in the DomU. 121Mb/sec write, 
141Mb/sec read.

So my wonder is now... Why when put in a RAID6 do we have a 180Mb/sec+ 
from the Dom0, but only 50Mb/sec from the DomU of the same filesystem...

Any further testing that may indicate something?

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-24  9:10                         ` Steven Haigh
@ 2013-03-24  9:54                           ` Steven Haigh
  2013-03-25  2:21                           ` Steven Haigh
  1 sibling, 0 replies; 29+ messages in thread
From: Steven Haigh @ 2013-03-24  9:54 UTC (permalink / raw)
  To: xen-devel

On 24/03/13 20:10, Steven Haigh wrote:
> So my wonder is now... Why when put in a RAID6 do we have a 180Mb/sec+
> from the Dom0, but only 50Mb/sec from the DomU of the same filesystem...

I should actually clarify that this is 180Mb/sec write speed from the 
Dom0 and 50Mb/sec from the DomU. Really not quite sure why this is.

The filesystem in question is XFS - the tests in my previous post were 
on ext4.

>
> Any further testing that may indicate something?
>


-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-24  9:10                         ` Steven Haigh
  2013-03-24  9:54                           ` Steven Haigh
@ 2013-03-25  2:21                           ` Steven Haigh
  2013-08-20 16:48                             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-03-25  2:21 UTC (permalink / raw)
  To: konrad.wilk; +Cc: roger.pau, xen-devel

So, based on my tests yesterday, I decided to break the RAID6 and pull a 
drive out of it to test directly on the 2Tb drives in question.

The array in question:
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2 
[4/4] [UUUU]

# mdadm /dev/md2 --fail /dev/sdf
mdadm: set /dev/sdf faulty in /dev/md2
# mdadm /dev/md2 --remove /dev/sdf
mdadm: hot removed /dev/sdf from /dev/md2

So, all tests are to be done on /dev/sdf.
Model Family:     Seagate SV35
Device Model:     ST2000VX000-9YW164
Serial Number:    Z1E17C3X
LU WWN Device Id: 5 000c50 04e1bc6f0
Firmware Version: CV13
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical

 From the Dom0:
# dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s

Create a single partition on the drive, and format it with ext4:
Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x98d8baaf

    Device Boot      Start         End      Blocks   Id  System
/dev/sdf1            2048  3907029167  1953513560   83  Linux

Command (m for help): w

# mkfs.ext4 -j /dev/sdf1
......
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Mount it on the Dom0:
# mount /dev/sdf1 /mnt/esata/
# cd /mnt/esata/
# bonnie++ -d . -u 0:0
....
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
xenhost.lan.crc. 2G   425  94 133607  24 60544  12   973  95 209114  17 
296.4   6
Latency             70971us     190ms     221ms   40369us   17657us 
164ms

So from the Dom0: 133Mb/sec write, 209Mb/sec read.

Now, I'll attach the full disk to a DomU:
# xm block-attach zeus.vm phy:/dev/sdf xvdc w

And we'll test from the DomU.

# dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s

Partition the same as in the Dom0 and create an ext4 filesystem on it:

I notice something interesting here. In the Dom0, the device is seen as:
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

In the DomU, it is seen as:
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Not sure if this could be related - but continuing testing:
     Device Boot      Start         End      Blocks   Id  System
/dev/xvdc1            2048  3907029167  1953513560   83  Linux

# mkfs.ext4 -j /dev/xvdc1
....
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

# mount /dev/xvdc1 /mnt/esata/
# cd /mnt/esata/
# bonnie++ -d . -u 0:0
....
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
zeus.crc.id.au   2G   396  99 116530  23 50451  15  1035  99 176407  23 
313.4   9
Latency             34615us     130ms     128ms   33316us   74401us 
130ms

So still... 116Mb/sec write, 176Mb/sec read to the physical device from 
the DomU. More than acceptable.

It leaves me to wonder.... Could there be something in the Dom0 seeing 
the drives as 4096 byte sectors, but the DomU seeing it as 512 byte 
sectors cause an issue?

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-03-25  2:21                           ` Steven Haigh
@ 2013-08-20 16:48                             ` Konrad Rzeszutek Wilk
  2013-08-20 18:25                               ` Steven Haigh
  2013-09-05  8:28                               ` Steven Haigh
  0 siblings, 2 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-20 16:48 UTC (permalink / raw)
  To: Steven Haigh; +Cc: roger.pau, xen-devel

On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
> So, based on my tests yesterday, I decided to break the RAID6 and
> pull a drive out of it to test directly on the 2Tb drives in
> question.
> 
> The array in question:
> # cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
> [4/4] [UUUU]
> 
> # mdadm /dev/md2 --fail /dev/sdf
> mdadm: set /dev/sdf faulty in /dev/md2
> # mdadm /dev/md2 --remove /dev/sdf
> mdadm: hot removed /dev/sdf from /dev/md2
> 
> So, all tests are to be done on /dev/sdf.
> Model Family:     Seagate SV35
> Device Model:     ST2000VX000-9YW164
> Serial Number:    Z1E17C3X
> LU WWN Device Id: 5 000c50 04e1bc6f0
> Firmware Version: CV13
> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> 
> From the Dom0:
> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
> 
> Create a single partition on the drive, and format it with ext4:
> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x98d8baaf
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdf1            2048  3907029167  1953513560   83  Linux
> 
> Command (m for help): w
> 
> # mkfs.ext4 -j /dev/sdf1
> ......
> Writing inode tables: done
> Creating journal (32768 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> Mount it on the Dom0:
> # mount /dev/sdf1 /mnt/esata/
> # cd /mnt/esata/
> # bonnie++ -d . -u 0:0
> ....
> Version  1.96       ------Sequential Output------ --Sequential
> Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
> --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> %CP /sec %CP
> xenhost.lan.crc. 2G   425  94 133607  24 60544  12   973  95 209114
> 17 296.4   6
> Latency             70971us     190ms     221ms   40369us   17657us
> 164ms
> 
> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
> 
> Now, I'll attach the full disk to a DomU:
> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
> 
> And we'll test from the DomU.
> 
> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
> 
> Partition the same as in the Dom0 and create an ext4 filesystem on it:
> 
> I notice something interesting here. In the Dom0, the device is seen as:
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> 
> In the DomU, it is seen as:
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> 
> Not sure if this could be related - but continuing testing:
>     Device Boot      Start         End      Blocks   Id  System
> /dev/xvdc1            2048  3907029167  1953513560   83  Linux
> 
> # mkfs.ext4 -j /dev/xvdc1
> ....
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (32768 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> # mount /dev/xvdc1 /mnt/esata/
> # cd /mnt/esata/
> # bonnie++ -d . -u 0:0
> ....
> Version  1.96       ------Sequential Output------ --Sequential
> Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
> --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> %CP /sec %CP
> zeus.crc.id.au   2G   396  99 116530  23 50451  15  1035  99 176407
> 23 313.4   9
> Latency             34615us     130ms     128ms   33316us   74401us
> 130ms
> 
> So still... 116Mb/sec write, 176Mb/sec read to the physical device
> from the DomU. More than acceptable.
> 
> It leaves me to wonder.... Could there be something in the Dom0
> seeing the drives as 4096 byte sectors, but the DomU seeing it as
> 512 byte sectors cause an issue?

There is certain overhead in it. I still have this in my mailbox
so I am not sure whether this issue got ever resolved? I know that the 
indirect patches in Xen blkback and xen blkfront are meant to resolve
some of these issues - by being able to carry a bigger payload.

Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
> 
> -- 
> Steven Haigh
> 
> Email: netwiz@crc.id.au
> Web: https://www.crc.id.au
> Phone: (03) 9001 6090 - 0412 935 897
> Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-08-20 16:48                             ` Konrad Rzeszutek Wilk
@ 2013-08-20 18:25                               ` Steven Haigh
  2013-09-05  8:28                               ` Steven Haigh
  1 sibling, 0 replies; 29+ messages in thread
From: Steven Haigh @ 2013-08-20 18:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: roger.pau, xen-devel

On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>> So, based on my tests yesterday, I decided to break the RAID6 and
>> pull a drive out of it to test directly on the 2Tb drives in
>> question.
>>
>> The array in question:
>> # cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>>        3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> [4/4] [UUUU]
>>
>> # mdadm /dev/md2 --fail /dev/sdf
>> mdadm: set /dev/sdf faulty in /dev/md2
>> # mdadm /dev/md2 --remove /dev/sdf
>> mdadm: hot removed /dev/sdf from /dev/md2
>>
>> So, all tests are to be done on /dev/sdf.
>> Model Family:     Seagate SV35
>> Device Model:     ST2000VX000-9YW164
>> Serial Number:    Z1E17C3X
>> LU WWN Device Id: 5 000c50 04e1bc6f0
>> Firmware Version: CV13
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>
>>  From the Dom0:
>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>
>> Create a single partition on the drive, and format it with ext4:
>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x98d8baaf
>>
>>     Device Boot      Start         End      Blocks   Id  System
>> /dev/sdf1            2048  3907029167  1953513560   83  Linux
>>
>> Command (m for help): w
>>
>> # mkfs.ext4 -j /dev/sdf1
>> ......
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> Mount it on the Dom0:
>> # mount /dev/sdf1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version  1.96       ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> xenhost.lan.crc. 2G   425  94 133607  24 60544  12   973  95 209114
>> 17 296.4   6
>> Latency             70971us     190ms     221ms   40369us   17657us
>> 164ms
>>
>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>
>> Now, I'll attach the full disk to a DomU:
>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>
>> And we'll test from the DomU.
>>
>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>
>> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>>
>> I notice something interesting here. In the Dom0, the device is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>
>> In the DomU, it is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>
>> Not sure if this could be related - but continuing testing:
>>      Device Boot      Start         End      Blocks   Id  System
>> /dev/xvdc1            2048  3907029167  1953513560   83  Linux
>>
>> # mkfs.ext4 -j /dev/xvdc1
>> ....
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> # mount /dev/xvdc1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version  1.96       ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> zeus.crc.id.au   2G   396  99 116530  23 50451  15  1035  99 176407
>> 23 313.4   9
>> Latency             34615us     130ms     128ms   33316us   74401us
>> 130ms
>>
>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>> from the DomU. More than acceptable.
>>
>> It leaves me to wonder.... Could there be something in the Dom0
>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>> 512 byte sectors cause an issue?
>
> There is certain overhead in it. I still have this in my mailbox
> so I am not sure whether this issue got ever resolved? I know that the
> indirect patches in Xen blkback and xen blkfront are meant to resolve
> some of these issues - by being able to carry a bigger payload.
>
> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.

Hi Konrad,

I don't believe I ever fixed it - however I haven't tried kernel 3.11 in 
Dom0 OR DomU...

I'll keep this in my inbox and try to build a 3.11 kernel for both in 
the near future for testing...

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-08-20 16:48                             ` Konrad Rzeszutek Wilk
  2013-08-20 18:25                               ` Steven Haigh
@ 2013-09-05  8:28                               ` Steven Haigh
  2013-09-06 13:33                                 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-09-05  8:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: roger.pau, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6577 bytes --]

On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>> So, based on my tests yesterday, I decided to break the RAID6 and
>> pull a drive out of it to test directly on the 2Tb drives in
>> question.
>>
>> The array in question:
>> # cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>>       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>> [4/4] [UUUU]
>>
>> # mdadm /dev/md2 --fail /dev/sdf
>> mdadm: set /dev/sdf faulty in /dev/md2
>> # mdadm /dev/md2 --remove /dev/sdf
>> mdadm: hot removed /dev/sdf from /dev/md2
>>
>> So, all tests are to be done on /dev/sdf.
>> Model Family:     Seagate SV35
>> Device Model:     ST2000VX000-9YW164
>> Serial Number:    Z1E17C3X
>> LU WWN Device Id: 5 000c50 04e1bc6f0
>> Firmware Version: CV13
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>
>> From the Dom0:
>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>
>> Create a single partition on the drive, and format it with ext4:
>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x98d8baaf
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/sdf1            2048  3907029167  1953513560   83  Linux
>>
>> Command (m for help): w
>>
>> # mkfs.ext4 -j /dev/sdf1
>> ......
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> Mount it on the Dom0:
>> # mount /dev/sdf1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version  1.96       ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> xenhost.lan.crc. 2G   425  94 133607  24 60544  12   973  95 209114
>> 17 296.4   6
>> Latency             70971us     190ms     221ms   40369us   17657us
>> 164ms
>>
>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>
>> Now, I'll attach the full disk to a DomU:
>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>
>> And we'll test from the DomU.
>>
>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>> 4096+0 records in
>> 4096+0 records out
>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>
>> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>>
>> I notice something interesting here. In the Dom0, the device is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>
>> In the DomU, it is seen as:
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>
>> Not sure if this could be related - but continuing testing:
>>     Device Boot      Start         End      Blocks   Id  System
>> /dev/xvdc1            2048  3907029167  1953513560   83  Linux
>>
>> # mkfs.ext4 -j /dev/xvdc1
>> ....
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> # mount /dev/xvdc1 /mnt/esata/
>> # cd /mnt/esata/
>> # bonnie++ -d . -u 0:0
>> ....
>> Version  1.96       ------Sequential Output------ --Sequential
>> Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>> %CP /sec %CP
>> zeus.crc.id.au   2G   396  99 116530  23 50451  15  1035  99 176407
>> 23 313.4   9
>> Latency             34615us     130ms     128ms   33316us   74401us
>> 130ms
>>
>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>> from the DomU. More than acceptable.
>>
>> It leaves me to wonder.... Could there be something in the Dom0
>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>> 512 byte sectors cause an issue?
> 
> There is certain overhead in it. I still have this in my mailbox
> so I am not sure whether this issue got ever resolved? I know that the 
> indirect patches in Xen blkback and xen blkfront are meant to resolve
> some of these issues - by being able to carry a bigger payload.
> 
> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.

Ok, so I finally got around to building kernel 3.11 RPMs today for
testing. I upgraded both the Dom0 and DomU to the same kernel:

DomU:
# dmesg | grep blkfront
blkfront: xvda: flush diskcache: enabled; persistent grants: enabled;
indirect descriptors: enabled;
blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled;
indirect descriptors: enabled;

Looks good.

Transfer tests using bonnie++ as per before:
# bonnie -d . -u 0:0
Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au   2G   603  92 58250   9 62248  14   886  99 295757  30
492.3  13
Latency             27305us     124ms     158ms   34222us   16865us
374ms
Version  1.96       ------Sequential Create------ --------Random
Create--------
zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16 10048  22 +++++ +++ 17849  29 11109  25 +++++ +++
18389  31
Latency             17775us     154us     180us   16008us      38us
 58us

Still seems to be a massive discrepancy between Dom0 and DomU write
speeds. Interesting is that sequential block reads are nearly 300MB/sec,
yet sequential writes were only ~58MB/sec.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-09-05  8:28                               ` Steven Haigh
@ 2013-09-06 13:33                                 ` Konrad Rzeszutek Wilk
  2013-09-06 23:06                                   ` Steven Haigh
  0 siblings, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-06 13:33 UTC (permalink / raw)
  To: Steven Haigh; +Cc: roger.pau, xen-devel

On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:
> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
> > On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
> >> So, based on my tests yesterday, I decided to break the RAID6 and
> >> pull a drive out of it to test directly on the 2Tb drives in
> >> question.
> >>
> >> The array in question:
> >> # cat /proc/mdstat
> >> Personalities : [raid1] [raid6] [raid5] [raid4]
> >> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
> >>       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
> >> [4/4] [UUUU]
> >>
> >> # mdadm /dev/md2 --fail /dev/sdf
> >> mdadm: set /dev/sdf faulty in /dev/md2
> >> # mdadm /dev/md2 --remove /dev/sdf
> >> mdadm: hot removed /dev/sdf from /dev/md2
> >>
> >> So, all tests are to be done on /dev/sdf.
> >> Model Family:     Seagate SV35
> >> Device Model:     ST2000VX000-9YW164
> >> Serial Number:    Z1E17C3X
> >> LU WWN Device Id: 5 000c50 04e1bc6f0
> >> Firmware Version: CV13
> >> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
> >> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>
> >> From the Dom0:
> >> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
> >> 4096+0 records in
> >> 4096+0 records out
> >> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
> >>
> >> Create a single partition on the drive, and format it with ext4:
> >> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
> >> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> >> Units = sectors of 1 * 512 = 512 bytes
> >> Sector size (logical/physical): 512 bytes / 4096 bytes
> >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> >> Disk identifier: 0x98d8baaf
> >>
> >>    Device Boot      Start         End      Blocks   Id  System
> >> /dev/sdf1            2048  3907029167  1953513560   83  Linux
> >>
> >> Command (m for help): w
> >>
> >> # mkfs.ext4 -j /dev/sdf1
> >> ......
> >> Writing inode tables: done
> >> Creating journal (32768 blocks): done
> >> Writing superblocks and filesystem accounting information: done
> >>
> >> Mount it on the Dom0:
> >> # mount /dev/sdf1 /mnt/esata/
> >> # cd /mnt/esata/
> >> # bonnie++ -d . -u 0:0
> >> ....
> >> Version  1.96       ------Sequential Output------ --Sequential
> >> Input- --Random-
> >> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
> >> --Block-- --Seeks--
> >> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> >> %CP /sec %CP
> >> xenhost.lan.crc. 2G   425  94 133607  24 60544  12   973  95 209114
> >> 17 296.4   6
> >> Latency             70971us     190ms     221ms   40369us   17657us
> >> 164ms
> >>
> >> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
> >>
> >> Now, I'll attach the full disk to a DomU:
> >> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
> >>
> >> And we'll test from the DomU.
> >>
> >> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
> >> 4096+0 records in
> >> 4096+0 records out
> >> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
> >>
> >> Partition the same as in the Dom0 and create an ext4 filesystem on it:
> >>
> >> I notice something interesting here. In the Dom0, the device is seen as:
> >> Units = sectors of 1 * 512 = 512 bytes
> >> Sector size (logical/physical): 512 bytes / 4096 bytes
> >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> >>
> >> In the DomU, it is seen as:
> >> Units = sectors of 1 * 512 = 512 bytes
> >> Sector size (logical/physical): 512 bytes / 512 bytes
> >> I/O size (minimum/optimal): 512 bytes / 512 bytes
> >>
> >> Not sure if this could be related - but continuing testing:
> >>     Device Boot      Start         End      Blocks   Id  System
> >> /dev/xvdc1            2048  3907029167  1953513560   83  Linux
> >>
> >> # mkfs.ext4 -j /dev/xvdc1
> >> ....
> >> Allocating group tables: done
> >> Writing inode tables: done
> >> Creating journal (32768 blocks): done
> >> Writing superblocks and filesystem accounting information: done
> >>
> >> # mount /dev/xvdc1 /mnt/esata/
> >> # cd /mnt/esata/
> >> # bonnie++ -d . -u 0:0
> >> ....
> >> Version  1.96       ------Sequential Output------ --Sequential
> >> Input- --Random-
> >> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
> >> --Block-- --Seeks--
> >> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> >> %CP /sec %CP
> >> zeus.crc.id.au   2G   396  99 116530  23 50451  15  1035  99 176407
> >> 23 313.4   9
> >> Latency             34615us     130ms     128ms   33316us   74401us
> >> 130ms
> >>
> >> So still... 116Mb/sec write, 176Mb/sec read to the physical device
> >> from the DomU. More than acceptable.
> >>
> >> It leaves me to wonder.... Could there be something in the Dom0
> >> seeing the drives as 4096 byte sectors, but the DomU seeing it as
> >> 512 byte sectors cause an issue?
> > 
> > There is certain overhead in it. I still have this in my mailbox
> > so I am not sure whether this issue got ever resolved? I know that the 
> > indirect patches in Xen blkback and xen blkfront are meant to resolve
> > some of these issues - by being able to carry a bigger payload.
> > 
> > Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
> 
> Ok, so I finally got around to building kernel 3.11 RPMs today for
> testing. I upgraded both the Dom0 and DomU to the same kernel:

Woohoo!
> 
> DomU:
> # dmesg | grep blkfront
> blkfront: xvda: flush diskcache: enabled; persistent grants: enabled;
> indirect descriptors: enabled;
> blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled;
> indirect descriptors: enabled;
> 
> Looks good.
> 
> Transfer tests using bonnie++ as per before:
> # bonnie -d . -u 0:0
> Version  1.96       ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zeus.crc.id.au   2G   603  92 58250   9 62248  14   886  99 295757  30
> 492.3  13
> Latency             27305us     124ms     158ms   34222us   16865us
> 374ms
> Version  1.96       ------Sequential Create------ --------Random
> Create--------
> zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                  16 10048  22 +++++ +++ 17849  29 11109  25 +++++ +++
> 18389  31
> Latency             17775us     154us     180us   16008us      38us
>  58us
> 
> Still seems to be a massive discrepancy between Dom0 and DomU write
> speeds. Interesting is that sequential block reads are nearly 300MB/sec,
> yet sequential writes were only ~58MB/sec.

OK, so the other thing that people were pointing out that is you
can use xen-blkfront.max parameter. By default it is 32, but try 8.
Or 64. Or 256.

The indirect descriptor allows us to put more I/Os on the ring - and
I am hoping that will:
 a) solve your problem
 b) not solve your problem, but demonstrate that the issue is not with
    the ring, but with something else making your writes slower.

Hmm, are you by any chance using O_DIRECT when running bonnie++ in
dom0? The xen-blkback tacks on O_DIRECT to all write requests. This is
done to not use the dom0 page cache - otherwise you end up with
a double buffer where the writes are insane speed - but with absolutly
no safety.

If you want to try disabling that (so no O_DIRECT), I would do this
little change:

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index bf4b9d2..823b629 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
                break;
        case BLKIF_OP_WRITE:
                blkif->st_wr_req++;
-               operation = WRITE_ODIRECT;
+               operation = WRITE;
                break;
        case BLKIF_OP_WRITE_BARRIER:
                drain = true;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-09-06 13:33                                 ` Konrad Rzeszutek Wilk
@ 2013-09-06 23:06                                   ` Steven Haigh
  2013-09-06 23:37                                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Haigh @ 2013-09-06 23:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: roger.pau, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 11785 bytes --]

On 06/09/13 23:33, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:
>> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>>>> So, based on my tests yesterday, I decided to break the RAID6 and
>>>> pull a drive out of it to test directly on the 2Tb drives in
>>>> question.
>>>>
>>>> The array in question:
>>>> # cat /proc/mdstat
>>>> Personalities : [raid1] [raid6] [raid5] [raid4]
>>>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>>>>       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>>> [4/4] [UUUU]
>>>>
>>>> # mdadm /dev/md2 --fail /dev/sdf
>>>> mdadm: set /dev/sdf faulty in /dev/md2
>>>> # mdadm /dev/md2 --remove /dev/sdf
>>>> mdadm: hot removed /dev/sdf from /dev/md2
>>>>
>>>> So, all tests are to be done on /dev/sdf.
>>>> Model Family:     Seagate SV35
>>>> Device Model:     ST2000VX000-9YW164
>>>> Serial Number:    Z1E17C3X
>>>> LU WWN Device Id: 5 000c50 04e1bc6f0
>>>> Firmware Version: CV13
>>>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>
>>>> From the Dom0:
>>>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>>>
>>>> Create a single partition on the drive, and format it with ext4:
>>>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>>>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>> Disk identifier: 0x98d8baaf
>>>>
>>>>    Device Boot      Start         End      Blocks   Id  System
>>>> /dev/sdf1            2048  3907029167  1953513560   83  Linux
>>>>
>>>> Command (m for help): w
>>>>
>>>> # mkfs.ext4 -j /dev/sdf1
>>>> ......
>>>> Writing inode tables: done
>>>> Creating journal (32768 blocks): done
>>>> Writing superblocks and filesystem accounting information: done
>>>>
>>>> Mount it on the Dom0:
>>>> # mount /dev/sdf1 /mnt/esata/
>>>> # cd /mnt/esata/
>>>> # bonnie++ -d . -u 0:0
>>>> ....
>>>> Version  1.96       ------Sequential Output------ --Sequential
>>>> Input- --Random-
>>>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>> --Block-- --Seeks--
>>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>> %CP /sec %CP
>>>> xenhost.lan.crc. 2G   425  94 133607  24 60544  12   973  95 209114
>>>> 17 296.4   6
>>>> Latency             70971us     190ms     221ms   40369us   17657us
>>>> 164ms
>>>>
>>>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>>>
>>>> Now, I'll attach the full disk to a DomU:
>>>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>>>
>>>> And we'll test from the DomU.
>>>>
>>>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>>>
>>>> Partition the same as in the Dom0 and create an ext4 filesystem on it:
>>>>
>>>> I notice something interesting here. In the Dom0, the device is seen as:
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>>
>>>> In the DomU, it is seen as:
>>>> Units = sectors of 1 * 512 = 512 bytes
>>>> Sector size (logical/physical): 512 bytes / 512 bytes
>>>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>>>
>>>> Not sure if this could be related - but continuing testing:
>>>>     Device Boot      Start         End      Blocks   Id  System
>>>> /dev/xvdc1            2048  3907029167  1953513560   83  Linux
>>>>
>>>> # mkfs.ext4 -j /dev/xvdc1
>>>> ....
>>>> Allocating group tables: done
>>>> Writing inode tables: done
>>>> Creating journal (32768 blocks): done
>>>> Writing superblocks and filesystem accounting information: done
>>>>
>>>> # mount /dev/xvdc1 /mnt/esata/
>>>> # cd /mnt/esata/
>>>> # bonnie++ -d . -u 0:0
>>>> ....
>>>> Version  1.96       ------Sequential Output------ --Sequential
>>>> Input- --Random-
>>>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>> --Block-- --Seeks--
>>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>> %CP /sec %CP
>>>> zeus.crc.id.au   2G   396  99 116530  23 50451  15  1035  99 176407
>>>> 23 313.4   9
>>>> Latency             34615us     130ms     128ms   33316us   74401us
>>>> 130ms
>>>>
>>>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>>>> from the DomU. More than acceptable.
>>>>
>>>> It leaves me to wonder.... Could there be something in the Dom0
>>>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>>>> 512 byte sectors cause an issue?
>>>
>>> There is certain overhead in it. I still have this in my mailbox
>>> so I am not sure whether this issue got ever resolved? I know that the 
>>> indirect patches in Xen blkback and xen blkfront are meant to resolve
>>> some of these issues - by being able to carry a bigger payload.
>>>
>>> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
>>
>> Ok, so I finally got around to building kernel 3.11 RPMs today for
>> testing. I upgraded both the Dom0 and DomU to the same kernel:
> 
> Woohoo!
>>
>> DomU:
>> # dmesg | grep blkfront
>> blkfront: xvda: flush diskcache: enabled; persistent grants: enabled;
>> indirect descriptors: enabled;
>> blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled;
>> indirect descriptors: enabled;
>>
>> Looks good.
>>
>> Transfer tests using bonnie++ as per before:
>> # bonnie -d . -u 0:0
>> Version  1.96       ------Sequential Output------ --Sequential Input-
>> --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>> --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>> /sec %CP
>> zeus.crc.id.au   2G   603  92 58250   9 62248  14   886  99 295757  30
>> 492.3  13
>> Latency             27305us     124ms     158ms   34222us   16865us
>> 374ms
>> Version  1.96       ------Sequential Create------ --------Random
>> Create--------
>> zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create-- --Read---
>> -Delete--
>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>> /sec %CP
>>                  16 10048  22 +++++ +++ 17849  29 11109  25 +++++ +++
>> 18389  31
>> Latency             17775us     154us     180us   16008us      38us
>>  58us
>>
>> Still seems to be a massive discrepancy between Dom0 and DomU write
>> speeds. Interesting is that sequential block reads are nearly 300MB/sec,
>> yet sequential writes were only ~58MB/sec.
> 
> OK, so the other thing that people were pointing out that is you
> can use xen-blkfront.max parameter. By default it is 32, but try 8.
> Or 64. Or 256.

Ahh - interesting.

I used the following:
Kernel command line: ro root=/dev/xvda rd_NO_LUKS rd_NO_DM
LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
crashkernel=auto console=hvc0 xen-blkfront.max=X

8:
Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au   2G   696  92 50906   7 46102  11  1013  97 256784  27
496.5  10
Latency             24374us     199ms     117ms   30855us   38008us
85175us

16:
Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au   2G   675  92 58078   8 57585  13  1005  97 262735  25
505.6  10
Latency             24412us     187ms     183ms   23661us   53850us
232ms

32:
Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au   2G   698  92 57416   8 63328  13  1063  97 267154  24
498.2  12
Latency             24264us     199ms   81362us   33144us   22526us
237ms

64:
Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au   2G   574  86 88447  13 68988  17   897  97 265128  27
493.7  13

128:
Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au   2G   702  97 107638  14 70158  15  1045  97 255596  24
491.0  12
Latency             27279us   17553us     134ms   29771us   38392us
65761us

256:
Version  1.96       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
zeus.crc.id.au   2G   689  91 102554  14 67337  15  1012  97 262475  24
484.4  12
Latency             20642us     104ms     189ms   36624us   45286us
80023us

So, as a nice summary:
8: 50Mb/sec
16: 58Mb/sec
32: 57Mb/sec
64: 88Mb/sec
128: 107Mb/sec
256: 102Mb/sec

So, maybe it's coincidence, maybe it isn't - but the best (factoring
margin of error) seems to be 128 - which happens to be the block size of
the underlying RAID6 array on the Dom0.

# cat /proc/mdstat
md2 : active raid6 sdd[5] sdc[4] sdf[1] sde[0]
      3906766592 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4]
[UUUU]

> The indirect descriptor allows us to put more I/Os on the ring - and
> I am hoping that will:
>  a) solve your problem

Well, it looks like this solves the issue - at least increasing the max
causes almost double the write speed - and no change to read speeds
(within margin of error).

>  b) not solve your problem, but demonstrate that the issue is not with
>     the ring, but with something else making your writes slower.
> 
> Hmm, are you by any chance using O_DIRECT when running bonnie++ in
> dom0? The xen-blkback tacks on O_DIRECT to all write requests. This is
> done to not use the dom0 page cache - otherwise you end up with
> a double buffer where the writes are insane speed - but with absolutly
> no safety.
> 
> If you want to try disabling that (so no O_DIRECT), I would do this
> little change:
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index bf4b9d2..823b629 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>                 break;
>         case BLKIF_OP_WRITE:
>                 blkif->st_wr_req++;
> -               operation = WRITE_ODIRECT;
> +               operation = WRITE;
>                 break;
>         case BLKIF_OP_WRITE_BARRIER:
>                 drain = true;

With the above results, is this still useful?

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 4.2.1: Poor write performance for DomU.
  2013-09-06 23:06                                   ` Steven Haigh
@ 2013-09-06 23:37                                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-06 23:37 UTC (permalink / raw)
  To: Steven Haigh; +Cc: roger.pau, xen-devel

Steven Haigh <netwiz@crc.id.au> wrote:
>On 06/09/13 23:33, Konrad Rzeszutek Wilk wrote:
>> On Thu, Sep 05, 2013 at 06:28:25PM +1000, Steven Haigh wrote:
>>> On 21/08/13 02:48, Konrad Rzeszutek Wilk wrote:
>>>> On Mon, Mar 25, 2013 at 01:21:09PM +1100, Steven Haigh wrote:
>>>>> So, based on my tests yesterday, I decided to break the RAID6 and
>>>>> pull a drive out of it to test directly on the 2Tb drives in
>>>>> question.
>>>>>
>>>>> The array in question:
>>>>> # cat /proc/mdstat
>>>>> Personalities : [raid1] [raid6] [raid5] [raid4]
>>>>> md2 : active raid6 sdd[4] sdc[0] sde[1] sdf[5]
>>>>>       3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
>>>>> [4/4] [UUUU]
>>>>>
>>>>> # mdadm /dev/md2 --fail /dev/sdf
>>>>> mdadm: set /dev/sdf faulty in /dev/md2
>>>>> # mdadm /dev/md2 --remove /dev/sdf
>>>>> mdadm: hot removed /dev/sdf from /dev/md2
>>>>>
>>>>> So, all tests are to be done on /dev/sdf.
>>>>> Model Family:     Seagate SV35
>>>>> Device Model:     ST2000VX000-9YW164
>>>>> Serial Number:    Z1E17C3X
>>>>> LU WWN Device Id: 5 000c50 04e1bc6f0
>>>>> Firmware Version: CV13
>>>>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>
>>>>> From the Dom0:
>>>>> # dd if=/dev/zero of=/dev/sdf bs=1M count=4096 oflag=direct
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 30.7691 s, 140 MB/s
>>>>>
>>>>> Create a single partition on the drive, and format it with ext4:
>>>>> Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
>>>>> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168
>sectors
>>>>> Units = sectors of 1 * 512 = 512 bytes
>>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>>> Disk identifier: 0x98d8baaf
>>>>>
>>>>>    Device Boot      Start         End      Blocks   Id  System
>>>>> /dev/sdf1            2048  3907029167  1953513560   83  Linux
>>>>>
>>>>> Command (m for help): w
>>>>>
>>>>> # mkfs.ext4 -j /dev/sdf1
>>>>> ......
>>>>> Writing inode tables: done
>>>>> Creating journal (32768 blocks): done
>>>>> Writing superblocks and filesystem accounting information: done
>>>>>
>>>>> Mount it on the Dom0:
>>>>> # mount /dev/sdf1 /mnt/esata/
>>>>> # cd /mnt/esata/
>>>>> # bonnie++ -d . -u 0:0
>>>>> ....
>>>>> Version  1.96       ------Sequential Output------ --Sequential
>>>>> Input- --Random-
>>>>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>>> --Block-- --Seeks--
>>>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>>> %CP /sec %CP
>>>>> xenhost.lan.crc. 2G   425  94 133607  24 60544  12   973  95
>209114
>>>>> 17 296.4   6
>>>>> Latency             70971us     190ms     221ms   40369us  
>17657us
>>>>> 164ms
>>>>>
>>>>> So from the Dom0: 133Mb/sec write, 209Mb/sec read.
>>>>>
>>>>> Now, I'll attach the full disk to a DomU:
>>>>> # xm block-attach zeus.vm phy:/dev/sdf xvdc w
>>>>>
>>>>> And we'll test from the DomU.
>>>>>
>>>>> # dd if=/dev/zero of=/dev/xvdc bs=1M count=4096 oflag=direct
>>>>> 4096+0 records in
>>>>> 4096+0 records out
>>>>> 4294967296 bytes (4.3 GB) copied, 32.318 s, 133 MB/s
>>>>>
>>>>> Partition the same as in the Dom0 and create an ext4 filesystem on
>it:
>>>>>
>>>>> I notice something interesting here. In the Dom0, the device is
>seen as:
>>>>> Units = sectors of 1 * 512 = 512 bytes
>>>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>>>
>>>>> In the DomU, it is seen as:
>>>>> Units = sectors of 1 * 512 = 512 bytes
>>>>> Sector size (logical/physical): 512 bytes / 512 bytes
>>>>> I/O size (minimum/optimal): 512 bytes / 512 bytes
>>>>>
>>>>> Not sure if this could be related - but continuing testing:
>>>>>     Device Boot      Start         End      Blocks   Id  System
>>>>> /dev/xvdc1            2048  3907029167  1953513560   83  Linux
>>>>>
>>>>> # mkfs.ext4 -j /dev/xvdc1
>>>>> ....
>>>>> Allocating group tables: done
>>>>> Writing inode tables: done
>>>>> Creating journal (32768 blocks): done
>>>>> Writing superblocks and filesystem accounting information: done
>>>>>
>>>>> # mount /dev/xvdc1 /mnt/esata/
>>>>> # cd /mnt/esata/
>>>>> # bonnie++ -d . -u 0:0
>>>>> ....
>>>>> Version  1.96       ------Sequential Output------ --Sequential
>>>>> Input- --Random-
>>>>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>>>>> --Block-- --Seeks--
>>>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>>>>> %CP /sec %CP
>>>>> zeus.crc.id.au   2G   396  99 116530  23 50451  15  1035  99
>176407
>>>>> 23 313.4   9
>>>>> Latency             34615us     130ms     128ms   33316us  
>74401us
>>>>> 130ms
>>>>>
>>>>> So still... 116Mb/sec write, 176Mb/sec read to the physical device
>>>>> from the DomU. More than acceptable.
>>>>>
>>>>> It leaves me to wonder.... Could there be something in the Dom0
>>>>> seeing the drives as 4096 byte sectors, but the DomU seeing it as
>>>>> 512 byte sectors cause an issue?
>>>>
>>>> There is certain overhead in it. I still have this in my mailbox
>>>> so I am not sure whether this issue got ever resolved? I know that
>the 
>>>> indirect patches in Xen blkback and xen blkfront are meant to
>resolve
>>>> some of these issues - by being able to carry a bigger payload.
>>>>
>>>> Did you ever try v3.11 kernel in both dom0 and domU? Thanks.
>>>
>>> Ok, so I finally got around to building kernel 3.11 RPMs today for
>>> testing. I upgraded both the Dom0 and DomU to the same kernel:
>> 
>> Woohoo!
>>>
>>> DomU:
>>> # dmesg | grep blkfront
>>> blkfront: xvda: flush diskcache: enabled; persistent grants:
>enabled;
>>> indirect descriptors: enabled;
>>> blkfront: xvdb: flush diskcache: enabled; persistent grants:
>enabled;
>>> indirect descriptors: enabled;
>>>
>>> Looks good.
>>>
>>> Transfer tests using bonnie++ as per before:
>>> # bonnie -d . -u 0:0
>>> Version  1.96       ------Sequential Output------ --Sequential
>Input-
>>> --Random-
>>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
>--Block--
>>> --Seeks--
>>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
>%CP
>>> /sec %CP
>>> zeus.crc.id.au   2G   603  92 58250   9 62248  14   886  99 295757 
>30
>>> 492.3  13
>>> Latency             27305us     124ms     158ms   34222us   16865us
>>> 374ms
>>> Version  1.96       ------Sequential Create------ --------Random
>>> Create--------
>>> zeus.crc.id.au      -Create-- --Read--- -Delete-- -Create--
>--Read---
>>> -Delete--
>>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
>%CP
>>> /sec %CP
>>>                  16 10048  22 +++++ +++ 17849  29 11109  25 +++++
>+++
>>> 18389  31
>>> Latency             17775us     154us     180us   16008us      38us
>>>  58us
>>>
>>> Still seems to be a massive discrepancy between Dom0 and DomU write
>>> speeds. Interesting is that sequential block reads are nearly
>300MB/sec,
>>> yet sequential writes were only ~58MB/sec.
>> 
>> OK, so the other thing that people were pointing out that is you
>> can use xen-blkfront.max parameter. By default it is 32, but try 8.
>> Or 64. Or 256.
>
>Ahh - interesting.
>
>I used the following:
>Kernel command line: ro root=/dev/xvda rd_NO_LUKS rd_NO_DM
>LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
>crashkernel=auto console=hvc0 xen-blkfront.max=X
>
>8:
>Version  1.96       ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au   2G   696  92 50906   7 46102  11  1013  97 256784  27
>496.5  10
>Latency             24374us     199ms     117ms   30855us   38008us
>85175us
>
>16:
>Version  1.96       ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au   2G   675  92 58078   8 57585  13  1005  97 262735  25
>505.6  10
>Latency             24412us     187ms     183ms   23661us   53850us
>232ms
>
>32:
>Version  1.96       ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au   2G   698  92 57416   8 63328  13  1063  97 267154  24
>498.2  12
>Latency             24264us     199ms   81362us   33144us   22526us
>237ms
>
>64:
>Version  1.96       ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au   2G   574  86 88447  13 68988  17   897  97 265128  27
>493.7  13
>
>128:
>Version  1.96       ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au   2G   702  97 107638  14 70158  15  1045  97 255596  24
>491.0  12
>Latency             27279us   17553us     134ms   29771us   38392us
>65761us
>
>256:
>Version  1.96       ------Sequential Output------ --Sequential Input-
>--Random-
>Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
>--Seeks--
>Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
>/sec %CP
>zeus.crc.id.au   2G   689  91 102554  14 67337  15  1012  97 262475  24
>484.4  12
>Latency             20642us     104ms     189ms   36624us   45286us
>80023us
>
>So, as a nice summary:
>8: 50Mb/sec
>16: 58Mb/sec
>32: 57Mb/sec
>64: 88Mb/sec
>128: 107Mb/sec
>256: 102Mb/sec
>
>So, maybe it's coincidence, maybe it isn't - but the best (factoring
>margin of error) seems to be 128 - which happens to be the block size
>of
>the underlying RAID6 array on the Dom0.
>
># cat /proc/mdstat
>md2 : active raid6 sdd[5] sdc[4] sdf[1] sde[0]
>     3906766592 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4]
>[UUUU]
>
>> The indirect descriptor allows us to put more I/Os on the ring - and
>> I am hoping that will:
>>  a) solve your problem
>
>Well, it looks like this solves the issue - at least increasing the max
>causes almost double the write speed - and no change to read speeds
>(within margin of error).
>
>>  b) not solve your problem, but demonstrate that the issue is not
>with
>>     the ring, but with something else making your writes slower.
>> 
>> Hmm, are you by any chance using O_DIRECT when running bonnie++ in
>> dom0? The xen-blkback tacks on O_DIRECT to all write requests. This
>is
>> done to not use the dom0 page cache - otherwise you end up with
>> a double buffer where the writes are insane speed - but with
>absolutly
>> no safety.
>> 
>> If you want to try disabling that (so no O_DIRECT), I would do this
>> little change:
>> 
>> diff --git a/drivers/block/xen-blkback/blkback.c
>b/drivers/block/xen-blkback/blkback.c
>> index bf4b9d2..823b629 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -1139,7 +1139,7 @@ static int dispatch_rw_block_io(struct
>xen_blkif *blkif,
>>                 break;
>>         case BLKIF_OP_WRITE:
>>                 blkif->st_wr_req++;
>> -               operation = WRITE_ODIRECT;
>> +               operation = WRITE;
>>                 break;
>>         case BLKIF_OP_WRITE_BARRIER:
>>                 drain = true;
>
>With the above results, is this still useful?

No.  There is no need.  Awesome that this fixed it.  Roger had mentioned that he had seen similar behavior. We should probably do a patch that interrogates the backend for optimal segment size and informs the frontend - so it can set it not. 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-09-06 23:37 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-20  2:10 4.2.1: Poor write performance for DomU Steven Haigh
2013-02-20  8:26 ` Roger Pau Monné
2013-02-20  8:49   ` Steven Haigh
2013-02-20  9:49     ` Steven Haigh
2013-02-20 10:12       ` Jan Beulich
2013-02-20 11:06         ` Andrew Cooper
2013-02-20 11:08           ` Steven Haigh
2013-02-20 12:48             ` Andrew Cooper
2013-02-20 13:18             ` Pasi Kärkkäinen
2013-03-08 20:42               ` Konrad Rzeszutek Wilk
2013-03-08  8:54       ` Steven Haigh
2013-03-08  9:43         ` Roger Pau Monné
2013-03-08  9:46           ` Steven Haigh
2013-03-08  9:54             ` Roger Pau Monné
2013-03-08 20:49         ` Konrad Rzeszutek Wilk
2013-03-08 22:30           ` Steven Haigh
2013-03-11 13:30             ` Konrad Rzeszutek Wilk
2013-03-11 13:37               ` Steven Haigh
2013-03-12 13:04                 ` Konrad Rzeszutek Wilk
2013-03-12 14:08                   ` Steven Haigh
     [not found]                   ` <514EA337.7030303@crc.id.au>
     [not found]                     ` <514EA6B0.8010504@crc.id.au>
     [not found]                       ` <514EA741.7050403@crc.id.au>
2013-03-24  9:10                         ` Steven Haigh
2013-03-24  9:54                           ` Steven Haigh
2013-03-25  2:21                           ` Steven Haigh
2013-08-20 16:48                             ` Konrad Rzeszutek Wilk
2013-08-20 18:25                               ` Steven Haigh
2013-09-05  8:28                               ` Steven Haigh
2013-09-06 13:33                                 ` Konrad Rzeszutek Wilk
2013-09-06 23:06                                   ` Steven Haigh
2013-09-06 23:37                                     ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.