All of lore.kernel.org
 help / color / mirror / Atom feed
* Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
@ 2014-05-14 19:11 Konrad Rzeszutek Wilk
  2014-05-14 19:21 ` Josh Boyer
                   ` (6 more replies)
  0 siblings, 7 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-05-14 19:11 UTC (permalink / raw)
  To: gregkh, linux-kernel, stable, xen-devel, felipe.franciosi,
	roger.pau, jerry.snitselaar, axboe

Hey Greg

This email is in regards to backporting two patches to stable that
fall under the 'performance' rule:

 bfe11d6de1c416cea4f3f0f35f864162063ce3fa
 fbe363c476afe8ec992d3baf682670a4bd1b6ce6

I've copied Jerry - the maintainer of the Oracle's kernel. I don't have
the emails of the other distros maintainers but the bugs associated with it are:

https://bugzilla.redhat.com/show_bug.cgi?id=1096909
(RHEL7)
https://bugs.launchpad.net/ubuntu/+bug/1319003
(Ubuntu 13.10)

The following distros are affected:

(x) Ubuntu 13.04 and derivatives (3.8)
(v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
(x) Fedora 17 (3.8 and 3.9 in updates)
(x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
(v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
(v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA
(v) RHEL 7 and derivatives (3.10), expected to be supported until about 2025
(v) openSUSE 13.1 (3.11), expected to be supported until at least 2016-08
(v) SLES 12 (3.12), expected to be supported until about 2024
(v) Mageia 3 (3.8), supported until 2014-11-19
(v) Mageia 4 (3.12), supported until 2015-08-01
(v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel Release 3 (3.8), supported until TBA

Here is the analysis of the problem and what was put in the RHEL7 bug.
The Oracle bug does not exist (as I just backport them in the kernel and
send a GIT PULL to Jerry) - but if you would like I can certainly furnish
you with one (it would be identical to what is mentioned below).

If you are OK with the backport, I am volunteering Roger and Felipe to assist
in jamming^H^H^H^Hbackporting the patches into earlier kernels.

Summary:
Storage performance regression when Xen backend lacks persistent-grants support

Description of problem:
When used as a Xen guest, RHEL 7 will be slower than older releases in terms
s of storage performance. This is due to the persistent-grants feature introduced
in xen-blkfront on the Linux Kernel 3.8 series. From 3.8 to 3.12 (inclusive),
xen-blkfront will add an extra set of memcpy() operations regardless of
persistent-grants support in the backend (i.e. xen-blkback, qemu, tapdisk).
This has been identified and fixed in the 3.13 kernel series, but was not
backported to previous LTS kernels due to the nature of the bug (performance only).

While persistent grants reduce the stress on the Xen grant table and allow
for much better aggregate throughput (at the cost of an extra set of memcpy
operations), adding the copy overhead when the feature is unsupported on
the backend combines the worst of both worlds.   This is particularly noticeable
when intensive storage workloads are active from many guests.


How reproducible:
This is always reproducible when a RHEL 7 guest is running on Xen and the
storage backend (i.e. xen-blkback, qemu, tapdisk) does not have support for
persistent grants.


Steps to Reproduce:
1. Install a Xen dom0 running a kernel prior to 3.8 (without
persistent-grants support) - or run it under Amazon EC2
2. Install a set of RHEL 7 guests (which uses kernel 3.10).
3. Measure aggregate storage throughput from all guests.

NOTE: The storage infrastructure (e.g. local SSDs, network-attached storage)
cannot be a bottleneck in itself. If tested on a single SATA disk, for
example, the issue will probably be unnoticeable as the infrastructure will
be limiting response time and throughput.



Actual results:
Aggregate storage throughput will be lower than with a xen-blkfront
versions prior to 3.8 or newer than 3.12.



Expected results:
Aggregate storage throughput should be at least as good or better if the
backend supports persistent grants.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
  2014-05-14 19:21 ` Josh Boyer
@ 2014-05-14 19:21 ` Josh Boyer
  2014-05-20  3:19   ` Greg KH
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Josh Boyer @ 2014-05-14 19:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Greg KH, Linux-Kernel@Vger. Kernel. Org, stable, xen-devel,
	felipe.franciosi, roger.pau, jerry.snitselaar, Jens Axboe

On Wed, May 14, 2014 at 3:11 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> The following distros are affected:

> (x) Fedora 17 (3.8 and 3.9 in updates)
> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA

The net effect of all of the above is that all currently supported
Fedora versions already have the fixes.  Highlights one of the major
reasons we rebase the kernel in Fedora, even at the cost of fighting
the regressions that inevitably come with that.  Just an FYI for those
keeping track at home.

josh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
@ 2014-05-14 19:21 ` Josh Boyer
  2014-05-14 19:21 ` Josh Boyer
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Josh Boyer @ 2014-05-14 19:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jens Axboe, felipe.franciosi, Greg KH,
	Linux-Kernel@Vger. Kernel. Org, stable, jerry.snitselaar,
	xen-devel, roger.pau

On Wed, May 14, 2014 at 3:11 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> The following distros are affected:

> (x) Fedora 17 (3.8 and 3.9 in updates)
> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA

The net effect of all of the above is that all currently supported
Fedora versions already have the fixes.  Highlights one of the major
reasons we rebase the kernel in Fedora, even at the cost of fighting
the regressions that inevitably come with that.  Just an FYI for those
keeping track at home.

josh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
@ 2014-05-20  3:19   ` Greg KH
  2014-05-14 19:21 ` Josh Boyer
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Greg KH @ 2014-05-20  3:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, stable, xen-devel, felipe.franciosi, roger.pau,
	jerry.snitselaar, axboe

On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> Hey Greg
> 
> This email is in regards to backporting two patches to stable that
> fall under the 'performance' rule:
> 
>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> 
> I've copied Jerry - the maintainer of the Oracle's kernel. I don't have
> the emails of the other distros maintainers but the bugs associated with it are:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1096909
> (RHEL7)
> https://bugs.launchpad.net/ubuntu/+bug/1319003
> (Ubuntu 13.10)
> 
> The following distros are affected:
> 
> (x) Ubuntu 13.04 and derivatives (3.8)
> (v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
> (x) Fedora 17 (3.8 and 3.9 in updates)
> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) RHEL 7 and derivatives (3.10), expected to be supported until about 2025
> (v) openSUSE 13.1 (3.11), expected to be supported until at least 2016-08
> (v) SLES 12 (3.12), expected to be supported until about 2024
> (v) Mageia 3 (3.8), supported until 2014-11-19
> (v) Mageia 4 (3.12), supported until 2015-08-01
> (v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel Release 3 (3.8), supported until TBA
> 
> Here is the analysis of the problem and what was put in the RHEL7 bug.
> The Oracle bug does not exist (as I just backport them in the kernel and
> send a GIT PULL to Jerry) - but if you would like I can certainly furnish
> you with one (it would be identical to what is mentioned below).
> 
> If you are OK with the backport, I am volunteering Roger and Felipe to assist
> in jamming^H^H^H^Hbackporting the patches into earlier kernels.

Sure, can you provide backported patches?  As-is, they don't apply to
the 3.10-stable kernel.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
@ 2014-05-20  3:19   ` Greg KH
  0 siblings, 0 replies; 41+ messages in thread
From: Greg KH @ 2014-05-20  3:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: axboe, felipe.franciosi, linux-kernel, stable, jerry.snitselaar,
	xen-devel, roger.pau

On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> Hey Greg
> 
> This email is in regards to backporting two patches to stable that
> fall under the 'performance' rule:
> 
>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> 
> I've copied Jerry - the maintainer of the Oracle's kernel. I don't have
> the emails of the other distros maintainers but the bugs associated with it are:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1096909
> (RHEL7)
> https://bugs.launchpad.net/ubuntu/+bug/1319003
> (Ubuntu 13.10)
> 
> The following distros are affected:
> 
> (x) Ubuntu 13.04 and derivatives (3.8)
> (v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
> (x) Fedora 17 (3.8 and 3.9 in updates)
> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) RHEL 7 and derivatives (3.10), expected to be supported until about 2025
> (v) openSUSE 13.1 (3.11), expected to be supported until at least 2016-08
> (v) SLES 12 (3.12), expected to be supported until about 2024
> (v) Mageia 3 (3.8), supported until 2014-11-19
> (v) Mageia 4 (3.12), supported until 2015-08-01
> (v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel Release 3 (3.8), supported until TBA
> 
> Here is the analysis of the problem and what was put in the RHEL7 bug.
> The Oracle bug does not exist (as I just backport them in the kernel and
> send a GIT PULL to Jerry) - but if you would like I can certainly furnish
> you with one (it would be identical to what is mentioned below).
> 
> If you are OK with the backport, I am volunteering Roger and Felipe to assist
> in jamming^H^H^H^Hbackporting the patches into earlier kernels.

Sure, can you provide backported patches?  As-is, they don't apply to
the 3.10-stable kernel.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2014-05-20  3:19   ` Greg KH
@ 2014-05-20  9:32 ` Vitaly Kuznetsov
  2014-05-20  9:54   ` Vitaly Kuznetsov
  2014-05-20  9:54   ` [Xen-devel] " Vitaly Kuznetsov
  2014-05-20  9:32 ` Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-05-20  9:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: gregkh, linux-kernel, stable, xen-devel, felipe.franciosi,
	roger.pau, jerry.snitselaar, axboe

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> Hey Greg
>
> This email is in regards to backporting two patches to stable that
> fall under the 'performance' rule:
>
>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>
> I've copied Jerry - the maintainer of the Oracle's kernel. I don't have
> the emails of the other distros maintainers but the bugs associated with it are:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1096909
> (RHEL7)

I was doing tests with RHEL7 kernel and these patches and unfortunately
I see huge performance degradation in some workloads.

I'm in the middle of my testing now but here are some intermediate
results. 
Test environment:
Fedora-20, xen-4.3.2-2.fc20.x86_64, 3.11.10-301.fc20.x86_64
I do testing with 1-9 RHEL7 PVHVM guests with:
1) Unmodified RHEL7 kernel
2) Only fbe363c476afe8ec992d3baf682670a4bd1b6ce6 applied (revoke foreign
access)
3) Both fbe363c476afe8ec992d3baf682670a4bd1b6ce6 and
bfe11d6de1c416cea4f3f0f35f864162063ce3fa
(actually 427bfe07e6744c058ce6fc4aa187cda96b635539 is required as well
to make build happy, I suggest we backport that to stable as well)

Storage devices are:
1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors disabled)
2) /tmp/img*.img on tmpfs (persistent grants and indirect descriptors disabled)

Test itself: direct random read with bs=2048k (using fio). (Actually
'dd', 'read/write access', ... show same results)

fio test file:
[fio_read]
ioengine=libaio
blocksize=2048k
rw=randread
filename=/dev/xvdc
randrepeat=1
fallocate=none
direct=1
invalidate=0
runtime=20
time_based

I run fio simultaneously and sum up the result. So, results are:
1) ramdisks: http://hadoop.ru/pubfiles/b1096909_3.11.10_ramdisk.png
2) tmpfiles: http://hadoop.ru/pubfiles/b1096909_3.11.10_tmpfile.png

In few words: patch series has (almost) no effect when persistent grants
are enabled (that was expected) and gives me performance regression when
persistent grants are disabled (that wasn't expected).

My thoughts are: it seems fbe363c476afe8ec992d3baf682670a4bd1b6ce6
brings performance regression in some cases (at least when persistent
grants are disabled). My guess atm is that gnttab_end_foreign_access()
(gnttab_end_foreign_access_ref_v1() is being used here) is guilty, for
some reason it is looping for some
time. bfe11d6de1c416cea4f3f0f35f864162063ce3fa really brings performance
improvement over fbe363c476afe8ec992d3baf682670a4bd1b6ce6 but whole
series still brings regression.

I would be glad to hear what could be wrong with my testing in case I'm
the only one who sees such behavior. Any other pointers are more than
welcome and please feel free to ask for any additional
info/testing/whatever from me.

> https://bugs.launchpad.net/ubuntu/+bug/1319003
> (Ubuntu 13.10)
>
> The following distros are affected:
>
> (x) Ubuntu 13.04 and derivatives (3.8)
> (v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
> (x) Fedora 17 (3.8 and 3.9 in updates)
> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) RHEL 7 and derivatives (3.10), expected to be supported until about 2025
> (v) openSUSE 13.1 (3.11), expected to be supported until at least 2016-08
> (v) SLES 12 (3.12), expected to be supported until about 2024
> (v) Mageia 3 (3.8), supported until 2014-11-19
> (v) Mageia 4 (3.12), supported until 2015-08-01
> (v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel Release 3 (3.8), supported until TBA
>
> Here is the analysis of the problem and what was put in the RHEL7 bug.
> The Oracle bug does not exist (as I just backport them in the kernel and
> send a GIT PULL to Jerry) - but if you would like I can certainly furnish
> you with one (it would be identical to what is mentioned below).
>
> If you are OK with the backport, I am volunteering Roger and Felipe to assist
> in jamming^H^H^H^Hbackporting the patches into earlier kernels.
>
> Summary:
> Storage performance regression when Xen backend lacks persistent-grants support
>
> Description of problem:
> When used as a Xen guest, RHEL 7 will be slower than older releases in terms
> s of storage performance. This is due to the persistent-grants feature introduced
> in xen-blkfront on the Linux Kernel 3.8 series. From 3.8 to 3.12 (inclusive),
> xen-blkfront will add an extra set of memcpy() operations regardless of
> persistent-grants support in the backend (i.e. xen-blkback, qemu, tapdisk).
> This has been identified and fixed in the 3.13 kernel series, but was not
> backported to previous LTS kernels due to the nature of the bug (performance only).
>
> While persistent grants reduce the stress on the Xen grant table and allow
> for much better aggregate throughput (at the cost of an extra set of memcpy
> operations), adding the copy overhead when the feature is unsupported on
> the backend combines the worst of both worlds.   This is particularly noticeable
> when intensive storage workloads are active from many guests.
>
> How reproducible:
> This is always reproducible when a RHEL 7 guest is running on Xen and the
> storage backend (i.e. xen-blkback, qemu, tapdisk) does not have support for
> persistent grants.
>
> Steps to Reproduce:
> 1. Install a Xen dom0 running a kernel prior to 3.8 (without
> persistent-grants support) - or run it under Amazon EC2
> 2. Install a set of RHEL 7 guests (which uses kernel 3.10).
> 3. Measure aggregate storage throughput from all guests.
>
> NOTE: The storage infrastructure (e.g. local SSDs, network-attached storage)
> cannot be a bottleneck in itself. If tested on a single SATA disk, for
> example, the issue will probably be unnoticeable as the infrastructure will
> be limiting response time and throughput.
>
> Actual results:
> Aggregate storage throughput will be lower than with a xen-blkfront
> versions prior to 3.8 or newer than 3.12.
>
> Expected results:
> Aggregate storage throughput should be at least as good or better if the
> backend supports persistent grants.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2014-05-20  9:32 ` [Xen-devel] " Vitaly Kuznetsov
@ 2014-05-20  9:32 ` Vitaly Kuznetsov
  2014-06-04  5:48 ` Greg KH
  2014-06-04  5:48 ` Greg KH
  6 siblings, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-05-20  9:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: axboe, felipe.franciosi, gregkh, linux-kernel, stable,
	jerry.snitselaar, xen-devel, roger.pau

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> Hey Greg
>
> This email is in regards to backporting two patches to stable that
> fall under the 'performance' rule:
>
>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>
> I've copied Jerry - the maintainer of the Oracle's kernel. I don't have
> the emails of the other distros maintainers but the bugs associated with it are:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1096909
> (RHEL7)

I was doing tests with RHEL7 kernel and these patches and unfortunately
I see huge performance degradation in some workloads.

I'm in the middle of my testing now but here are some intermediate
results. 
Test environment:
Fedora-20, xen-4.3.2-2.fc20.x86_64, 3.11.10-301.fc20.x86_64
I do testing with 1-9 RHEL7 PVHVM guests with:
1) Unmodified RHEL7 kernel
2) Only fbe363c476afe8ec992d3baf682670a4bd1b6ce6 applied (revoke foreign
access)
3) Both fbe363c476afe8ec992d3baf682670a4bd1b6ce6 and
bfe11d6de1c416cea4f3f0f35f864162063ce3fa
(actually 427bfe07e6744c058ce6fc4aa187cda96b635539 is required as well
to make build happy, I suggest we backport that to stable as well)

Storage devices are:
1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors disabled)
2) /tmp/img*.img on tmpfs (persistent grants and indirect descriptors disabled)

Test itself: direct random read with bs=2048k (using fio). (Actually
'dd', 'read/write access', ... show same results)

fio test file:
[fio_read]
ioengine=libaio
blocksize=2048k
rw=randread
filename=/dev/xvdc
randrepeat=1
fallocate=none
direct=1
invalidate=0
runtime=20
time_based

I run fio simultaneously and sum up the result. So, results are:
1) ramdisks: http://hadoop.ru/pubfiles/b1096909_3.11.10_ramdisk.png
2) tmpfiles: http://hadoop.ru/pubfiles/b1096909_3.11.10_tmpfile.png

In few words: patch series has (almost) no effect when persistent grants
are enabled (that was expected) and gives me performance regression when
persistent grants are disabled (that wasn't expected).

My thoughts are: it seems fbe363c476afe8ec992d3baf682670a4bd1b6ce6
brings performance regression in some cases (at least when persistent
grants are disabled). My guess atm is that gnttab_end_foreign_access()
(gnttab_end_foreign_access_ref_v1() is being used here) is guilty, for
some reason it is looping for some
time. bfe11d6de1c416cea4f3f0f35f864162063ce3fa really brings performance
improvement over fbe363c476afe8ec992d3baf682670a4bd1b6ce6 but whole
series still brings regression.

I would be glad to hear what could be wrong with my testing in case I'm
the only one who sees such behavior. Any other pointers are more than
welcome and please feel free to ask for any additional
info/testing/whatever from me.

> https://bugs.launchpad.net/ubuntu/+bug/1319003
> (Ubuntu 13.10)
>
> The following distros are affected:
>
> (x) Ubuntu 13.04 and derivatives (3.8)
> (v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
> (x) Fedora 17 (3.8 and 3.9 in updates)
> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA
> (v) RHEL 7 and derivatives (3.10), expected to be supported until about 2025
> (v) openSUSE 13.1 (3.11), expected to be supported until at least 2016-08
> (v) SLES 12 (3.12), expected to be supported until about 2024
> (v) Mageia 3 (3.8), supported until 2014-11-19
> (v) Mageia 4 (3.12), supported until 2015-08-01
> (v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel Release 3 (3.8), supported until TBA
>
> Here is the analysis of the problem and what was put in the RHEL7 bug.
> The Oracle bug does not exist (as I just backport them in the kernel and
> send a GIT PULL to Jerry) - but if you would like I can certainly furnish
> you with one (it would be identical to what is mentioned below).
>
> If you are OK with the backport, I am volunteering Roger and Felipe to assist
> in jamming^H^H^H^Hbackporting the patches into earlier kernels.
>
> Summary:
> Storage performance regression when Xen backend lacks persistent-grants support
>
> Description of problem:
> When used as a Xen guest, RHEL 7 will be slower than older releases in terms
> s of storage performance. This is due to the persistent-grants feature introduced
> in xen-blkfront on the Linux Kernel 3.8 series. From 3.8 to 3.12 (inclusive),
> xen-blkfront will add an extra set of memcpy() operations regardless of
> persistent-grants support in the backend (i.e. xen-blkback, qemu, tapdisk).
> This has been identified and fixed in the 3.13 kernel series, but was not
> backported to previous LTS kernels due to the nature of the bug (performance only).
>
> While persistent grants reduce the stress on the Xen grant table and allow
> for much better aggregate throughput (at the cost of an extra set of memcpy
> operations), adding the copy overhead when the feature is unsupported on
> the backend combines the worst of both worlds.   This is particularly noticeable
> when intensive storage workloads are active from many guests.
>
> How reproducible:
> This is always reproducible when a RHEL 7 guest is running on Xen and the
> storage backend (i.e. xen-blkback, qemu, tapdisk) does not have support for
> persistent grants.
>
> Steps to Reproduce:
> 1. Install a Xen dom0 running a kernel prior to 3.8 (without
> persistent-grants support) - or run it under Amazon EC2
> 2. Install a set of RHEL 7 guests (which uses kernel 3.10).
> 3. Measure aggregate storage throughput from all guests.
>
> NOTE: The storage infrastructure (e.g. local SSDs, network-attached storage)
> cannot be a bottleneck in itself. If tested on a single SATA disk, for
> example, the issue will probably be unnoticeable as the infrastructure will
> be limiting response time and throughput.
>
> Actual results:
> Aggregate storage throughput will be lower than with a xen-blkfront
> versions prior to 3.8 or newer than 3.12.
>
> Expected results:
> Aggregate storage throughput should be at least as good or better if the
> backend supports persistent grants.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20  9:32 ` [Xen-devel] " Vitaly Kuznetsov
  2014-05-20  9:54   ` Vitaly Kuznetsov
@ 2014-05-20  9:54   ` Vitaly Kuznetsov
  2014-05-20 10:32     ` Roger Pau Monné
  2014-05-20 10:32     ` Roger Pau Monné
  1 sibling, 2 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-05-20  9:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: gregkh, linux-kernel, stable, xen-devel, felipe.franciosi,
	roger.pau, jerry.snitselaar, axboe

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
> disabled)

sorry, there was a typo. persistent grants and indirect descriptors are
enabled with ramdisks, otherwise such testing won't make any sense.

> 2) /tmp/img*.img on tmpfs (persistent grants and indirect descriptors disabled)
>

-- 
  Vitaly Kuznetsov

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20  9:32 ` [Xen-devel] " Vitaly Kuznetsov
@ 2014-05-20  9:54   ` Vitaly Kuznetsov
  2014-05-20  9:54   ` [Xen-devel] " Vitaly Kuznetsov
  1 sibling, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-05-20  9:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: axboe, felipe.franciosi, gregkh, linux-kernel, stable,
	jerry.snitselaar, xen-devel, roger.pau

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
> disabled)

sorry, there was a typo. persistent grants and indirect descriptors are
enabled with ramdisks, otherwise such testing won't make any sense.

> 2) /tmp/img*.img on tmpfs (persistent grants and indirect descriptors disabled)
>

-- 
  Vitaly Kuznetsov

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20  9:54   ` [Xen-devel] " Vitaly Kuznetsov
@ 2014-05-20 10:32     ` Roger Pau Monné
  2014-05-20 11:41       ` Vitaly Kuznetsov
  2014-05-20 11:41       ` [Xen-devel] " Vitaly Kuznetsov
  2014-05-20 10:32     ` Roger Pau Monné
  1 sibling, 2 replies; 41+ messages in thread
From: Roger Pau Monné @ 2014-05-20 10:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Konrad Rzeszutek Wilk
  Cc: gregkh, linux-kernel, stable, xen-devel, felipe.franciosi,
	jerry.snitselaar, axboe

On 20/05/14 11:54, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
>> disabled)
> 
> sorry, there was a typo. persistent grants and indirect descriptors are
> enabled with ramdisks, otherwise such testing won't make any sense.

I'm not sure how is that possible, from your description I get that you
are using 3.11 on the Dom0, which means blkback has support for
persistent grants and indirect descriptors, but the guest is RHEL7,
that's using the 3.10 kernel AFAICT, and this kernel only has persistent
grants implemented.

Roger.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20  9:54   ` [Xen-devel] " Vitaly Kuznetsov
  2014-05-20 10:32     ` Roger Pau Monné
@ 2014-05-20 10:32     ` Roger Pau Monné
  1 sibling, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2014-05-20 10:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Konrad Rzeszutek Wilk
  Cc: axboe, felipe.franciosi, gregkh, linux-kernel, stable,
	jerry.snitselaar, xen-devel

On 20/05/14 11:54, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
>> disabled)
> 
> sorry, there was a typo. persistent grants and indirect descriptors are
> enabled with ramdisks, otherwise such testing won't make any sense.

I'm not sure how is that possible, from your description I get that you
are using 3.11 on the Dom0, which means blkback has support for
persistent grants and indirect descriptors, but the guest is RHEL7,
that's using the 3.10 kernel AFAICT, and this kernel only has persistent
grants implemented.

Roger.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20 10:32     ` Roger Pau Monné
  2014-05-20 11:41       ` Vitaly Kuznetsov
@ 2014-05-20 11:41       ` Vitaly Kuznetsov
  2014-05-20 13:59           ` [Xen-devel] " Felipe Franciosi
  1 sibling, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-05-20 11:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Konrad Rzeszutek Wilk, axboe, felipe.franciosi, gregkh,
	linux-kernel, stable, jerry.snitselaar, xen-devel

Roger Pau Monné <roger.pau@citrix.com> writes:

> On 20/05/14 11:54, Vitaly Kuznetsov wrote:
>> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> 
>>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
>>> disabled)
>> 
>> sorry, there was a typo. persistent grants and indirect descriptors are
>> enabled with ramdisks, otherwise such testing won't make any sense.
>
> I'm not sure how is that possible, from your description I get that you
> are using 3.11 on the Dom0, which means blkback has support for
> persistent grants and indirect descriptors, but the guest is RHEL7,
> that's using the 3.10 kernel AFAICT, and this kernel only has persistent
> grants implemented.

RHEL7 kernel is mostly merged with 3.11 in its Xen part, we have
indirect descriptors backported.

Actually I tried my tests with upstream (Fedora) kernel and results were
similar. I can try comparing e.g. 3.11.10 with 3.12.0 and provide exact
measurements.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20 10:32     ` Roger Pau Monné
@ 2014-05-20 11:41       ` Vitaly Kuznetsov
  2014-05-20 11:41       ` [Xen-devel] " Vitaly Kuznetsov
  1 sibling, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-05-20 11:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: axboe, felipe.franciosi, gregkh, linux-kernel, stable,
	jerry.snitselaar, xen-devel

Roger Pau Monné <roger.pau@citrix.com> writes:

> On 20/05/14 11:54, Vitaly Kuznetsov wrote:
>> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> 
>>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
>>> disabled)
>> 
>> sorry, there was a typo. persistent grants and indirect descriptors are
>> enabled with ramdisks, otherwise such testing won't make any sense.
>
> I'm not sure how is that possible, from your description I get that you
> are using 3.11 on the Dom0, which means blkback has support for
> persistent grants and indirect descriptors, but the guest is RHEL7,
> that's using the 3.10 kernel AFAICT, and this kernel only has persistent
> grants implemented.

RHEL7 kernel is mostly merged with 3.11 in its Xen part, we have
indirect descriptors backported.

Actually I tried my tests with upstream (Fedora) kernel and results were
similar. I can try comparing e.g. 3.11.10 with 3.12.0 and provide exact
measurements.

-- 
  Vitaly

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20 11:41       ` [Xen-devel] " Vitaly Kuznetsov
  2014-05-20 13:59           ` [Xen-devel] " Felipe Franciosi
@ 2014-05-20 13:59           ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-05-20 13:59 UTC (permalink / raw)
  To: 'Vitaly Kuznetsov'
  Cc: Konrad Rzeszutek Wilk, axboe, gregkh, linux-kernel, stable,
	jerry.snitselaar, xen-devel, Roger Pau Monne

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2192 bytes --]

I had a small side-bar thread with Vitaly discussing the comprehensiveness of his measurements and how his tests are being conducted. He will report new results as they become available.

In the meantime, I stand behind that the patches need to be backported and there is a regression if we don't do that.
Ubuntu has already provided a test kernel with the patches pulled in.  I will test those as soon as I get the chance (hopefully by the end of the week).
See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1319003

Felipe

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: 20 May 2014 12:41
> To: Roger Pau Monne
> Cc: Konrad Rzeszutek Wilk; axboe@kernel.dk; Felipe Franciosi;
> gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> stable@vger.kernel.org; jerry.snitselaar@oracle.com; xen-
> devel@lists.xenproject.org
> Subject: Re: [Xen-devel] Backport request to stable of two performance
> related fixes for xen-blkfront (3.13 fixes to earlier trees)
> 
> Roger Pau Monné <roger.pau@citrix.com> writes:
> 
> > On 20/05/14 11:54, Vitaly Kuznetsov wrote:
> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >>
> >>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
> >>> disabled)
> >>
> >> sorry, there was a typo. persistent grants and indirect descriptors
> >> are enabled with ramdisks, otherwise such testing won't make any sense.
> >
> > I'm not sure how is that possible, from your description I get that
> > you are using 3.11 on the Dom0, which means blkback has support for
> > persistent grants and indirect descriptors, but the guest is RHEL7,
> > that's using the 3.10 kernel AFAICT, and this kernel only has
> > persistent grants implemented.
> 
> RHEL7 kernel is mostly merged with 3.11 in its Xen part, we have indirect
> descriptors backported.
> 
> Actually I tried my tests with upstream (Fedora) kernel and results were
> similar. I can try comparing e.g. 3.11.10 with 3.12.0 and provide exact
> measurements.
> 
> --
>   Vitaly
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
@ 2014-05-20 13:59           ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-05-20 13:59 UTC (permalink / raw)
  To: 'Vitaly Kuznetsov'
  Cc: Konrad Rzeszutek Wilk, axboe, gregkh, linux-kernel, stable,
	jerry.snitselaar, xen-devel, Roger Pau Monne

I had a small side-bar thread with Vitaly discussing the comprehensiveness of his measurements and how his tests are being conducted. He will report new results as they become available.

In the meantime, I stand behind that the patches need to be backported and there is a regression if we don't do that.
Ubuntu has already provided a test kernel with the patches pulled in.  I will test those as soon as I get the chance (hopefully by the end of the week).
See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1319003

Felipe

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: 20 May 2014 12:41
> To: Roger Pau Monne
> Cc: Konrad Rzeszutek Wilk; axboe@kernel.dk; Felipe Franciosi;
> gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> stable@vger.kernel.org; jerry.snitselaar@oracle.com; xen-
> devel@lists.xenproject.org
> Subject: Re: [Xen-devel] Backport request to stable of two performance
> related fixes for xen-blkfront (3.13 fixes to earlier trees)
> 
> Roger Pau Monné <roger.pau@citrix.com> writes:
> 
> > On 20/05/14 11:54, Vitaly Kuznetsov wrote:
> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >>
> >>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
> >>> disabled)
> >>
> >> sorry, there was a typo. persistent grants and indirect descriptors
> >> are enabled with ramdisks, otherwise such testing won't make any sense.
> >
> > I'm not sure how is that possible, from your description I get that
> > you are using 3.11 on the Dom0, which means blkback has support for
> > persistent grants and indirect descriptors, but the guest is RHEL7,
> > that's using the 3.10 kernel AFAICT, and this kernel only has
> > persistent grants implemented.
> 
> RHEL7 kernel is mostly merged with 3.11 in its Xen part, we have indirect
> descriptors backported.
> 
> Actually I tried my tests with upstream (Fedora) kernel and results were
> similar. I can try comparing e.g. 3.11.10 with 3.12.0 and provide exact
> measurements.
> 
> --
>   Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
@ 2014-05-20 13:59           ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-05-20 13:59 UTC (permalink / raw)
  To: 'Vitaly Kuznetsov'
  Cc: axboe, gregkh, linux-kernel, stable, jerry.snitselaar, xen-devel,
	Roger Pau Monne

I had a small side-bar thread with Vitaly discussing the comprehensiveness of his measurements and how his tests are being conducted. He will report new results as they become available.

In the meantime, I stand behind that the patches need to be backported and there is a regression if we don't do that.
Ubuntu has already provided a test kernel with the patches pulled in.  I will test those as soon as I get the chance (hopefully by the end of the week).
See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1319003

Felipe

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: 20 May 2014 12:41
> To: Roger Pau Monne
> Cc: Konrad Rzeszutek Wilk; axboe@kernel.dk; Felipe Franciosi;
> gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> stable@vger.kernel.org; jerry.snitselaar@oracle.com; xen-
> devel@lists.xenproject.org
> Subject: Re: [Xen-devel] Backport request to stable of two performance
> related fixes for xen-blkfront (3.13 fixes to earlier trees)
> 
> Roger Pau Monné <roger.pau@citrix.com> writes:
> 
> > On 20/05/14 11:54, Vitaly Kuznetsov wrote:
> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >>
> >>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
> >>> disabled)
> >>
> >> sorry, there was a typo. persistent grants and indirect descriptors
> >> are enabled with ramdisks, otherwise such testing won't make any sense.
> >
> > I'm not sure how is that possible, from your description I get that
> > you are using 3.11 on the Dom0, which means blkback has support for
> > persistent grants and indirect descriptors, but the guest is RHEL7,
> > that's using the 3.10 kernel AFAICT, and this kernel only has
> > persistent grants implemented.
> 
> RHEL7 kernel is mostly merged with 3.11 in its Xen part, we have indirect
> descriptors backported.
> 
> Actually I tried my tests with upstream (Fedora) kernel and results were
> similar. I can try comparing e.g. 3.11.10 with 3.12.0 and provide exact
> measurements.
> 
> --
>   Vitaly
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20 13:59           ` [Xen-devel] " Felipe Franciosi
  (?)
  (?)
@ 2014-05-22  8:52           ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-05-22  8:52 UTC (permalink / raw)
  To: xen-devel
  Cc: axboe, gregkh, linux-kernel, stable, jerry.snitselaar,
	Roger Pau Monne, Felipe Franciosi, Andrew Jones, Ronen Hod

Felipe Franciosi <felipe.franciosi@citrix.com> writes:

> I had a small side-bar thread with Vitaly discussing the
> comprehensiveness of his measurements and how his tests are being
> conducted. He will report new results as they become available.

I'm back ;-)

In short: I think I was able to find a very 'special' case when this
patch series leads to io performance regression. It is clearly visible
with upstream kernel, no RHEL specifics involved. Now the details.

I compare IO performance with 3 kernel
1) "unpatched_upstream" means unmodified Linus's
  60b5f90d0fac7585f1a43ccdad06787b97eda0ab build

2) "revertall_upstream" means upstream with all patches reverted
  (60b5f90d0fac7585f1a43ccdad06787b97eda0ab + 3 commits reverted:
  427bfe07e6744c058ce6fc4aa187cda96b635539,
  bfe11d6de1c416cea4f3f0f35f864162063ce3fa,
  fbe363c476afe8ec992d3baf682670a4bd1b6ce6)

3) "revokefaonly_upstream" means upstream with "revoke foreign access"
  patch only (60b5f90d0fac7585f1a43ccdad06787b97eda0ab + 2 commits
  revered:
  427bfe07e6744c058ce6fc4aa187cda96b635539,
  bfe11d6de1c416cea4f3f0f35f864162063ce3fa)

I have the following setup:
1) Single-cpu "Intel(R) Xeon(R)CPU W3550 @ 3.07GHz" system, 4 cores
2) No hyper threading, turbo boost, ..
3) Dom0 is running 3.11.10-301.fc20.x86_64 kernel, xen-4.3.2-3.fc20.x86_64
4) Dom0 is pinned to Core0
5) I create 9 clients pinned to (Core1, Core2, Core3, Core1, Core2,
Core3, Core1, Core2, Core3)
6) Clients are identical, the only thing which differs is kernel.

Now the most important part.
1) For each client I create 1G file on tmpfs in dom0
2) I attach these files to clients as e.g. "file:/tmp/img10.img,xvdc,rw"
(no blkback)
3) In clients I see these devices as "blkfront: xvdc: barrier: enabled;
persistent grants: disabled; indirect descriptors: disabled;"

Tests:
I run fio simultaneously on 1 - 9 clients and measure aggregate
throughput. Each test is being run 3 times and average is being taken. I
run tests with different BS (4k, 64k, 512k, 2048k) and different RW
(randread, randrw)
Fio job:
[fio_jobname]
ioengine=libaio
blocksize=<BS>
filename=/dev/xvdc
randrepeat=1
fallocate=none
direct=1
invalidate=0
runtime=20
time_based
rw=<randread|randrw>

Now the results:

rw=randread:
1) 4k (strange): http://hadoop.ru/pubfiles/bug1096909/4k_r.png
2) 64k: http://hadoop.ru/pubfiles/bug1096909/64k_r.png
3) 512k: http://hadoop.ru/pubfiles/bug1096909/512k_r.png
4) 2048k: http://hadoop.ru/pubfiles/bug1096909/2048k_r.png

rw=randrw:
1) 4k (strange): http://hadoop.ru/pubfiles/bug1096909/4k_rw.png
2) 64k: http://hadoop.ru/pubfiles/bug1096909/64k_rw.png
3) 512k: http://hadoop.ru/pubfiles/bug1096909/512k_rw.png
4) 2048k: http://hadoop.ru/pubfiles/bug1096909/2048k_rw.png

In short, 'revertall_upstream' wins everywhere with significant-enough
difference.

P.S. To see the regression such complicated setup is not required, it is
clearly visible even with 1 client. E.g.:

# for q in `seq 1 10`; do fio --minimal --client <client_with_"revertall_upstream"_kernel> test_r.fio | grep READ; done
   READ: io=8674.0MB, aggrb=444086KB/s, minb=444086KB/s, maxb=444086KB/s, mint=20001msec, maxt=20001msec
   READ: io=8626.0MB, aggrb=441607KB/s, minb=441607KB/s, maxb=441607KB/s, mint=20002msec, maxt=20002msec
   READ: io=8620.0MB, aggrb=441277KB/s, minb=441277KB/s, maxb=441277KB/s, mint=20003msec, maxt=20003msec
   READ: io=8522.0MB, aggrb=436304KB/s, minb=436304KB/s, maxb=436304KB/s, mint=20001msec, maxt=20001msec
   READ: io=8218.0MB, aggrb=420698KB/s, minb=420698KB/s, maxb=420698KB/s, mint=20003msec, maxt=20003msec
   READ: io=8374.0MB, aggrb=428705KB/s, minb=428705KB/s, maxb=428705KB/s, mint=20002msec, maxt=20002msec
   READ: io=8198.0MB, aggrb=419653KB/s, minb=419653KB/s, maxb=419653KB/s, mint=20004msec, maxt=20004msec
   READ: io=7586.0MB, aggrb=388306KB/s, minb=388306KB/s, maxb=388306KB/s, mint=20005msec, maxt=20005msec
   READ: io=8512.0MB, aggrb=435749KB/s, minb=435749KB/s, maxb=435749KB/s, mint=20003msec, maxt=20003msec
   READ: io=8524.0MB, aggrb=436319KB/s, minb=436319KB/s, maxb=436319KB/s, mint=20005msec, maxt=20005msec

# for q in `seq 1 10`; do fio --minimal --client <client_with_"unpatched_upstream"_kernel> test_r.fio | grep READ; done
   READ: io=7236.0MB, aggrb=370464KB/s, minb=370464KB/s, maxb=370464KB/s, mint=20001msec, maxt=20001msec
   READ: io=6506.0MB, aggrb=333090KB/s, minb=333090KB/s, maxb=333090KB/s, mint=20001msec, maxt=20001msec
   READ: io=6584.0MB, aggrb=337050KB/s, minb=337050KB/s, maxb=337050KB/s, mint=20003msec, maxt=20003msec
   READ: io=7120.0MB, aggrb=364489KB/s, minb=364489KB/s, maxb=364489KB/s, mint=20003msec, maxt=20003msec
   READ: io=6610.0MB, aggrb=338347KB/s, minb=338347KB/s, maxb=338347KB/s, mint=20005msec, maxt=20005msec
   READ: io=7024.0MB, aggrb=359556KB/s, minb=359556KB/s, maxb=359556KB/s, mint=20004msec, maxt=20004msec
   READ: io=7320.0MB, aggrb=374765KB/s, minb=374765KB/s, maxb=374765KB/s, mint=20001msec, maxt=20001msec
   READ: io=6540.0MB, aggrb=334814KB/s, minb=334814KB/s, maxb=334814KB/s, mint=20002msec, maxt=20002msec
   READ: io=6636.0MB, aggrb=339661KB/s, minb=339661KB/s, maxb=339661KB/s, mint=20006msec, maxt=20006msec
   READ: io=6594.0MB, aggrb=337595KB/s, minb=337595KB/s, maxb=337595KB/s, mint=20001msec, maxt=20001msec

Dumb 'dd' test shows the same:
"revertall_upstream" client:
# time for ntry in `seq 1 100`; do dd if=/dev/xvdc of=/dev/null bs=2048k 2> /dev/null; done

real0m16.262s
user0m0.189s
sys0m7.021s

"unpatched_upstream"
# time for ntry in `seq 1 100`; do dd if=/dev/xvdc of=/dev/null bs=2048k 2> /dev/null; done

real0m19.938s
user0m0.174s
sys0m9.489s

I tried running newer Dom0 (3.14.4-200.fc20.x86_64) but that makes no
difference.

P.P.S. I understand this test differs a lot from what these patches were
supposed to fix and I'm not trying to say 'no' for stable backport, but
I also thinks this test data can be interesting as well.

And thanks, Felipe, for all your hardware hints!

>
> In the meantime, I stand behind that the patches need to be backported and there is a regression if we don't do that.
> Ubuntu has already provided a test kernel with the patches pulled in.  I will test those as soon as I get the chance (hopefully by the end of the week).
> See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1319003
>
> Felipe
>
>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: 20 May 2014 12:41
>> To: Roger Pau Monne
>> Cc: Konrad Rzeszutek Wilk; axboe@kernel.dk; Felipe Franciosi;
>> gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
>> stable@vger.kernel.org; jerry.snitselaar@oracle.com; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [Xen-devel] Backport request to stable of two performance
>> related fixes for xen-blkfront (3.13 fixes to earlier trees)
>> 
>> Roger Pau Monné <roger.pau@citrix.com> writes:
>> 
>> > On 20/05/14 11:54, Vitaly Kuznetsov wrote:
>> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> >>
>> >>> 1) ramdisks (/dev/ram*) (persistent grants and indirect descriptors
>> >>> disabled)
>> >>
>> >> sorry, there was a typo. persistent grants and indirect descriptors
>> >> are enabled with ramdisks, otherwise such testing won't make any sense.
>> >
>> > I'm not sure how is that possible, from your description I get that
>> > you are using 3.11 on the Dom0, which means blkback has support for
>> > persistent grants and indirect descriptors, but the guest is RHEL7,
>> > that's using the 3.10 kernel AFAICT, and this kernel only has
>> > persistent grants implemented.
>> 
>> RHEL7 kernel is mostly merged with 3.11 in its Xen part, we have indirect
>> descriptors backported.
>> 
>> Actually I tried my tests with upstream (Fedora) kernel and results were
>> similar. I can try comparing e.g. 3.11.10 with 3.12.0 and provide exact
>> measurements.
>> 
>> --
>>   Vitaly
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20  3:19   ` Greg KH
  (?)
@ 2014-05-22 12:54   ` Roger Pau Monné
  2014-06-02  7:13     ` Felipe Franciosi
  -1 siblings, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2014-05-22 12:54 UTC (permalink / raw)
  To: Greg KH, Konrad Rzeszutek Wilk
  Cc: linux-kernel, stable, xen-devel, felipe.franciosi,
	jerry.snitselaar, axboe

[-- Attachment #1: Type: text/plain, Size: 2263 bytes --]

On 20/05/14 05:19, Greg KH wrote:
> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>> Hey Greg
>>
>> This email is in regards to backporting two patches to stable that
>> fall under the 'performance' rule:
>>
>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>
>> I've copied Jerry - the maintainer of the Oracle's kernel. I don't have
>> the emails of the other distros maintainers but the bugs associated with it are:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1096909
>> (RHEL7)
>> https://bugs.launchpad.net/ubuntu/+bug/1319003
>> (Ubuntu 13.10)
>>
>> The following distros are affected:
>>
>> (x) Ubuntu 13.04 and derivatives (3.8)
>> (v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
>> (x) Fedora 17 (3.8 and 3.9 in updates)
>> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
>> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
>> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA
>> (v) RHEL 7 and derivatives (3.10), expected to be supported until about 2025
>> (v) openSUSE 13.1 (3.11), expected to be supported until at least 2016-08
>> (v) SLES 12 (3.12), expected to be supported until about 2024
>> (v) Mageia 3 (3.8), supported until 2014-11-19
>> (v) Mageia 4 (3.12), supported until 2015-08-01
>> (v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel Release 3 (3.8), supported until TBA
>>
>> Here is the analysis of the problem and what was put in the RHEL7 bug.
>> The Oracle bug does not exist (as I just backport them in the kernel and
>> send a GIT PULL to Jerry) - but if you would like I can certainly furnish
>> you with one (it would be identical to what is mentioned below).
>>
>> If you are OK with the backport, I am volunteering Roger and Felipe to assist
>> in jamming^H^H^H^Hbackporting the patches into earlier kernels.
> 
> Sure, can you provide backported patches?  As-is, they don't apply to
> the 3.10-stable kernel.

Here are the backported patches to 3.10 stable, I would like however to
get some testing/benchmarking on them before applying, since it's not a
trivial backport. Could you please give them a spin Felipe?

Roger.


[-- Attachment #2: 0001-backport-of-fbe363-to-stable-3.10.y-branch.patch --]
[-- Type: text/plain, Size: 2702 bytes --]

>From 8db2b1bff4b54dc01592f9e14b9dae32c341f435 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Wed, 21 May 2014 16:51:47 +0200
Subject: [PATCH 1/2] backport of fbe363 to stable 3.10.y branch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

xen-blkfront: revoke foreign access for grants not mapped by the backend

There's no need to keep the foreign access in a grant if it is not
persistently mapped by the backend. This allows us to free grants that
are not mapped by the backend, thus preventing blkfront from hoarding
all grants.

The main effect of this is that blkfront will only persistently map
the same grants as the backend, and it will always try to use grants
that are already mapped by the backend. Also the number of persistent
grants in blkfront is the same as in blkback (and is controlled by the
value in blkback).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/block/xen-blkfront.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1735b0d..cbd9f6b 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -894,9 +894,27 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 		}
 	}
 	/* Add the persistent grant into the list of free grants */
-	for (i = 0; i < s->req.u.rw.nr_segments; i++) {
-		list_add(&s->grants_used[i]->node, &info->persistent_gnts);
-		info->persistent_gnts_c++;
+	for (i = 0; i < nseg; i++) {
+		if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
+			/*
+			 * If the grant is still mapped by the backend (the
+			 * backend has chosen to make this grant persistent)
+			 * we add it at the head of the list, so it will be
+			 * reused first.
+			 */
+			list_add(&s->grants_used[i]->node, &info->persistent_gnts);
+			info->persistent_gnts_c++;
+		} else {
+			/*
+			 * If the grant is not mapped by the backend we end the
+			 * foreign access and add it to the tail of the list,
+			 * so it will not be picked again unless we run out of
+			 * persistent grants.
+			 */
+			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
+			s->grants_used[i]->gref = GRANT_INVALID_REF;
+			list_add_tail(&s->grants_used[i]->node, &info->persistent_gnts);
+		}
 	}
 }
 
-- 
1.7.7.5 (Apple Git-26)


[-- Attachment #3: 0002-backport-of-bfe11d-to-stable-3.10.y-branch.patch --]
[-- Type: text/plain, Size: 9845 bytes --]

>From 0cb3ce2b6838aa9a28b81dacf497a4ae11610826 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Wed, 21 May 2014 16:57:58 +0200
Subject: [PATCH 2/2] backport of bfe11d to stable 3.10.y branch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

xen-blkfront: restore the non-persistent data path

When persistent grants were added they were always used, even if the
backend doesn't have this feature (there's no harm in always using the
same set of pages). This restores the old data path when the backend
doesn't have persistent grants, removing the burden of doing a memcpy
when it is not actually needed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fix up whitespace issues]
---
 drivers/block/xen-blkfront.c |  105 +++++++++++++++++++++++++++++------------
 1 files changed, 74 insertions(+), 31 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index cbd9f6b..ddd9a09 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -104,7 +104,7 @@ struct blkfront_info
 	struct work_struct work;
 	struct gnttab_free_callback callback;
 	struct blk_shadow shadow[BLK_RING_SIZE];
-	struct list_head persistent_gnts;
+	struct list_head grants;
 	unsigned int persistent_gnts_c;
 	unsigned long shadow_free;
 	unsigned int feature_flush;
@@ -175,15 +175,17 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 		if (!gnt_list_entry)
 			goto out_of_memory;
 
-		granted_page = alloc_page(GFP_NOIO);
-		if (!granted_page) {
-			kfree(gnt_list_entry);
-			goto out_of_memory;
+		if (info->feature_persistent) {
+			granted_page = alloc_page(GFP_NOIO);
+			if (!granted_page) {
+				kfree(gnt_list_entry);
+				goto out_of_memory;
+			}
+			gnt_list_entry->pfn = page_to_pfn(granted_page);
 		}
 
-		gnt_list_entry->pfn = page_to_pfn(granted_page);
 		gnt_list_entry->gref = GRANT_INVALID_REF;
-		list_add(&gnt_list_entry->node, &info->persistent_gnts);
+		list_add(&gnt_list_entry->node, &info->grants);
 		i++;
 	}
 
@@ -191,9 +193,10 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
-	                         &info->persistent_gnts, node) {
+	                         &info->grants, node) {
 		list_del(&gnt_list_entry->node);
-		__free_page(pfn_to_page(gnt_list_entry->pfn));
+		if (info->feature_persistent)
+			__free_page(pfn_to_page(gnt_list_entry->pfn));
 		kfree(gnt_list_entry);
 		i--;
 	}
@@ -202,14 +205,14 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
+                               unsigned long pfn,
                                struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
 	unsigned long buffer_mfn;
 
-	BUG_ON(list_empty(&info->persistent_gnts));
-	gnt_list_entry = list_first_entry(&info->persistent_gnts, struct grant,
-	                                  node);
+	BUG_ON(list_empty(&info->grants));
+	gnt_list_entry = list_first_entry(&info->grants, struct grant, node);
 	list_del(&gnt_list_entry->node);
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF) {
@@ -220,6 +223,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 	/* Assign a gref to this page */
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
+	if (!info->feature_persistent) {
+		BUG_ON(!pfn);
+		gnt_list_entry->pfn = pfn;
+	}
 	buffer_mfn = pfn_to_mfn(gnt_list_entry->pfn);
 	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
 	                                info->xbdev->otherend_id,
@@ -430,12 +437,12 @@ static int blkif_queue_request(struct request *req)
 			fsect = sg->offset >> 9;
 			lsect = fsect + (sg->length >> 9) - 1;
 
-			gnt_list_entry = get_grant(&gref_head, info);
+			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
 			ref = gnt_list_entry->gref;
 
 			info->shadow[id].grants_used[i] = gnt_list_entry;
 
-			if (rq_data_dir(req)) {
+			if (rq_data_dir(req) && info->feature_persistent) {
 				char *bvec_data;
 				void *shared_data;
 
@@ -828,16 +835,17 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		blk_stop_queue(info->rq);
 
 	/* Remove all persistent grants */
-	if (!list_empty(&info->persistent_gnts)) {
+	if (!list_empty(&info->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
-		                         &info->persistent_gnts, node) {
+		                         &info->grants, node) {
 			list_del(&persistent_gnt->node);
 			if (persistent_gnt->gref != GRANT_INVALID_REF) {
 				gnttab_end_foreign_access(persistent_gnt->gref,
 				                          0, 0UL);
 				info->persistent_gnts_c--;
 			}
-			__free_page(pfn_to_page(persistent_gnt->pfn));
+			if (info->feature_persistent)
+				__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 	}
@@ -874,7 +882,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 
 	nseg = s->req.u.rw.nr_segments;
 
-	if (bret->operation == BLKIF_OP_READ) {
+	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
 		/*
 		 * Copy the data received from the backend into the bvec.
 		 * Since bv_offset can be different than 0, and bv_len different
@@ -902,7 +910,10 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 * we add it at the head of the list, so it will be
 			 * reused first.
 			 */
-			list_add(&s->grants_used[i]->node, &info->persistent_gnts);
+			if (!info->feature_persistent)
+				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
+						     s->grants_used[i]->gref);
+			list_add(&s->grants_used[i]->node, &info->grants);
 			info->persistent_gnts_c++;
 		} else {
 			/*
@@ -913,7 +924,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
-			list_add_tail(&s->grants_used[i]->node, &info->persistent_gnts);
+			list_add_tail(&s->grants_used[i]->node, &info->grants);
 		}
 	}
 }
@@ -1052,12 +1063,6 @@ static int setup_blkring(struct xenbus_device *dev,
 	for (i = 0; i < BLK_RING_SIZE; i++)
 		sg_init_table(info->shadow[i].sg, BLKIF_MAX_SEGMENTS_PER_REQUEST);
 
-	/* Allocate memory for grants */
-	err = fill_grant_buffer(info, BLK_RING_SIZE *
-	                              BLKIF_MAX_SEGMENTS_PER_REQUEST);
-	if (err)
-		goto fail;
-
 	err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
 	if (err < 0) {
 		free_page((unsigned long)sring);
@@ -1216,7 +1221,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 	spin_lock_init(&info->io_lock);
 	info->xbdev = dev;
 	info->vdevice = vdevice;
-	INIT_LIST_HEAD(&info->persistent_gnts);
+	INIT_LIST_HEAD(&info->grants);
 	info->persistent_gnts_c = 0;
 	info->connected = BLKIF_STATE_DISCONNECTED;
 	INIT_WORK(&info->work, blkif_restart_queue);
@@ -1245,7 +1250,8 @@ static int blkif_recover(struct blkfront_info *info)
 	int i;
 	struct blkif_request *req;
 	struct blk_shadow *copy;
-	int j;
+	unsigned int persistent;
+	int j, rc;
 
 	/* Stage 1: Make a safe copy of the shadow state. */
 	copy = kmemdup(info->shadow, sizeof(info->shadow),
@@ -1260,6 +1266,24 @@ static int blkif_recover(struct blkfront_info *info)
 	info->shadow_free = info->ring.req_prod_pvt;
 	info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
 
+	/* Check if the backend supports persistent grants */
+	rc = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+			   "feature-persistent", "%u", &persistent,
+			   NULL);
+	if (rc)
+		info->feature_persistent = 0;
+	else
+		info->feature_persistent = persistent;
+
+	/* Allocate memory for grants */
+	rc = fill_grant_buffer(info, BLK_RING_SIZE *
+	                             BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	if (rc) {
+		xenbus_dev_fatal(info->xbdev, rc, "setting grant buffer failed");
+		kfree(copy);
+		return rc;
+	}
+
 	/* Stage 3: Find pending requests and requeue them. */
 	for (i = 0; i < BLK_RING_SIZE; i++) {
 		/* Not in use? */
@@ -1324,8 +1348,12 @@ static int blkfront_resume(struct xenbus_device *dev)
 	blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
 
 	err = talk_to_blkback(dev, info);
-	if (info->connected == BLKIF_STATE_SUSPENDED && !err)
-		err = blkif_recover(info);
+
+	/*
+	 * We have to wait for the backend to switch to
+	 * connected state, since we want to read which
+	 * features it supports.
+	 */
 
 	return err;
 }
@@ -1429,9 +1457,16 @@ static void blkfront_connect(struct blkfront_info *info)
 		       sectors);
 		set_capacity(info->gd, sectors);
 		revalidate_disk(info->gd);
+		return;
 
-		/* fall through */
 	case BLKIF_STATE_SUSPENDED:
+		/*
+		 * If we are recovering from suspension, we need to wait
+		 * for the backend to announce it's features before
+		 * reconnecting, we need to know if the backend supports
+		 * persistent grants.
+		 */
+		blkif_recover(info);
 		return;
 
 	default:
@@ -1499,6 +1534,14 @@ static void blkfront_connect(struct blkfront_info *info)
 	else
 		info->feature_persistent = persistent;
 
+	/* Allocate memory for grants */
+	err = fill_grant_buffer(info, BLK_RING_SIZE *
+	                              BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err, "setting grant buffer failed");
+		return;
+	}
+
 	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size);
 	if (err) {
 		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
-- 
1.7.7.5 (Apple Git-26)


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-20  3:19   ` Greg KH
  (?)
  (?)
@ 2014-05-22 12:54   ` Roger Pau Monné
  -1 siblings, 0 replies; 41+ messages in thread
From: Roger Pau Monné @ 2014-05-22 12:54 UTC (permalink / raw)
  To: Greg KH, Konrad Rzeszutek Wilk
  Cc: axboe, felipe.franciosi, linux-kernel, stable, jerry.snitselaar,
	xen-devel

[-- Attachment #1: Type: text/plain, Size: 2263 bytes --]

On 20/05/14 05:19, Greg KH wrote:
> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>> Hey Greg
>>
>> This email is in regards to backporting two patches to stable that
>> fall under the 'performance' rule:
>>
>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>
>> I've copied Jerry - the maintainer of the Oracle's kernel. I don't have
>> the emails of the other distros maintainers but the bugs associated with it are:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1096909
>> (RHEL7)
>> https://bugs.launchpad.net/ubuntu/+bug/1319003
>> (Ubuntu 13.10)
>>
>> The following distros are affected:
>>
>> (x) Ubuntu 13.04 and derivatives (3.8)
>> (v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
>> (x) Fedora 17 (3.8 and 3.9 in updates)
>> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
>> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest update to 3.13), supported until TBA
>> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to 3.13), supported until TBA
>> (v) RHEL 7 and derivatives (3.10), expected to be supported until about 2025
>> (v) openSUSE 13.1 (3.11), expected to be supported until at least 2016-08
>> (v) SLES 12 (3.12), expected to be supported until about 2024
>> (v) Mageia 3 (3.8), supported until 2014-11-19
>> (v) Mageia 4 (3.12), supported until 2015-08-01
>> (v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel Release 3 (3.8), supported until TBA
>>
>> Here is the analysis of the problem and what was put in the RHEL7 bug.
>> The Oracle bug does not exist (as I just backport them in the kernel and
>> send a GIT PULL to Jerry) - but if you would like I can certainly furnish
>> you with one (it would be identical to what is mentioned below).
>>
>> If you are OK with the backport, I am volunteering Roger and Felipe to assist
>> in jamming^H^H^H^Hbackporting the patches into earlier kernels.
> 
> Sure, can you provide backported patches?  As-is, they don't apply to
> the 3.10-stable kernel.

Here are the backported patches to 3.10 stable, I would like however to
get some testing/benchmarking on them before applying, since it's not a
trivial backport. Could you please give them a spin Felipe?

Roger.


[-- Attachment #2: 0001-backport-of-fbe363-to-stable-3.10.y-branch.patch --]
[-- Type: text/plain, Size: 2702 bytes --]

>From 8db2b1bff4b54dc01592f9e14b9dae32c341f435 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Wed, 21 May 2014 16:51:47 +0200
Subject: [PATCH 1/2] backport of fbe363 to stable 3.10.y branch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

xen-blkfront: revoke foreign access for grants not mapped by the backend

There's no need to keep the foreign access in a grant if it is not
persistently mapped by the backend. This allows us to free grants that
are not mapped by the backend, thus preventing blkfront from hoarding
all grants.

The main effect of this is that blkfront will only persistently map
the same grants as the backend, and it will always try to use grants
that are already mapped by the backend. Also the number of persistent
grants in blkfront is the same as in blkback (and is controlled by the
value in blkback).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/block/xen-blkfront.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1735b0d..cbd9f6b 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -894,9 +894,27 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 		}
 	}
 	/* Add the persistent grant into the list of free grants */
-	for (i = 0; i < s->req.u.rw.nr_segments; i++) {
-		list_add(&s->grants_used[i]->node, &info->persistent_gnts);
-		info->persistent_gnts_c++;
+	for (i = 0; i < nseg; i++) {
+		if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
+			/*
+			 * If the grant is still mapped by the backend (the
+			 * backend has chosen to make this grant persistent)
+			 * we add it at the head of the list, so it will be
+			 * reused first.
+			 */
+			list_add(&s->grants_used[i]->node, &info->persistent_gnts);
+			info->persistent_gnts_c++;
+		} else {
+			/*
+			 * If the grant is not mapped by the backend we end the
+			 * foreign access and add it to the tail of the list,
+			 * so it will not be picked again unless we run out of
+			 * persistent grants.
+			 */
+			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
+			s->grants_used[i]->gref = GRANT_INVALID_REF;
+			list_add_tail(&s->grants_used[i]->node, &info->persistent_gnts);
+		}
 	}
 }
 
-- 
1.7.7.5 (Apple Git-26)


[-- Attachment #3: 0002-backport-of-bfe11d-to-stable-3.10.y-branch.patch --]
[-- Type: text/plain, Size: 9845 bytes --]

>From 0cb3ce2b6838aa9a28b81dacf497a4ae11610826 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Wed, 21 May 2014 16:57:58 +0200
Subject: [PATCH 2/2] backport of bfe11d to stable 3.10.y branch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

xen-blkfront: restore the non-persistent data path

When persistent grants were added they were always used, even if the
backend doesn't have this feature (there's no harm in always using the
same set of pages). This restores the old data path when the backend
doesn't have persistent grants, removing the burden of doing a memcpy
when it is not actually needed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fix up whitespace issues]
---
 drivers/block/xen-blkfront.c |  105 +++++++++++++++++++++++++++++------------
 1 files changed, 74 insertions(+), 31 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index cbd9f6b..ddd9a09 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -104,7 +104,7 @@ struct blkfront_info
 	struct work_struct work;
 	struct gnttab_free_callback callback;
 	struct blk_shadow shadow[BLK_RING_SIZE];
-	struct list_head persistent_gnts;
+	struct list_head grants;
 	unsigned int persistent_gnts_c;
 	unsigned long shadow_free;
 	unsigned int feature_flush;
@@ -175,15 +175,17 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 		if (!gnt_list_entry)
 			goto out_of_memory;
 
-		granted_page = alloc_page(GFP_NOIO);
-		if (!granted_page) {
-			kfree(gnt_list_entry);
-			goto out_of_memory;
+		if (info->feature_persistent) {
+			granted_page = alloc_page(GFP_NOIO);
+			if (!granted_page) {
+				kfree(gnt_list_entry);
+				goto out_of_memory;
+			}
+			gnt_list_entry->pfn = page_to_pfn(granted_page);
 		}
 
-		gnt_list_entry->pfn = page_to_pfn(granted_page);
 		gnt_list_entry->gref = GRANT_INVALID_REF;
-		list_add(&gnt_list_entry->node, &info->persistent_gnts);
+		list_add(&gnt_list_entry->node, &info->grants);
 		i++;
 	}
 
@@ -191,9 +193,10 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
-	                         &info->persistent_gnts, node) {
+	                         &info->grants, node) {
 		list_del(&gnt_list_entry->node);
-		__free_page(pfn_to_page(gnt_list_entry->pfn));
+		if (info->feature_persistent)
+			__free_page(pfn_to_page(gnt_list_entry->pfn));
 		kfree(gnt_list_entry);
 		i--;
 	}
@@ -202,14 +205,14 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
+                               unsigned long pfn,
                                struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
 	unsigned long buffer_mfn;
 
-	BUG_ON(list_empty(&info->persistent_gnts));
-	gnt_list_entry = list_first_entry(&info->persistent_gnts, struct grant,
-	                                  node);
+	BUG_ON(list_empty(&info->grants));
+	gnt_list_entry = list_first_entry(&info->grants, struct grant, node);
 	list_del(&gnt_list_entry->node);
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF) {
@@ -220,6 +223,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 	/* Assign a gref to this page */
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
+	if (!info->feature_persistent) {
+		BUG_ON(!pfn);
+		gnt_list_entry->pfn = pfn;
+	}
 	buffer_mfn = pfn_to_mfn(gnt_list_entry->pfn);
 	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
 	                                info->xbdev->otherend_id,
@@ -430,12 +437,12 @@ static int blkif_queue_request(struct request *req)
 			fsect = sg->offset >> 9;
 			lsect = fsect + (sg->length >> 9) - 1;
 
-			gnt_list_entry = get_grant(&gref_head, info);
+			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
 			ref = gnt_list_entry->gref;
 
 			info->shadow[id].grants_used[i] = gnt_list_entry;
 
-			if (rq_data_dir(req)) {
+			if (rq_data_dir(req) && info->feature_persistent) {
 				char *bvec_data;
 				void *shared_data;
 
@@ -828,16 +835,17 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		blk_stop_queue(info->rq);
 
 	/* Remove all persistent grants */
-	if (!list_empty(&info->persistent_gnts)) {
+	if (!list_empty(&info->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
-		                         &info->persistent_gnts, node) {
+		                         &info->grants, node) {
 			list_del(&persistent_gnt->node);
 			if (persistent_gnt->gref != GRANT_INVALID_REF) {
 				gnttab_end_foreign_access(persistent_gnt->gref,
 				                          0, 0UL);
 				info->persistent_gnts_c--;
 			}
-			__free_page(pfn_to_page(persistent_gnt->pfn));
+			if (info->feature_persistent)
+				__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 	}
@@ -874,7 +882,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 
 	nseg = s->req.u.rw.nr_segments;
 
-	if (bret->operation == BLKIF_OP_READ) {
+	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
 		/*
 		 * Copy the data received from the backend into the bvec.
 		 * Since bv_offset can be different than 0, and bv_len different
@@ -902,7 +910,10 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 * we add it at the head of the list, so it will be
 			 * reused first.
 			 */
-			list_add(&s->grants_used[i]->node, &info->persistent_gnts);
+			if (!info->feature_persistent)
+				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
+						     s->grants_used[i]->gref);
+			list_add(&s->grants_used[i]->node, &info->grants);
 			info->persistent_gnts_c++;
 		} else {
 			/*
@@ -913,7 +924,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
-			list_add_tail(&s->grants_used[i]->node, &info->persistent_gnts);
+			list_add_tail(&s->grants_used[i]->node, &info->grants);
 		}
 	}
 }
@@ -1052,12 +1063,6 @@ static int setup_blkring(struct xenbus_device *dev,
 	for (i = 0; i < BLK_RING_SIZE; i++)
 		sg_init_table(info->shadow[i].sg, BLKIF_MAX_SEGMENTS_PER_REQUEST);
 
-	/* Allocate memory for grants */
-	err = fill_grant_buffer(info, BLK_RING_SIZE *
-	                              BLKIF_MAX_SEGMENTS_PER_REQUEST);
-	if (err)
-		goto fail;
-
 	err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
 	if (err < 0) {
 		free_page((unsigned long)sring);
@@ -1216,7 +1221,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 	spin_lock_init(&info->io_lock);
 	info->xbdev = dev;
 	info->vdevice = vdevice;
-	INIT_LIST_HEAD(&info->persistent_gnts);
+	INIT_LIST_HEAD(&info->grants);
 	info->persistent_gnts_c = 0;
 	info->connected = BLKIF_STATE_DISCONNECTED;
 	INIT_WORK(&info->work, blkif_restart_queue);
@@ -1245,7 +1250,8 @@ static int blkif_recover(struct blkfront_info *info)
 	int i;
 	struct blkif_request *req;
 	struct blk_shadow *copy;
-	int j;
+	unsigned int persistent;
+	int j, rc;
 
 	/* Stage 1: Make a safe copy of the shadow state. */
 	copy = kmemdup(info->shadow, sizeof(info->shadow),
@@ -1260,6 +1266,24 @@ static int blkif_recover(struct blkfront_info *info)
 	info->shadow_free = info->ring.req_prod_pvt;
 	info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff;
 
+	/* Check if the backend supports persistent grants */
+	rc = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+			   "feature-persistent", "%u", &persistent,
+			   NULL);
+	if (rc)
+		info->feature_persistent = 0;
+	else
+		info->feature_persistent = persistent;
+
+	/* Allocate memory for grants */
+	rc = fill_grant_buffer(info, BLK_RING_SIZE *
+	                             BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	if (rc) {
+		xenbus_dev_fatal(info->xbdev, rc, "setting grant buffer failed");
+		kfree(copy);
+		return rc;
+	}
+
 	/* Stage 3: Find pending requests and requeue them. */
 	for (i = 0; i < BLK_RING_SIZE; i++) {
 		/* Not in use? */
@@ -1324,8 +1348,12 @@ static int blkfront_resume(struct xenbus_device *dev)
 	blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
 
 	err = talk_to_blkback(dev, info);
-	if (info->connected == BLKIF_STATE_SUSPENDED && !err)
-		err = blkif_recover(info);
+
+	/*
+	 * We have to wait for the backend to switch to
+	 * connected state, since we want to read which
+	 * features it supports.
+	 */
 
 	return err;
 }
@@ -1429,9 +1457,16 @@ static void blkfront_connect(struct blkfront_info *info)
 		       sectors);
 		set_capacity(info->gd, sectors);
 		revalidate_disk(info->gd);
+		return;
 
-		/* fall through */
 	case BLKIF_STATE_SUSPENDED:
+		/*
+		 * If we are recovering from suspension, we need to wait
+		 * for the backend to announce it's features before
+		 * reconnecting, we need to know if the backend supports
+		 * persistent grants.
+		 */
+		blkif_recover(info);
 		return;
 
 	default:
@@ -1499,6 +1534,14 @@ static void blkfront_connect(struct blkfront_info *info)
 	else
 		info->feature_persistent = persistent;
 
+	/* Allocate memory for grants */
+	err = fill_grant_buffer(info, BLK_RING_SIZE *
+	                              BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err, "setting grant buffer failed");
+		return;
+	}
+
 	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size);
 	if (err) {
 		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
-- 
1.7.7.5 (Apple Git-26)


[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-22 12:54   ` Roger Pau Monné
@ 2014-06-02  7:13     ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-06-02  7:13 UTC (permalink / raw)
  To: Roger Pau Monne, Greg KH, Konrad Rzeszutek Wilk
  Cc: axboe, xen-devel, jerry.snitselaar, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 4048 bytes --]

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 22 May 2014 13:54
> To: Greg KH; Konrad Rzeszutek Wilk
> Cc: linux-kernel@vger.kernel.org; stable@vger.kernel.org; xen-
> devel@lists.xenproject.org; Felipe Franciosi; jerry.snitselaar@oracle.com;
> axboe@kernel.dk
> Subject: Re: Backport request to stable of two performance related fixes for
> xen-blkfront (3.13 fixes to earlier trees)
> 
> On 20/05/14 05:19, Greg KH wrote:
> > On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> >> Hey Greg
> >>
> >> This email is in regards to backporting two patches to stable that
> >> fall under the 'performance' rule:
> >>
> >>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> >>
> >> I've copied Jerry - the maintainer of the Oracle's kernel. I don't
> >> have the emails of the other distros maintainers but the bugs associated
> with it are:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1096909
> >> (RHEL7)
> >> https://bugs.launchpad.net/ubuntu/+bug/1319003
> >> (Ubuntu 13.10)
> >>
> >> The following distros are affected:
> >>
> >> (x) Ubuntu 13.04 and derivatives (3.8)
> >> (v) Ubuntu 13.10 and derivatives (3.11), supported until 2014-07
> >> (x) Fedora 17 (3.8 and 3.9 in updates)
> >> (x) Fedora 18 (3.8, 3.9, 3.10, 3.11 in updates)
> >> (v) Fedora 19 (3.9; 3.10, 3.11, 3.12 in updates; fixed with latest
> >> update to 3.13), supported until TBA
> >> (v) Fedora 20 (3.11; 3.12 in updates; fixed with latest update to
> >> 3.13), supported until TBA
> >> (v) RHEL 7 and derivatives (3.10), expected to be supported until
> >> about 2025
> >> (v) openSUSE 13.1 (3.11), expected to be supported until at least
> >> 2016-08
> >> (v) SLES 12 (3.12), expected to be supported until about 2024
> >> (v) Mageia 3 (3.8), supported until 2014-11-19
> >> (v) Mageia 4 (3.12), supported until 2015-08-01
> >> (v) Oracle Enterprise Linux with Unbreakable Enterprise Kernel
> >> Release 3 (3.8), supported until TBA
> >>
> >> Here is the analysis of the problem and what was put in the RHEL7 bug.
> >> The Oracle bug does not exist (as I just backport them in the kernel
> >> and send a GIT PULL to Jerry) - but if you would like I can certainly
> >> furnish you with one (it would be identical to what is mentioned below).
> >>
> >> If you are OK with the backport, I am volunteering Roger and Felipe
> >> to assist in jamming^H^H^H^Hbackporting the patches into earlier kernels.
> >
> > Sure, can you provide backported patches?  As-is, they don't apply to
> > the 3.10-stable kernel.
> 
> Here are the backported patches to 3.10 stable, I would like however to get
> some testing/benchmarking on them before applying, since it's not a trivial
> backport. Could you please give them a spin Felipe?

Apologies for the delay in remeasuring this.

I can confirm the backport drastically improves aggregate throughput (at least) when the backend does not support persistent grants. This is very visible with moderate-sized requests and more than one VM (on my graphs, above four guests). Naturally, the magnitude of the regression will inevitably vary depending on the host's characteristics.

The attached graphs were measured on XenServer Creedence #85466 which uses tapdisk3 (grant copy). They are the result of a single run, so some variation is expected.

I used Ubuntu 13.10 64 bit guests and installed 3.10.40. I used 5 SSDs (4 Microns and 1 Fusion-io). Each SSD had 10 LVM Logical Volumes, one assigned to each VM (therefore each VM had 5 extra virtual disks). The benchmark was sequential reads using the specified block size directly on the virtual block device with O_DIRECT for a duration of 10s each, reporting the total throughput. It was synchronised so that the specified number of guests executed the IO operations at the same time.

The only difference between the two tests are Roger's patches. Please see attached.

> 
> Roger.

Cheers,
Felipe

[-- Attachment #2: backport_eval-3.10.40.png --]
[-- Type: image/png, Size: 281535 bytes --]

[-- Attachment #3: backport_eval-3.10.40-roger.png --]
[-- Type: image/png, Size: 282454 bytes --]

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2014-06-04  5:48 ` Greg KH
@ 2014-06-04  5:48 ` Greg KH
  2014-06-06 10:47   ` Jiri Slaby
  2014-06-06 10:47   ` Jiri Slaby
  6 siblings, 2 replies; 41+ messages in thread
From: Greg KH @ 2014-06-04  5:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, stable, xen-devel, felipe.franciosi, roger.pau,
	jerry.snitselaar, axboe

On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> Hey Greg
> 
> This email is in regards to backporting two patches to stable that
> fall under the 'performance' rule:
> 
>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2014-05-20  9:32 ` Vitaly Kuznetsov
@ 2014-06-04  5:48 ` Greg KH
  2014-06-04  5:48 ` Greg KH
  6 siblings, 0 replies; 41+ messages in thread
From: Greg KH @ 2014-06-04  5:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: axboe, felipe.franciosi, linux-kernel, stable, jerry.snitselaar,
	xen-devel, roger.pau

On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> Hey Greg
> 
> This email is in regards to backporting two patches to stable that
> fall under the 'performance' rule:
> 
>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-04  5:48 ` Greg KH
@ 2014-06-06 10:47   ` Jiri Slaby
  2014-06-06 10:58     ` Vitaly Kuznetsov
                       ` (3 more replies)
  2014-06-06 10:47   ` Jiri Slaby
  1 sibling, 4 replies; 41+ messages in thread
From: Jiri Slaby @ 2014-06-06 10:47 UTC (permalink / raw)
  To: Greg KH, Konrad Rzeszutek Wilk
  Cc: linux-kernel, stable, xen-devel, felipe.franciosi, roger.pau,
	jerry.snitselaar, axboe, vkuznets

On 06/04/2014 07:48 AM, Greg KH wrote:
> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>> Hey Greg
>>
>> This email is in regards to backporting two patches to stable that
>> fall under the 'performance' rule:
>>
>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> 
> Now queued up, thanks.

AFAIU, they introduce a performance regression.

Vitaly?

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-04  5:48 ` Greg KH
  2014-06-06 10:47   ` Jiri Slaby
@ 2014-06-06 10:47   ` Jiri Slaby
  1 sibling, 0 replies; 41+ messages in thread
From: Jiri Slaby @ 2014-06-06 10:47 UTC (permalink / raw)
  To: Greg KH, Konrad Rzeszutek Wilk
  Cc: axboe, felipe.franciosi, linux-kernel, stable, jerry.snitselaar,
	xen-devel, vkuznets, roger.pau

On 06/04/2014 07:48 AM, Greg KH wrote:
> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>> Hey Greg
>>
>> This email is in regards to backporting two patches to stable that
>> fall under the 'performance' rule:
>>
>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> 
> Now queued up, thanks.

AFAIU, they introduce a performance regression.

Vitaly?

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 10:47   ` Jiri Slaby
@ 2014-06-06 10:58     ` Vitaly Kuznetsov
  2014-06-10 13:19       ` [Xen-devel] " Vitaly Kuznetsov
  2014-06-06 10:58     ` Vitaly Kuznetsov
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-06-06 10:58 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Greg KH, Konrad Rzeszutek Wilk, linux-kernel, stable, xen-devel,
	felipe.franciosi, roger.pau, jerry.snitselaar, axboe

Jiri Slaby <jslaby@suse.cz> writes:

> On 06/04/2014 07:48 AM, Greg KH wrote:
>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>> Hey Greg
>>>
>>> This email is in regards to backporting two patches to stable that
>>> fall under the 'performance' rule:
>>>
>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>> 
>> Now queued up, thanks.
>
> AFAIU, they introduce a performance regression.
>
> Vitaly?

I'm aware of a performance regression in a 'very special' case when
ramdisks or files on tmpfs are being used as storage, I post my results
a while ago:
https://lkml.org/lkml/2014/5/22/164
I'm not sure if that 'special' case requires investigation and/or should
prevent us from doing stable backport but it would be nice if someone
tries to reproduce it at least.

I'm going to make a bunch of tests with FusionIO drives and sequential
read to replicate same test Felipe did, I'll report as soon as I have
data (beginning of next week hopefuly).

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 10:47   ` Jiri Slaby
  2014-06-06 10:58     ` Vitaly Kuznetsov
@ 2014-06-06 10:58     ` Vitaly Kuznetsov
  2014-06-06 13:56     ` Greg KH
  2014-06-06 13:56     ` Greg KH
  3 siblings, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-06-06 10:58 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: axboe, felipe.franciosi, Greg KH, linux-kernel, stable,
	jerry.snitselaar, xen-devel, roger.pau

Jiri Slaby <jslaby@suse.cz> writes:

> On 06/04/2014 07:48 AM, Greg KH wrote:
>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>> Hey Greg
>>>
>>> This email is in regards to backporting two patches to stable that
>>> fall under the 'performance' rule:
>>>
>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>> 
>> Now queued up, thanks.
>
> AFAIU, they introduce a performance regression.
>
> Vitaly?

I'm aware of a performance regression in a 'very special' case when
ramdisks or files on tmpfs are being used as storage, I post my results
a while ago:
https://lkml.org/lkml/2014/5/22/164
I'm not sure if that 'special' case requires investigation and/or should
prevent us from doing stable backport but it would be nice if someone
tries to reproduce it at least.

I'm going to make a bunch of tests with FusionIO drives and sequential
read to replicate same test Felipe did, I'll report as soon as I have
data (beginning of next week hopefuly).

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 10:47   ` Jiri Slaby
                       ` (2 preceding siblings ...)
  2014-06-06 13:56     ` Greg KH
@ 2014-06-06 13:56     ` Greg KH
  2014-06-06 14:02       ` Konrad Rzeszutek Wilk
  2014-06-06 14:02       ` Konrad Rzeszutek Wilk
  3 siblings, 2 replies; 41+ messages in thread
From: Greg KH @ 2014-06-06 13:56 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Konrad Rzeszutek Wilk, linux-kernel, stable, xen-devel,
	felipe.franciosi, roger.pau, jerry.snitselaar, axboe, vkuznets

On Fri, Jun 06, 2014 at 12:47:07PM +0200, Jiri Slaby wrote:
> On 06/04/2014 07:48 AM, Greg KH wrote:
> > On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> >> Hey Greg
> >>
> >> This email is in regards to backporting two patches to stable that
> >> fall under the 'performance' rule:
> >>
> >>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> > 
> > Now queued up, thanks.
> 
> AFAIU, they introduce a performance regression.

That "regression" is also in mainline, right?  As Konrad doesn't seem to
think it matters, I'm deferring to the maintainer here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 10:47   ` Jiri Slaby
  2014-06-06 10:58     ` Vitaly Kuznetsov
  2014-06-06 10:58     ` Vitaly Kuznetsov
@ 2014-06-06 13:56     ` Greg KH
  2014-06-06 13:56     ` Greg KH
  3 siblings, 0 replies; 41+ messages in thread
From: Greg KH @ 2014-06-06 13:56 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: axboe, felipe.franciosi, linux-kernel, stable, jerry.snitselaar,
	xen-devel, vkuznets, roger.pau

On Fri, Jun 06, 2014 at 12:47:07PM +0200, Jiri Slaby wrote:
> On 06/04/2014 07:48 AM, Greg KH wrote:
> > On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> >> Hey Greg
> >>
> >> This email is in regards to backporting two patches to stable that
> >> fall under the 'performance' rule:
> >>
> >>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> > 
> > Now queued up, thanks.
> 
> AFAIU, they introduce a performance regression.

That "regression" is also in mainline, right?  As Konrad doesn't seem to
think it matters, I'm deferring to the maintainer here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 13:56     ` Greg KH
  2014-06-06 14:02       ` Konrad Rzeszutek Wilk
@ 2014-06-06 14:02       ` Konrad Rzeszutek Wilk
  2014-06-06 14:03         ` Jiri Slaby
  2014-06-06 14:03         ` Jiri Slaby
  1 sibling, 2 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-06 14:02 UTC (permalink / raw)
  To: Greg KH
  Cc: Jiri Slaby, linux-kernel, stable, xen-devel, felipe.franciosi,
	roger.pau, jerry.snitselaar, axboe, vkuznets

On Fri, Jun 06, 2014 at 06:56:57AM -0700, Greg KH wrote:
> On Fri, Jun 06, 2014 at 12:47:07PM +0200, Jiri Slaby wrote:
> > On 06/04/2014 07:48 AM, Greg KH wrote:
> > > On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> > >> Hey Greg
> > >>
> > >> This email is in regards to backporting two patches to stable that
> > >> fall under the 'performance' rule:
> > >>
> > >>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> > >>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> > > 
> > > Now queued up, thanks.
> > 
> > AFAIU, they introduce a performance regression.
> 
> That "regression" is also in mainline, right?  As Konrad doesn't seem to
> think it matters, I'm deferring to the maintainer here.

Hehe.

Greg is correct - the performance regression with tmpfs/ramfs does exist
upstream and once a fix has been established will be dealt with. Right now we
are fousing on the 99% usage models which is solid state, rotational,
and flash (just got one of those) and the two patches outlined above are
needed for the stable trees.

Thank you.

Hopefully I haven't confused the issue here.
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 13:56     ` Greg KH
@ 2014-06-06 14:02       ` Konrad Rzeszutek Wilk
  2014-06-06 14:02       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 41+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-06 14:02 UTC (permalink / raw)
  To: Greg KH
  Cc: axboe, felipe.franciosi, linux-kernel, stable, jerry.snitselaar,
	xen-devel, vkuznets, Jiri Slaby, roger.pau

On Fri, Jun 06, 2014 at 06:56:57AM -0700, Greg KH wrote:
> On Fri, Jun 06, 2014 at 12:47:07PM +0200, Jiri Slaby wrote:
> > On 06/04/2014 07:48 AM, Greg KH wrote:
> > > On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
> > >> Hey Greg
> > >>
> > >> This email is in regards to backporting two patches to stable that
> > >> fall under the 'performance' rule:
> > >>
> > >>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> > >>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> > > 
> > > Now queued up, thanks.
> > 
> > AFAIU, they introduce a performance regression.
> 
> That "regression" is also in mainline, right?  As Konrad doesn't seem to
> think it matters, I'm deferring to the maintainer here.

Hehe.

Greg is correct - the performance regression with tmpfs/ramfs does exist
upstream and once a fix has been established will be dealt with. Right now we
are fousing on the 99% usage models which is solid state, rotational,
and flash (just got one of those) and the two patches outlined above are
needed for the stable trees.

Thank you.

Hopefully I haven't confused the issue here.
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 14:02       ` Konrad Rzeszutek Wilk
  2014-06-06 14:03         ` Jiri Slaby
@ 2014-06-06 14:03         ` Jiri Slaby
  1 sibling, 0 replies; 41+ messages in thread
From: Jiri Slaby @ 2014-06-06 14:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Greg KH
  Cc: linux-kernel, stable, xen-devel, felipe.franciosi, roger.pau,
	jerry.snitselaar, axboe, vkuznets

On 06/06/2014 04:02 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 06, 2014 at 06:56:57AM -0700, Greg KH wrote:
>> On Fri, Jun 06, 2014 at 12:47:07PM +0200, Jiri Slaby wrote:
>>> On 06/04/2014 07:48 AM, Greg KH wrote:
>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>> Hey Greg
>>>>>
>>>>> This email is in regards to backporting two patches to stable that
>>>>> fall under the 'performance' rule:
>>>>>
>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>>>
>>>> Now queued up, thanks.
>>>
>>> AFAIU, they introduce a performance regression.
>>
>> That "regression" is also in mainline, right?  As Konrad doesn't seem to
>> think it matters, I'm deferring to the maintainer here.
> 
> Hehe.
> 
> Greg is correct - the performance regression with tmpfs/ramfs does exist
> upstream and once a fix has been established will be dealt with. Right now we
> are fousing on the 99% usage models which is solid state, rotational,
> and flash (just got one of those) and the two patches outlined above are
> needed for the stable trees.

Ok, I wanted to be sure before I take these to 3.12.

Thanks.

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 14:02       ` Konrad Rzeszutek Wilk
@ 2014-06-06 14:03         ` Jiri Slaby
  2014-06-06 14:03         ` Jiri Slaby
  1 sibling, 0 replies; 41+ messages in thread
From: Jiri Slaby @ 2014-06-06 14:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Greg KH
  Cc: axboe, felipe.franciosi, linux-kernel, stable, jerry.snitselaar,
	xen-devel, vkuznets, roger.pau

On 06/06/2014 04:02 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 06, 2014 at 06:56:57AM -0700, Greg KH wrote:
>> On Fri, Jun 06, 2014 at 12:47:07PM +0200, Jiri Slaby wrote:
>>> On 06/04/2014 07:48 AM, Greg KH wrote:
>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>> Hey Greg
>>>>>
>>>>> This email is in regards to backporting two patches to stable that
>>>>> fall under the 'performance' rule:
>>>>>
>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>>>
>>>> Now queued up, thanks.
>>>
>>> AFAIU, they introduce a performance regression.
>>
>> That "regression" is also in mainline, right?  As Konrad doesn't seem to
>> think it matters, I'm deferring to the maintainer here.
> 
> Hehe.
> 
> Greg is correct - the performance regression with tmpfs/ramfs does exist
> upstream and once a fix has been established will be dealt with. Right now we
> are fousing on the 99% usage models which is solid state, rotational,
> and flash (just got one of those) and the two patches outlined above are
> needed for the stable trees.

Ok, I wanted to be sure before I take these to 3.12.

Thanks.

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-06 10:58     ` Vitaly Kuznetsov
@ 2014-06-10 13:19       ` Vitaly Kuznetsov
  2014-06-10 16:55         ` Roger Pau Monné
  2014-06-10 17:26         ` Felipe Franciosi
  0 siblings, 2 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-06-10 13:19 UTC (permalink / raw)
  To: xen-devel
  Cc: axboe, felipe.franciosi, Greg KH, linux-kernel, stable,
	jerry.snitselaar, roger.pau, Jiri Slaby, Ronen Hod, Andrew Jones

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Jiri Slaby <jslaby@suse.cz> writes:
>
>> On 06/04/2014 07:48 AM, Greg KH wrote:
>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>>> Hey Greg
>>>>
>>>> This email is in regards to backporting two patches to stable that
>>>> fall under the 'performance' rule:
>>>>
>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>> 
>>> Now queued up, thanks.
>>
>> AFAIU, they introduce a performance regression.
>>
>> Vitaly?
>
> I'm aware of a performance regression in a 'very special' case when
> ramdisks or files on tmpfs are being used as storage, I post my results
> a while ago:
> https://lkml.org/lkml/2014/5/22/164
> I'm not sure if that 'special' case requires investigation and/or should
> prevent us from doing stable backport but it would be nice if someone
> tries to reproduce it at least.
>
> I'm going to make a bunch of tests with FusionIO drives and sequential
> read to replicate same test Felipe did, I'll report as soon as I have
> data (beginning of next week hopefuly).

Turns out the regression I'm observing with these patches is not
restricted to tmpfs/ramdisk usage.

I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter) on HP
ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is disabled, Dom0 is
pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1 vCPU each,
they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I tried differed
pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7 to balance NUMA, that
doesn't make any difference to the results). I was testing on top of
Xen-4.3.2.

I was testing two storage configurations:
1) Plain 10G partitions from one Fusion drive (/dev/fioa) are attached
to guests
2) LVM group is created on top of both drives (/dev/fioa, /dev/fiob),
10G logical volumes are created with striping (lvcreate -i2 ...)

Test is done by simultaneous fio run in guests (rw=read, direct=1) for
10 second. Each test was performed 3 times and the average was taken. 
Kernels I compare are:
1) v3.15-rc5-157-g60b5f90 unmodified
2) v3.15-rc5-157-g60b5f90 with 427bfe07e6744c058ce6fc4aa187cda96b635539,
   bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
   fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.

First test was done with Dom0 with persistent grant support (Fedora's
3.14.4-200.fc20.x86_64):
1) Partitions:
http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions.png
(same markers mean same bs, we get 860 MB/s here, patches make no
difference, result matches expectation)

2) LVM Stripe:
http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
(1715 MB/s, patches make no difference, result matches expectation)

Second test was performed with Dom0 without persistent grants support
(Fedora's 3.7.9-205.fc18.x86_64)
1) Partitions:
http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.png
(860 MB/sec again, patches worsen a bit overall throughput with 1-3
clients)

2) LVM Stripe:
http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.png
(Here we see the same regression I observed with ramdisks and tmpfs
files, unmodified kernel: 1550MB/s, with patches reverted: 1715MB/s).

The only major difference with Felipe's test is that he was using
blktap3 with XenServer and I'm using standard blktap2.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-10 13:19       ` [Xen-devel] " Vitaly Kuznetsov
@ 2014-06-10 16:55         ` Roger Pau Monné
  2014-06-12 12:00           ` Vitaly Kuznetsov
  2014-06-10 17:26         ` Felipe Franciosi
  1 sibling, 1 reply; 41+ messages in thread
From: Roger Pau Monné @ 2014-06-10 16:55 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: axboe, felipe.franciosi, Greg KH, linux-kernel, stable,
	jerry.snitselaar, Jiri Slaby, Ronen Hod, Andrew Jones

On 10/06/14 15:19, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
>> Jiri Slaby <jslaby@suse.cz> writes:
>>
>>> On 06/04/2014 07:48 AM, Greg KH wrote:
>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>> Hey Greg
>>>>>
>>>>> This email is in regards to backporting two patches to stable that
>>>>> fall under the 'performance' rule:
>>>>>
>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>>>
>>>> Now queued up, thanks.
>>>
>>> AFAIU, they introduce a performance regression.
>>>
>>> Vitaly?
>>
>> I'm aware of a performance regression in a 'very special' case when
>> ramdisks or files on tmpfs are being used as storage, I post my results
>> a while ago:
>> https://lkml.org/lkml/2014/5/22/164
>> I'm not sure if that 'special' case requires investigation and/or should
>> prevent us from doing stable backport but it would be nice if someone
>> tries to reproduce it at least.
>>
>> I'm going to make a bunch of tests with FusionIO drives and sequential
>> read to replicate same test Felipe did, I'll report as soon as I have
>> data (beginning of next week hopefuly).
> 
> Turns out the regression I'm observing with these patches is not
> restricted to tmpfs/ramdisk usage.
> 
> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter) on HP
> ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is disabled, Dom0 is
> pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1 vCPU each,
> they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I tried differed
> pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7 to balance NUMA, that
> doesn't make any difference to the results). I was testing on top of
> Xen-4.3.2.
> 
> I was testing two storage configurations:
> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are attached
> to guests
> 2) LVM group is created on top of both drives (/dev/fioa, /dev/fiob),
> 10G logical volumes are created with striping (lvcreate -i2 ...)
> 
> Test is done by simultaneous fio run in guests (rw=read, direct=1) for
> 10 second. Each test was performed 3 times and the average was taken. 
> Kernels I compare are:
> 1) v3.15-rc5-157-g60b5f90 unmodified
> 2) v3.15-rc5-157-g60b5f90 with 427bfe07e6744c058ce6fc4aa187cda96b635539,
>    bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
>    fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
> 
> First test was done with Dom0 with persistent grant support (Fedora's
> 3.14.4-200.fc20.x86_64):
> 1) Partitions:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions.png
> (same markers mean same bs, we get 860 MB/s here, patches make no
> difference, result matches expectation)
> 
> 2) LVM Stripe:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
> (1715 MB/s, patches make no difference, result matches expectation)
> 
> Second test was performed with Dom0 without persistent grants support
> (Fedora's 3.7.9-205.fc18.x86_64)
> 1) Partitions:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.png
> (860 MB/sec again, patches worsen a bit overall throughput with 1-3
> clients)
> 
> 2) LVM Stripe:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.png
> (Here we see the same regression I observed with ramdisks and tmpfs
> files, unmodified kernel: 1550MB/s, with patches reverted: 1715MB/s).
> 
> The only major difference with Felipe's test is that he was using
> blktap3 with XenServer and I'm using standard blktap2.

Hello,

I don't think you are using blktap2, I guess you are using blkback.
Also, running the test only for 10s and 3 repetitions seems too low, I
would probably try to run the tests for a longer time and do more
repetitions, and include the standard deviation also.

Could you try to revert the patches independently to see if it's a
specific commit that introduces the regression?

Thanks, Roger.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-10 13:19       ` [Xen-devel] " Vitaly Kuznetsov
  2014-06-10 16:55         ` Roger Pau Monné
@ 2014-06-10 17:26         ` Felipe Franciosi
  1 sibling, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-06-10 17:26 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, axboe, Greg KH, linux-kernel, stable,
	jerry.snitselaar, Roger Pau Monne, Jiri Slaby, Ronen Hod,
	Andrew Jones



> On 10 Jun 2014, at 14:20, "Vitaly Kuznetsov" <vkuznets@redhat.com> wrote:
> 
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
>> Jiri Slaby <jslaby@suse.cz> writes:
>> 
>>>> On 06/04/2014 07:48 AM, Greg KH wrote:
>>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>> Hey Greg
>>>>> 
>>>>> This email is in regards to backporting two patches to stable that
>>>>> fall under the 'performance' rule:
>>>>> 
>>>>> bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>>> fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>>> 
>>>> Now queued up, thanks.
>>> 
>>> AFAIU, they introduce a performance regression.
>>> 
>>> Vitaly?
>> 
>> I'm aware of a performance regression in a 'very special' case when
>> ramdisks or files on tmpfs are being used as storage, I post my results
>> a while ago:
>> https://lkml.org/lkml/2014/5/22/164
>> I'm not sure if that 'special' case requires investigation and/or should
>> prevent us from doing stable backport but it would be nice if someone
>> tries to reproduce it at least.
>> 
>> I'm going to make a bunch of tests with FusionIO drives and sequential
>> read to replicate same test Felipe did, I'll report as soon as I have
>> data (beginning of next week hopefuly).
> 
> Turns out the regression I'm observing with these patches is not
> restricted to tmpfs/ramdisk usage.
> 
> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter) on HP
> ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is disabled, Dom0 is
> pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1 vCPU each,
> they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I tried differed
> pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7 to balance NUMA, that
> doesn't make any difference to the results). I was testing on top of
> Xen-4.3.2.
> 
> I was testing two storage configurations:
> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are attached
> to guests
> 2) LVM group is created on top of both drives (/dev/fioa, /dev/fiob),
> 10G logical volumes are created with striping (lvcreate -i2 ...)
> 
> Test is done by simultaneous fio run in guests (rw=read, direct=1) for
> 10 second. Each test was performed 3 times and the average was taken. 
> Kernels I compare are:
> 1) v3.15-rc5-157-g60b5f90 unmodified
> 2) v3.15-rc5-157-g60b5f90 with 427bfe07e6744c058ce6fc4aa187cda96b635539,
>   bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
>   fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
> 
> First test was done with Dom0 with persistent grant support (Fedora's
> 3.14.4-200.fc20.x86_64):
> 1) Partitions:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions.png
> (same markers mean same bs, we get 860 MB/s here, patches make no
> difference, result matches expectation)
> 
> 2) LVM Stripe:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
> (1715 MB/s, patches make no difference, result matches expectation)
> 
> Second test was performed with Dom0 without persistent grants support
> (Fedora's 3.7.9-205.fc18.x86_64)
> 1) Partitions:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.png
> (860 MB/sec again, patches worsen a bit overall throughput with 1-3
> clients)
> 
> 2) LVM Stripe:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.png
> (Here we see the same regression I observed with ramdisks and tmpfs
> files, unmodified kernel: 1550MB/s, with patches reverted: 1715MB/s).
> 
> The only major difference with Felipe's test is that he was using
> blktap3 with XenServer and I'm using standard blktap2.

Another major difference is that I took older kernels plus the patches instead of taking a 3.15 and reverting the patches.

I'll have a look at your data later in the week. A bit flooded at the moment.

F.


> 
> -- 
>  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-10 16:55         ` Roger Pau Monné
@ 2014-06-12 12:00           ` Vitaly Kuznetsov
  2014-06-12 12:32               ` Felipe Franciosi
  0 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-06-12 12:00 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, axboe, felipe.franciosi, Greg KH, linux-kernel,
	stable, jerry.snitselaar, Jiri Slaby, Ronen Hod, Andrew Jones

Roger Pau Monné <roger.pau@citrix.com> writes:

> On 10/06/14 15:19, Vitaly Kuznetsov wrote:
>> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> 
>>> Jiri Slaby <jslaby@suse.cz> writes:
>>>
>>>> On 06/04/2014 07:48 AM, Greg KH wrote:
>>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>>> Hey Greg
>>>>>>
>>>>>> This email is in regards to backporting two patches to stable that
>>>>>> fall under the 'performance' rule:
>>>>>>
>>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>>>>>
>>>>> Now queued up, thanks.
>>>>
>>>> AFAIU, they introduce a performance regression.
>>>>
>>>> Vitaly?
>>>
>>> I'm aware of a performance regression in a 'very special' case when
>>> ramdisks or files on tmpfs are being used as storage, I post my results
>>> a while ago:
>>> https://lkml.org/lkml/2014/5/22/164
>>> I'm not sure if that 'special' case requires investigation and/or should
>>> prevent us from doing stable backport but it would be nice if someone
>>> tries to reproduce it at least.
>>>
>>> I'm going to make a bunch of tests with FusionIO drives and sequential
>>> read to replicate same test Felipe did, I'll report as soon as I have
>>> data (beginning of next week hopefuly).
>> 
>> Turns out the regression I'm observing with these patches is not
>> restricted to tmpfs/ramdisk usage.
>> 
>> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter) on HP
>> ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is disabled, Dom0 is
>> pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1 vCPU each,
>> they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I tried differed
>> pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7 to balance NUMA, that
>> doesn't make any difference to the results). I was testing on top of
>> Xen-4.3.2.
>> 
>> I was testing two storage configurations:
>> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are attached
>> to guests
>> 2) LVM group is created on top of both drives (/dev/fioa, /dev/fiob),
>> 10G logical volumes are created with striping (lvcreate -i2 ...)
>> 
>> Test is done by simultaneous fio run in guests (rw=read, direct=1) for
>> 10 second. Each test was performed 3 times and the average was taken. 
>> Kernels I compare are:
>> 1) v3.15-rc5-157-g60b5f90 unmodified
>> 2) v3.15-rc5-157-g60b5f90 with 427bfe07e6744c058ce6fc4aa187cda96b635539,
>>    bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
>>    fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
>> 
>> First test was done with Dom0 with persistent grant support (Fedora's
>> 3.14.4-200.fc20.x86_64):
>> 1) Partitions:
>> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions.png
>> (same markers mean same bs, we get 860 MB/s here, patches make no
>> difference, result matches expectation)
>> 
>> 2) LVM Stripe:
>> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
>> (1715 MB/s, patches make no difference, result matches expectation)
>> 
>> Second test was performed with Dom0 without persistent grants support
>> (Fedora's 3.7.9-205.fc18.x86_64)
>> 1) Partitions:
>> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.png
>> (860 MB/sec again, patches worsen a bit overall throughput with 1-3
>> clients)
>> 
>> 2) LVM Stripe:
>> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.png
>> (Here we see the same regression I observed with ramdisks and tmpfs
>> files, unmodified kernel: 1550MB/s, with patches reverted: 1715MB/s).
>> 
>> The only major difference with Felipe's test is that he was using
>> blktap3 with XenServer and I'm using standard blktap2.
>
> Hello,
>
> I don't think you are using blktap2, I guess you are using blkback.

Right, sorry for the confusion.

> Also, running the test only for 10s and 3 repetitions seems too low, I
> would probably try to run the tests for a longer time and do more
> repetitions, and include the standard deviation also.
>
> Could you try to revert the patches independently to see if it's a
> specific commit that introduces the regression?

I did additional test runs. Now I'm comparing 3 kernels:
1) Unmodified v3.15-rc5-157-g60b5f90 - green color on chart

2) v3.15-rc5-157-g60b5f90 with bfe11d6de1c416cea4f3f0f35f864162063ce3fa
and 427bfe07e6744c058ce6fc4aa187cda96b635539 reverted (so only
fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke foreign
access for grants not mapped by the backend" left) - blue color on chart

3) v3.15-rc5-157-g60b5f90 with all
(bfe11d6de1c416cea4f3f0f35f864162063ce3fa,
427bfe07e6744c058ce6fc4aa187cda96b635539,
fbe363c476afe8ec992d3baf682670a4bd1b6ce6) patches reverted - red color
on chart.

I test on top of striped LVM on 2 FusionIO drives, I do 3 repetitions for
30 seconds each.

The result is here:
http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_20140612.png

It is consistent with what I've measured with ramdrives and tmpfs files:

1) fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
foreign access for grants not mapped by the backend" brings us the
regression. Bigger block size is - bigger the difference but the
regression is observed with all block sizes > 8k.

2) bfe11d6de1c416cea4f3f0f35f864162063ce3fa "xen-blkfront: restore the
non-persistent data path" brings us performance improvement but with
conjunction with fbe363c476afe8ec992d3baf682670a4bd1b6ce6 it is still
worse than the kernel without both patches.

My Dom0 is Fedora's 3.7.9-205.fc18.x86_64. I can test on newer blkback,
however I'm not aware of any way to disable persistent grants there
(there is no regression when they're used). 

>
> Thanks, Roger.

Thanks,

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-12 12:00           ` Vitaly Kuznetsov
@ 2014-06-12 12:32               ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-06-12 12:32 UTC (permalink / raw)
  To: 'Vitaly Kuznetsov', Roger Pau Monne
  Cc: xen-devel, axboe, Greg KH, linux-kernel, stable,
	jerry.snitselaar, Jiri Slaby, Ronen Hod, Andrew Jones

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7110 bytes --]

Hi Vitaly,

Are you able to test a 3.10 guest with and without the backport that Roger sent? This patch is attached to an e-mail Roger sent on "22 May 2014 13:54".

Because your results are contradicting with what these patches are meant to do, I would like to make sure that this isn't related to something else that happened after 3.10.

You could also test Ubuntu Sancy guests with and without the patched kernels provided by Joseph Salisbury on launchpad: https://bugs.launchpad.net/bugs/1319003

Thanks,
Felipe

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: 12 June 2014 13:01
> To: Roger Pau Monne
> Cc: xen-devel@lists.xenproject.org; axboe@kernel.dk; Felipe Franciosi; Greg
> KH; linux-kernel@vger.kernel.org; stable@vger.kernel.org;
> jerry.snitselaar@oracle.com; Jiri Slaby; Ronen Hod; Andrew Jones
> Subject: Re: [Xen-devel] Backport request to stable of two performance
> related fixes for xen-blkfront (3.13 fixes to earlier trees)
> 
> Roger Pau Monné <roger.pau@citrix.com> writes:
> 
> > On 10/06/14 15:19, Vitaly Kuznetsov wrote:
> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >>
> >>> Jiri Slaby <jslaby@suse.cz> writes:
> >>>
> >>>> On 06/04/2014 07:48 AM, Greg KH wrote:
> >>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk
> wrote:
> >>>>>> Hey Greg
> >>>>>>
> >>>>>> This email is in regards to backporting two patches to stable
> >>>>>> that fall under the 'performance' rule:
> >>>>>>
> >>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> >>>>>
> >>>>> Now queued up, thanks.
> >>>>
> >>>> AFAIU, they introduce a performance regression.
> >>>>
> >>>> Vitaly?
> >>>
> >>> I'm aware of a performance regression in a 'very special' case when
> >>> ramdisks or files on tmpfs are being used as storage, I post my
> >>> results a while ago:
> >>> https://lkml.org/lkml/2014/5/22/164
> >>> I'm not sure if that 'special' case requires investigation and/or
> >>> should prevent us from doing stable backport but it would be nice if
> >>> someone tries to reproduce it at least.
> >>>
> >>> I'm going to make a bunch of tests with FusionIO drives and
> >>> sequential read to replicate same test Felipe did, I'll report as
> >>> soon as I have data (beginning of next week hopefuly).
> >>
> >> Turns out the regression I'm observing with these patches is not
> >> restricted to tmpfs/ramdisk usage.
> >>
> >> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter) on
> >> HP ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is disabled,
> >> Dom0 is pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1
> >> vCPU each, they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I tried
> >> differed pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7 to
> >> balance NUMA, that doesn't make any difference to the results). I was
> >> testing on top of Xen-4.3.2.
> >>
> >> I was testing two storage configurations:
> >> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are
> >> attached to guests
> >> 2) LVM group is created on top of both drives (/dev/fioa, /dev/fiob),
> >> 10G logical volumes are created with striping (lvcreate -i2 ...)
> >>
> >> Test is done by simultaneous fio run in guests (rw=read, direct=1)
> >> for
> >> 10 second. Each test was performed 3 times and the average was taken.
> >> Kernels I compare are:
> >> 1) v3.15-rc5-157-g60b5f90 unmodified
> >> 2) v3.15-rc5-157-g60b5f90 with
> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> >>    bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
> >>    fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
> >>
> >> First test was done with Dom0 with persistent grant support (Fedora's
> >> 3.14.4-200.fc20.x86_64):
> >> 1) Partitions:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions.pn
> >> g (same markers mean same bs, we get 860 MB/s here, patches make no
> >> difference, result matches expectation)
> >>
> >> 2) LVM Stripe:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
> >> (1715 MB/s, patches make no difference, result matches expectation)
> >>
> >> Second test was performed with Dom0 without persistent grants support
> >> (Fedora's 3.7.9-205.fc18.x86_64)
> >> 1) Partitions:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.
> >> png
> >> (860 MB/sec again, patches worsen a bit overall throughput with 1-3
> >> clients)
> >>
> >> 2) LVM Stripe:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.png
> >> (Here we see the same regression I observed with ramdisks and tmpfs
> >> files, unmodified kernel: 1550MB/s, with patches reverted: 1715MB/s).
> >>
> >> The only major difference with Felipe's test is that he was using
> >> blktap3 with XenServer and I'm using standard blktap2.
> >
> > Hello,
> >
> > I don't think you are using blktap2, I guess you are using blkback.
> 
> Right, sorry for the confusion.
> 
> > Also, running the test only for 10s and 3 repetitions seems too low, I
> > would probably try to run the tests for a longer time and do more
> > repetitions, and include the standard deviation also.
> >
> > Could you try to revert the patches independently to see if it's a
> > specific commit that introduces the regression?
> 
> I did additional test runs. Now I'm comparing 3 kernels:
> 1) Unmodified v3.15-rc5-157-g60b5f90 - green color on chart
> 
> 2) v3.15-rc5-157-g60b5f90 with bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> and 427bfe07e6744c058ce6fc4aa187cda96b635539 reverted (so only
> fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke foreign
> access for grants not mapped by the backend" left) - blue color on chart
> 
> 3) v3.15-rc5-157-g60b5f90 with all
> (bfe11d6de1c416cea4f3f0f35f864162063ce3fa,
> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> fbe363c476afe8ec992d3baf682670a4bd1b6ce6) patches reverted - red color
> on chart.
> 
> I test on top of striped LVM on 2 FusionIO drives, I do 3 repetitions for
> 30 seconds each.
> 
> The result is here:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_20140612.pn
> g
> 
> It is consistent with what I've measured with ramdrives and tmpfs files:
> 
> 1) fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
> foreign access for grants not mapped by the backend" brings us the
> regression. Bigger block size is - bigger the difference but the regression is
> observed with all block sizes > 8k.
> 
> 2) bfe11d6de1c416cea4f3f0f35f864162063ce3fa "xen-blkfront: restore the
> non-persistent data path" brings us performance improvement but with
> conjunction with fbe363c476afe8ec992d3baf682670a4bd1b6ce6 it is still
> worse than the kernel without both patches.
> 
> My Dom0 is Fedora's 3.7.9-205.fc18.x86_64. I can test on newer blkback,
> however I'm not aware of any way to disable persistent grants there (there is
> no regression when they're used).
> 
> >
> > Thanks, Roger.
> 
> Thanks,
> 
> --
>   Vitaly
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
@ 2014-06-12 12:32               ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-06-12 12:32 UTC (permalink / raw)
  To: 'Vitaly Kuznetsov', Roger Pau Monne
  Cc: xen-devel, axboe, Greg KH, linux-kernel, stable,
	jerry.snitselaar, Jiri Slaby, Ronen Hod, Andrew Jones

Hi Vitaly,

Are you able to test a 3.10 guest with and without the backport that Roger sent? This patch is attached to an e-mail Roger sent on "22 May 2014 13:54".

Because your results are contradicting with what these patches are meant to do, I would like to make sure that this isn't related to something else that happened after 3.10.

You could also test Ubuntu Sancy guests with and without the patched kernels provided by Joseph Salisbury on launchpad: https://bugs.launchpad.net/bugs/1319003

Thanks,
Felipe

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: 12 June 2014 13:01
> To: Roger Pau Monne
> Cc: xen-devel@lists.xenproject.org; axboe@kernel.dk; Felipe Franciosi; Greg
> KH; linux-kernel@vger.kernel.org; stable@vger.kernel.org;
> jerry.snitselaar@oracle.com; Jiri Slaby; Ronen Hod; Andrew Jones
> Subject: Re: [Xen-devel] Backport request to stable of two performance
> related fixes for xen-blkfront (3.13 fixes to earlier trees)
> 
> Roger Pau Monné <roger.pau@citrix.com> writes:
> 
> > On 10/06/14 15:19, Vitaly Kuznetsov wrote:
> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >>
> >>> Jiri Slaby <jslaby@suse.cz> writes:
> >>>
> >>>> On 06/04/2014 07:48 AM, Greg KH wrote:
> >>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk
> wrote:
> >>>>>> Hey Greg
> >>>>>>
> >>>>>> This email is in regards to backporting two patches to stable
> >>>>>> that fall under the 'performance' rule:
> >>>>>>
> >>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> >>>>>
> >>>>> Now queued up, thanks.
> >>>>
> >>>> AFAIU, they introduce a performance regression.
> >>>>
> >>>> Vitaly?
> >>>
> >>> I'm aware of a performance regression in a 'very special' case when
> >>> ramdisks or files on tmpfs are being used as storage, I post my
> >>> results a while ago:
> >>> https://lkml.org/lkml/2014/5/22/164
> >>> I'm not sure if that 'special' case requires investigation and/or
> >>> should prevent us from doing stable backport but it would be nice if
> >>> someone tries to reproduce it at least.
> >>>
> >>> I'm going to make a bunch of tests with FusionIO drives and
> >>> sequential read to replicate same test Felipe did, I'll report as
> >>> soon as I have data (beginning of next week hopefuly).
> >>
> >> Turns out the regression I'm observing with these patches is not
> >> restricted to tmpfs/ramdisk usage.
> >>
> >> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter) on
> >> HP ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is disabled,
> >> Dom0 is pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1
> >> vCPU each, they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I tried
> >> differed pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7 to
> >> balance NUMA, that doesn't make any difference to the results). I was
> >> testing on top of Xen-4.3.2.
> >>
> >> I was testing two storage configurations:
> >> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are
> >> attached to guests
> >> 2) LVM group is created on top of both drives (/dev/fioa, /dev/fiob),
> >> 10G logical volumes are created with striping (lvcreate -i2 ...)
> >>
> >> Test is done by simultaneous fio run in guests (rw=read, direct=1)
> >> for
> >> 10 second. Each test was performed 3 times and the average was taken.
> >> Kernels I compare are:
> >> 1) v3.15-rc5-157-g60b5f90 unmodified
> >> 2) v3.15-rc5-157-g60b5f90 with
> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> >>    bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
> >>    fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
> >>
> >> First test was done with Dom0 with persistent grant support (Fedora's
> >> 3.14.4-200.fc20.x86_64):
> >> 1) Partitions:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions.pn
> >> g (same markers mean same bs, we get 860 MB/s here, patches make no
> >> difference, result matches expectation)
> >>
> >> 2) LVM Stripe:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
> >> (1715 MB/s, patches make no difference, result matches expectation)
> >>
> >> Second test was performed with Dom0 without persistent grants support
> >> (Fedora's 3.7.9-205.fc18.x86_64)
> >> 1) Partitions:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.
> >> png
> >> (860 MB/sec again, patches worsen a bit overall throughput with 1-3
> >> clients)
> >>
> >> 2) LVM Stripe:
> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.png
> >> (Here we see the same regression I observed with ramdisks and tmpfs
> >> files, unmodified kernel: 1550MB/s, with patches reverted: 1715MB/s).
> >>
> >> The only major difference with Felipe's test is that he was using
> >> blktap3 with XenServer and I'm using standard blktap2.
> >
> > Hello,
> >
> > I don't think you are using blktap2, I guess you are using blkback.
> 
> Right, sorry for the confusion.
> 
> > Also, running the test only for 10s and 3 repetitions seems too low, I
> > would probably try to run the tests for a longer time and do more
> > repetitions, and include the standard deviation also.
> >
> > Could you try to revert the patches independently to see if it's a
> > specific commit that introduces the regression?
> 
> I did additional test runs. Now I'm comparing 3 kernels:
> 1) Unmodified v3.15-rc5-157-g60b5f90 - green color on chart
> 
> 2) v3.15-rc5-157-g60b5f90 with bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> and 427bfe07e6744c058ce6fc4aa187cda96b635539 reverted (so only
> fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke foreign
> access for grants not mapped by the backend" left) - blue color on chart
> 
> 3) v3.15-rc5-157-g60b5f90 with all
> (bfe11d6de1c416cea4f3f0f35f864162063ce3fa,
> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> fbe363c476afe8ec992d3baf682670a4bd1b6ce6) patches reverted - red color
> on chart.
> 
> I test on top of striped LVM on 2 FusionIO drives, I do 3 repetitions for
> 30 seconds each.
> 
> The result is here:
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_20140612.pn
> g
> 
> It is consistent with what I've measured with ramdrives and tmpfs files:
> 
> 1) fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
> foreign access for grants not mapped by the backend" brings us the
> regression. Bigger block size is - bigger the difference but the regression is
> observed with all block sizes > 8k.
> 
> 2) bfe11d6de1c416cea4f3f0f35f864162063ce3fa "xen-blkfront: restore the
> non-persistent data path" brings us performance improvement but with
> conjunction with fbe363c476afe8ec992d3baf682670a4bd1b6ce6 it is still
> worse than the kernel without both patches.
> 
> My Dom0 is Fedora's 3.7.9-205.fc18.x86_64. I can test on newer blkback,
> however I'm not aware of any way to disable persistent grants there (there is
> no regression when they're used).
> 
> >
> > Thanks, Roger.
> 
> Thanks,
> 
> --
>   Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-12 12:32               ` Felipe Franciosi
  (?)
@ 2014-06-12 15:32               ` Vitaly Kuznetsov
  2014-06-20 17:06                   ` Felipe Franciosi
  -1 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2014-06-12 15:32 UTC (permalink / raw)
  To: Felipe Franciosi
  Cc: Roger Pau Monne, xen-devel, axboe, Greg KH, linux-kernel, stable,
	jerry.snitselaar, Jiri Slaby, Ronen Hod, Andrew Jones

Felipe Franciosi <felipe.franciosi@citrix.com> writes:

> Hi Vitaly,
>
> Are you able to test a 3.10 guest with and without the backport that
> Roger sent? This patch is attached to an e-mail Roger sent on "22 May
> 2014 13:54".

Sure,

Now I'm comparing d642daf637d02dacf216d7fd9da7532a4681cfd3 and
46c0326164c98e556c35c3eb240273595d43425d commits from
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
(with and without two commits in question). The test is exactly the same
as described before.

The result is here:
http://hadoop.ru/pubfiles/bug1096909/fusion/310_nopgrants_stripe.png

as you can see 46c03261 (without patches) wins everywhere.

>
> Because your results are contradicting with what these patches are
> meant to do, I would like to make sure that this isn't related to
> something else that happened after 3.10.

I still think Dom0 kernel and blktap/blktap3 is what make a difference
between our test environments.

>
> You could also test Ubuntu Sancy guests with and without the patched kernels provided by Joseph Salisbury on launchpad: https://bugs.launchpad.net/bugs/1319003
>
> Thanks,
> Felipe
>
>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: 12 June 2014 13:01
>> To: Roger Pau Monne
>> Cc: xen-devel@lists.xenproject.org; axboe@kernel.dk; Felipe Franciosi; Greg
>> KH; linux-kernel@vger.kernel.org; stable@vger.kernel.org;
>> jerry.snitselaar@oracle.com; Jiri Slaby; Ronen Hod; Andrew Jones
>> Subject: Re: [Xen-devel] Backport request to stable of two performance
>> related fixes for xen-blkfront (3.13 fixes to earlier trees)
>> 
>> Roger Pau Monné <roger.pau@citrix.com> writes:
>> 
>> > On 10/06/14 15:19, Vitaly Kuznetsov wrote:
>> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> >>
>> >>> Jiri Slaby <jslaby@suse.cz> writes:
>> >>>
>> >>>> On 06/04/2014 07:48 AM, Greg KH wrote:
>> >>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk
>> wrote:
>> >>>>>> Hey Greg
>> >>>>>>
>> >>>>>> This email is in regards to backporting two patches to stable
>> >>>>>> that fall under the 'performance' rule:
>> >>>>>>
>> >>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>> >>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
>> >>>>>
>> >>>>> Now queued up, thanks.
>> >>>>
>> >>>> AFAIU, they introduce a performance regression.
>> >>>>
>> >>>> Vitaly?
>> >>>
>> >>> I'm aware of a performance regression in a 'very special' case when
>> >>> ramdisks or files on tmpfs are being used as storage, I post my
>> >>> results a while ago:
>> >>> https://lkml.org/lkml/2014/5/22/164
>> >>> I'm not sure if that 'special' case requires investigation and/or
>> >>> should prevent us from doing stable backport but it would be nice if
>> >>> someone tries to reproduce it at least.
>> >>>
>> >>> I'm going to make a bunch of tests with FusionIO drives and
>> >>> sequential read to replicate same test Felipe did, I'll report as
>> >>> soon as I have data (beginning of next week hopefuly).
>> >>
>> >> Turns out the regression I'm observing with these patches is not
>> >> restricted to tmpfs/ramdisk usage.
>> >>
>> >> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter) on
>> >> HP ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is disabled,
>> >> Dom0 is pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1
>> >> vCPU each, they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I tried
>> >> differed pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7 to
>> >> balance NUMA, that doesn't make any difference to the results). I was
>> >> testing on top of Xen-4.3.2.
>> >>
>> >> I was testing two storage configurations:
>> >> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are
>> >> attached to guests
>> >> 2) LVM group is created on top of both drives (/dev/fioa, /dev/fiob),
>> >> 10G logical volumes are created with striping (lvcreate -i2 ...)
>> >>
>> >> Test is done by simultaneous fio run in guests (rw=read, direct=1)
>> >> for
>> >> 10 second. Each test was performed 3 times and the average was taken.
>> >> Kernels I compare are:
>> >> 1) v3.15-rc5-157-g60b5f90 unmodified
>> >> 2) v3.15-rc5-157-g60b5f90 with
>> 427bfe07e6744c058ce6fc4aa187cda96b635539,
>> >>    bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
>> >>    fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
>> >>
>> >> First test was done with Dom0 with persistent grant support (Fedora's
>> >> 3.14.4-200.fc20.x86_64):
>> >> 1) Partitions:
>> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions.pn
>> >> g (same markers mean same bs, we get 860 MB/s here, patches make no
>> >> difference, result matches expectation)
>> >>
>> >> 2) LVM Stripe:
>> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
>> >> (1715 MB/s, patches make no difference, result matches expectation)
>> >>
>> >> Second test was performed with Dom0 without persistent grants support
>> >> (Fedora's 3.7.9-205.fc18.x86_64)
>> >> 1) Partitions:
>> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.
>> >> png
>> >> (860 MB/sec again, patches worsen a bit overall throughput with 1-3
>> >> clients)
>> >>
>> >> 2) LVM Stripe:
>> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.png
>> >> (Here we see the same regression I observed with ramdisks and tmpfs
>> >> files, unmodified kernel: 1550MB/s, with patches reverted: 1715MB/s).
>> >>
>> >> The only major difference with Felipe's test is that he was using
>> >> blktap3 with XenServer and I'm using standard blktap2.
>> >
>> > Hello,
>> >
>> > I don't think you are using blktap2, I guess you are using blkback.
>> 
>> Right, sorry for the confusion.
>> 
>> > Also, running the test only for 10s and 3 repetitions seems too low, I
>> > would probably try to run the tests for a longer time and do more
>> > repetitions, and include the standard deviation also.
>> >
>> > Could you try to revert the patches independently to see if it's a
>> > specific commit that introduces the regression?
>> 
>> I did additional test runs. Now I'm comparing 3 kernels:
>> 1) Unmodified v3.15-rc5-157-g60b5f90 - green color on chart
>> 
>> 2) v3.15-rc5-157-g60b5f90 with bfe11d6de1c416cea4f3f0f35f864162063ce3fa
>> and 427bfe07e6744c058ce6fc4aa187cda96b635539 reverted (so only
>> fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke foreign
>> access for grants not mapped by the backend" left) - blue color on chart
>> 
>> 3) v3.15-rc5-157-g60b5f90 with all
>> (bfe11d6de1c416cea4f3f0f35f864162063ce3fa,
>> 427bfe07e6744c058ce6fc4aa187cda96b635539,
>> fbe363c476afe8ec992d3baf682670a4bd1b6ce6) patches reverted - red color
>> on chart.
>> 
>> I test on top of striped LVM on 2 FusionIO drives, I do 3 repetitions for
>> 30 seconds each.
>> 
>> The result is here:
>> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_20140612.pn
>> g
>> 
>> It is consistent with what I've measured with ramdrives and tmpfs files:
>> 
>> 1) fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
>> foreign access for grants not mapped by the backend" brings us the
>> regression. Bigger block size is - bigger the difference but the regression is
>> observed with all block sizes > 8k.
>> 
>> 2) bfe11d6de1c416cea4f3f0f35f864162063ce3fa "xen-blkfront: restore the
>> non-persistent data path" brings us performance improvement but with
>> conjunction with fbe363c476afe8ec992d3baf682670a4bd1b6ce6 it is still
>> worse than the kernel without both patches.
>> 
>> My Dom0 is Fedora's 3.7.9-205.fc18.x86_64. I can test on newer blkback,
>> however I'm not aware of any way to disable persistent grants there (there is
>> no regression when they're used).
>> 
>> >
>> > Thanks, Roger.
>> 
>> Thanks,
>> 
>> --
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
  2014-06-12 15:32               ` Vitaly Kuznetsov
@ 2014-06-20 17:06                   ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-06-20 17:06 UTC (permalink / raw)
  To: 'Vitaly Kuznetsov'
  Cc: Roger Pau Monne, xen-devel, axboe, Greg KH, linux-kernel, stable,
	jerry.snitselaar, Jiri Slaby, Ronen Hod, Andrew Jones

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 13746 bytes --]

Hi all,

Vitaly and I just some hours having a look at his environment together and examining what was going on. Kudos for Vitaly for putting together the virtual meeting and preparing the whole environment for this!

The short version is that we don't believe the patches introduce a regression. There is one particular case that we measured where the throughput was a bit lower with the patches backported, but that scenario had several other factors affecting the workload as you can read below. For all other cases the throughput is either equivalent or higher when the patches are applied.

The patches should be taken for guests on the affected kernel versions.


The longer version:

We started by ensuring the system was stable with little variation between measurements by:
* Disabling hyper threading, turbo and C-States on the BIOS.
* Using Xen's performance governor (cores fixed to P0).
* Setting the maximum C-State to 1 (cores fixed to C0 and C1).
* Identity pinning dom0's vCPUs to NUMA node 0.
** vcpu0 -> pcpu0, vcpu1->pcpu1, vcpu2->pcpu2, vcpu3->pcpu3.
* Pinning guest's vCPUs to NUMA node 1.
** domU1vcpu0->pcpus4-7, domU2vcpu0->pcpus4-7, ...
* We used the NOOP scheduler within the guest (the Fusion-io and the device mapper block devices in dom0 don't register schedulers).

We also tried to identify all meaningful differences between Vitaly's setup and tests that I've been running. The main ones were:
* Vitaly is using HVM guests. (I've been testing on PV guests.)
* Vitaly's LVM configuration was different for his test disks:
** Vitaly's Fusion-io presented 2 SCSI devices which were presented to LVM as two physical volumes (PV).
** Both PVs were added to a single volume group (VG).
** Stripped logical volumes (using 4K stripes) were created on the VG to utilise both disks.
** I am using several SSDs on my test and treating them independently.

When we repeated his experiments on this configuration for a full sequential read workload with 1MB requests, we got:
Guests running 3.10 stock: Aggregated 1.7 GB/s
Guests running 3.10 +backports: Aggregated 1.6 GB/s

This is the only case where we identified the backports causing a regression (keep reading).

We made a few important observations:
* The dom0 CPU utilisation was high-ish for both scenarios:
** Cores 1-3 were at 70%, mostly in system time (blkback + fusion-io workers)
** Core 0 was at 100% and receiving all hardware interrupts
* The device mapper block devices receiving IO from blkback had an average request size of 4K.
** They were subsequently being merged back to the Fusion-io block devices.
** We identified this to be an effect of the 4K stripes at the logical volume level.

Next, we addressed the first difference between our environments and repeated the experiment with guests in PV mode, also trying 4M and 64K requests. On this experiment, there was no noticeable difference when using either kernel. As expected, with 64K requests the throughput was a bit lower, but still comparable between kernels. The throughput on the cases with 1M and 4M were identical.

Next, we addressed the second difference between our environments and recreated the LVM configuration as follows:
* One physical volume per Fusion-io device.
* One volume group per physical volume.
* Four logical volumes per volume group (without stripping, just using a linear table).
* Assigned one logical volume for each guest for the total of 8 guests.

When repeating the experiment one last time, we noticed the following:
* The CPU utilisation was lower for both HVM and PV guests.
** We believe this due the reduced stress caused by breaking and merging requests at the device mapper level.
* The requests reaching each logical volume were now 44K in size.
** We were not using indirect IO for this test.

On this modified LVM configuration, the kernel with the backports was actually faster than the stock 3.10. This was both for PV and HVM guests:
Guests running 3.10 stock: Aggregated 1.6 GB/s
Guests running 3.10 +backports: Aggregated 1.7 GB/s

In conclusion:

I believe the environment with only one SSD does not provide enough storage power to conclusively show the regression that Roger's patches are addressing. When looking at the measurements I did for the Ubuntu report, it is possible to observe the regression becomes noticeable for throughputs that are much higher than 1.7 GB/s:
https://launchpadlibrarian.net/176700111/saucy64.png
https://launchpadlibrarian.net/176700099/saucy64-backports.png

There is the weird case where the stripped LVM configuration forced some CPU contention in dom0 and, for HVM guests, the patches result in the throughput being slightly slower. We are open to ideas on this one.

All in all, the patches are important and can drastically improve the throughput of guests in the affected kernel range when the backend does not support persistent grants.

Thanks,
Felipe

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: 12 June 2014 16:33
> To: Felipe Franciosi
> Cc: Roger Pau Monne; xen-devel@lists.xenproject.org; axboe@kernel.dk;
> Greg KH; linux-kernel@vger.kernel.org; stable@vger.kernel.org;
> jerry.snitselaar@oracle.com; Jiri Slaby; Ronen Hod; Andrew Jones
> Subject: Re: [Xen-devel] Backport request to stable of two performance
> related fixes for xen-blkfront (3.13 fixes to earlier trees)
> 
> Felipe Franciosi <felipe.franciosi@citrix.com> writes:
> 
> > Hi Vitaly,
> >
> > Are you able to test a 3.10 guest with and without the backport that
> > Roger sent? This patch is attached to an e-mail Roger sent on "22 May
> > 2014 13:54".
> 
> Sure,
> 
> Now I'm comparing d642daf637d02dacf216d7fd9da7532a4681cfd3 and
> 46c0326164c98e556c35c3eb240273595d43425d commits from
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> (with and without two commits in question). The test is exactly the same as
> described before.
> 
> The result is here:
> http://hadoop.ru/pubfiles/bug1096909/fusion/310_nopgrants_stripe.png
> 
> as you can see 46c03261 (without patches) wins everywhere.
> 
> >
> > Because your results are contradicting with what these patches are
> > meant to do, I would like to make sure that this isn't related to
> > something else that happened after 3.10.
> 
> I still think Dom0 kernel and blktap/blktap3 is what make a difference
> between our test environments.
> 
> >
> > You could also test Ubuntu Sancy guests with and without the patched
> > kernels provided by Joseph Salisbury on launchpad:
> > https://bugs.launchpad.net/bugs/1319003
> >
> > Thanks,
> > Felipe
> >
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: 12 June 2014 13:01
> >> To: Roger Pau Monne
> >> Cc: xen-devel@lists.xenproject.org; axboe@kernel.dk; Felipe
> >> Franciosi; Greg KH; linux-kernel@vger.kernel.org;
> >> stable@vger.kernel.org; jerry.snitselaar@oracle.com; Jiri Slaby;
> >> Ronen Hod; Andrew Jones
> >> Subject: Re: [Xen-devel] Backport request to stable of two
> >> performance related fixes for xen-blkfront (3.13 fixes to earlier
> >> trees)
> >>
> >> Roger Pau Monné <roger.pau@citrix.com> writes:
> >>
> >> > On 10/06/14 15:19, Vitaly Kuznetsov wrote:
> >> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >> >>
> >> >>> Jiri Slaby <jslaby@suse.cz> writes:
> >> >>>
> >> >>>> On 06/04/2014 07:48 AM, Greg KH wrote:
> >> >>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk
> >> wrote:
> >> >>>>>> Hey Greg
> >> >>>>>>
> >> >>>>>> This email is in regards to backporting two patches to stable
> >> >>>>>> that fall under the 'performance' rule:
> >> >>>>>>
> >> >>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >> >>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> >> >>>>>
> >> >>>>> Now queued up, thanks.
> >> >>>>
> >> >>>> AFAIU, they introduce a performance regression.
> >> >>>>
> >> >>>> Vitaly?
> >> >>>
> >> >>> I'm aware of a performance regression in a 'very special' case
> >> >>> when ramdisks or files on tmpfs are being used as storage, I post
> >> >>> my results a while ago:
> >> >>> https://lkml.org/lkml/2014/5/22/164
> >> >>> I'm not sure if that 'special' case requires investigation and/or
> >> >>> should prevent us from doing stable backport but it would be nice
> >> >>> if someone tries to reproduce it at least.
> >> >>>
> >> >>> I'm going to make a bunch of tests with FusionIO drives and
> >> >>> sequential read to replicate same test Felipe did, I'll report as
> >> >>> soon as I have data (beginning of next week hopefuly).
> >> >>
> >> >> Turns out the regression I'm observing with these patches is not
> >> >> restricted to tmpfs/ramdisk usage.
> >> >>
> >> >> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter)
> >> >> on HP ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is
> >> >> disabled,
> >> >> Dom0 is pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1
> >> >> vCPU each, they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I
> >> >> tried differed pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7
> >> >> to balance NUMA, that doesn't make any difference to the results).
> >> >> I was testing on top of Xen-4.3.2.
> >> >>
> >> >> I was testing two storage configurations:
> >> >> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are
> >> >> attached to guests
> >> >> 2) LVM group is created on top of both drives (/dev/fioa,
> >> >> /dev/fiob), 10G logical volumes are created with striping
> >> >> (lvcreate -i2 ...)
> >> >>
> >> >> Test is done by simultaneous fio run in guests (rw=read, direct=1)
> >> >> for
> >> >> 10 second. Each test was performed 3 times and the average was
> taken.
> >> >> Kernels I compare are:
> >> >> 1) v3.15-rc5-157-g60b5f90 unmodified
> >> >> 2) v3.15-rc5-157-g60b5f90 with
> >> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> >> >>    bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
> >> >>    fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
> >> >>
> >> >> First test was done with Dom0 with persistent grant support
> >> >> (Fedora's
> >> >> 3.14.4-200.fc20.x86_64):
> >> >> 1) Partitions:
> >> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions
> >> >> .pn g (same markers mean same bs, we get 860 MB/s here, patches
> >> >> make no difference, result matches expectation)
> >> >>
> >> >> 2) LVM Stripe:
> >> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
> >> >> (1715 MB/s, patches make no difference, result matches
> >> >> expectation)
> >> >>
> >> >> Second test was performed with Dom0 without persistent grants
> >> >> support (Fedora's 3.7.9-205.fc18.x86_64)
> >> >> 1) Partitions:
> >> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.
> >> >> png
> >> >> (860 MB/sec again, patches worsen a bit overall throughput with
> >> >> 1-3
> >> >> clients)
> >> >>
> >> >> 2) LVM Stripe:
> >> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.p
> >> >> ng (Here we see the same regression I observed with ramdisks and
> >> >> tmpfs files, unmodified kernel: 1550MB/s, with patches reverted:
> >> >> 1715MB/s).
> >> >>
> >> >> The only major difference with Felipe's test is that he was using
> >> >> blktap3 with XenServer and I'm using standard blktap2.
> >> >
> >> > Hello,
> >> >
> >> > I don't think you are using blktap2, I guess you are using blkback.
> >>
> >> Right, sorry for the confusion.
> >>
> >> > Also, running the test only for 10s and 3 repetitions seems too
> >> > low, I would probably try to run the tests for a longer time and do
> >> > more repetitions, and include the standard deviation also.
> >> >
> >> > Could you try to revert the patches independently to see if it's a
> >> > specific commit that introduces the regression?
> >>
> >> I did additional test runs. Now I'm comparing 3 kernels:
> >> 1) Unmodified v3.15-rc5-157-g60b5f90 - green color on chart
> >>
> >> 2) v3.15-rc5-157-g60b5f90 with
> >> bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >> and 427bfe07e6744c058ce6fc4aa187cda96b635539 reverted (so only
> >> fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
> >> foreign access for grants not mapped by the backend" left) - blue
> >> color on chart
> >>
> >> 3) v3.15-rc5-157-g60b5f90 with all
> >> (bfe11d6de1c416cea4f3f0f35f864162063ce3fa,
> >> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> >> fbe363c476afe8ec992d3baf682670a4bd1b6ce6) patches reverted - red
> >> color on chart.
> >>
> >> I test on top of striped LVM on 2 FusionIO drives, I do 3 repetitions
> >> for
> >> 30 seconds each.
> >>
> >> The result is here:
> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_20140612.pn
> >> g
> >>
> >> It is consistent with what I've measured with ramdrives and tmpfs files:
> >>
> >> 1) fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
> >> foreign access for grants not mapped by the backend" brings us the
> >> regression. Bigger block size is - bigger the difference but the
> >> regression is observed with all block sizes > 8k.
> >>
> >> 2) bfe11d6de1c416cea4f3f0f35f864162063ce3fa "xen-blkfront: restore
> >> the non-persistent data path" brings us performance improvement but
> >> with conjunction with fbe363c476afe8ec992d3baf682670a4bd1b6ce6 it is
> >> still worse than the kernel without both patches.
> >>
> >> My Dom0 is Fedora's 3.7.9-205.fc18.x86_64. I can test on newer
> >> blkback, however I'm not aware of any way to disable persistent
> >> grants there (there is no regression when they're used).
> >>
> >> >
> >> > Thanks, Roger.
> >>
> >> Thanks,
> >>
> >> --
> >>   Vitaly
> 
> --
>   Vitaly
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Xen-devel] Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees)
@ 2014-06-20 17:06                   ` Felipe Franciosi
  0 siblings, 0 replies; 41+ messages in thread
From: Felipe Franciosi @ 2014-06-20 17:06 UTC (permalink / raw)
  To: 'Vitaly Kuznetsov'
  Cc: Roger Pau Monne, xen-devel, axboe, Greg KH, linux-kernel, stable,
	jerry.snitselaar, Jiri Slaby, Ronen Hod, Andrew Jones

Hi all,

Vitaly and I just some hours having a look at his environment together and examining what was going on. Kudos for Vitaly for putting together the virtual meeting and preparing the whole environment for this!

The short version is that we don't believe the patches introduce a regression. There is one particular case that we measured where the throughput was a bit lower with the patches backported, but that scenario had several other factors affecting the workload as you can read below. For all other cases the throughput is either equivalent or higher when the patches are applied.

The patches should be taken for guests on the affected kernel versions.


The longer version:

We started by ensuring the system was stable with little variation between measurements by:
* Disabling hyper threading, turbo and C-States on the BIOS.
* Using Xen's performance governor (cores fixed to P0).
* Setting the maximum C-State to 1 (cores fixed to C0 and C1).
* Identity pinning dom0's vCPUs to NUMA node 0.
** vcpu0 -> pcpu0, vcpu1->pcpu1, vcpu2->pcpu2, vcpu3->pcpu3.
* Pinning guest's vCPUs to NUMA node 1.
** domU1vcpu0->pcpus4-7, domU2vcpu0->pcpus4-7, ...
* We used the NOOP scheduler within the guest (the Fusion-io and the device mapper block devices in dom0 don't register schedulers).

We also tried to identify all meaningful differences between Vitaly's setup and tests that I've been running. The main ones were:
* Vitaly is using HVM guests. (I've been testing on PV guests.)
* Vitaly's LVM configuration was different for his test disks:
** Vitaly's Fusion-io presented 2 SCSI devices which were presented to LVM as two physical volumes (PV).
** Both PVs were added to a single volume group (VG).
** Stripped logical volumes (using 4K stripes) were created on the VG to utilise both disks.
** I am using several SSDs on my test and treating them independently.

When we repeated his experiments on this configuration for a full sequential read workload with 1MB requests, we got:
Guests running 3.10 stock: Aggregated 1.7 GB/s
Guests running 3.10 +backports: Aggregated 1.6 GB/s

This is the only case where we identified the backports causing a regression (keep reading).

We made a few important observations:
* The dom0 CPU utilisation was high-ish for both scenarios:
** Cores 1-3 were at 70%, mostly in system time (blkback + fusion-io workers)
** Core 0 was at 100% and receiving all hardware interrupts
* The device mapper block devices receiving IO from blkback had an average request size of 4K.
** They were subsequently being merged back to the Fusion-io block devices.
** We identified this to be an effect of the 4K stripes at the logical volume level.

Next, we addressed the first difference between our environments and repeated the experiment with guests in PV mode, also trying 4M and 64K requests. On this experiment, there was no noticeable difference when using either kernel. As expected, with 64K requests the throughput was a bit lower, but still comparable between kernels. The throughput on the cases with 1M and 4M were identical.

Next, we addressed the second difference between our environments and recreated the LVM configuration as follows:
* One physical volume per Fusion-io device.
* One volume group per physical volume.
* Four logical volumes per volume group (without stripping, just using a linear table).
* Assigned one logical volume for each guest for the total of 8 guests.

When repeating the experiment one last time, we noticed the following:
* The CPU utilisation was lower for both HVM and PV guests.
** We believe this due the reduced stress caused by breaking and merging requests at the device mapper level.
* The requests reaching each logical volume were now 44K in size.
** We were not using indirect IO for this test.

On this modified LVM configuration, the kernel with the backports was actually faster than the stock 3.10. This was both for PV and HVM guests:
Guests running 3.10 stock: Aggregated 1.6 GB/s
Guests running 3.10 +backports: Aggregated 1.7 GB/s

In conclusion:

I believe the environment with only one SSD does not provide enough storage power to conclusively show the regression that Roger's patches are addressing. When looking at the measurements I did for the Ubuntu report, it is possible to observe the regression becomes noticeable for throughputs that are much higher than 1.7 GB/s:
https://launchpadlibrarian.net/176700111/saucy64.png
https://launchpadlibrarian.net/176700099/saucy64-backports.png

There is the weird case where the stripped LVM configuration forced some CPU contention in dom0 and, for HVM guests, the patches result in the throughput being slightly slower. We are open to ideas on this one.

All in all, the patches are important and can drastically improve the throughput of guests in the affected kernel range when the backend does not support persistent grants.

Thanks,
Felipe

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: 12 June 2014 16:33
> To: Felipe Franciosi
> Cc: Roger Pau Monne; xen-devel@lists.xenproject.org; axboe@kernel.dk;
> Greg KH; linux-kernel@vger.kernel.org; stable@vger.kernel.org;
> jerry.snitselaar@oracle.com; Jiri Slaby; Ronen Hod; Andrew Jones
> Subject: Re: [Xen-devel] Backport request to stable of two performance
> related fixes for xen-blkfront (3.13 fixes to earlier trees)
> 
> Felipe Franciosi <felipe.franciosi@citrix.com> writes:
> 
> > Hi Vitaly,
> >
> > Are you able to test a 3.10 guest with and without the backport that
> > Roger sent? This patch is attached to an e-mail Roger sent on "22 May
> > 2014 13:54".
> 
> Sure,
> 
> Now I'm comparing d642daf637d02dacf216d7fd9da7532a4681cfd3 and
> 46c0326164c98e556c35c3eb240273595d43425d commits from
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> (with and without two commits in question). The test is exactly the same as
> described before.
> 
> The result is here:
> http://hadoop.ru/pubfiles/bug1096909/fusion/310_nopgrants_stripe.png
> 
> as you can see 46c03261 (without patches) wins everywhere.
> 
> >
> > Because your results are contradicting with what these patches are
> > meant to do, I would like to make sure that this isn't related to
> > something else that happened after 3.10.
> 
> I still think Dom0 kernel and blktap/blktap3 is what make a difference
> between our test environments.
> 
> >
> > You could also test Ubuntu Sancy guests with and without the patched
> > kernels provided by Joseph Salisbury on launchpad:
> > https://bugs.launchpad.net/bugs/1319003
> >
> > Thanks,
> > Felipe
> >
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: 12 June 2014 13:01
> >> To: Roger Pau Monne
> >> Cc: xen-devel@lists.xenproject.org; axboe@kernel.dk; Felipe
> >> Franciosi; Greg KH; linux-kernel@vger.kernel.org;
> >> stable@vger.kernel.org; jerry.snitselaar@oracle.com; Jiri Slaby;
> >> Ronen Hod; Andrew Jones
> >> Subject: Re: [Xen-devel] Backport request to stable of two
> >> performance related fixes for xen-blkfront (3.13 fixes to earlier
> >> trees)
> >>
> >> Roger Pau Monné <roger.pau@citrix.com> writes:
> >>
> >> > On 10/06/14 15:19, Vitaly Kuznetsov wrote:
> >> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >> >>
> >> >>> Jiri Slaby <jslaby@suse.cz> writes:
> >> >>>
> >> >>>> On 06/04/2014 07:48 AM, Greg KH wrote:
> >> >>>>> On Wed, May 14, 2014 at 03:11:22PM -0400, Konrad Rzeszutek Wilk
> >> wrote:
> >> >>>>>> Hey Greg
> >> >>>>>>
> >> >>>>>> This email is in regards to backporting two patches to stable
> >> >>>>>> that fall under the 'performance' rule:
> >> >>>>>>
> >> >>>>>>  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >> >>>>>>  fbe363c476afe8ec992d3baf682670a4bd1b6ce6
> >> >>>>>
> >> >>>>> Now queued up, thanks.
> >> >>>>
> >> >>>> AFAIU, they introduce a performance regression.
> >> >>>>
> >> >>>> Vitaly?
> >> >>>
> >> >>> I'm aware of a performance regression in a 'very special' case
> >> >>> when ramdisks or files on tmpfs are being used as storage, I post
> >> >>> my results a while ago:
> >> >>> https://lkml.org/lkml/2014/5/22/164
> >> >>> I'm not sure if that 'special' case requires investigation and/or
> >> >>> should prevent us from doing stable backport but it would be nice
> >> >>> if someone tries to reproduce it at least.
> >> >>>
> >> >>> I'm going to make a bunch of tests with FusionIO drives and
> >> >>> sequential read to replicate same test Felipe did, I'll report as
> >> >>> soon as I have data (beginning of next week hopefuly).
> >> >>
> >> >> Turns out the regression I'm observing with these patches is not
> >> >> restricted to tmpfs/ramdisk usage.
> >> >>
> >> >> I was doing tests with Fusion-io ioDrive Duo 320GB (Dual Adapter)
> >> >> on HP ProLiant DL380 G6 (2xE5540, 8G RAM). Hyperthreading is
> >> >> disabled,
> >> >> Dom0 is pinned to CPU0 (cores 0,1,2,3) I run up to 8 guests with 1
> >> >> vCPU each, they are pinned to CPU1 (cores 4,5,6,7,4,5,6,7). I
> >> >> tried differed pinning (Dom0 to 0,1,4,5, DomUs to 2,3,6,7,2,3,6,7
> >> >> to balance NUMA, that doesn't make any difference to the results).
> >> >> I was testing on top of Xen-4.3.2.
> >> >>
> >> >> I was testing two storage configurations:
> >> >> 1) Plain 10G partitions from one Fusion drive (/dev/fioa) are
> >> >> attached to guests
> >> >> 2) LVM group is created on top of both drives (/dev/fioa,
> >> >> /dev/fiob), 10G logical volumes are created with striping
> >> >> (lvcreate -i2 ...)
> >> >>
> >> >> Test is done by simultaneous fio run in guests (rw=read, direct=1)
> >> >> for
> >> >> 10 second. Each test was performed 3 times and the average was
> taken.
> >> >> Kernels I compare are:
> >> >> 1) v3.15-rc5-157-g60b5f90 unmodified
> >> >> 2) v3.15-rc5-157-g60b5f90 with
> >> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> >> >>    bfe11d6de1c416cea4f3f0f35f864162063ce3fa, and
> >> >>    fbe363c476afe8ec992d3baf682670a4bd1b6ce6 reverted.
> >> >>
> >> >> First test was done with Dom0 with persistent grant support
> >> >> (Fedora's
> >> >> 3.14.4-200.fc20.x86_64):
> >> >> 1) Partitions:
> >> >> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_partitions
> >> >> .pn g (same markers mean same bs, we get 860 MB/s here, patches
> >> >> make no difference, result matches expectation)
> >> >>
> >> >> 2) LVM Stripe:
> >> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_pgrants_stripe.png
> >> >> (1715 MB/s, patches make no difference, result matches
> >> >> expectation)
> >> >>
> >> >> Second test was performed with Dom0 without persistent grants
> >> >> support (Fedora's 3.7.9-205.fc18.x86_64)
> >> >> 1) Partitions:
> >> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_partitions.
> >> >> png
> >> >> (860 MB/sec again, patches worsen a bit overall throughput with
> >> >> 1-3
> >> >> clients)
> >> >>
> >> >> 2) LVM Stripe:
> >> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_stripe.p
> >> >> ng (Here we see the same regression I observed with ramdisks and
> >> >> tmpfs files, unmodified kernel: 1550MB/s, with patches reverted:
> >> >> 1715MB/s).
> >> >>
> >> >> The only major difference with Felipe's test is that he was using
> >> >> blktap3 with XenServer and I'm using standard blktap2.
> >> >
> >> > Hello,
> >> >
> >> > I don't think you are using blktap2, I guess you are using blkback.
> >>
> >> Right, sorry for the confusion.
> >>
> >> > Also, running the test only for 10s and 3 repetitions seems too
> >> > low, I would probably try to run the tests for a longer time and do
> >> > more repetitions, and include the standard deviation also.
> >> >
> >> > Could you try to revert the patches independently to see if it's a
> >> > specific commit that introduces the regression?
> >>
> >> I did additional test runs. Now I'm comparing 3 kernels:
> >> 1) Unmodified v3.15-rc5-157-g60b5f90 - green color on chart
> >>
> >> 2) v3.15-rc5-157-g60b5f90 with
> >> bfe11d6de1c416cea4f3f0f35f864162063ce3fa
> >> and 427bfe07e6744c058ce6fc4aa187cda96b635539 reverted (so only
> >> fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
> >> foreign access for grants not mapped by the backend" left) - blue
> >> color on chart
> >>
> >> 3) v3.15-rc5-157-g60b5f90 with all
> >> (bfe11d6de1c416cea4f3f0f35f864162063ce3fa,
> >> 427bfe07e6744c058ce6fc4aa187cda96b635539,
> >> fbe363c476afe8ec992d3baf682670a4bd1b6ce6) patches reverted - red
> >> color on chart.
> >>
> >> I test on top of striped LVM on 2 FusionIO drives, I do 3 repetitions
> >> for
> >> 30 seconds each.
> >>
> >> The result is here:
> >>
> http://hadoop.ru/pubfiles/bug1096909/fusion/315_nopgrants_20140612.pn
> >> g
> >>
> >> It is consistent with what I've measured with ramdrives and tmpfs files:
> >>
> >> 1) fbe363c476afe8ec992d3baf682670a4bd1b6ce6 "xen-blkfront: revoke
> >> foreign access for grants not mapped by the backend" brings us the
> >> regression. Bigger block size is - bigger the difference but the
> >> regression is observed with all block sizes > 8k.
> >>
> >> 2) bfe11d6de1c416cea4f3f0f35f864162063ce3fa "xen-blkfront: restore
> >> the non-persistent data path" brings us performance improvement but
> >> with conjunction with fbe363c476afe8ec992d3baf682670a4bd1b6ce6 it is
> >> still worse than the kernel without both patches.
> >>
> >> My Dom0 is Fedora's 3.7.9-205.fc18.x86_64. I can test on newer
> >> blkback, however I'm not aware of any way to disable persistent
> >> grants there (there is no regression when they're used).
> >>
> >> >
> >> > Thanks, Roger.
> >>
> >> Thanks,
> >>
> >> --
> >>   Vitaly
> 
> --
>   Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2014-06-20 17:06 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14 19:11 Backport request to stable of two performance related fixes for xen-blkfront (3.13 fixes to earlier trees) Konrad Rzeszutek Wilk
2014-05-14 19:21 ` Josh Boyer
2014-05-14 19:21 ` Josh Boyer
2014-05-20  3:19 ` Greg KH
2014-05-20  3:19   ` Greg KH
2014-05-22 12:54   ` Roger Pau Monné
2014-06-02  7:13     ` Felipe Franciosi
2014-05-22 12:54   ` Roger Pau Monné
2014-05-20  9:32 ` [Xen-devel] " Vitaly Kuznetsov
2014-05-20  9:54   ` Vitaly Kuznetsov
2014-05-20  9:54   ` [Xen-devel] " Vitaly Kuznetsov
2014-05-20 10:32     ` Roger Pau Monné
2014-05-20 11:41       ` Vitaly Kuznetsov
2014-05-20 11:41       ` [Xen-devel] " Vitaly Kuznetsov
2014-05-20 13:59         ` Felipe Franciosi
2014-05-20 13:59           ` Felipe Franciosi
2014-05-20 13:59           ` [Xen-devel] " Felipe Franciosi
2014-05-22  8:52           ` Vitaly Kuznetsov
2014-05-20 10:32     ` Roger Pau Monné
2014-05-20  9:32 ` Vitaly Kuznetsov
2014-06-04  5:48 ` Greg KH
2014-06-04  5:48 ` Greg KH
2014-06-06 10:47   ` Jiri Slaby
2014-06-06 10:58     ` Vitaly Kuznetsov
2014-06-10 13:19       ` [Xen-devel] " Vitaly Kuznetsov
2014-06-10 16:55         ` Roger Pau Monné
2014-06-12 12:00           ` Vitaly Kuznetsov
2014-06-12 12:32             ` Felipe Franciosi
2014-06-12 12:32               ` Felipe Franciosi
2014-06-12 15:32               ` Vitaly Kuznetsov
2014-06-20 17:06                 ` Felipe Franciosi
2014-06-20 17:06                   ` Felipe Franciosi
2014-06-10 17:26         ` Felipe Franciosi
2014-06-06 10:58     ` Vitaly Kuznetsov
2014-06-06 13:56     ` Greg KH
2014-06-06 13:56     ` Greg KH
2014-06-06 14:02       ` Konrad Rzeszutek Wilk
2014-06-06 14:02       ` Konrad Rzeszutek Wilk
2014-06-06 14:03         ` Jiri Slaby
2014-06-06 14:03         ` Jiri Slaby
2014-06-06 10:47   ` Jiri Slaby

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.