From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45716)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ming.lei@canonical.com>) id 1XF1eW-0001Cf-Sz
	for qemu-devel@nongnu.org; Wed, 06 Aug 2014 09:53:24 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ming.lei@canonical.com>) id 1XF1eM-000314-GF
	for qemu-devel@nongnu.org; Wed, 06 Aug 2014 09:53:16 -0400
Received: from youngberry.canonical.com ([91.189.89.112]:53071)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ming.lei@canonical.com>) id 1XF1eM-00030o-9t
	for qemu-devel@nongnu.org; Wed, 06 Aug 2014 09:53:06 -0400
Received: from mail-vc0-f178.google.com ([209.85.220.178])
	by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_ARCFOUR_SHA1:16)
	(Exim 4.71) (envelope-from <ming.lei@canonical.com>)
	id 1XF1eL-0002GK-Cv
	for qemu-devel@nongnu.org; Wed, 06 Aug 2014 13:53:05 +0000
Received: by mail-vc0-f178.google.com with SMTP id la4so4035337vcb.37
	for <qemu-devel@nongnu.org>; Wed, 06 Aug 2014 06:53:04 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <53E1EC6D.9020308@redhat.com>
References: <1407209598-2572-1-git-send-email-ming.lei@canonical.com>
	<20140805094844.GF4391@noname.str.redhat.com>
	<CACVXFVMeu9N9OnTcfGV3EC5HOSBxaxrz2R_aodDG43gusS1dow@mail.gmail.com>
	<20140805134815.GD12251@stefanha-thinkpad.redhat.com>
	<20140805144728.GH4391@noname.str.redhat.com>
	<CACVXFVOyaoHXYpuayk8vOQpUCZ0rjD2rMTKG-wxCKokfarGr_w@mail.gmail.com>
	<53E1DD0E.8080202@redhat.com>
	<CACVXFVNFonrY-+R-77iKFAxD_bnnXcY4bhDJ-C5rbN5emG2FQw@mail.gmail.com>
	<53E1EC6D.9020308@redhat.com>
Date: Wed, 6 Aug 2014 21:53:04 +0800
Message-ID: <CACVXFVOSL1ve1SgzDgKzbBw9rPiafsu+gBtuHz0evOzgPfiPOA@mail.gmail.com>
From: Ming Lei <ming.lei@canonical.com>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi
 virtqueue support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Fam Zheng <famz@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Stefan Hajnoczi <stefanha@redhat.com>

On Wed, Aug 6, 2014 at 4:50 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 06/08/2014 10:38, Ming Lei ha scritto:
>> On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 06/08/2014 07:33, Ming Lei ha scritto:
>>>>>> I played a bit with the following, I hope it's not too naive. I couldn't
>>>>>> see a difference with your patches, but at least one reason for this is
>>>>>> probably that my laptop SSD isn't fast enough to make the CPU the
>>>>>> bottleneck. Haven't tried ramdisk yet, that would probably be the next
>>>>>> thing. (I actually wrote the patch up just for some profiling on my own,
>>>>>> not for comparing throughput, but it should be usable for that as well.)
>>>> This might not be good for the test since it is basically a sequential
>>>> read test, which can be optimized a lot by kernel. And I always use
>>>> randread benchmark.
>>>
>>> A microbenchmark already exists in tests/test-coroutine.c, and doesn't
>>> really tell us much; it's obvious that coroutines execute more code, the
>>> question is why it affects the iops performance.
>>
>> Could you take a look at the coroutine benchmark I worte?  The running
>> result shows coroutine does decrease performance a lot compared with
>> bypass coroutine like the patchset is doing.
>
> Your benchmark is synchronous, while disk I/O is asynchronous.

It can be thought as asynchronous too, since it won't sleep like
synchronous I/O.

Basically the IO thread is CPU bound type in case of linux-aio
since both submission and completion won't block CPU mostly,
so my benchmark still fits if we thought the completion as nop.

The current problem is that from single coroutine benchmark,
looks it doesn't hurt performance with stack switch, but in Kevin's
block aio benchmark, bypass coroutine can still obtain observable
improvement.

>
> Your benchmark doesn't add much compared to "time tests/test-coroutine
> -m perf  -p /perf/yield".  It takes 8 seconds on my machine, and 10^8
> function calls obviously take less than 8 seconds.  I've sent a patch to
> add a "baseline" function call benchmark to test-coroutine.
>
>>> The sequential read should be the right workload.  For fio, you want to
>>> get as many iops as possible to QEMU and so you need randread.  But
>>> qemu-img is not run in a guest and if the kernel optimizes sequential
>>> reads then the bypass should have even more benefits because it makes
>>> userspace proportionally more expensive.
>
> Do you agree with this?

Yes, I have posted the result of the benchmark, and looks the result
is basically similar with my previous test on dataplane.

Thanks,