All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@canonical.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Fam Zheng <famz@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	tom.leiming@gmail.com, qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Sun, 10 Aug 2014 11:46:24 +0800	[thread overview]
Message-ID: <20140810114624.0305b7af@tom-ThinkPad-T410> (raw)
In-Reply-To: <20140806084855.GA4090@noname.str.redhat.com>

Hi Kevin, Paolo, Stefan and all,


On Wed, 6 Aug 2014 10:48:55 +0200
Kevin Wolf <kwolf@redhat.com> wrote:

> Am 06.08.2014 um 07:33 hat Ming Lei geschrieben:

> 
> Anyhow, the coroutine version of your benchmark is buggy, it leaks all
> coroutines instead of exiting them, so it can't make any use of the
> coroutine pool. On my laptop, I get this (where fixed coroutine is a
> version that simply removes the yield at the end):
> 
>                 | bypass        | fixed coro    | buggy coro
> ----------------+---------------+---------------+--------------
> time            | 1.09s         | 1.10s         | 1.62s
> L1-dcache-loads | 921,836,360   | 932,781,747   | 1,298,067,438
> insns per cycle | 2.39          | 2.39          | 1.90
> 
> Begs the question whether you see a similar effect on a real qemu and
> the coroutine pool is still not big enough? With correct use of
> coroutines, the difference seems to be barely measurable even without
> any I/O involved.

Now I fixes the coroutine leak bug, and previous crypt bench is a bit high
loading, and cause operations per sec very low(~40K/sec), finally I write a new
and simple one which can generate hundreds of kilo operations per sec and
the number should match with some fast storage devices, and it does show there
is not small effect from coroutine.

Extremely if just getppid() syscall is run in each iteration, with using coroutine,
only 3M operations/sec can be got, and without using coroutine, the number can
reach 16M/sec, and there is more than 4 times difference!!!

From another file read bench which is the default one:

      just doing open(file), read(fd, buf in stack, 512), sum and close() in each iteration

without using coroutine, operations per second can increase ~20% compared
with using coroutine. If reading 1024 bytes each time, the number still can
increase ~10%. The operations per second level is between 200K~400K per
sec which should match the IOPS in dataplane test, and the tests are
done in my lenovo T410 notepad(CPU: 2.6GHz, dual core, four threads). 

When reading 8192 and more bytes each time, the difference between using
coroutine and not can't be observed obviously.

Surely, the test result should depend on how fast the machine is, but even
for fast machine, I guess the similar result still can be observed by
decreasing read bytes each time.


diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index ae64b3d..78c3b60 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -15,6 +15,12 @@ STEXI
 @item bench [-q] [-f @var{fmt]} [-n] [-t @var{cache}] filename
 ETEXI
 
+DEF("co_bench", co_bench,
+    "co_bench -c count -f read_file_name -s read_size -q -b")
+STEXI
+@item co_bench [-c @var{count}] [-f @var{filename}] [-s @var{read_size}] [-b] [-q]
+ETEXI
+
 DEF("check", img_check,
     "check [-q] [-f fmt] [--output=ofmt]  [-r [leaks | all]] filename")
 STEXI
diff --git a/qemu-img.c b/qemu-img.c
index 3e1b7c4..c9c7ac3 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -366,6 +366,138 @@ static int add_old_style_options(const char *fmt, QemuOpts *opts,
     return 0;
 }
 
+struct co_data {
+    const char *file_name;
+    unsigned long sum;
+    int read_size;
+    bool bypass;
+};
+
+static unsigned long file_bench(struct co_data *co)
+{
+    const int size = co->read_size;
+    int fd = open(co->file_name, O_RDONLY);
+    char buf[size];
+    int len, i;
+    unsigned long sum = 0;
+
+    if (fd < 0) {
+        perror("open file failed\n");
+        exit(-1);
+    }
+
+    /* the 1st page should have been in page cache, needn't worry about block */
+    len = read(fd, buf, size);
+    if (len != size) {
+        perror("open file failed\n");
+        exit(-1);
+    }
+    close(fd);
+
+    for (i = 0; i < len; i++) {
+        sum += buf[i];
+    }
+
+    return sum;
+}
+
+static void syscall_bench(void *opaque)
+{
+    struct co_data *data = opaque;
+
+#if 0
+    /*
+     * Doing getppid() only will show operations per sec may increase 5
+     * times in my T410 notepad via bypassing coroutine!!!
+     */
+    data->sum += getppid();
+#else
+    /*
+     * open, read 1024 bytes, and close will show ~10% increase in my
+     * T410 notepad via bypassing coroutine!!!
+     *
+     * open, read 512bytes, and close will show ~20% increase in my
+     * T410 notepad via bypassing coroutine!!!
+     *
+     * Below link provides 'perf stat' on several hw events:
+     *
+     *       http://pastebin.com/5s750m8C
+     *
+     * And with bypassing coroutine, dcache loads decreases, insns per
+     * cycle increased 0.7, branch-misses ratio decreases 0.4%, and
+     * dTLB-loads decreases too.
+     */
+    data->sum += file_bench(data);
+#endif
+
+    if (!data->bypass) {
+        qemu_coroutine_yield();
+    }
+}
+
+static int co_bench(int argc, char **argv)
+{
+    int c;
+    unsigned long cnt = 1;
+    int num = 1;
+    unsigned long i;
+    struct co_data data = {
+        .file_name = argv[-1],
+        .sum = 0,
+        .read_size = 1024,
+        .bypass = false,
+    };
+    Coroutine *co, *last_co = NULL;
+    struct timeval t1, t2;
+    unsigned long tv = 0;
+
+    for (;;) {
+        c = getopt(argc, argv, "bc:s:f:");
+        if (c == -1) {
+            break;
+        }
+        switch (c) {
+        case 'b':
+            data.bypass = true;
+            break;
+        case 'c':
+            num = atoi(optarg);
+            break;
+        case 's':
+            data.read_size = atoi(optarg);
+            break;
+        case 'f':
+            data.file_name = optarg;
+            break;
+        }
+    }
+
+    printf("%s: iterations %d, bypass: %s, file %s, read_size: %d\n",
+           __func__, num,
+           data.bypass ? "yes" : "no",
+           data.file_name, data.read_size);
+    gettimeofday(&t1, NULL);
+    for (i = 0; i < num * cnt; i++) {
+        if (!data.bypass) {
+            if (last_co) {
+                qemu_coroutine_enter(last_co, NULL);
+            }
+            co = qemu_coroutine_create(syscall_bench);
+            last_co = co;
+            qemu_coroutine_enter(co, &data);
+        } else {
+            syscall_bench(&data);
+        }
+    }
+    gettimeofday(&t2, NULL);
+    tv = (t2.tv_sec - t1.tv_sec) * 1000000 +
+        (t2.tv_usec - t1.tv_usec);
+    printf("\ttotal time: %lums, %5.0fK ops per sec\n", tv / 1000,
+           (double)((cnt * num * 1000) / tv));
+
+    return (int)data.sum;
+}
+
 static int img_create(int argc, char **argv)
 {
     int c;


Thanks,
-- 
Ming Lei

  parent reply	other threads:[~2014-08-10  3:47 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05  3:33 [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 01/17] qemu/obj_pool.h: introduce object allocation pool Ming Lei
2014-08-05 11:55   ` Eric Blake
2014-08-05 12:05     ` Michael S. Tsirkin
2014-08-05 12:21       ` Eric Blake
2014-08-05 12:51         ` Michael S. Tsirkin
2014-08-06  2:35     ` Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 02/17] dataplane: use object pool to speed up allocation for virtio blk request Ming Lei
2014-08-05 12:30   ` Eric Blake
2014-08-06  2:45     ` Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 03/17] qemu coroutine: support bypass mode Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 04/17] block: prepare for supporting selective bypass coroutine Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 05/17] garbage collector: introduced for support of " Ming Lei
2014-08-05 12:43   ` Eric Blake
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 06/17] block: introduce bdrv_co_can_bypass_co Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 07/17] block: support to bypass qemu coroutinue Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 08/17] Revert "raw-posix: drop raw_get_aio_fd() since it is no longer used" Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 09/17] dataplane: enable selective bypassing coroutine Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 10/17] linux-aio: fix submit aio as a batch Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 11/17] linux-aio: handling -EAGAIN for !s->io_q.plugged case Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 12/17] linux-aio: increase max event to 256 Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 13/17] linux-aio: remove 'node' from 'struct qemu_laiocb' Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 14/17] hw/virtio/virtio-blk.h: introduce VIRTIO_BLK_F_MQ Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 15/17] virtio-blk: support multi queue for non-dataplane Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 16/17] virtio-blk: dataplane: support multi virtqueue Ming Lei
2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 17/17] hw/virtio-pci: introduce num_queues property Ming Lei
2014-08-05  9:38 ` [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support Stefan Hajnoczi
2014-08-05  9:50   ` Ming Lei
2014-08-05  9:56     ` Kevin Wolf
2014-08-05 10:50       ` Ming Lei
2014-08-05 13:59     ` Stefan Hajnoczi
2014-08-05  9:48 ` Kevin Wolf
2014-08-05 10:00   ` Ming Lei
2014-08-05 11:44     ` Paolo Bonzini
2014-08-05 13:48     ` Stefan Hajnoczi
2014-08-05 14:47       ` Kevin Wolf
2014-08-06  5:33         ` Ming Lei
2014-08-06  7:45           ` Paolo Bonzini
2014-08-06  8:38             ` Ming Lei
2014-08-06  8:50               ` Paolo Bonzini
2014-08-06 13:53                 ` Ming Lei
2014-08-06  8:48           ` Kevin Wolf
2014-08-06  9:37             ` Ming Lei
2014-08-06 10:09               ` Kevin Wolf
2014-08-06 11:28                 ` Ming Lei
2014-08-06 11:44                   ` Ming Lei
2014-08-06 15:40                   ` Kevin Wolf
2014-08-07 10:27                     ` Ming Lei
2014-08-07 10:52                       ` Ming Lei
2014-08-07 11:06                         ` Kevin Wolf
2014-08-07 13:03                           ` Ming Lei
2014-08-07 13:51                       ` Kevin Wolf
2014-08-08 10:32                         ` Ming Lei
2014-08-08 11:26                           ` Ming Lei
2014-08-10  3:46             ` Ming Lei [this message]
2014-08-11 14:03               ` Kevin Wolf
2014-08-12  7:53                 ` Ming Lei
2014-08-12 11:40                   ` Kevin Wolf
2014-08-12 12:14                     ` Ming Lei
2014-08-11 19:37               ` Paolo Bonzini
2014-08-12  8:12                 ` Ming Lei
2014-08-12 19:08                   ` Paolo Bonzini
2014-08-13  9:54                     ` Kevin Wolf
2014-08-13 13:16                       ` Paolo Bonzini
2014-08-13 13:49                         ` Ming Lei
2014-08-14  9:39                           ` Stefan Hajnoczi
2014-08-14 10:12                             ` Ming Lei
2014-08-15 20:16                             ` Paolo Bonzini
2014-08-13 10:19                     ` Ming Lei
2014-08-13 12:35                       ` Paolo Bonzini
2014-08-13  8:55                 ` Stefan Hajnoczi
2014-08-13 11:43                 ` Ming Lei
2014-08-13 12:35                   ` Paolo Bonzini
2014-08-13 13:07                     ` Ming Lei
2014-08-14 10:46                 ` Kevin Wolf
2014-08-15 10:39                   ` Ming Lei
2014-08-15 20:15                   ` Paolo Bonzini
2014-08-16  8:20                     ` Ming Lei
2014-08-17  5:29                     ` Paolo Bonzini
2014-08-18  8:58                       ` Kevin Wolf
2014-08-06  9:37           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140810114624.0305b7af@tom-ThinkPad-T410 \
    --to=ming.lei@canonical.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.