All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: anthony@codemonkey.ws
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: [Qemu-devel] [PATCH 11/30] posix-aio-compat: fix latency issues
Date: Mon, 29 Aug 2011 16:53:19 +0200	[thread overview]
Message-ID: <1314629618-8308-12-git-send-email-kwolf@redhat.com> (raw)
In-Reply-To: <1314629618-8308-1-git-send-email-kwolf@redhat.com>

From: Avi Kivity <avi@redhat.com>

In certain circumstances, posix-aio-compat can incur a lot of latency:
 - threads are created by vcpu threads, so if vcpu affinity is set,
   aio threads inherit vcpu affinity.  This can cause many aio threads
   to compete for one cpu.
 - we can create up to max_threads (64) aio threads in one go; since a
   pthread_create can take around 30μs, we have up to 2ms of cpu time
   under a global lock.

Fix by:
 - moving thread creation to the main thread, so we inherit the main
   thread's affinity instead of the vcpu thread's affinity.
 - if a thread is currently being created, and we need to create yet
   another thread, let thread being born create the new thread, reducing
   the amount of time we spend under the main thread.
 - drop the local lock while creating a thread (we may still hold the
   global mutex, though)

Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it.  We may want pre-allocated
threads when this cannot be tolerated.

Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 posix-aio-compat.c |   44 ++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index babb094..3193dbf 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -30,6 +30,7 @@
 
 #include "block/raw-posix-aio.h"
 
+static void do_spawn_thread(void);
 
 struct qemu_paiocb {
     BlockDriverAIOCB common;
@@ -64,6 +65,9 @@ static pthread_attr_t attr;
 static int max_threads = 64;
 static int cur_threads = 0;
 static int idle_threads = 0;
+static int new_threads = 0;     /* backlog of threads we need to create */
+static int pending_threads = 0; /* threads created but not running yet */
+static QEMUBH *new_thread_bh;
 static QTAILQ_HEAD(, qemu_paiocb) request_list;
 
 #ifdef CONFIG_PREADV
@@ -311,6 +315,11 @@ static void *aio_thread(void *unused)
 
     pid = getpid();
 
+    mutex_lock(&lock);
+    pending_threads--;
+    mutex_unlock(&lock);
+    do_spawn_thread();
+
     while (1) {
         struct qemu_paiocb *aiocb;
         ssize_t ret = 0;
@@ -381,11 +390,20 @@ static void *aio_thread(void *unused)
     return NULL;
 }
 
-static void spawn_thread(void)
+static void do_spawn_thread(void)
 {
     sigset_t set, oldset;
 
-    cur_threads++;
+    mutex_lock(&lock);
+    if (!new_threads) {
+        mutex_unlock(&lock);
+        return;
+    }
+
+    new_threads--;
+    pending_threads++;
+
+    mutex_unlock(&lock);
 
     /* block all signals */
     if (sigfillset(&set)) die("sigfillset");
@@ -396,6 +414,27 @@ static void spawn_thread(void)
     if (sigprocmask(SIG_SETMASK, &oldset, NULL)) die("sigprocmask restore");
 }
 
+static void spawn_thread_bh_fn(void *opaque)
+{
+    do_spawn_thread();
+}
+
+static void spawn_thread(void)
+{
+    cur_threads++;
+    new_threads++;
+    /* If there are threads being created, they will spawn new workers, so
+     * we don't spend time creating many threads in a loop holding a mutex or
+     * starving the current vcpu.
+     *
+     * If there are no idle threads, ask the main thread to create one, so we
+     * inherit the correct affinity instead of the vcpu affinity.
+     */
+    if (!pending_threads) {
+        qemu_bh_schedule(new_thread_bh);
+    }
+}
+
 static void qemu_paio_submit(struct qemu_paiocb *aiocb)
 {
     aiocb->ret = -EINPROGRESS;
@@ -665,6 +704,7 @@ int paio_init(void)
         die2(ret, "pthread_attr_setdetachstate");
 
     QTAILQ_INIT(&request_list);
+    new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL);
 
     posix_aio_state = s;
     return 0;
-- 
1.7.6

  parent reply	other threads:[~2011-08-29 14:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-29 14:53 [Qemu-devel] [PULL 00/30] Block patches Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 01/30] coroutine: Add CoRwlock support Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 02/30] block: parse cache mode flags in a single place Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 03/30] block: add cache=directsync parameter to -drive Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 04/30] qcow2: Fix DEBUG_* compilation Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 05/30] qemu-img: Use qemu_blockalign Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 06/30] qcow2: fix typo in documentation for qcow2_get_cluster_offset() Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 07/30] qcow: initialize coroutine mutex Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 08/30] qemu-img: print error codes when convert fails Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 09/30] block/curl: Handle failed reads gracefully Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 10/30] block: include flush requests in info blockstats Kevin Wolf
2011-08-29 14:53 ` Kevin Wolf [this message]
2011-08-29 14:53 ` [Qemu-devel] [PATCH 12/30] qcow/qcow2: Allocate QCowAIOCB structure using stack Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 13/30] qcow: QCowAIOCB field cleanup Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 14/30] qcow: move some blocks of code to avoid useless variable initialization Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 15/30] qcow: Remove QCowAIOCB Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 16/30] qcow: remove old #undefined code Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 17/30] qcow2: Removed unused AIOCB fields Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 18/30] qcow2: removed cur_nr_sectors field in QCowAIOCB Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 19/30] qcow2: remove l2meta from QCowAIOCB Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 20/30] qcow2: remove cluster_offset " Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 21/30] qcow2: remove common " Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 22/30] qcow2: reindent and use while before the big jump Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 23/30] qcow2: Removed QCowAIOCB entirely Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 24/30] qcow2: remove memory leak Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 25/30] sheepdog: use coroutines Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 26/30] qcow2: use always stderr for debugging Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 27/30] qcow2: remove unused qcow2_create_refcount_update function Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 28/30] block: explicit I/O accounting Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 29/30] block: latency accounting Kevin Wolf
2011-08-29 14:53 ` [Qemu-devel] [PATCH 30/30] qemu-img: Require larger zero areas for sparse handling Kevin Wolf
2011-08-29 19:15 ` [Qemu-devel] [PULL 00/30] Block patches Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1314629618-8308-12-git-send-email-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.