All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] thread-pool: fix performance regression
@ 2022-05-14  6:50 Paolo Bonzini
  2022-05-14  6:50 ` [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom half Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-14  6:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, nsaenzju

Together, these two patches fix the performance regression induced by
QemuSemaphore; individually they don't though.

The third patch is a small cleanup on top, that was enabled by the
recent introduction of min_threads/max_threads knobs for the
thread pool.

6.2:
   iops        : min=58051, max=62260, avg=60282.57, stdev=1081.18, samples=30
    clat percentiles (usec):   1.00th=[  490],   99.99th=[  775]
   iops        : min=59401, max=61290, avg=60651.27, stdev=468.24, samples=30
    clat percentiles (usec):   1.00th=[  490],   99.99th=[  717]
   iops        : min=59583, max=60816, avg=60353.43, stdev=282.69, samples=30
    clat percentiles (usec):   1.00th=[  490],   99.99th=[  701]
   iops        : min=58099, max=60713, avg=59739.53, stdev=755.49, samples=30
    clat percentiles (usec):   1.00th=[  494],   99.99th=[  717]
	
patched:
   iops        : min=60616, max=62522, avg=61654.37, stdev=555.67, samples=30
    clat percentiles (usec):   1.00th=[  474],   99.99th=[ 1303]
   iops        : min=61841, max=63600, avg=62878.47, stdev=442.40, samples=30
    clat percentiles (usec):   1.00th=[  465],   99.99th=[  685]
   iops        : min=62976, max=63910, avg=63531.60, stdev=261.05, samples=30
    clat percentiles (usec):   1.00th=[  461],   99.99th=[  693]
   iops        : min=60803, max=63623, avg=62653.37, stdev=808.76, samples=30
    clat percentiles (usec):   1.00th=[  465],   99.99th=[  685]
	
Paolo

v1->v2: support min_threads/max_threads

v2->v3: fix NULL pointer dereference [Nicolas]

Paolo Bonzini (3):
  thread-pool: optimize scheduling of completion bottom half
  thread-pool: replace semaphore with condition variable
  thread-pool: remove stopping variable

 util/thread-pool.c | 74 +++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 44 deletions(-)

-- 
2.36.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom half
  2022-05-14  6:50 [PATCH v3 0/3] thread-pool: fix performance regression Paolo Bonzini
@ 2022-05-14  6:50 ` Paolo Bonzini
  2022-05-17 15:20   ` Stefan Hajnoczi
  2022-05-14  6:50 ` [PATCH v3 2/3] thread-pool: replace semaphore with condition variable Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-14  6:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, nsaenzju

The completion bottom half was scheduled within the pool->lock
critical section.  That actually results in worse performance,
because the worker thread can run its own small critical section
and go to sleep before the bottom half starts running.

Note that this simple change does not produce an improvement without
changing the thread pool QemuSemaphore to a condition variable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/thread-pool.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 196835b4d3..4979f30ca3 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -127,9 +127,8 @@ static void *worker_thread(void *opaque)
         smp_wmb();
         req->state = THREAD_DONE;
 
-        qemu_mutex_lock(&pool->lock);
-
         qemu_bh_schedule(pool->completion_bh);
+        qemu_mutex_lock(&pool->lock);
     }
 
     pool->cur_threads--;
-- 
2.36.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/3] thread-pool: replace semaphore with condition variable
  2022-05-14  6:50 [PATCH v3 0/3] thread-pool: fix performance regression Paolo Bonzini
  2022-05-14  6:50 ` [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom half Paolo Bonzini
@ 2022-05-14  6:50 ` Paolo Bonzini
  2022-05-17 12:46   ` Nicolas Saenz Julienne
  2022-05-17 15:19   ` Stefan Hajnoczi
  2022-05-14  6:50 ` [PATCH v3 3/3] thread-pool: remove stopping variable Paolo Bonzini
  2022-05-17 14:30 ` [PATCH v3 0/3] thread-pool: fix performance regression Nicolas Saenz Julienne
  3 siblings, 2 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-14  6:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, nsaenzju, Lukáš Doktor

Since commit f9fc8932b1 ("thread-posix: remove the posix semaphore
support", 2022-04-06) QemuSemaphore has its own mutex and condition
variable; this adds unnecessary overhead on I/O with small block sizes.

Check the QTAILQ directly instead of adding the indirection of a
semaphore's count.  Using a semaphore has not been necessary since
qemu_cond_timedwait was introduced; the new code has to be careful about
spurious wakeups but it is simpler, for example thread_pool_cancel does
not have to worry about synchronizing the semaphore count with the number
of elements of pool->request_list.

Note that the return value of qemu_cond_timedwait (0 for timeout, 1 for
signal or spurious wakeup) is different from that of qemu_sem_timedwait
(-1 for timeout, 0 for success).

Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/thread-pool.c | 68 +++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 40 deletions(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 4979f30ca3..6bbf24754a 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -57,7 +57,7 @@ struct ThreadPool {
     QEMUBH *completion_bh;
     QemuMutex lock;
     QemuCond worker_stopped;
-    QemuSemaphore sem;
+    QemuCond request_cond;
     QEMUBH *new_thread_bh;
 
     /* The following variables are only accessed from one AioContext. */
@@ -74,23 +74,6 @@ struct ThreadPool {
     int max_threads;
 };
 
-static inline bool back_to_sleep(ThreadPool *pool, int ret)
-{
-    /*
-     * The semaphore timed out, we should exit the loop except when:
-     *  - There is work to do, we raced with the signal.
-     *  - The max threads threshold just changed, we raced with the signal.
-     *  - The thread pool forces a minimum number of readily available threads.
-     */
-    if (ret == -1 && (!QTAILQ_EMPTY(&pool->request_list) ||
-            pool->cur_threads > pool->max_threads ||
-            pool->cur_threads <= pool->min_threads)) {
-            return true;
-    }
-
-    return false;
-}
-
 static void *worker_thread(void *opaque)
 {
     ThreadPool *pool = opaque;
@@ -99,20 +82,25 @@ static void *worker_thread(void *opaque)
     pool->pending_threads--;
     do_spawn_thread(pool);
 
-    while (!pool->stopping) {
+    while (!pool->stopping && pool->cur_threads <= pool->max_threads) {
         ThreadPoolElement *req;
         int ret;
 
-        do {
+        if (QTAILQ_EMPTY(&pool->request_list)) {
             pool->idle_threads++;
-            qemu_mutex_unlock(&pool->lock);
-            ret = qemu_sem_timedwait(&pool->sem, 10000);
-            qemu_mutex_lock(&pool->lock);
+            ret = qemu_cond_timedwait(&pool->request_cond, &pool->lock, 10000);
             pool->idle_threads--;
-        } while (back_to_sleep(pool, ret));
-        if (ret == -1 || pool->stopping ||
-            pool->cur_threads > pool->max_threads) {
-            break;
+            if (ret == 0 &&
+                QTAILQ_EMPTY(&pool->request_list) &&
+                pool->cur_threads > pool->min_threads) {
+                /* Timed out + no work to do + no need for warm threads = exit.  */
+                break;
+            }
+            /*
+             * Even if there was some work to do, check if there aren't
+             * too many worker threads before picking it up.
+             */
+            continue;
         }
 
         req = QTAILQ_FIRST(&pool->request_list);
@@ -134,6 +122,12 @@ static void *worker_thread(void *opaque)
     pool->cur_threads--;
     qemu_cond_signal(&pool->worker_stopped);
     qemu_mutex_unlock(&pool->lock);
+
+    /*
+     * Wake up another thread, in case we got a wakeup but decided
+     * to exit due to pool->cur_threads > pool->max_threads.
+     */
+    qemu_cond_signal(&pool->worker_stopped);
     return NULL;
 }
 
@@ -229,13 +223,7 @@ static void thread_pool_cancel(BlockAIOCB *acb)
     trace_thread_pool_cancel(elem, elem->common.opaque);
 
     QEMU_LOCK_GUARD(&pool->lock);
-    if (elem->state == THREAD_QUEUED &&
-        /* No thread has yet started working on elem. we can try to "steal"
-         * the item from the worker if we can get a signal from the
-         * semaphore.  Because this is non-blocking, we can do it with
-         * the lock taken and ensure that elem will remain THREAD_QUEUED.
-         */
-        qemu_sem_timedwait(&pool->sem, 0) == 0) {
+    if (elem->state == THREAD_QUEUED) {
         QTAILQ_REMOVE(&pool->request_list, elem, reqs);
         qemu_bh_schedule(pool->completion_bh);
 
@@ -280,7 +268,7 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPool *pool,
     }
     QTAILQ_INSERT_TAIL(&pool->request_list, req, reqs);
     qemu_mutex_unlock(&pool->lock);
-    qemu_sem_post(&pool->sem);
+    qemu_cond_signal(&pool->request_cond);
     return &req->common;
 }
 
@@ -323,7 +311,7 @@ void thread_pool_update_params(ThreadPool *pool, AioContext *ctx)
      * We either have to:
      *  - Increase the number available of threads until over the min_threads
      *    threshold.
-     *  - Decrease the number of available threads until under the max_threads
+     *  - Bump the worker threads so that they exit, until under the max_threads
      *    threshold.
      *  - Do nothing. The current number of threads fall in between the min and
      *    max thresholds. We'll let the pool manage itself.
@@ -333,7 +321,7 @@ void thread_pool_update_params(ThreadPool *pool, AioContext *ctx)
     }
 
     for (int i = pool->cur_threads; i > pool->max_threads; i--) {
-        qemu_sem_post(&pool->sem);
+        qemu_cond_signal(&pool->request_cond);
     }
 
     qemu_mutex_unlock(&pool->lock);
@@ -350,7 +338,7 @@ static void thread_pool_init_one(ThreadPool *pool, AioContext *ctx)
     pool->completion_bh = aio_bh_new(ctx, thread_pool_completion_bh, pool);
     qemu_mutex_init(&pool->lock);
     qemu_cond_init(&pool->worker_stopped);
-    qemu_sem_init(&pool->sem, 0);
+    qemu_cond_init(&pool->request_cond);
     pool->new_thread_bh = aio_bh_new(ctx, spawn_thread_bh_fn, pool);
 
     QLIST_INIT(&pool->head);
@@ -383,15 +371,15 @@ void thread_pool_free(ThreadPool *pool)
 
     /* Wait for worker threads to terminate */
     pool->stopping = true;
+    qemu_cond_broadcast(&pool->request_cond);
     while (pool->cur_threads > 0) {
-        qemu_sem_post(&pool->sem);
         qemu_cond_wait(&pool->worker_stopped, &pool->lock);
     }
 
     qemu_mutex_unlock(&pool->lock);
 
     qemu_bh_delete(pool->completion_bh);
-    qemu_sem_destroy(&pool->sem);
+    qemu_cond_destroy(&pool->request_cond);
     qemu_cond_destroy(&pool->worker_stopped);
     qemu_mutex_destroy(&pool->lock);
     g_free(pool);
-- 
2.36.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/3] thread-pool: remove stopping variable
  2022-05-14  6:50 [PATCH v3 0/3] thread-pool: fix performance regression Paolo Bonzini
  2022-05-14  6:50 ` [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom half Paolo Bonzini
  2022-05-14  6:50 ` [PATCH v3 2/3] thread-pool: replace semaphore with condition variable Paolo Bonzini
@ 2022-05-14  6:50 ` Paolo Bonzini
  2022-05-17 15:20   ` Stefan Hajnoczi
  2022-05-17 14:30 ` [PATCH v3 0/3] thread-pool: fix performance regression Nicolas Saenz Julienne
  3 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-14  6:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, nsaenzju

Just setting the max threads to 0 is enough to stop all workers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/thread-pool.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 6bbf24754a..507c998827 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -69,7 +69,6 @@ struct ThreadPool {
     int idle_threads;
     int new_threads;     /* backlog of threads we need to create */
     int pending_threads; /* threads created but not running yet */
-    bool stopping;
     int min_threads;
     int max_threads;
 };
@@ -82,7 +81,7 @@ static void *worker_thread(void *opaque)
     pool->pending_threads--;
     do_spawn_thread(pool);
 
-    while (!pool->stopping && pool->cur_threads <= pool->max_threads) {
+    while (pool->cur_threads <= pool->max_threads) {
         ThreadPoolElement *req;
         int ret;
 
@@ -370,7 +369,7 @@ void thread_pool_free(ThreadPool *pool)
     pool->new_threads = 0;
 
     /* Wait for worker threads to terminate */
-    pool->stopping = true;
+    pool->max_threads = 0;
     qemu_cond_broadcast(&pool->request_cond);
     while (pool->cur_threads > 0) {
         qemu_cond_wait(&pool->worker_stopped, &pool->lock);
-- 
2.36.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] thread-pool: replace semaphore with condition variable
  2022-05-14  6:50 ` [PATCH v3 2/3] thread-pool: replace semaphore with condition variable Paolo Bonzini
@ 2022-05-17 12:46   ` Nicolas Saenz Julienne
  2022-05-17 14:18     ` Paolo Bonzini
  2022-05-17 15:19   ` Stefan Hajnoczi
  1 sibling, 1 reply; 13+ messages in thread
From: Nicolas Saenz Julienne @ 2022-05-17 12:46 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: stefanha, Lukáš Doktor

Hi Paolo,

On Sat, 2022-05-14 at 08:50 +0200, Paolo Bonzini wrote:

[...]

>  static void *worker_thread(void *opaque)
>  {
>      ThreadPool *pool = opaque;
> @@ -99,20 +82,25 @@ static void *worker_thread(void *opaque)
>      pool->pending_threads--;
>      do_spawn_thread(pool);
>  
> -    while (!pool->stopping) {
> +    while (!pool->stopping && pool->cur_threads <= pool->max_threads) {
>          ThreadPoolElement *req;
>          int ret;
>  
> -        do {
> +        if (QTAILQ_EMPTY(&pool->request_list)) {
>              pool->idle_threads++;
> -            qemu_mutex_unlock(&pool->lock);
> -            ret = qemu_sem_timedwait(&pool->sem, 10000);
> -            qemu_mutex_lock(&pool->lock);
> +            ret = qemu_cond_timedwait(&pool->request_cond, &pool->lock, 10000);
>              pool->idle_threads--;
> -        } while (back_to_sleep(pool, ret));
> -        if (ret == -1 || pool->stopping ||
> -            pool->cur_threads > pool->max_threads) {
> -            break;
> +            if (ret == 0 &&
> +                QTAILQ_EMPTY(&pool->request_list) &&
> +                pool->cur_threads > pool->min_threads) {
> +                /* Timed out + no work to do + no need for warm threads = exit.  */
> +                break;
> +            }

 Some comments:

- A completely idle pool will now never be able to lose its threads, as the
  'pool->cur_threads <= pool->max_threads' condition is only checked after a
  non-timeout wakeup.

- You don't take into account the possibility of being woken up with an empty
  queue. Which I belive possible:

	CPU0:					      CPU1:
						   // in worker_thread(), queue empty
						   qemu_cond_timedwait();

	acb = thread_pool_submit_aio();
		    ...
	    qemu_cond_signal();
	thread_pool_cancel(acb);
						   // wakes-up, ret == 1
					           req = QTAILQ_FIRST(&pool->request_list);
					           QTAILQ_REMOVE(&pool->request_list, req, reqs);
						   // segmentation fault....

		------------------------------------------------------

	CPU0:					      CPU1:
						   // in worker_thread(), queue empty
						   qemu_cond_timedwait();
	thread_pool_free()
		...
            qemu_cond_broadcast();
						   // wakes-up, ret == 1
					           req = QTAILQ_FIRST(&pool->request_list);
					           QTAILQ_REMOVE(&pool->request_list, req, reqs);
						   // segmentation fault....

Regards,

-- 
Nicolás Sáenz



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] thread-pool: replace semaphore with condition variable
  2022-05-17 12:46   ` Nicolas Saenz Julienne
@ 2022-05-17 14:18     ` Paolo Bonzini
  2022-05-17 14:37       ` Nicolas Saenz Julienne
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-17 14:18 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, qemu-devel; +Cc: stefanha, Lukáš Doktor

On 5/17/22 14:46, Nicolas Saenz Julienne wrote:
>> -    while (!pool->stopping) {
>> +    while (!pool->stopping && pool->cur_threads <= pool->max_threads) {
>>           ThreadPoolElement *req;
>>           int ret;
>>   
>> -        do {
>> +        if (QTAILQ_EMPTY(&pool->request_list)) {
>>               pool->idle_threads++;
>> -            qemu_mutex_unlock(&pool->lock);
>> -            ret = qemu_sem_timedwait(&pool->sem, 10000);
>> -            qemu_mutex_lock(&pool->lock);
>> +            ret = qemu_cond_timedwait(&pool->request_cond, &pool->lock, 10000);
>>               pool->idle_threads--;
>> -        } while (back_to_sleep(pool, ret));
>> -        if (ret == -1 || pool->stopping ||
>> -            pool->cur_threads > pool->max_threads) {
>> -            break;
>> +            if (ret == 0 &&
>> +                QTAILQ_EMPTY(&pool->request_list) &&
>> +                pool->cur_threads > pool->min_threads) {
>> +                /* Timed out + no work to do + no need for warm threads = exit.  */
>> +                break;
>> +            }
> 
>   Some comments:
> 
> - A completely idle pool will now never be able to lose its threads, as the
>    'pool->cur_threads <= pool->max_threads' condition is only checked after a
>    non-timeout wakeup.

Are you sure?  The full code is:

             ret = qemu_cond_timedwait(&pool->request_cond, &pool->lock, 10000);
             pool->idle_threads--;
             if (ret == 0 &&
                 QTAILQ_EMPTY(&pool->request_list) &&
                 pool->cur_threads > pool->min_threads) {
                 /* Timed out + no work to do + no need for warm threads  exit.  */
                 break;
             }
             /*
              * Even if there was some work to do, check if there aren't
              * too many worker threads before picking it up.
              */
             continue;

That is, it won't immediately pick up the job after _any_ wait,
whether successful or not.  It will first of all "continue" to
check pool->cur_threads <= pool->max_threads.

This is also the reason why I had to add a qemu_cond_signal() at the
bottom of the worker thread (because maybe it got a signal to act on a
non-empty queue, but decided to exit instead).

> - You don't take into account the possibility of being woken up with an empty
>    queue. Which I belive possible:

It's absolutely possible, but the difference between v2 and v3 _should_
be the fix.  Of course I could have screwed up, but it seems correct
this time.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/3] thread-pool: fix performance regression
  2022-05-14  6:50 [PATCH v3 0/3] thread-pool: fix performance regression Paolo Bonzini
                   ` (2 preceding siblings ...)
  2022-05-14  6:50 ` [PATCH v3 3/3] thread-pool: remove stopping variable Paolo Bonzini
@ 2022-05-17 14:30 ` Nicolas Saenz Julienne
  3 siblings, 0 replies; 13+ messages in thread
From: Nicolas Saenz Julienne @ 2022-05-17 14:30 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: stefanha

On Sat, 2022-05-14 at 08:50 +0200, Paolo Bonzini wrote:
> Together, these two patches fix the performance regression induced by
> QemuSemaphore; individually they don't though.
> 
> The third patch is a small cleanup on top, that was enabled by the
> recent introduction of min_threads/max_threads knobs for the
> thread pool.
> 
> 6.2:
>    iops        : min=58051, max=62260, avg=60282.57, stdev=1081.18, samples=30
>     clat percentiles (usec):   1.00th=[  490],   99.99th=[  775]
>    iops        : min=59401, max=61290, avg=60651.27, stdev=468.24, samples=30
>     clat percentiles (usec):   1.00th=[  490],   99.99th=[  717]
>    iops        : min=59583, max=60816, avg=60353.43, stdev=282.69, samples=30
>     clat percentiles (usec):   1.00th=[  490],   99.99th=[  701]
>    iops        : min=58099, max=60713, avg=59739.53, stdev=755.49, samples=30
>     clat percentiles (usec):   1.00th=[  494],   99.99th=[  717]
> 	
> patched:
>    iops        : min=60616, max=62522, avg=61654.37, stdev=555.67, samples=30
>     clat percentiles (usec):   1.00th=[  474],   99.99th=[ 1303]
>    iops        : min=61841, max=63600, avg=62878.47, stdev=442.40, samples=30
>     clat percentiles (usec):   1.00th=[  465],   99.99th=[  685]
>    iops        : min=62976, max=63910, avg=63531.60, stdev=261.05, samples=30
>     clat percentiles (usec):   1.00th=[  461],   99.99th=[  693]
>    iops        : min=60803, max=63623, avg=62653.37, stdev=808.76, samples=30
>     clat percentiles (usec):   1.00th=[  465],   99.99th=[  685]
> 	
> Paolo
> 
> v1->v2: support min_threads/max_threads
> 
> v2->v3: fix NULL pointer dereference [Nicolas]
> 
> Paolo Bonzini (3):
>   thread-pool: optimize scheduling of completion bottom half
>   thread-pool: replace semaphore with condition variable
>   thread-pool: remove stopping variable
> 
>  util/thread-pool.c | 74 +++++++++++++++++++---------------------------
>  1 file changed, 30 insertions(+), 44 deletions(-)


For the whole series:

Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Thanks,

-- 
Nicolás Sáenz



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] thread-pool: replace semaphore with condition variable
  2022-05-17 14:18     ` Paolo Bonzini
@ 2022-05-17 14:37       ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Saenz Julienne @ 2022-05-17 14:37 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: stefanha, Lukáš Doktor

On Tue, 2022-05-17 at 16:18 +0200, Paolo Bonzini wrote:
> On 5/17/22 14:46, Nicolas Saenz Julienne wrote:
> > > -    while (!pool->stopping) {
> > > +    while (!pool->stopping && pool->cur_threads <= pool->max_threads) {
> > >           ThreadPoolElement *req;
> > >           int ret;
> > >   
> > > -        do {
> > > +        if (QTAILQ_EMPTY(&pool->request_list)) {
> > >               pool->idle_threads++;
> > > -            qemu_mutex_unlock(&pool->lock);
> > > -            ret = qemu_sem_timedwait(&pool->sem, 10000);
> > > -            qemu_mutex_lock(&pool->lock);
> > > +            ret = qemu_cond_timedwait(&pool->request_cond, &pool->lock, 10000);
> > >               pool->idle_threads--;
> > > -        } while (back_to_sleep(pool, ret));
> > > -        if (ret == -1 || pool->stopping ||
> > > -            pool->cur_threads > pool->max_threads) {
> > > -            break;
> > > +            if (ret == 0 &&
> > > +                QTAILQ_EMPTY(&pool->request_list) &&
> > > +                pool->cur_threads > pool->min_threads) {
> > > +                /* Timed out + no work to do + no need for warm threads = exit.  */
> > > +                break;
> > > +            }
> > 
> >   Some comments:
> > 
> > - A completely idle pool will now never be able to lose its threads, as the
> >    'pool->cur_threads <= pool->max_threads' condition is only checked after a
> >    non-timeout wakeup.
> 
> Are you sure?  The full code is:
> 
>              ret = qemu_cond_timedwait(&pool->request_cond, &pool->lock, 10000);
>              pool->idle_threads--;
>              if (ret == 0 &&
>                  QTAILQ_EMPTY(&pool->request_list) &&
>                  pool->cur_threads > pool->min_threads) {
>                  /* Timed out + no work to do + no need for warm threads  exit.  */
>                  break;
>              }
>              /*
>               * Even if there was some work to do, check if there aren't
>               * too many worker threads before picking it up.
>               */
>              continue;
> 

OK, I somehow missed this 'continue', I wonder if I managed to re-review v2 by
accident... Anyway, it looks fine to me now.

Regards,

-- 
Nicolás Sáenz



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] thread-pool: replace semaphore with condition variable
  2022-05-14  6:50 ` [PATCH v3 2/3] thread-pool: replace semaphore with condition variable Paolo Bonzini
  2022-05-17 12:46   ` Nicolas Saenz Julienne
@ 2022-05-17 15:19   ` Stefan Hajnoczi
  2022-05-17 15:21     ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-05-17 15:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, nsaenzju, Lukáš Doktor

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

On Sat, May 14, 2022 at 08:50:11AM +0200, Paolo Bonzini wrote:
> @@ -134,6 +122,12 @@ static void *worker_thread(void *opaque)
>      pool->cur_threads--;
>      qemu_cond_signal(&pool->worker_stopped);
>      qemu_mutex_unlock(&pool->lock);
> +
> +    /*
> +     * Wake up another thread, in case we got a wakeup but decided
> +     * to exit due to pool->cur_threads > pool->max_threads.
> +     */
> +    qemu_cond_signal(&pool->worker_stopped);

&pool->worker_stopped? Was this supposed to be &pool->request_cond?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom half
  2022-05-14  6:50 ` [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom half Paolo Bonzini
@ 2022-05-17 15:20   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-05-17 15:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, nsaenzju

[-- Attachment #1: Type: text/plain, Size: 720 bytes --]

On Sat, May 14, 2022 at 08:50:10AM +0200, Paolo Bonzini wrote:
> The completion bottom half was scheduled within the pool->lock
> critical section.  That actually results in worse performance,
> because the worker thread can run its own small critical section
> and go to sleep before the bottom half starts running.
> 
> Note that this simple change does not produce an improvement without
> changing the thread pool QemuSemaphore to a condition variable.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  util/thread-pool.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] thread-pool: remove stopping variable
  2022-05-14  6:50 ` [PATCH v3 3/3] thread-pool: remove stopping variable Paolo Bonzini
@ 2022-05-17 15:20   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-05-17 15:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, nsaenzju

[-- Attachment #1: Type: text/plain, Size: 336 bytes --]

On Sat, May 14, 2022 at 08:50:12AM +0200, Paolo Bonzini wrote:
> Just setting the max threads to 0 is enough to stop all workers.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  util/thread-pool.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] thread-pool: replace semaphore with condition variable
  2022-05-17 15:19   ` Stefan Hajnoczi
@ 2022-05-17 15:21     ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-17 15:21 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Nicolas Saenz Julienne, Lukáš Doktor

On Tue, May 17, 2022 at 5:20 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Sat, May 14, 2022 at 08:50:11AM +0200, Paolo Bonzini wrote:
> > @@ -134,6 +122,12 @@ static void *worker_thread(void *opaque)
> >      pool->cur_threads--;
> >      qemu_cond_signal(&pool->worker_stopped);
> >      qemu_mutex_unlock(&pool->lock);
> > +
> > +    /*
> > +     * Wake up another thread, in case we got a wakeup but decided
> > +     * to exit due to pool->cur_threads > pool->max_threads.
> > +     */
> > +    qemu_cond_signal(&pool->worker_stopped);
>
> &pool->worker_stopped? Was this supposed to be &pool->request_cond?

Yes, of course.

Paolo



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 0/3] thread-pool: fix performance regression
@ 2022-05-14  6:50 Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-14  6:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, nsaenzju, Lukáš Doktor

Together, these two patches fix the performance regression induced by
QemuSemaphore; individually they don't though.

The third patch is a small cleanup on top, that was enabled by the
recent introduction of min_threads/max_threads knobs for the
thread pool.

6.2:
   iops        : min=58051, max=62260, avg=60282.57, stdev=1081.18, samples=30
    clat percentiles (usec):   1.00th=[  490],   99.99th=[  775]
   iops        : min=59401, max=61290, avg=60651.27, stdev=468.24, samples=30
    clat percentiles (usec):   1.00th=[  490],   99.99th=[  717]
   iops        : min=59583, max=60816, avg=60353.43, stdev=282.69, samples=30
    clat percentiles (usec):   1.00th=[  490],   99.99th=[  701]
   iops        : min=58099, max=60713, avg=59739.53, stdev=755.49, samples=30
    clat percentiles (usec):   1.00th=[  494],   99.99th=[  717]
	
patched:
   iops        : min=60616, max=62522, avg=61654.37, stdev=555.67, samples=30
    clat percentiles (usec):   1.00th=[  474],   99.99th=[ 1303]
   iops        : min=61841, max=63600, avg=62878.47, stdev=442.40, samples=30
    clat percentiles (usec):   1.00th=[  465],   99.99th=[  685]
   iops        : min=62976, max=63910, avg=63531.60, stdev=261.05, samples=30
    clat percentiles (usec):   1.00th=[  461],   99.99th=[  693]
   iops        : min=60803, max=63623, avg=62653.37, stdev=808.76, samples=30
    clat percentiles (usec):   1.00th=[  465],   99.99th=[  685]
	
Paolo

v1->v2: support min_threads/max_threads

v2->v3: fix NULL pointer dereference [Nicolas]

Paolo Bonzini (3):
  thread-pool: optimize scheduling of completion bottom half
  thread-pool: replace semaphore with condition variable
  thread-pool: remove stopping variable

 util/thread-pool.c | 74 +++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 44 deletions(-)

-- 
2.36.0

From c7468d98055a6fccb126f514457ae35e689117d2 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 6 May 2022 07:47:10 -0400
Subject: [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom
 half
Content-Type: text/plain

The completion bottom half was scheduled within the pool->lock
critical section.  That actually results in worse performance,
because the worker thread can run its own small critical section
and go to sleep before the bottom half starts running.

Note that this simple change does not produce an improvement without
changing the thread pool QemuSemaphore to a condition variable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/thread-pool.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 196835b4d3..4979f30ca3 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -127,9 +127,8 @@ static void *worker_thread(void *opaque)
         smp_wmb();
         req->state = THREAD_DONE;
 
-        qemu_mutex_lock(&pool->lock);
-
         qemu_bh_schedule(pool->completion_bh);
+        qemu_mutex_lock(&pool->lock);
     }
 
     pool->cur_threads--;
-- 
2.36.0


From c7f6d706e524402aacec5f825de599b0938fe3ef Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 12 May 2022 12:27:41 +0200
Subject: [PATCH v3 2/3] thread-pool: replace semaphore with condition variable
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Type: text/plain

Since commit f9fc8932b1 ("thread-posix: remove the posix semaphore
support", 2022-04-06) QemuSemaphore has its own mutex and condition
variable; this adds unnecessary overhead on I/O with small block sizes.

Check the QTAILQ directly instead of adding the indirection of a
semaphore's count.  Using a semaphore has not been necessary since
qemu_cond_timedwait was introduced; the new code has to be careful about
spurious wakeups but it is simpler, for example thread_pool_cancel does
not have to worry about synchronizing the semaphore count with the number
of elements of pool->request_list.

Note that the return value of qemu_cond_timedwait (0 for timeout, 1 for
signal or spurious wakeup) is different from that of qemu_sem_timedwait
(-1 for timeout, 0 for success).

Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/thread-pool.c | 68 +++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 40 deletions(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 4979f30ca3..6bbf24754a 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -57,7 +57,7 @@ struct ThreadPool {
     QEMUBH *completion_bh;
     QemuMutex lock;
     QemuCond worker_stopped;
-    QemuSemaphore sem;
+    QemuCond request_cond;
     QEMUBH *new_thread_bh;
 
     /* The following variables are only accessed from one AioContext. */
@@ -74,23 +74,6 @@ struct ThreadPool {
     int max_threads;
 };
 
-static inline bool back_to_sleep(ThreadPool *pool, int ret)
-{
-    /*
-     * The semaphore timed out, we should exit the loop except when:
-     *  - There is work to do, we raced with the signal.
-     *  - The max threads threshold just changed, we raced with the signal.
-     *  - The thread pool forces a minimum number of readily available threads.
-     */
-    if (ret == -1 && (!QTAILQ_EMPTY(&pool->request_list) ||
-            pool->cur_threads > pool->max_threads ||
-            pool->cur_threads <= pool->min_threads)) {
-            return true;
-    }
-
-    return false;
-}
-
 static void *worker_thread(void *opaque)
 {
     ThreadPool *pool = opaque;
@@ -99,20 +82,25 @@ static void *worker_thread(void *opaque)
     pool->pending_threads--;
     do_spawn_thread(pool);
 
-    while (!pool->stopping) {
+    while (!pool->stopping && pool->cur_threads <= pool->max_threads) {
         ThreadPoolElement *req;
         int ret;
 
-        do {
+        if (QTAILQ_EMPTY(&pool->request_list)) {
             pool->idle_threads++;
-            qemu_mutex_unlock(&pool->lock);
-            ret = qemu_sem_timedwait(&pool->sem, 10000);
-            qemu_mutex_lock(&pool->lock);
+            ret = qemu_cond_timedwait(&pool->request_cond, &pool->lock, 10000);
             pool->idle_threads--;
-        } while (back_to_sleep(pool, ret));
-        if (ret == -1 || pool->stopping ||
-            pool->cur_threads > pool->max_threads) {
-            break;
+            if (ret == 0 &&
+                QTAILQ_EMPTY(&pool->request_list) &&
+                pool->cur_threads > pool->min_threads) {
+                /* Timed out + no work to do + no need for warm threads = exit.  */
+                break;
+            }
+            /*
+             * Even if there was some work to do, check if there aren't
+             * too many worker threads before picking it up.
+             */
+            continue;
         }
 
         req = QTAILQ_FIRST(&pool->request_list);
@@ -134,6 +122,12 @@ static void *worker_thread(void *opaque)
     pool->cur_threads--;
     qemu_cond_signal(&pool->worker_stopped);
     qemu_mutex_unlock(&pool->lock);
+
+    /*
+     * Wake up another thread, in case we got a wakeup but decided
+     * to exit due to pool->cur_threads > pool->max_threads.
+     */
+    qemu_cond_signal(&pool->worker_stopped);
     return NULL;
 }
 
@@ -229,13 +223,7 @@ static void thread_pool_cancel(BlockAIOCB *acb)
     trace_thread_pool_cancel(elem, elem->common.opaque);
 
     QEMU_LOCK_GUARD(&pool->lock);
-    if (elem->state == THREAD_QUEUED &&
-        /* No thread has yet started working on elem. we can try to "steal"
-         * the item from the worker if we can get a signal from the
-         * semaphore.  Because this is non-blocking, we can do it with
-         * the lock taken and ensure that elem will remain THREAD_QUEUED.
-         */
-        qemu_sem_timedwait(&pool->sem, 0) == 0) {
+    if (elem->state == THREAD_QUEUED) {
         QTAILQ_REMOVE(&pool->request_list, elem, reqs);
         qemu_bh_schedule(pool->completion_bh);
 
@@ -280,7 +268,7 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPool *pool,
     }
     QTAILQ_INSERT_TAIL(&pool->request_list, req, reqs);
     qemu_mutex_unlock(&pool->lock);
-    qemu_sem_post(&pool->sem);
+    qemu_cond_signal(&pool->request_cond);
     return &req->common;
 }
 
@@ -323,7 +311,7 @@ void thread_pool_update_params(ThreadPool *pool, AioContext *ctx)
      * We either have to:
      *  - Increase the number available of threads until over the min_threads
      *    threshold.
-     *  - Decrease the number of available threads until under the max_threads
+     *  - Bump the worker threads so that they exit, until under the max_threads
      *    threshold.
      *  - Do nothing. The current number of threads fall in between the min and
      *    max thresholds. We'll let the pool manage itself.
@@ -333,7 +321,7 @@ void thread_pool_update_params(ThreadPool *pool, AioContext *ctx)
     }
 
     for (int i = pool->cur_threads; i > pool->max_threads; i--) {
-        qemu_sem_post(&pool->sem);
+        qemu_cond_signal(&pool->request_cond);
     }
 
     qemu_mutex_unlock(&pool->lock);
@@ -350,7 +338,7 @@ static void thread_pool_init_one(ThreadPool *pool, AioContext *ctx)
     pool->completion_bh = aio_bh_new(ctx, thread_pool_completion_bh, pool);
     qemu_mutex_init(&pool->lock);
     qemu_cond_init(&pool->worker_stopped);
-    qemu_sem_init(&pool->sem, 0);
+    qemu_cond_init(&pool->request_cond);
     pool->new_thread_bh = aio_bh_new(ctx, spawn_thread_bh_fn, pool);
 
     QLIST_INIT(&pool->head);
@@ -383,15 +371,15 @@ void thread_pool_free(ThreadPool *pool)
 
     /* Wait for worker threads to terminate */
     pool->stopping = true;
+    qemu_cond_broadcast(&pool->request_cond);
     while (pool->cur_threads > 0) {
-        qemu_sem_post(&pool->sem);
         qemu_cond_wait(&pool->worker_stopped, &pool->lock);
     }
 
     qemu_mutex_unlock(&pool->lock);
 
     qemu_bh_delete(pool->completion_bh);
-    qemu_sem_destroy(&pool->sem);
+    qemu_cond_destroy(&pool->request_cond);
     qemu_cond_destroy(&pool->worker_stopped);
     qemu_mutex_destroy(&pool->lock);
     g_free(pool);
-- 
2.36.0


From 98cebebe9380fda00491de1ad68fdbe0a5a98bc6 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 12 May 2022 12:36:28 +0200
Subject: [PATCH v3 3/3] thread-pool: remove stopping variable
Content-Type: text/plain

Just setting the max threads to 0 is enough to stop all workers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/thread-pool.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 6bbf24754a..507c998827 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -69,7 +69,6 @@ struct ThreadPool {
     int idle_threads;
     int new_threads;     /* backlog of threads we need to create */
     int pending_threads; /* threads created but not running yet */
-    bool stopping;
     int min_threads;
     int max_threads;
 };
@@ -82,7 +81,7 @@ static void *worker_thread(void *opaque)
     pool->pending_threads--;
     do_spawn_thread(pool);
 
-    while (!pool->stopping && pool->cur_threads <= pool->max_threads) {
+    while (pool->cur_threads <= pool->max_threads) {
         ThreadPoolElement *req;
         int ret;
 
@@ -370,7 +369,7 @@ void thread_pool_free(ThreadPool *pool)
     pool->new_threads = 0;
 
     /* Wait for worker threads to terminate */
-    pool->stopping = true;
+    pool->max_threads = 0;
     qemu_cond_broadcast(&pool->request_cond);
     while (pool->cur_threads > 0) {
         qemu_cond_wait(&pool->worker_stopped, &pool->lock);
-- 
2.36.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-17 15:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-14  6:50 [PATCH v3 0/3] thread-pool: fix performance regression Paolo Bonzini
2022-05-14  6:50 ` [PATCH v3 1/3] thread-pool: optimize scheduling of completion bottom half Paolo Bonzini
2022-05-17 15:20   ` Stefan Hajnoczi
2022-05-14  6:50 ` [PATCH v3 2/3] thread-pool: replace semaphore with condition variable Paolo Bonzini
2022-05-17 12:46   ` Nicolas Saenz Julienne
2022-05-17 14:18     ` Paolo Bonzini
2022-05-17 14:37       ` Nicolas Saenz Julienne
2022-05-17 15:19   ` Stefan Hajnoczi
2022-05-17 15:21     ` Paolo Bonzini
2022-05-14  6:50 ` [PATCH v3 3/3] thread-pool: remove stopping variable Paolo Bonzini
2022-05-17 15:20   ` Stefan Hajnoczi
2022-05-17 14:30 ` [PATCH v3 0/3] thread-pool: fix performance regression Nicolas Saenz Julienne
  -- strict thread matches above, loose matches on Subject: below --
2022-05-14  6:50 Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.