All of lore.kernel.org
 help / color / mirror / Atom feed
* [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
@ 2021-04-23 21:11 Vivek Goyal
  2021-04-24  6:12 ` Liu Bo
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-23 21:11 UTC (permalink / raw)
  To: virtio-fs-list; +Cc: Michey Mehta

This is just an RFC patch for now. I am still running performance numbers
and see if this method of switching is good enough or not. I did one run
and seemed to get higher performance on deeper queue depths. There were
few tests where I did not match lower queue depth performance with 
no thread pool. May be run to run variation.

For low queue depth workloads, inline processing works well. But for 
high queue depth (and multiple process/thread) workloads, parallel
processing of thread pool can benefit.

This patch is an experiment which tries to switch to between inline and
thread-pool processing. If number of requests received on queue is
1, inline processing is done. Otherwise requests are handed over to
a thread pool for processing.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:03:46.175920039 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:56:37.793722634 -0400
@@ -446,6 +446,15 @@ err:
 static __thread bool clone_fs_called;
 
 /* Process one FVRequest in a thread pool */
+static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
+{
+    FVRequest *req = data;
+    GThreadPool *pool = user_data;
+
+    g_thread_pool_push(pool, req, NULL);
+}
+
+/* Process one FVRequest in a thread pool */
 static void fv_queue_worker(gpointer data, gpointer user_data)
 {
     struct fv_QueueInfo *qi = user_data;
@@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
     struct fuse_session *se = qi->virtio_dev->se;
     GThreadPool *pool = NULL;
     GList *req_list = NULL;
+    int nr_req = 0;
 
     if (se->thread_pool_size) {
         fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
@@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
             }
 
             req->reply_sent = false;
-
-            if (!se->thread_pool_size) {
-                req_list = g_list_prepend(req_list, req);
-            } else {
-                g_thread_pool_push(pool, req, NULL);
-            }
+            req_list = g_list_prepend(req_list, req);
+            nr_req++;
         }
 
         pthread_mutex_unlock(&qi->vq_lock);
         vu_dispatch_unlock(qi->virtio_dev);
 
         /* Process all the requests. */
-        if (!se->thread_pool_size && req_list != NULL) {
-            g_list_foreach(req_list, fv_queue_worker, qi);
+        if (req_list != NULL) {
+            if (!se->thread_pool_size || nr_req <= 2) {
+                g_list_foreach(req_list, fv_queue_worker, qi);
+            } else  {
+                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
+            }
             g_list_free(req_list);
             req_list = NULL;
+            nr_req = 0;
         }
     }
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
@ 2021-04-24  6:12 ` Liu Bo
  2021-04-26 11:39   ` Vivek Goyal
  2021-04-25  2:29 ` [Virtio-fs] [External] " Jiachen Zhang
  2021-04-26 17:34 ` [Virtio-fs] " Vivek Goyal
  2 siblings, 1 reply; 12+ messages in thread
From: Liu Bo @ 2021-04-24  6:12 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta

On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> This is just an RFC patch for now. I am still running performance numbers
> and see if this method of switching is good enough or not. I did one run
> and seemed to get higher performance on deeper queue depths. There were
> few tests where I did not match lower queue depth performance with 
> no thread pool. May be run to run variation.
> 
> For low queue depth workloads, inline processing works well. But for 
> high queue depth (and multiple process/thread) workloads, parallel
> processing of thread pool can benefit.
> 
> This patch is an experiment which tries to switch to between inline and
> thread-pool processing. If number of requests received on queue is
> 1, inline processing is done. Otherwise requests are handed over to
> a thread pool for processing.
>

I'm looking forward to the results showing how many requests would it
beats the overhead of using thread pools.

This is a good idea indeed, and the switch mechanism also applies to
async IO framework like iouring.

thanks,
liubo

> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> ===================================================================
> --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:03:46.175920039 -0400
> +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:56:37.793722634 -0400
> @@ -446,6 +446,15 @@ err:
>  static __thread bool clone_fs_called;
>  
>  /* Process one FVRequest in a thread pool */
> +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> +{
> +    FVRequest *req = data;
> +    GThreadPool *pool = user_data;
> +
> +    g_thread_pool_push(pool, req, NULL);
> +}
> +
> +/* Process one FVRequest in a thread pool */
>  static void fv_queue_worker(gpointer data, gpointer user_data)
>  {
>      struct fv_QueueInfo *qi = user_data;
> @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
>      struct fuse_session *se = qi->virtio_dev->se;
>      GThreadPool *pool = NULL;
>      GList *req_list = NULL;
> +    int nr_req = 0;
>  
>      if (se->thread_pool_size) {
>          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
>              }
>  
>              req->reply_sent = false;
> -
> -            if (!se->thread_pool_size) {
> -                req_list = g_list_prepend(req_list, req);
> -            } else {
> -                g_thread_pool_push(pool, req, NULL);
> -            }
> +            req_list = g_list_prepend(req_list, req);
> +            nr_req++;
>          }
>  
>          pthread_mutex_unlock(&qi->vq_lock);
>          vu_dispatch_unlock(qi->virtio_dev);
>  
>          /* Process all the requests. */
> -        if (!se->thread_pool_size && req_list != NULL) {
> -            g_list_foreach(req_list, fv_queue_worker, qi);
> +        if (req_list != NULL) {
> +            if (!se->thread_pool_size || nr_req <= 2) {
> +                g_list_foreach(req_list, fv_queue_worker, qi);
> +            } else  {
> +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> +            }
>              g_list_free(req_list);
>              req_list = NULL;
> +            nr_req = 0;
>          }
>      }
>  
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
  2021-04-24  6:12 ` Liu Bo
@ 2021-04-25  2:29 ` Jiachen Zhang
  2021-04-26 15:11   ` Vivek Goyal
  2021-04-26 17:34 ` [Virtio-fs] " Vivek Goyal
  2 siblings, 1 reply; 12+ messages in thread
From: Jiachen Zhang @ 2021-04-25  2:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta

[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]

On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:

> This is just an RFC patch for now. I am still running performance numbers
> and see if this method of switching is good enough or not. I did one run
> and seemed to get higher performance on deeper queue depths. There were
> few tests where I did not match lower queue depth performance with
> no thread pool. May be run to run variation.
>
>
I think this is interesting. I guess the switch number between thread-pool
and the inline method may change as the hardware or OS kernel changes.
Looking forward to the results.

Jiachen


> For low queue depth workloads, inline processing works well. But for
> high queue depth (and multiple process/thread) workloads, parallel
> processing of thread pool can benefit.
>
> This patch is an experiment which tries to switch to between inline and
> thread-pool processing. If number of requests received on queue is
> 1, inline processing is done. Otherwise requests are handed over to
> a thread pool for processing.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
> Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> ===================================================================
> --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c    2021-04-23
> 10:03:46.175920039 -0400
> +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> 10:56:37.793722634 -0400
> @@ -446,6 +446,15 @@ err:
>  static __thread bool clone_fs_called;
>
>  /* Process one FVRequest in a thread pool */
> +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> +{
> +    FVRequest *req = data;
> +    GThreadPool *pool = user_data;
> +
> +    g_thread_pool_push(pool, req, NULL);
> +}
> +
> +/* Process one FVRequest in a thread pool */
>  static void fv_queue_worker(gpointer data, gpointer user_data)
>  {
>      struct fv_QueueInfo *qi = user_data;
> @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
>      struct fuse_session *se = qi->virtio_dev->se;
>      GThreadPool *pool = NULL;
>      GList *req_list = NULL;
> +    int nr_req = 0;
>
>      if (se->thread_pool_size) {
>          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> %d\n",
> @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
>              }
>
>              req->reply_sent = false;
> -
> -            if (!se->thread_pool_size) {
> -                req_list = g_list_prepend(req_list, req);
> -            } else {
> -                g_thread_pool_push(pool, req, NULL);
> -            }
> +            req_list = g_list_prepend(req_list, req);
> +            nr_req++;
>          }
>
>          pthread_mutex_unlock(&qi->vq_lock);
>          vu_dispatch_unlock(qi->virtio_dev);
>
>          /* Process all the requests. */
> -        if (!se->thread_pool_size && req_list != NULL) {
> -            g_list_foreach(req_list, fv_queue_worker, qi);
> +        if (req_list != NULL) {
> +            if (!se->thread_pool_size || nr_req <= 2) {
> +                g_list_foreach(req_list, fv_queue_worker, qi);
> +            } else  {
> +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> +            }
>              g_list_free(req_list);
>              req_list = NULL;
> +            nr_req = 0;
>          }
>      }
>
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
>
>

[-- Attachment #2: Type: text/html, Size: 4797 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-24  6:12 ` Liu Bo
@ 2021-04-26 11:39   ` Vivek Goyal
  2021-04-26 19:03     ` Liu Bo
  0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 11:39 UTC (permalink / raw)
  To: Liu Bo; +Cc: virtio-fs-list, Michey Mehta

On Sat, Apr 24, 2021 at 02:12:44PM +0800, Liu Bo wrote:
> On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> > This is just an RFC patch for now. I am still running performance numbers
> > and see if this method of switching is good enough or not. I did one run
> > and seemed to get higher performance on deeper queue depths. There were
> > few tests where I did not match lower queue depth performance with 
> > no thread pool. May be run to run variation.
> > 
> > For low queue depth workloads, inline processing works well. But for 
> > high queue depth (and multiple process/thread) workloads, parallel
> > processing of thread pool can benefit.
> > 
> > This patch is an experiment which tries to switch to between inline and
> > thread-pool processing. If number of requests received on queue is
> > 1, inline processing is done. Otherwise requests are handed over to
> > a thread pool for processing.
> >
> 
> I'm looking forward to the results showing how many requests would it
> beats the overhead of using thread pools.
> 
> This is a good idea indeed, and the switch mechanism also applies to
> async IO framework like iouring.

Hi Liubo,

I have been thinking of using io_uring. Have you managed to make it work.
Do we get better performance with io_uring.

Thanks
Vivek

> 
> thanks,
> liubo
> 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
> >  1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > ===================================================================
> > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:03:46.175920039 -0400
> > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:56:37.793722634 -0400
> > @@ -446,6 +446,15 @@ err:
> >  static __thread bool clone_fs_called;
> >  
> >  /* Process one FVRequest in a thread pool */
> > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > +{
> > +    FVRequest *req = data;
> > +    GThreadPool *pool = user_data;
> > +
> > +    g_thread_pool_push(pool, req, NULL);
> > +}
> > +
> > +/* Process one FVRequest in a thread pool */
> >  static void fv_queue_worker(gpointer data, gpointer user_data)
> >  {
> >      struct fv_QueueInfo *qi = user_data;
> > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> >      struct fuse_session *se = qi->virtio_dev->se;
> >      GThreadPool *pool = NULL;
> >      GList *req_list = NULL;
> > +    int nr_req = 0;
> >  
> >      if (se->thread_pool_size) {
> >          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> >              }
> >  
> >              req->reply_sent = false;
> > -
> > -            if (!se->thread_pool_size) {
> > -                req_list = g_list_prepend(req_list, req);
> > -            } else {
> > -                g_thread_pool_push(pool, req, NULL);
> > -            }
> > +            req_list = g_list_prepend(req_list, req);
> > +            nr_req++;
> >          }
> >  
> >          pthread_mutex_unlock(&qi->vq_lock);
> >          vu_dispatch_unlock(qi->virtio_dev);
> >  
> >          /* Process all the requests. */
> > -        if (!se->thread_pool_size && req_list != NULL) {
> > -            g_list_foreach(req_list, fv_queue_worker, qi);
> > +        if (req_list != NULL) {
> > +            if (!se->thread_pool_size || nr_req <= 2) {
> > +                g_list_foreach(req_list, fv_queue_worker, qi);
> > +            } else  {
> > +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > +            }
> >              g_list_free(req_list);
> >              req_list = NULL;
> > +            nr_req = 0;
> >          }
> >      }
> >  
> > 
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-25  2:29 ` [Virtio-fs] [External] " Jiachen Zhang
@ 2021-04-26 15:11   ` Vivek Goyal
  2021-04-26 17:40     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 15:11 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: virtio-fs-list, Michey Mehta

On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > This is just an RFC patch for now. I am still running performance numbers
> > and see if this method of switching is good enough or not. I did one run
> > and seemed to get higher performance on deeper queue depths. There were
> > few tests where I did not match lower queue depth performance with
> > no thread pool. May be run to run variation.
> >
> >
> I think this is interesting. I guess the switch number between thread-pool
> and the inline method may change as the hardware or OS kernel changes.
> Looking forward to the results.

Hi Jiachen,

Agreed that threshold to switch between inline and thread-pool will vary
based on hardware and so many other factors. Having said that, there is
no good way to determine what's the most optimized threshold for a
particular setup. 

Maybe we will have to just go with a default which works reasonably
well and also provide a knob to user to alter that threshold. Atleast
it can be useful to experiment with different thresholds.

Vivek

> 
> Jiachen
> 
> 
> > For low queue depth workloads, inline processing works well. But for
> > high queue depth (and multiple process/thread) workloads, parallel
> > processing of thread pool can benefit.
> >
> > This patch is an experiment which tries to switch to between inline and
> > thread-pool processing. If number of requests received on queue is
> > 1, inline processing is done. Otherwise requests are handed over to
> > a thread pool for processing.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
> >  1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > ===================================================================
> > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c    2021-04-23
> > 10:03:46.175920039 -0400
> > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > 10:56:37.793722634 -0400
> > @@ -446,6 +446,15 @@ err:
> >  static __thread bool clone_fs_called;
> >
> >  /* Process one FVRequest in a thread pool */
> > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > +{
> > +    FVRequest *req = data;
> > +    GThreadPool *pool = user_data;
> > +
> > +    g_thread_pool_push(pool, req, NULL);
> > +}
> > +
> > +/* Process one FVRequest in a thread pool */
> >  static void fv_queue_worker(gpointer data, gpointer user_data)
> >  {
> >      struct fv_QueueInfo *qi = user_data;
> > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> >      struct fuse_session *se = qi->virtio_dev->se;
> >      GThreadPool *pool = NULL;
> >      GList *req_list = NULL;
> > +    int nr_req = 0;
> >
> >      if (se->thread_pool_size) {
> >          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > %d\n",
> > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> >              }
> >
> >              req->reply_sent = false;
> > -
> > -            if (!se->thread_pool_size) {
> > -                req_list = g_list_prepend(req_list, req);
> > -            } else {
> > -                g_thread_pool_push(pool, req, NULL);
> > -            }
> > +            req_list = g_list_prepend(req_list, req);
> > +            nr_req++;
> >          }
> >
> >          pthread_mutex_unlock(&qi->vq_lock);
> >          vu_dispatch_unlock(qi->virtio_dev);
> >
> >          /* Process all the requests. */
> > -        if (!se->thread_pool_size && req_list != NULL) {
> > -            g_list_foreach(req_list, fv_queue_worker, qi);
> > +        if (req_list != NULL) {
> > +            if (!se->thread_pool_size || nr_req <= 2) {
> > +                g_list_foreach(req_list, fv_queue_worker, qi);
> > +            } else  {
> > +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > +            }
> >              g_list_free(req_list);
> >              req_list = NULL;
> > +            nr_req = 0;
> >          }
> >      }
> >
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> >
> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
  2021-04-24  6:12 ` Liu Bo
  2021-04-25  2:29 ` [Virtio-fs] [External] " Jiachen Zhang
@ 2021-04-26 17:34 ` Vivek Goyal
  2 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 17:34 UTC (permalink / raw)
  To: virtio-fs-list; +Cc: Michey Mehta

On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> This is just an RFC patch for now. I am still running performance numbers
> and see if this method of switching is good enough or not. I did one run
> and seemed to get higher performance on deeper queue depths. There were
> few tests where I did not match lower queue depth performance with 
> no thread pool. May be run to run variation.

Hi All,

So I ran fio jobs with auto switch enabled and uploaded all the data
here. Also pasted final results at the end of email.

https://github.com/rhvgoyal/virtiofs-tests/tree/master/performance-results/auto-switch-inline-thread-pool/april-26-2021

Looks like auto switch threshold of 3 is working pretty well for me. That
is if queue depth of 3 or less, do inline processing otherwise switch
to thread pool. Of course this will vary from system to system and
based on I/O size as well. I have only tested with bs=4K.

I think it might be worth to provide user a knob say
"--thread-pool-threshold=<nr>" where user can control it. By default it
can be 3, may be.

Thanks
Vivek

NAME                    WORKLOAD                Bandwidth       IOPS            
tpool-16                seqread-psync           62.1mb          15.5k           
no-tpool                seqread-psync           89.3mb          22.3k           
auto-switch-3           seqread-psync           89.6mb          22.4k           
auto-switch-4           seqread-psync           88.8mb          22.2k           

tpool-16                seqread-psync-multi     256.1mb         64.0k           
no-tpool                seqread-psync-multi     280.6mb         70.1k           
auto-switch-3           seqread-psync-multi     282.9mb         70.7k           
auto-switch-4           seqread-psync-multi     284.9mb         71.2k           

tpool-16                seqread-libaio          327.7mb         81.9k           
no-tpool                seqread-libaio          306.5mb         76.6k           
auto-switch-3           seqread-libaio          361.8mb         90.4k           
auto-switch-4           seqread-libaio          365.3mb         91.3k           

tpool-16                seqread-libaio-multi    354.5mb         88.6k           
no-tpool                seqread-libaio-multi    300.7mb         75.1k           
auto-switch-3           seqread-libaio-multi    375.4mb         93.8k           
auto-switch-4           seqread-libaio-multi    374.3mb         93.5k           

tpool-16                randread-psync          64.5mb          16.1k           
no-tpool                randread-psync          88.5mb          22.1k           
auto-switch-3           randread-psync          87.5mb          21.8k           
auto-switch-4           randread-psync          87.7mb          21.9k           

tpool-16                randread-psync-multi    268.0mb         67.0k           
no-tpool                randread-psync-multi    282.5mb         70.6k           
auto-switch-3           randread-psync-multi    283.3mb         70.8k           
auto-switch-4           randread-psync-multi    282.6mb         70.6k           

tpool-16                randread-libaio         325.1mb         81.2k           
no-tpool                randread-libaio         303.0mb         75.7k           
auto-switch-3           randread-libaio         360.8mb         90.2k           
auto-switch-4           randread-libaio         361.0mb         90.2k           

tpool-16                randread-libaio-multi   352.2mb         88.0k           
no-tpool                randread-libaio-multi   289.6mb         72.4k           
auto-switch-3           randread-libaio-multi   373.5mb         93.3k           
auto-switch-4           randread-libaio-multi   372.0mb         93.0k           

tpool-16                seqwrite-psync          65.7mb          16.4k           
no-tpool                seqwrite-psync          87.6mb          21.9k           
auto-switch-3           seqwrite-psync          86.5mb          21.6k           
auto-switch-4           seqwrite-psync          85.9mb          21.4k           

tpool-16                seqwrite-psync-multi    270.9mb         67.7k           
no-tpool                seqwrite-psync-multi    268.0mb         67.0k           
auto-switch-3           seqwrite-psync-multi    267.7mb         66.9k           
auto-switch-4           seqwrite-psync-multi    263.7mb         65.9k           

tpool-16                seqwrite-libaio         293.5mb         73.3k           
no-tpool                seqwrite-libaio         286.1mb         71.5k           
auto-switch-3           seqwrite-libaio         333.7mb         83.4k           
auto-switch-4           seqwrite-libaio         330.5mb         82.6k           

tpool-16                seqwrite-libaio-multi   415.0mb         103.7k          
no-tpool                seqwrite-libaio-multi   281.6mb         70.4k           
auto-switch-3           seqwrite-libaio-multi   417.4mb         104.3k          
auto-switch-4           seqwrite-libaio-multi   401.6mb         100.4k          

tpool-16                randwrite-psync         65.1mb          16.2k           
no-tpool                randwrite-psync         87.9mb          21.9k           
auto-switch-3           randwrite-psync         86.6mb          21.6k           
auto-switch-4           randwrite-psync         85.9mb          21.4k           

tpool-16                randwrite-psync-multi   268.9mb         67.2k           
no-tpool                randwrite-psync-multi   261.0mb         65.2k           
auto-switch-3           randwrite-psync-multi   261.2mb         65.3k           
auto-switch-4           randwrite-psync-multi   263.7mb         65.9k           

tpool-16                randwrite-libaio        300.8mb         75.2k           
no-tpool                randwrite-libaio        287.4mb         71.8k           
auto-switch-3           randwrite-libaio        328.2mb         82.0k           
auto-switch-4           randwrite-libaio        335.4mb         83.8k           

tpool-16                randwrite-libaio-multi  415.4mb         103.8k          
no-tpool                randwrite-libaio-multi  271.7mb         67.9k           
auto-switch-3           randwrite-libaio-multi  415.7mb         103.9k          
auto-switch-4           randwrite-libaio-multi  396.5mb         99.1k           

tpool-16                randrw-psync            35.3mb/11.7mb   9044/3019       
no-tpool                randrw-psync            52.9mb/17.7mb   13.2k/4536      
auto-switch-3           randrw-psync            52.7mb/17.6mb   13.1k/4516      
auto-switch-4           randrw-psync            52.7mb/17.6mb   13.1k/4514      

tpool-16                randrw-psync-multi      151.9mb/50.8mb  37.9k/12.7k     
no-tpool                randrw-psync-multi      160.5mb/53.7mb  40.1k/13.4k     
auto-switch-3           randrw-psync-multi      167.6mb/56.1mb  41.9k/14.0k     
auto-switch-4           randrw-psync-multi      164.6mb/55.1mb  41.1k/13.7k     

tpool-16                randrw-libaio           83.0mb/27.8mb   20.7k/7121      
no-tpool                randrw-libaio           87.2mb/29.2mb   21.7k/7479      
auto-switch-3           randrw-libaio           87.9mb/29.4mb   21.9k/7544      
auto-switch-4           randrw-libaio           86.4mb/28.9mb   21.6k/7417      

tpool-16                randrw-libaio-multi     201.6mb/67.5mb  50.4k/16.8k     
no-tpool                randrw-libaio-multi     187.5mb/62.8mb  46.8k/15.7k     
auto-switch-3           randrw-libaio-multi     218.0mb/73.0mb  54.5k/18.2k     
auto-switch-4           randrw-libaio-multi     213.4mb/71.5mb  53.3k/17.8k     


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-26 15:11   ` Vivek Goyal
@ 2021-04-26 17:40     ` Dr. David Alan Gilbert
  2021-04-26 17:51       ` Vivek Goyal
  0 siblings, 1 reply; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-26 17:40 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> > On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > This is just an RFC patch for now. I am still running performance numbers
> > > and see if this method of switching is good enough or not. I did one run
> > > and seemed to get higher performance on deeper queue depths. There were
> > > few tests where I did not match lower queue depth performance with
> > > no thread pool. May be run to run variation.
> > >
> > >
> > I think this is interesting. I guess the switch number between thread-pool
> > and the inline method may change as the hardware or OS kernel changes.
> > Looking forward to the results.
> 
> Hi Jiachen,
> 
> Agreed that threshold to switch between inline and thread-pool will vary
> based on hardware and so many other factors. Having said that, there is
> no good way to determine what's the most optimized threshold for a
> particular setup. 

I can imagine something based on the % of time a thread is utilised as
being an indication you might need more/less; but that's an even more
complicated heuristic.

> Maybe we will have to just go with a default which works reasonably
> well and also provide a knob to user to alter that threshold. Atleast
> it can be useful to experiment with different thresholds.

Dave

> Vivek
> 
> > 
> > Jiachen
> > 
> > 
> > > For low queue depth workloads, inline processing works well. But for
> > > high queue depth (and multiple process/thread) workloads, parallel
> > > processing of thread pool can benefit.
> > >
> > > This patch is an experiment which tries to switch to between inline and
> > > thread-pool processing. If number of requests received on queue is
> > > 1, inline processing is done. Otherwise requests are handed over to
> > > a thread pool for processing.
> > >
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > >  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
> > >  1 file changed, 19 insertions(+), 8 deletions(-)
> > >
> > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > ===================================================================
> > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c    2021-04-23
> > > 10:03:46.175920039 -0400
> > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > 10:56:37.793722634 -0400
> > > @@ -446,6 +446,15 @@ err:
> > >  static __thread bool clone_fs_called;
> > >
> > >  /* Process one FVRequest in a thread pool */
> > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > +{
> > > +    FVRequest *req = data;
> > > +    GThreadPool *pool = user_data;
> > > +
> > > +    g_thread_pool_push(pool, req, NULL);
> > > +}
> > > +
> > > +/* Process one FVRequest in a thread pool */
> > >  static void fv_queue_worker(gpointer data, gpointer user_data)
> > >  {
> > >      struct fv_QueueInfo *qi = user_data;
> > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > >      struct fuse_session *se = qi->virtio_dev->se;
> > >      GThreadPool *pool = NULL;
> > >      GList *req_list = NULL;
> > > +    int nr_req = 0;
> > >
> > >      if (se->thread_pool_size) {
> > >          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > > %d\n",
> > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > >              }
> > >
> > >              req->reply_sent = false;
> > > -
> > > -            if (!se->thread_pool_size) {
> > > -                req_list = g_list_prepend(req_list, req);
> > > -            } else {
> > > -                g_thread_pool_push(pool, req, NULL);
> > > -            }
> > > +            req_list = g_list_prepend(req_list, req);
> > > +            nr_req++;
> > >          }
> > >
> > >          pthread_mutex_unlock(&qi->vq_lock);
> > >          vu_dispatch_unlock(qi->virtio_dev);
> > >
> > >          /* Process all the requests. */
> > > -        if (!se->thread_pool_size && req_list != NULL) {
> > > -            g_list_foreach(req_list, fv_queue_worker, qi);
> > > +        if (req_list != NULL) {
> > > +            if (!se->thread_pool_size || nr_req <= 2) {
> > > +                g_list_foreach(req_list, fv_queue_worker, qi);
> > > +            } else  {
> > > +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > +            }
> > >              g_list_free(req_list);
> > >              req_list = NULL;
> > > +            nr_req = 0;
> > >          }
> > >      }
> > >
> > >
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > >
> > >
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-26 17:40     ` Dr. David Alan Gilbert
@ 2021-04-26 17:51       ` Vivek Goyal
  2021-04-26 17:59         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 17:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, Michey Mehta

On Mon, Apr 26, 2021 at 06:40:28PM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> > > On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > 
> > > > This is just an RFC patch for now. I am still running performance numbers
> > > > and see if this method of switching is good enough or not. I did one run
> > > > and seemed to get higher performance on deeper queue depths. There were
> > > > few tests where I did not match lower queue depth performance with
> > > > no thread pool. May be run to run variation.
> > > >
> > > >
> > > I think this is interesting. I guess the switch number between thread-pool
> > > and the inline method may change as the hardware or OS kernel changes.
> > > Looking forward to the results.
> > 
> > Hi Jiachen,
> > 
> > Agreed that threshold to switch between inline and thread-pool will vary
> > based on hardware and so many other factors. Having said that, there is
> > no good way to determine what's the most optimized threshold for a
> > particular setup. 
> 
> I can imagine something based on the % of time a thread is utilised as
> being an indication you might need more/less; but that's an even more
> complicated heuristic.

Yes, that's even more complicated heuristic.

Also not sure how to account for when thread blocks in kernel waiting for
I/O to finish. Thread is not keeping cpu busy but at the same time
can't make progress.

Vivek

> 
> > Maybe we will have to just go with a default which works reasonably
> > well and also provide a knob to user to alter that threshold. Atleast
> > it can be useful to experiment with different thresholds.
> 
> Dave
> 
> > Vivek
> > 
> > > 
> > > Jiachen
> > > 
> > > 
> > > > For low queue depth workloads, inline processing works well. But for
> > > > high queue depth (and multiple process/thread) workloads, parallel
> > > > processing of thread pool can benefit.
> > > >
> > > > This patch is an experiment which tries to switch to between inline and
> > > > thread-pool processing. If number of requests received on queue is
> > > > 1, inline processing is done. Otherwise requests are handed over to
> > > > a thread pool for processing.
> > > >
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
> > > >  1 file changed, 19 insertions(+), 8 deletions(-)
> > > >
> > > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > > ===================================================================
> > > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c    2021-04-23
> > > > 10:03:46.175920039 -0400
> > > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > > 10:56:37.793722634 -0400
> > > > @@ -446,6 +446,15 @@ err:
> > > >  static __thread bool clone_fs_called;
> > > >
> > > >  /* Process one FVRequest in a thread pool */
> > > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > > +{
> > > > +    FVRequest *req = data;
> > > > +    GThreadPool *pool = user_data;
> > > > +
> > > > +    g_thread_pool_push(pool, req, NULL);
> > > > +}
> > > > +
> > > > +/* Process one FVRequest in a thread pool */
> > > >  static void fv_queue_worker(gpointer data, gpointer user_data)
> > > >  {
> > > >      struct fv_QueueInfo *qi = user_data;
> > > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > >      struct fuse_session *se = qi->virtio_dev->se;
> > > >      GThreadPool *pool = NULL;
> > > >      GList *req_list = NULL;
> > > > +    int nr_req = 0;
> > > >
> > > >      if (se->thread_pool_size) {
> > > >          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > > > %d\n",
> > > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > >              }
> > > >
> > > >              req->reply_sent = false;
> > > > -
> > > > -            if (!se->thread_pool_size) {
> > > > -                req_list = g_list_prepend(req_list, req);
> > > > -            } else {
> > > > -                g_thread_pool_push(pool, req, NULL);
> > > > -            }
> > > > +            req_list = g_list_prepend(req_list, req);
> > > > +            nr_req++;
> > > >          }
> > > >
> > > >          pthread_mutex_unlock(&qi->vq_lock);
> > > >          vu_dispatch_unlock(qi->virtio_dev);
> > > >
> > > >          /* Process all the requests. */
> > > > -        if (!se->thread_pool_size && req_list != NULL) {
> > > > -            g_list_foreach(req_list, fv_queue_worker, qi);
> > > > +        if (req_list != NULL) {
> > > > +            if (!se->thread_pool_size || nr_req <= 2) {
> > > > +                g_list_foreach(req_list, fv_queue_worker, qi);
> > > > +            } else  {
> > > > +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > > +            }
> > > >              g_list_free(req_list);
> > > >              req_list = NULL;
> > > > +            nr_req = 0;
> > > >          }
> > > >      }
> > > >
> > > >
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > Virtio-fs@redhat.com
> > > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > > >
> > > >
> > 
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-26 17:51       ` Vivek Goyal
@ 2021-04-26 17:59         ` Dr. David Alan Gilbert
  2021-04-28 13:40           ` Vivek Goyal
  0 siblings, 1 reply; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-26 17:59 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Mon, Apr 26, 2021 at 06:40:28PM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> > > > On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > 
> > > > > This is just an RFC patch for now. I am still running performance numbers
> > > > > and see if this method of switching is good enough or not. I did one run
> > > > > and seemed to get higher performance on deeper queue depths. There were
> > > > > few tests where I did not match lower queue depth performance with
> > > > > no thread pool. May be run to run variation.
> > > > >
> > > > >
> > > > I think this is interesting. I guess the switch number between thread-pool
> > > > and the inline method may change as the hardware or OS kernel changes.
> > > > Looking forward to the results.
> > > 
> > > Hi Jiachen,
> > > 
> > > Agreed that threshold to switch between inline and thread-pool will vary
> > > based on hardware and so many other factors. Having said that, there is
> > > no good way to determine what's the most optimized threshold for a
> > > particular setup. 
> > 
> > I can imagine something based on the % of time a thread is utilised as
> > being an indication you might need more/less; but that's an even more
> > complicated heuristic.
> 
> Yes, that's even more complicated heuristic.
> 
> Also not sure how to account for when thread blocks in kernel waiting for
> I/O to finish. Thread is not keeping cpu busy but at the same time
> can't make progress.

Does it matter if the thread is blocked due to CPU or IO?  If you can
just look at the time the thread is non-free then that gives you a
utilisation.

Dave


> Vivek
> 
> > 
> > > Maybe we will have to just go with a default which works reasonably
> > > well and also provide a knob to user to alter that threshold. Atleast
> > > it can be useful to experiment with different thresholds.
> > 
> > Dave
> > 
> > > Vivek
> > > 
> > > > 
> > > > Jiachen
> > > > 
> > > > 
> > > > > For low queue depth workloads, inline processing works well. But for
> > > > > high queue depth (and multiple process/thread) workloads, parallel
> > > > > processing of thread pool can benefit.
> > > > >
> > > > > This patch is an experiment which tries to switch to between inline and
> > > > > thread-pool processing. If number of requests received on queue is
> > > > > 1, inline processing is done. Otherwise requests are handed over to
> > > > > a thread pool for processing.
> > > > >
> > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > > ---
> > > > >  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
> > > > >  1 file changed, 19 insertions(+), 8 deletions(-)
> > > > >
> > > > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > > > ===================================================================
> > > > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c    2021-04-23
> > > > > 10:03:46.175920039 -0400
> > > > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > > > 10:56:37.793722634 -0400
> > > > > @@ -446,6 +446,15 @@ err:
> > > > >  static __thread bool clone_fs_called;
> > > > >
> > > > >  /* Process one FVRequest in a thread pool */
> > > > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > > > +{
> > > > > +    FVRequest *req = data;
> > > > > +    GThreadPool *pool = user_data;
> > > > > +
> > > > > +    g_thread_pool_push(pool, req, NULL);
> > > > > +}
> > > > > +
> > > > > +/* Process one FVRequest in a thread pool */
> > > > >  static void fv_queue_worker(gpointer data, gpointer user_data)
> > > > >  {
> > > > >      struct fv_QueueInfo *qi = user_data;
> > > > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > > >      struct fuse_session *se = qi->virtio_dev->se;
> > > > >      GThreadPool *pool = NULL;
> > > > >      GList *req_list = NULL;
> > > > > +    int nr_req = 0;
> > > > >
> > > > >      if (se->thread_pool_size) {
> > > > >          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > > > > %d\n",
> > > > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > > >              }
> > > > >
> > > > >              req->reply_sent = false;
> > > > > -
> > > > > -            if (!se->thread_pool_size) {
> > > > > -                req_list = g_list_prepend(req_list, req);
> > > > > -            } else {
> > > > > -                g_thread_pool_push(pool, req, NULL);
> > > > > -            }
> > > > > +            req_list = g_list_prepend(req_list, req);
> > > > > +            nr_req++;
> > > > >          }
> > > > >
> > > > >          pthread_mutex_unlock(&qi->vq_lock);
> > > > >          vu_dispatch_unlock(qi->virtio_dev);
> > > > >
> > > > >          /* Process all the requests. */
> > > > > -        if (!se->thread_pool_size && req_list != NULL) {
> > > > > -            g_list_foreach(req_list, fv_queue_worker, qi);
> > > > > +        if (req_list != NULL) {
> > > > > +            if (!se->thread_pool_size || nr_req <= 2) {
> > > > > +                g_list_foreach(req_list, fv_queue_worker, qi);
> > > > > +            } else  {
> > > > > +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > > > +            }
> > > > >              g_list_free(req_list);
> > > > >              req_list = NULL;
> > > > > +            nr_req = 0;
> > > > >          }
> > > > >      }
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Virtio-fs mailing list
> > > > > Virtio-fs@redhat.com
> > > > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > > > >
> > > > >
> > > 
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > -- 
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-26 11:39   ` Vivek Goyal
@ 2021-04-26 19:03     ` Liu Bo
  2021-04-26 19:10       ` Vivek Goyal
  0 siblings, 1 reply; 12+ messages in thread
From: Liu Bo @ 2021-04-26 19:03 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta

On Mon, Apr 26, 2021 at 07:39:54AM -0400, Vivek Goyal wrote:
> On Sat, Apr 24, 2021 at 02:12:44PM +0800, Liu Bo wrote:
> > On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> > > This is just an RFC patch for now. I am still running performance numbers
> > > and see if this method of switching is good enough or not. I did one run
> > > and seemed to get higher performance on deeper queue depths. There were
> > > few tests where I did not match lower queue depth performance with 
> > > no thread pool. May be run to run variation.
> > > 
> > > For low queue depth workloads, inline processing works well. But for 
> > > high queue depth (and multiple process/thread) workloads, parallel
> > > processing of thread pool can benefit.
> > > 
> > > This patch is an experiment which tries to switch to between inline and
> > > thread-pool processing. If number of requests received on queue is
> > > 1, inline processing is done. Otherwise requests are handed over to
> > > a thread pool for processing.
> > >
> > 
> > I'm looking forward to the results showing how many requests would it
> > beats the overhead of using thread pools.
> > 
> > This is a good idea indeed, and the switch mechanism also applies to
> > async IO framework like iouring.
> 
> Hi Liubo,
> 
> I have been thinking of using io_uring. Have you managed to make it work.
> Do we get better performance with io_uring.
>
Hi Vivek,

With fuse-backend-rs, I did some experiments using rust async
framework and io_uring wrapper ringbahn, and my code is written to
provide a few rust coroutines to serve fuse process loop.

I tested the same 8k random read workload on three setups,
a) single thread
b) threads=4 multiple threads
c) coroutines=4 async 

the performance tests showed some expected feedbacks, in terms of IO
intensive workloads, "async" beats single thread, it comes with
overheads though, it performs about 80% of "multiple threads".

Note that the above tests were done with no limit of cpu/mem
resources, by limiting cpu to 1, "async" performs the best given the
async io kthreads were not limited.

So it looks like all three setups have pros and cons, it'd be great if
we can do switching between them on the fly.

thanks,
liubo

> Thanks
> Vivek
> 
> > 
> > thanks,
> > liubo
> > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > >  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
> > >  1 file changed, 19 insertions(+), 8 deletions(-)
> > > 
> > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > ===================================================================
> > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:03:46.175920039 -0400
> > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:56:37.793722634 -0400
> > > @@ -446,6 +446,15 @@ err:
> > >  static __thread bool clone_fs_called;
> > >  
> > >  /* Process one FVRequest in a thread pool */
> > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > +{
> > > +    FVRequest *req = data;
> > > +    GThreadPool *pool = user_data;
> > > +
> > > +    g_thread_pool_push(pool, req, NULL);
> > > +}
> > > +
> > > +/* Process one FVRequest in a thread pool */
> > >  static void fv_queue_worker(gpointer data, gpointer user_data)
> > >  {
> > >      struct fv_QueueInfo *qi = user_data;
> > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > >      struct fuse_session *se = qi->virtio_dev->se;
> > >      GThreadPool *pool = NULL;
> > >      GList *req_list = NULL;
> > > +    int nr_req = 0;
> > >  
> > >      if (se->thread_pool_size) {
> > >          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > >              }
> > >  
> > >              req->reply_sent = false;
> > > -
> > > -            if (!se->thread_pool_size) {
> > > -                req_list = g_list_prepend(req_list, req);
> > > -            } else {
> > > -                g_thread_pool_push(pool, req, NULL);
> > > -            }
> > > +            req_list = g_list_prepend(req_list, req);
> > > +            nr_req++;
> > >          }
> > >  
> > >          pthread_mutex_unlock(&qi->vq_lock);
> > >          vu_dispatch_unlock(qi->virtio_dev);
> > >  
> > >          /* Process all the requests. */
> > > -        if (!se->thread_pool_size && req_list != NULL) {
> > > -            g_list_foreach(req_list, fv_queue_worker, qi);
> > > +        if (req_list != NULL) {
> > > +            if (!se->thread_pool_size || nr_req <= 2) {
> > > +                g_list_foreach(req_list, fv_queue_worker, qi);
> > > +            } else  {
> > > +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > +            }
> > >              g_list_free(req_list);
> > >              req_list = NULL;
> > > +            nr_req = 0;
> > >          }
> > >      }
> > >  
> > > 
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-26 19:03     ` Liu Bo
@ 2021-04-26 19:10       ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 19:10 UTC (permalink / raw)
  To: Liu Bo; +Cc: virtio-fs-list, Michey Mehta

On Tue, Apr 27, 2021 at 03:03:43AM +0800, Liu Bo wrote:
> On Mon, Apr 26, 2021 at 07:39:54AM -0400, Vivek Goyal wrote:
> > On Sat, Apr 24, 2021 at 02:12:44PM +0800, Liu Bo wrote:
> > > On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> > > > This is just an RFC patch for now. I am still running performance numbers
> > > > and see if this method of switching is good enough or not. I did one run
> > > > and seemed to get higher performance on deeper queue depths. There were
> > > > few tests where I did not match lower queue depth performance with 
> > > > no thread pool. May be run to run variation.
> > > > 
> > > > For low queue depth workloads, inline processing works well. But for 
> > > > high queue depth (and multiple process/thread) workloads, parallel
> > > > processing of thread pool can benefit.
> > > > 
> > > > This patch is an experiment which tries to switch to between inline and
> > > > thread-pool processing. If number of requests received on queue is
> > > > 1, inline processing is done. Otherwise requests are handed over to
> > > > a thread pool for processing.
> > > >
> > > 
> > > I'm looking forward to the results showing how many requests would it
> > > beats the overhead of using thread pools.
> > > 
> > > This is a good idea indeed, and the switch mechanism also applies to
> > > async IO framework like iouring.
> > 
> > Hi Liubo,
> > 
> > I have been thinking of using io_uring. Have you managed to make it work.
> > Do we get better performance with io_uring.
> >
> Hi Vivek,
> 
> With fuse-backend-rs, I did some experiments using rust async
> framework and io_uring wrapper ringbahn, and my code is written to
> provide a few rust coroutines to serve fuse process loop.
> 
> I tested the same 8k random read workload on three setups,
> a) single thread
> b) threads=4 multiple threads
> c) coroutines=4 async 
> 
> the performance tests showed some expected feedbacks, in terms of IO
> intensive workloads, "async" beats single thread, it comes with
> overheads though, it performs about 80% of "multiple threads".
> 
> Note that the above tests were done with no limit of cpu/mem
> resources, by limiting cpu to 1, "async" performs the best given the
> async io kthreads were not limited.

So if number of cpus are not limited for virtiofsd, then async (c)
did not perform better than "multiple threads" (b)? 

IOW, async performed better than multiple threads only if number
of cpus was limited to 1 for virtiofsd.

Did I understand it right?

> 
> So it looks like all three setups have pros and cons, it'd be great if
> we can do switching between them on the fly.

Agreed. It probably will make sense to add support for async I/O
(probably using io_uring) and provide a knob to let user select
whatever I/O interface they want to use.

Vivek

> 
> thanks,
> liubo
> 
> > Thanks
> > Vivek
> > 
> > > 
> > > thanks,
> > > liubo
> > > 
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/fuse_virtio.c |   27 +++++++++++++++++++--------
> > > >  1 file changed, 19 insertions(+), 8 deletions(-)
> > > > 
> > > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > > ===================================================================
> > > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:03:46.175920039 -0400
> > > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c	2021-04-23 10:56:37.793722634 -0400
> > > > @@ -446,6 +446,15 @@ err:
> > > >  static __thread bool clone_fs_called;
> > > >  
> > > >  /* Process one FVRequest in a thread pool */
> > > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > > +{
> > > > +    FVRequest *req = data;
> > > > +    GThreadPool *pool = user_data;
> > > > +
> > > > +    g_thread_pool_push(pool, req, NULL);
> > > > +}
> > > > +
> > > > +/* Process one FVRequest in a thread pool */
> > > >  static void fv_queue_worker(gpointer data, gpointer user_data)
> > > >  {
> > > >      struct fv_QueueInfo *qi = user_data;
> > > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > >      struct fuse_session *se = qi->virtio_dev->se;
> > > >      GThreadPool *pool = NULL;
> > > >      GList *req_list = NULL;
> > > > +    int nr_req = 0;
> > > >  
> > > >      if (se->thread_pool_size) {
> > > >          fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> > > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > >              }
> > > >  
> > > >              req->reply_sent = false;
> > > > -
> > > > -            if (!se->thread_pool_size) {
> > > > -                req_list = g_list_prepend(req_list, req);
> > > > -            } else {
> > > > -                g_thread_pool_push(pool, req, NULL);
> > > > -            }
> > > > +            req_list = g_list_prepend(req_list, req);
> > > > +            nr_req++;
> > > >          }
> > > >  
> > > >          pthread_mutex_unlock(&qi->vq_lock);
> > > >          vu_dispatch_unlock(qi->virtio_dev);
> > > >  
> > > >          /* Process all the requests. */
> > > > -        if (!se->thread_pool_size && req_list != NULL) {
> > > > -            g_list_foreach(req_list, fv_queue_worker, qi);
> > > > +        if (req_list != NULL) {
> > > > +            if (!se->thread_pool_size || nr_req <= 2) {
> > > > +                g_list_foreach(req_list, fv_queue_worker, qi);
> > > > +            } else  {
> > > > +                g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > > +            }
> > > >              g_list_free(req_list);
> > > >              req_list = NULL;
> > > > +            nr_req = 0;
> > > >          }
> > > >      }
> > > >  
> > > > 
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > Virtio-fs@redhat.com
> > > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > > 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
  2021-04-26 17:59         ` Dr. David Alan Gilbert
@ 2021-04-28 13:40           ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-28 13:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, Michey Mehta

On Mon, Apr 26, 2021 at 06:59:49PM +0100, Dr. David Alan Gilbert wrote:

[..]
> > > I can imagine something based on the % of time a thread is utilised as
> > > being an indication you might need more/less; but that's an even more
> > > complicated heuristic.
> > 
> > Yes, that's even more complicated heuristic.
> > 
> > Also not sure how to account for when thread blocks in kernel waiting for
> > I/O to finish. Thread is not keeping cpu busy but at the same time
> > can't make progress.
> 
> Does it matter if the thread is blocked due to CPU or IO?  If you can
> just look at the time the thread is non-free then that gives you a
> utilisation.

Hmm.., i guess you are right. So one way of doing this will be take
time snapshots.

t1 = gettimeofday()
thread-goes-to-sleep
thread-wakes-up
t2= gettimeofday()
thread-pops-requests
thread-processes-requests-inline
t3=gettimeofday()

So effectively t2-t1 is free time and t3-t2 is busy time. So % utilization
should be.

%util=(t3-t2) * 100/(t3-t1)

Once we swtich to thread pool, then we will also need something to go
back to inline processing. Some notion of how busy thread pool is. If
thread pool's %utilization is less then switch back to inline processing.
This bit might be more tricky because we use glib thread pool and glib
API might not provide us any of this data.

https://developer.gnome.org/glib/2.68/glib-Thread-Pools.html

I feel that heuristics based on number of requests are very simple to
implement. So may be we can implement that for now.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-04-28 13:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
2021-04-24  6:12 ` Liu Bo
2021-04-26 11:39   ` Vivek Goyal
2021-04-26 19:03     ` Liu Bo
2021-04-26 19:10       ` Vivek Goyal
2021-04-25  2:29 ` [Virtio-fs] [External] " Jiachen Zhang
2021-04-26 15:11   ` Vivek Goyal
2021-04-26 17:40     ` Dr. David Alan Gilbert
2021-04-26 17:51       ` Vivek Goyal
2021-04-26 17:59         ` Dr. David Alan Gilbert
2021-04-28 13:40           ` Vivek Goyal
2021-04-26 17:34 ` [Virtio-fs] " Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.