* [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
@ 2021-04-23 21:11 Vivek Goyal
2021-04-24 6:12 ` Liu Bo
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-23 21:11 UTC (permalink / raw)
To: virtio-fs-list; +Cc: Michey Mehta
This is just an RFC patch for now. I am still running performance numbers
and see if this method of switching is good enough or not. I did one run
and seemed to get higher performance on deeper queue depths. There were
few tests where I did not match lower queue depth performance with
no thread pool. May be run to run variation.
For low queue depth workloads, inline processing works well. But for
high queue depth (and multiple process/thread) workloads, parallel
processing of thread pool can benefit.
This patch is an experiment which tries to switch to between inline and
thread-pool processing. If number of requests received on queue is
1, inline processing is done. Otherwise requests are handed over to
a thread pool for processing.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:03:46.175920039 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:56:37.793722634 -0400
@@ -446,6 +446,15 @@ err:
static __thread bool clone_fs_called;
/* Process one FVRequest in a thread pool */
+static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
+{
+ FVRequest *req = data;
+ GThreadPool *pool = user_data;
+
+ g_thread_pool_push(pool, req, NULL);
+}
+
+/* Process one FVRequest in a thread pool */
static void fv_queue_worker(gpointer data, gpointer user_data)
{
struct fv_QueueInfo *qi = user_data;
@@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
struct fuse_session *se = qi->virtio_dev->se;
GThreadPool *pool = NULL;
GList *req_list = NULL;
+ int nr_req = 0;
if (se->thread_pool_size) {
fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
@@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
}
req->reply_sent = false;
-
- if (!se->thread_pool_size) {
- req_list = g_list_prepend(req_list, req);
- } else {
- g_thread_pool_push(pool, req, NULL);
- }
+ req_list = g_list_prepend(req_list, req);
+ nr_req++;
}
pthread_mutex_unlock(&qi->vq_lock);
vu_dispatch_unlock(qi->virtio_dev);
/* Process all the requests. */
- if (!se->thread_pool_size && req_list != NULL) {
- g_list_foreach(req_list, fv_queue_worker, qi);
+ if (req_list != NULL) {
+ if (!se->thread_pool_size || nr_req <= 2) {
+ g_list_foreach(req_list, fv_queue_worker, qi);
+ } else {
+ g_list_foreach(req_list, fv_queue_push_to_pool, pool);
+ }
g_list_free(req_list);
req_list = NULL;
+ nr_req = 0;
}
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
@ 2021-04-24 6:12 ` Liu Bo
2021-04-26 11:39 ` Vivek Goyal
2021-04-25 2:29 ` [Virtio-fs] [External] " Jiachen Zhang
2021-04-26 17:34 ` [Virtio-fs] " Vivek Goyal
2 siblings, 1 reply; 12+ messages in thread
From: Liu Bo @ 2021-04-24 6:12 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta
On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> This is just an RFC patch for now. I am still running performance numbers
> and see if this method of switching is good enough or not. I did one run
> and seemed to get higher performance on deeper queue depths. There were
> few tests where I did not match lower queue depth performance with
> no thread pool. May be run to run variation.
>
> For low queue depth workloads, inline processing works well. But for
> high queue depth (and multiple process/thread) workloads, parallel
> processing of thread pool can benefit.
>
> This patch is an experiment which tries to switch to between inline and
> thread-pool processing. If number of requests received on queue is
> 1, inline processing is done. Otherwise requests are handed over to
> a thread pool for processing.
>
I'm looking forward to the results showing how many requests would it
beats the overhead of using thread pools.
This is a good idea indeed, and the switch mechanism also applies to
async IO framework like iouring.
thanks,
liubo
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
> tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> ===================================================================
> --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:03:46.175920039 -0400
> +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:56:37.793722634 -0400
> @@ -446,6 +446,15 @@ err:
> static __thread bool clone_fs_called;
>
> /* Process one FVRequest in a thread pool */
> +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> +{
> + FVRequest *req = data;
> + GThreadPool *pool = user_data;
> +
> + g_thread_pool_push(pool, req, NULL);
> +}
> +
> +/* Process one FVRequest in a thread pool */
> static void fv_queue_worker(gpointer data, gpointer user_data)
> {
> struct fv_QueueInfo *qi = user_data;
> @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> struct fuse_session *se = qi->virtio_dev->se;
> GThreadPool *pool = NULL;
> GList *req_list = NULL;
> + int nr_req = 0;
>
> if (se->thread_pool_size) {
> fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> }
>
> req->reply_sent = false;
> -
> - if (!se->thread_pool_size) {
> - req_list = g_list_prepend(req_list, req);
> - } else {
> - g_thread_pool_push(pool, req, NULL);
> - }
> + req_list = g_list_prepend(req_list, req);
> + nr_req++;
> }
>
> pthread_mutex_unlock(&qi->vq_lock);
> vu_dispatch_unlock(qi->virtio_dev);
>
> /* Process all the requests. */
> - if (!se->thread_pool_size && req_list != NULL) {
> - g_list_foreach(req_list, fv_queue_worker, qi);
> + if (req_list != NULL) {
> + if (!se->thread_pool_size || nr_req <= 2) {
> + g_list_foreach(req_list, fv_queue_worker, qi);
> + } else {
> + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> + }
> g_list_free(req_list);
> req_list = NULL;
> + nr_req = 0;
> }
> }
>
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
2021-04-24 6:12 ` Liu Bo
@ 2021-04-25 2:29 ` Jiachen Zhang
2021-04-26 15:11 ` Vivek Goyal
2021-04-26 17:34 ` [Virtio-fs] " Vivek Goyal
2 siblings, 1 reply; 12+ messages in thread
From: Jiachen Zhang @ 2021-04-25 2:29 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta
[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]
On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> This is just an RFC patch for now. I am still running performance numbers
> and see if this method of switching is good enough or not. I did one run
> and seemed to get higher performance on deeper queue depths. There were
> few tests where I did not match lower queue depth performance with
> no thread pool. May be run to run variation.
>
>
I think this is interesting. I guess the switch number between thread-pool
and the inline method may change as the hardware or OS kernel changes.
Looking forward to the results.
Jiachen
> For low queue depth workloads, inline processing works well. But for
> high queue depth (and multiple process/thread) workloads, parallel
> processing of thread pool can benefit.
>
> This patch is an experiment which tries to switch to between inline and
> thread-pool processing. If number of requests received on queue is
> 1, inline processing is done. Otherwise requests are handed over to
> a thread pool for processing.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
> tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> ===================================================================
> --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23
> 10:03:46.175920039 -0400
> +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> 10:56:37.793722634 -0400
> @@ -446,6 +446,15 @@ err:
> static __thread bool clone_fs_called;
>
> /* Process one FVRequest in a thread pool */
> +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> +{
> + FVRequest *req = data;
> + GThreadPool *pool = user_data;
> +
> + g_thread_pool_push(pool, req, NULL);
> +}
> +
> +/* Process one FVRequest in a thread pool */
> static void fv_queue_worker(gpointer data, gpointer user_data)
> {
> struct fv_QueueInfo *qi = user_data;
> @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> struct fuse_session *se = qi->virtio_dev->se;
> GThreadPool *pool = NULL;
> GList *req_list = NULL;
> + int nr_req = 0;
>
> if (se->thread_pool_size) {
> fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> %d\n",
> @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> }
>
> req->reply_sent = false;
> -
> - if (!se->thread_pool_size) {
> - req_list = g_list_prepend(req_list, req);
> - } else {
> - g_thread_pool_push(pool, req, NULL);
> - }
> + req_list = g_list_prepend(req_list, req);
> + nr_req++;
> }
>
> pthread_mutex_unlock(&qi->vq_lock);
> vu_dispatch_unlock(qi->virtio_dev);
>
> /* Process all the requests. */
> - if (!se->thread_pool_size && req_list != NULL) {
> - g_list_foreach(req_list, fv_queue_worker, qi);
> + if (req_list != NULL) {
> + if (!se->thread_pool_size || nr_req <= 2) {
> + g_list_foreach(req_list, fv_queue_worker, qi);
> + } else {
> + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> + }
> g_list_free(req_list);
> req_list = NULL;
> + nr_req = 0;
> }
> }
>
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
>
>
[-- Attachment #2: Type: text/html, Size: 4797 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-24 6:12 ` Liu Bo
@ 2021-04-26 11:39 ` Vivek Goyal
2021-04-26 19:03 ` Liu Bo
0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 11:39 UTC (permalink / raw)
To: Liu Bo; +Cc: virtio-fs-list, Michey Mehta
On Sat, Apr 24, 2021 at 02:12:44PM +0800, Liu Bo wrote:
> On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> > This is just an RFC patch for now. I am still running performance numbers
> > and see if this method of switching is good enough or not. I did one run
> > and seemed to get higher performance on deeper queue depths. There were
> > few tests where I did not match lower queue depth performance with
> > no thread pool. May be run to run variation.
> >
> > For low queue depth workloads, inline processing works well. But for
> > high queue depth (and multiple process/thread) workloads, parallel
> > processing of thread pool can benefit.
> >
> > This patch is an experiment which tries to switch to between inline and
> > thread-pool processing. If number of requests received on queue is
> > 1, inline processing is done. Otherwise requests are handed over to
> > a thread pool for processing.
> >
>
> I'm looking forward to the results showing how many requests would it
> beats the overhead of using thread pools.
>
> This is a good idea indeed, and the switch mechanism also applies to
> async IO framework like iouring.
Hi Liubo,
I have been thinking of using io_uring. Have you managed to make it work.
Do we get better performance with io_uring.
Thanks
Vivek
>
> thanks,
> liubo
>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> > tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> > 1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > ===================================================================
> > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:03:46.175920039 -0400
> > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:56:37.793722634 -0400
> > @@ -446,6 +446,15 @@ err:
> > static __thread bool clone_fs_called;
> >
> > /* Process one FVRequest in a thread pool */
> > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > +{
> > + FVRequest *req = data;
> > + GThreadPool *pool = user_data;
> > +
> > + g_thread_pool_push(pool, req, NULL);
> > +}
> > +
> > +/* Process one FVRequest in a thread pool */
> > static void fv_queue_worker(gpointer data, gpointer user_data)
> > {
> > struct fv_QueueInfo *qi = user_data;
> > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > struct fuse_session *se = qi->virtio_dev->se;
> > GThreadPool *pool = NULL;
> > GList *req_list = NULL;
> > + int nr_req = 0;
> >
> > if (se->thread_pool_size) {
> > fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > }
> >
> > req->reply_sent = false;
> > -
> > - if (!se->thread_pool_size) {
> > - req_list = g_list_prepend(req_list, req);
> > - } else {
> > - g_thread_pool_push(pool, req, NULL);
> > - }
> > + req_list = g_list_prepend(req_list, req);
> > + nr_req++;
> > }
> >
> > pthread_mutex_unlock(&qi->vq_lock);
> > vu_dispatch_unlock(qi->virtio_dev);
> >
> > /* Process all the requests. */
> > - if (!se->thread_pool_size && req_list != NULL) {
> > - g_list_foreach(req_list, fv_queue_worker, qi);
> > + if (req_list != NULL) {
> > + if (!se->thread_pool_size || nr_req <= 2) {
> > + g_list_foreach(req_list, fv_queue_worker, qi);
> > + } else {
> > + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > + }
> > g_list_free(req_list);
> > req_list = NULL;
> > + nr_req = 0;
> > }
> > }
> >
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-25 2:29 ` [Virtio-fs] [External] " Jiachen Zhang
@ 2021-04-26 15:11 ` Vivek Goyal
2021-04-26 17:40 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 15:11 UTC (permalink / raw)
To: Jiachen Zhang; +Cc: virtio-fs-list, Michey Mehta
On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > This is just an RFC patch for now. I am still running performance numbers
> > and see if this method of switching is good enough or not. I did one run
> > and seemed to get higher performance on deeper queue depths. There were
> > few tests where I did not match lower queue depth performance with
> > no thread pool. May be run to run variation.
> >
> >
> I think this is interesting. I guess the switch number between thread-pool
> and the inline method may change as the hardware or OS kernel changes.
> Looking forward to the results.
Hi Jiachen,
Agreed that threshold to switch between inline and thread-pool will vary
based on hardware and so many other factors. Having said that, there is
no good way to determine what's the most optimized threshold for a
particular setup.
Maybe we will have to just go with a default which works reasonably
well and also provide a knob to user to alter that threshold. Atleast
it can be useful to experiment with different thresholds.
Vivek
>
> Jiachen
>
>
> > For low queue depth workloads, inline processing works well. But for
> > high queue depth (and multiple process/thread) workloads, parallel
> > processing of thread pool can benefit.
> >
> > This patch is an experiment which tries to switch to between inline and
> > thread-pool processing. If number of requests received on queue is
> > 1, inline processing is done. Otherwise requests are handed over to
> > a thread pool for processing.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> > tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> > 1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > ===================================================================
> > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > 10:03:46.175920039 -0400
> > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > 10:56:37.793722634 -0400
> > @@ -446,6 +446,15 @@ err:
> > static __thread bool clone_fs_called;
> >
> > /* Process one FVRequest in a thread pool */
> > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > +{
> > + FVRequest *req = data;
> > + GThreadPool *pool = user_data;
> > +
> > + g_thread_pool_push(pool, req, NULL);
> > +}
> > +
> > +/* Process one FVRequest in a thread pool */
> > static void fv_queue_worker(gpointer data, gpointer user_data)
> > {
> > struct fv_QueueInfo *qi = user_data;
> > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > struct fuse_session *se = qi->virtio_dev->se;
> > GThreadPool *pool = NULL;
> > GList *req_list = NULL;
> > + int nr_req = 0;
> >
> > if (se->thread_pool_size) {
> > fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > %d\n",
> > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > }
> >
> > req->reply_sent = false;
> > -
> > - if (!se->thread_pool_size) {
> > - req_list = g_list_prepend(req_list, req);
> > - } else {
> > - g_thread_pool_push(pool, req, NULL);
> > - }
> > + req_list = g_list_prepend(req_list, req);
> > + nr_req++;
> > }
> >
> > pthread_mutex_unlock(&qi->vq_lock);
> > vu_dispatch_unlock(qi->virtio_dev);
> >
> > /* Process all the requests. */
> > - if (!se->thread_pool_size && req_list != NULL) {
> > - g_list_foreach(req_list, fv_queue_worker, qi);
> > + if (req_list != NULL) {
> > + if (!se->thread_pool_size || nr_req <= 2) {
> > + g_list_foreach(req_list, fv_queue_worker, qi);
> > + } else {
> > + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > + }
> > g_list_free(req_list);
> > req_list = NULL;
> > + nr_req = 0;
> > }
> > }
> >
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> >
> >
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
2021-04-24 6:12 ` Liu Bo
2021-04-25 2:29 ` [Virtio-fs] [External] " Jiachen Zhang
@ 2021-04-26 17:34 ` Vivek Goyal
2 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 17:34 UTC (permalink / raw)
To: virtio-fs-list; +Cc: Michey Mehta
On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> This is just an RFC patch for now. I am still running performance numbers
> and see if this method of switching is good enough or not. I did one run
> and seemed to get higher performance on deeper queue depths. There were
> few tests where I did not match lower queue depth performance with
> no thread pool. May be run to run variation.
Hi All,
So I ran fio jobs with auto switch enabled and uploaded all the data
here. Also pasted final results at the end of email.
https://github.com/rhvgoyal/virtiofs-tests/tree/master/performance-results/auto-switch-inline-thread-pool/april-26-2021
Looks like auto switch threshold of 3 is working pretty well for me. That
is if queue depth of 3 or less, do inline processing otherwise switch
to thread pool. Of course this will vary from system to system and
based on I/O size as well. I have only tested with bs=4K.
I think it might be worth to provide user a knob say
"--thread-pool-threshold=<nr>" where user can control it. By default it
can be 3, may be.
Thanks
Vivek
NAME WORKLOAD Bandwidth IOPS
tpool-16 seqread-psync 62.1mb 15.5k
no-tpool seqread-psync 89.3mb 22.3k
auto-switch-3 seqread-psync 89.6mb 22.4k
auto-switch-4 seqread-psync 88.8mb 22.2k
tpool-16 seqread-psync-multi 256.1mb 64.0k
no-tpool seqread-psync-multi 280.6mb 70.1k
auto-switch-3 seqread-psync-multi 282.9mb 70.7k
auto-switch-4 seqread-psync-multi 284.9mb 71.2k
tpool-16 seqread-libaio 327.7mb 81.9k
no-tpool seqread-libaio 306.5mb 76.6k
auto-switch-3 seqread-libaio 361.8mb 90.4k
auto-switch-4 seqread-libaio 365.3mb 91.3k
tpool-16 seqread-libaio-multi 354.5mb 88.6k
no-tpool seqread-libaio-multi 300.7mb 75.1k
auto-switch-3 seqread-libaio-multi 375.4mb 93.8k
auto-switch-4 seqread-libaio-multi 374.3mb 93.5k
tpool-16 randread-psync 64.5mb 16.1k
no-tpool randread-psync 88.5mb 22.1k
auto-switch-3 randread-psync 87.5mb 21.8k
auto-switch-4 randread-psync 87.7mb 21.9k
tpool-16 randread-psync-multi 268.0mb 67.0k
no-tpool randread-psync-multi 282.5mb 70.6k
auto-switch-3 randread-psync-multi 283.3mb 70.8k
auto-switch-4 randread-psync-multi 282.6mb 70.6k
tpool-16 randread-libaio 325.1mb 81.2k
no-tpool randread-libaio 303.0mb 75.7k
auto-switch-3 randread-libaio 360.8mb 90.2k
auto-switch-4 randread-libaio 361.0mb 90.2k
tpool-16 randread-libaio-multi 352.2mb 88.0k
no-tpool randread-libaio-multi 289.6mb 72.4k
auto-switch-3 randread-libaio-multi 373.5mb 93.3k
auto-switch-4 randread-libaio-multi 372.0mb 93.0k
tpool-16 seqwrite-psync 65.7mb 16.4k
no-tpool seqwrite-psync 87.6mb 21.9k
auto-switch-3 seqwrite-psync 86.5mb 21.6k
auto-switch-4 seqwrite-psync 85.9mb 21.4k
tpool-16 seqwrite-psync-multi 270.9mb 67.7k
no-tpool seqwrite-psync-multi 268.0mb 67.0k
auto-switch-3 seqwrite-psync-multi 267.7mb 66.9k
auto-switch-4 seqwrite-psync-multi 263.7mb 65.9k
tpool-16 seqwrite-libaio 293.5mb 73.3k
no-tpool seqwrite-libaio 286.1mb 71.5k
auto-switch-3 seqwrite-libaio 333.7mb 83.4k
auto-switch-4 seqwrite-libaio 330.5mb 82.6k
tpool-16 seqwrite-libaio-multi 415.0mb 103.7k
no-tpool seqwrite-libaio-multi 281.6mb 70.4k
auto-switch-3 seqwrite-libaio-multi 417.4mb 104.3k
auto-switch-4 seqwrite-libaio-multi 401.6mb 100.4k
tpool-16 randwrite-psync 65.1mb 16.2k
no-tpool randwrite-psync 87.9mb 21.9k
auto-switch-3 randwrite-psync 86.6mb 21.6k
auto-switch-4 randwrite-psync 85.9mb 21.4k
tpool-16 randwrite-psync-multi 268.9mb 67.2k
no-tpool randwrite-psync-multi 261.0mb 65.2k
auto-switch-3 randwrite-psync-multi 261.2mb 65.3k
auto-switch-4 randwrite-psync-multi 263.7mb 65.9k
tpool-16 randwrite-libaio 300.8mb 75.2k
no-tpool randwrite-libaio 287.4mb 71.8k
auto-switch-3 randwrite-libaio 328.2mb 82.0k
auto-switch-4 randwrite-libaio 335.4mb 83.8k
tpool-16 randwrite-libaio-multi 415.4mb 103.8k
no-tpool randwrite-libaio-multi 271.7mb 67.9k
auto-switch-3 randwrite-libaio-multi 415.7mb 103.9k
auto-switch-4 randwrite-libaio-multi 396.5mb 99.1k
tpool-16 randrw-psync 35.3mb/11.7mb 9044/3019
no-tpool randrw-psync 52.9mb/17.7mb 13.2k/4536
auto-switch-3 randrw-psync 52.7mb/17.6mb 13.1k/4516
auto-switch-4 randrw-psync 52.7mb/17.6mb 13.1k/4514
tpool-16 randrw-psync-multi 151.9mb/50.8mb 37.9k/12.7k
no-tpool randrw-psync-multi 160.5mb/53.7mb 40.1k/13.4k
auto-switch-3 randrw-psync-multi 167.6mb/56.1mb 41.9k/14.0k
auto-switch-4 randrw-psync-multi 164.6mb/55.1mb 41.1k/13.7k
tpool-16 randrw-libaio 83.0mb/27.8mb 20.7k/7121
no-tpool randrw-libaio 87.2mb/29.2mb 21.7k/7479
auto-switch-3 randrw-libaio 87.9mb/29.4mb 21.9k/7544
auto-switch-4 randrw-libaio 86.4mb/28.9mb 21.6k/7417
tpool-16 randrw-libaio-multi 201.6mb/67.5mb 50.4k/16.8k
no-tpool randrw-libaio-multi 187.5mb/62.8mb 46.8k/15.7k
auto-switch-3 randrw-libaio-multi 218.0mb/73.0mb 54.5k/18.2k
auto-switch-4 randrw-libaio-multi 213.4mb/71.5mb 53.3k/17.8k
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-26 15:11 ` Vivek Goyal
@ 2021-04-26 17:40 ` Dr. David Alan Gilbert
2021-04-26 17:51 ` Vivek Goyal
0 siblings, 1 reply; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-26 17:40 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> > On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > > This is just an RFC patch for now. I am still running performance numbers
> > > and see if this method of switching is good enough or not. I did one run
> > > and seemed to get higher performance on deeper queue depths. There were
> > > few tests where I did not match lower queue depth performance with
> > > no thread pool. May be run to run variation.
> > >
> > >
> > I think this is interesting. I guess the switch number between thread-pool
> > and the inline method may change as the hardware or OS kernel changes.
> > Looking forward to the results.
>
> Hi Jiachen,
>
> Agreed that threshold to switch between inline and thread-pool will vary
> based on hardware and so many other factors. Having said that, there is
> no good way to determine what's the most optimized threshold for a
> particular setup.
I can imagine something based on the % of time a thread is utilised as
being an indication you might need more/less; but that's an even more
complicated heuristic.
> Maybe we will have to just go with a default which works reasonably
> well and also provide a knob to user to alter that threshold. Atleast
> it can be useful to experiment with different thresholds.
Dave
> Vivek
>
> >
> > Jiachen
> >
> >
> > > For low queue depth workloads, inline processing works well. But for
> > > high queue depth (and multiple process/thread) workloads, parallel
> > > processing of thread pool can benefit.
> > >
> > > This patch is an experiment which tries to switch to between inline and
> > > thread-pool processing. If number of requests received on queue is
> > > 1, inline processing is done. Otherwise requests are handed over to
> > > a thread pool for processing.
> > >
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > > tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> > > 1 file changed, 19 insertions(+), 8 deletions(-)
> > >
> > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > ===================================================================
> > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > 10:03:46.175920039 -0400
> > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > 10:56:37.793722634 -0400
> > > @@ -446,6 +446,15 @@ err:
> > > static __thread bool clone_fs_called;
> > >
> > > /* Process one FVRequest in a thread pool */
> > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > +{
> > > + FVRequest *req = data;
> > > + GThreadPool *pool = user_data;
> > > +
> > > + g_thread_pool_push(pool, req, NULL);
> > > +}
> > > +
> > > +/* Process one FVRequest in a thread pool */
> > > static void fv_queue_worker(gpointer data, gpointer user_data)
> > > {
> > > struct fv_QueueInfo *qi = user_data;
> > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > struct fuse_session *se = qi->virtio_dev->se;
> > > GThreadPool *pool = NULL;
> > > GList *req_list = NULL;
> > > + int nr_req = 0;
> > >
> > > if (se->thread_pool_size) {
> > > fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > > %d\n",
> > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > }
> > >
> > > req->reply_sent = false;
> > > -
> > > - if (!se->thread_pool_size) {
> > > - req_list = g_list_prepend(req_list, req);
> > > - } else {
> > > - g_thread_pool_push(pool, req, NULL);
> > > - }
> > > + req_list = g_list_prepend(req_list, req);
> > > + nr_req++;
> > > }
> > >
> > > pthread_mutex_unlock(&qi->vq_lock);
> > > vu_dispatch_unlock(qi->virtio_dev);
> > >
> > > /* Process all the requests. */
> > > - if (!se->thread_pool_size && req_list != NULL) {
> > > - g_list_foreach(req_list, fv_queue_worker, qi);
> > > + if (req_list != NULL) {
> > > + if (!se->thread_pool_size || nr_req <= 2) {
> > > + g_list_foreach(req_list, fv_queue_worker, qi);
> > > + } else {
> > > + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > + }
> > > g_list_free(req_list);
> > > req_list = NULL;
> > > + nr_req = 0;
> > > }
> > > }
> > >
> > >
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > >
> > >
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-26 17:40 ` Dr. David Alan Gilbert
@ 2021-04-26 17:51 ` Vivek Goyal
2021-04-26 17:59 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 17:51 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, Michey Mehta
On Mon, Apr 26, 2021 at 06:40:28PM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> > > On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > > This is just an RFC patch for now. I am still running performance numbers
> > > > and see if this method of switching is good enough or not. I did one run
> > > > and seemed to get higher performance on deeper queue depths. There were
> > > > few tests where I did not match lower queue depth performance with
> > > > no thread pool. May be run to run variation.
> > > >
> > > >
> > > I think this is interesting. I guess the switch number between thread-pool
> > > and the inline method may change as the hardware or OS kernel changes.
> > > Looking forward to the results.
> >
> > Hi Jiachen,
> >
> > Agreed that threshold to switch between inline and thread-pool will vary
> > based on hardware and so many other factors. Having said that, there is
> > no good way to determine what's the most optimized threshold for a
> > particular setup.
>
> I can imagine something based on the % of time a thread is utilised as
> being an indication you might need more/less; but that's an even more
> complicated heuristic.
Yes, that's even more complicated heuristic.
Also not sure how to account for when thread blocks in kernel waiting for
I/O to finish. Thread is not keeping cpu busy but at the same time
can't make progress.
Vivek
>
> > Maybe we will have to just go with a default which works reasonably
> > well and also provide a knob to user to alter that threshold. Atleast
> > it can be useful to experiment with different thresholds.
>
> Dave
>
> > Vivek
> >
> > >
> > > Jiachen
> > >
> > >
> > > > For low queue depth workloads, inline processing works well. But for
> > > > high queue depth (and multiple process/thread) workloads, parallel
> > > > processing of thread pool can benefit.
> > > >
> > > > This patch is an experiment which tries to switch to between inline and
> > > > thread-pool processing. If number of requests received on queue is
> > > > 1, inline processing is done. Otherwise requests are handed over to
> > > > a thread pool for processing.
> > > >
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > ---
> > > > tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> > > > 1 file changed, 19 insertions(+), 8 deletions(-)
> > > >
> > > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > > ===================================================================
> > > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > > 10:03:46.175920039 -0400
> > > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > > 10:56:37.793722634 -0400
> > > > @@ -446,6 +446,15 @@ err:
> > > > static __thread bool clone_fs_called;
> > > >
> > > > /* Process one FVRequest in a thread pool */
> > > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > > +{
> > > > + FVRequest *req = data;
> > > > + GThreadPool *pool = user_data;
> > > > +
> > > > + g_thread_pool_push(pool, req, NULL);
> > > > +}
> > > > +
> > > > +/* Process one FVRequest in a thread pool */
> > > > static void fv_queue_worker(gpointer data, gpointer user_data)
> > > > {
> > > > struct fv_QueueInfo *qi = user_data;
> > > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > > struct fuse_session *se = qi->virtio_dev->se;
> > > > GThreadPool *pool = NULL;
> > > > GList *req_list = NULL;
> > > > + int nr_req = 0;
> > > >
> > > > if (se->thread_pool_size) {
> > > > fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > > > %d\n",
> > > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > > }
> > > >
> > > > req->reply_sent = false;
> > > > -
> > > > - if (!se->thread_pool_size) {
> > > > - req_list = g_list_prepend(req_list, req);
> > > > - } else {
> > > > - g_thread_pool_push(pool, req, NULL);
> > > > - }
> > > > + req_list = g_list_prepend(req_list, req);
> > > > + nr_req++;
> > > > }
> > > >
> > > > pthread_mutex_unlock(&qi->vq_lock);
> > > > vu_dispatch_unlock(qi->virtio_dev);
> > > >
> > > > /* Process all the requests. */
> > > > - if (!se->thread_pool_size && req_list != NULL) {
> > > > - g_list_foreach(req_list, fv_queue_worker, qi);
> > > > + if (req_list != NULL) {
> > > > + if (!se->thread_pool_size || nr_req <= 2) {
> > > > + g_list_foreach(req_list, fv_queue_worker, qi);
> > > > + } else {
> > > > + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > > + }
> > > > g_list_free(req_list);
> > > > req_list = NULL;
> > > > + nr_req = 0;
> > > > }
> > > > }
> > > >
> > > >
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > Virtio-fs@redhat.com
> > > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > > >
> > > >
> >
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-26 17:51 ` Vivek Goyal
@ 2021-04-26 17:59 ` Dr. David Alan Gilbert
2021-04-28 13:40 ` Vivek Goyal
0 siblings, 1 reply; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2021-04-26 17:59 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Mon, Apr 26, 2021 at 06:40:28PM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > On Sun, Apr 25, 2021 at 10:29:14AM +0800, Jiachen Zhang wrote:
> > > > On Sat, Apr 24, 2021 at 5:11 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > > This is just an RFC patch for now. I am still running performance numbers
> > > > > and see if this method of switching is good enough or not. I did one run
> > > > > and seemed to get higher performance on deeper queue depths. There were
> > > > > few tests where I did not match lower queue depth performance with
> > > > > no thread pool. May be run to run variation.
> > > > >
> > > > >
> > > > I think this is interesting. I guess the switch number between thread-pool
> > > > and the inline method may change as the hardware or OS kernel changes.
> > > > Looking forward to the results.
> > >
> > > Hi Jiachen,
> > >
> > > Agreed that threshold to switch between inline and thread-pool will vary
> > > based on hardware and so many other factors. Having said that, there is
> > > no good way to determine what's the most optimized threshold for a
> > > particular setup.
> >
> > I can imagine something based on the % of time a thread is utilised as
> > being an indication you might need more/less; but that's an even more
> > complicated heuristic.
>
> Yes, that's even more complicated heuristic.
>
> Also not sure how to account for when thread blocks in kernel waiting for
> I/O to finish. Thread is not keeping cpu busy but at the same time
> can't make progress.
Does it matter if the thread is blocked due to CPU or IO? If you can
just look at the time the thread is non-free then that gives you a
utilisation.
Dave
> Vivek
>
> >
> > > Maybe we will have to just go with a default which works reasonably
> > > well and also provide a knob to user to alter that threshold. Atleast
> > > it can be useful to experiment with different thresholds.
> >
> > Dave
> >
> > > Vivek
> > >
> > > >
> > > > Jiachen
> > > >
> > > >
> > > > > For low queue depth workloads, inline processing works well. But for
> > > > > high queue depth (and multiple process/thread) workloads, parallel
> > > > > processing of thread pool can benefit.
> > > > >
> > > > > This patch is an experiment which tries to switch to between inline and
> > > > > thread-pool processing. If number of requests received on queue is
> > > > > 1, inline processing is done. Otherwise requests are handed over to
> > > > > a thread pool for processing.
> > > > >
> > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > > ---
> > > > > tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> > > > > 1 file changed, 19 insertions(+), 8 deletions(-)
> > > > >
> > > > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > > > ===================================================================
> > > > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > > > 10:03:46.175920039 -0400
> > > > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23
> > > > > 10:56:37.793722634 -0400
> > > > > @@ -446,6 +446,15 @@ err:
> > > > > static __thread bool clone_fs_called;
> > > > >
> > > > > /* Process one FVRequest in a thread pool */
> > > > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > > > +{
> > > > > + FVRequest *req = data;
> > > > > + GThreadPool *pool = user_data;
> > > > > +
> > > > > + g_thread_pool_push(pool, req, NULL);
> > > > > +}
> > > > > +
> > > > > +/* Process one FVRequest in a thread pool */
> > > > > static void fv_queue_worker(gpointer data, gpointer user_data)
> > > > > {
> > > > > struct fv_QueueInfo *qi = user_data;
> > > > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > > > struct fuse_session *se = qi->virtio_dev->se;
> > > > > GThreadPool *pool = NULL;
> > > > > GList *req_list = NULL;
> > > > > + int nr_req = 0;
> > > > >
> > > > > if (se->thread_pool_size) {
> > > > > fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue
> > > > > %d\n",
> > > > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > > > }
> > > > >
> > > > > req->reply_sent = false;
> > > > > -
> > > > > - if (!se->thread_pool_size) {
> > > > > - req_list = g_list_prepend(req_list, req);
> > > > > - } else {
> > > > > - g_thread_pool_push(pool, req, NULL);
> > > > > - }
> > > > > + req_list = g_list_prepend(req_list, req);
> > > > > + nr_req++;
> > > > > }
> > > > >
> > > > > pthread_mutex_unlock(&qi->vq_lock);
> > > > > vu_dispatch_unlock(qi->virtio_dev);
> > > > >
> > > > > /* Process all the requests. */
> > > > > - if (!se->thread_pool_size && req_list != NULL) {
> > > > > - g_list_foreach(req_list, fv_queue_worker, qi);
> > > > > + if (req_list != NULL) {
> > > > > + if (!se->thread_pool_size || nr_req <= 2) {
> > > > > + g_list_foreach(req_list, fv_queue_worker, qi);
> > > > > + } else {
> > > > > + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > > > + }
> > > > > g_list_free(req_list);
> > > > > req_list = NULL;
> > > > > + nr_req = 0;
> > > > > }
> > > > > }
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Virtio-fs mailing list
> > > > > Virtio-fs@redhat.com
> > > > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > > > >
> > > > >
> > >
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-26 11:39 ` Vivek Goyal
@ 2021-04-26 19:03 ` Liu Bo
2021-04-26 19:10 ` Vivek Goyal
0 siblings, 1 reply; 12+ messages in thread
From: Liu Bo @ 2021-04-26 19:03 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, Michey Mehta
On Mon, Apr 26, 2021 at 07:39:54AM -0400, Vivek Goyal wrote:
> On Sat, Apr 24, 2021 at 02:12:44PM +0800, Liu Bo wrote:
> > On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> > > This is just an RFC patch for now. I am still running performance numbers
> > > and see if this method of switching is good enough or not. I did one run
> > > and seemed to get higher performance on deeper queue depths. There were
> > > few tests where I did not match lower queue depth performance with
> > > no thread pool. May be run to run variation.
> > >
> > > For low queue depth workloads, inline processing works well. But for
> > > high queue depth (and multiple process/thread) workloads, parallel
> > > processing of thread pool can benefit.
> > >
> > > This patch is an experiment which tries to switch to between inline and
> > > thread-pool processing. If number of requests received on queue is
> > > 1, inline processing is done. Otherwise requests are handed over to
> > > a thread pool for processing.
> > >
> >
> > I'm looking forward to the results showing how many requests would it
> > beats the overhead of using thread pools.
> >
> > This is a good idea indeed, and the switch mechanism also applies to
> > async IO framework like iouring.
>
> Hi Liubo,
>
> I have been thinking of using io_uring. Have you managed to make it work.
> Do we get better performance with io_uring.
>
Hi Vivek,
With fuse-backend-rs, I did some experiments using rust async
framework and io_uring wrapper ringbahn, and my code is written to
provide a few rust coroutines to serve fuse process loop.
I tested the same 8k random read workload on three setups,
a) single thread
b) threads=4 multiple threads
c) coroutines=4 async
the performance tests showed some expected feedbacks, in terms of IO
intensive workloads, "async" beats single thread, it comes with
overheads though, it performs about 80% of "multiple threads".
Note that the above tests were done with no limit of cpu/mem
resources, by limiting cpu to 1, "async" performs the best given the
async io kthreads were not limited.
So it looks like all three setups have pros and cons, it'd be great if
we can do switching between them on the fly.
thanks,
liubo
> Thanks
> Vivek
>
> >
> > thanks,
> > liubo
> >
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > > tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> > > 1 file changed, 19 insertions(+), 8 deletions(-)
> > >
> > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > ===================================================================
> > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:03:46.175920039 -0400
> > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:56:37.793722634 -0400
> > > @@ -446,6 +446,15 @@ err:
> > > static __thread bool clone_fs_called;
> > >
> > > /* Process one FVRequest in a thread pool */
> > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > +{
> > > + FVRequest *req = data;
> > > + GThreadPool *pool = user_data;
> > > +
> > > + g_thread_pool_push(pool, req, NULL);
> > > +}
> > > +
> > > +/* Process one FVRequest in a thread pool */
> > > static void fv_queue_worker(gpointer data, gpointer user_data)
> > > {
> > > struct fv_QueueInfo *qi = user_data;
> > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > struct fuse_session *se = qi->virtio_dev->se;
> > > GThreadPool *pool = NULL;
> > > GList *req_list = NULL;
> > > + int nr_req = 0;
> > >
> > > if (se->thread_pool_size) {
> > > fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > }
> > >
> > > req->reply_sent = false;
> > > -
> > > - if (!se->thread_pool_size) {
> > > - req_list = g_list_prepend(req_list, req);
> > > - } else {
> > > - g_thread_pool_push(pool, req, NULL);
> > > - }
> > > + req_list = g_list_prepend(req_list, req);
> > > + nr_req++;
> > > }
> > >
> > > pthread_mutex_unlock(&qi->vq_lock);
> > > vu_dispatch_unlock(qi->virtio_dev);
> > >
> > > /* Process all the requests. */
> > > - if (!se->thread_pool_size && req_list != NULL) {
> > > - g_list_foreach(req_list, fv_queue_worker, qi);
> > > + if (req_list != NULL) {
> > > + if (!se->thread_pool_size || nr_req <= 2) {
> > > + g_list_foreach(req_list, fv_queue_worker, qi);
> > > + } else {
> > > + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > + }
> > > g_list_free(req_list);
> > > req_list = NULL;
> > > + nr_req = 0;
> > > }
> > > }
> > >
> > >
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> >
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-26 19:03 ` Liu Bo
@ 2021-04-26 19:10 ` Vivek Goyal
0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-26 19:10 UTC (permalink / raw)
To: Liu Bo; +Cc: virtio-fs-list, Michey Mehta
On Tue, Apr 27, 2021 at 03:03:43AM +0800, Liu Bo wrote:
> On Mon, Apr 26, 2021 at 07:39:54AM -0400, Vivek Goyal wrote:
> > On Sat, Apr 24, 2021 at 02:12:44PM +0800, Liu Bo wrote:
> > > On Fri, Apr 23, 2021 at 05:11:30PM -0400, Vivek Goyal wrote:
> > > > This is just an RFC patch for now. I am still running performance numbers
> > > > and see if this method of switching is good enough or not. I did one run
> > > > and seemed to get higher performance on deeper queue depths. There were
> > > > few tests where I did not match lower queue depth performance with
> > > > no thread pool. May be run to run variation.
> > > >
> > > > For low queue depth workloads, inline processing works well. But for
> > > > high queue depth (and multiple process/thread) workloads, parallel
> > > > processing of thread pool can benefit.
> > > >
> > > > This patch is an experiment which tries to switch to between inline and
> > > > thread-pool processing. If number of requests received on queue is
> > > > 1, inline processing is done. Otherwise requests are handed over to
> > > > a thread pool for processing.
> > > >
> > >
> > > I'm looking forward to the results showing how many requests would it
> > > beats the overhead of using thread pools.
> > >
> > > This is a good idea indeed, and the switch mechanism also applies to
> > > async IO framework like iouring.
> >
> > Hi Liubo,
> >
> > I have been thinking of using io_uring. Have you managed to make it work.
> > Do we get better performance with io_uring.
> >
> Hi Vivek,
>
> With fuse-backend-rs, I did some experiments using rust async
> framework and io_uring wrapper ringbahn, and my code is written to
> provide a few rust coroutines to serve fuse process loop.
>
> I tested the same 8k random read workload on three setups,
> a) single thread
> b) threads=4 multiple threads
> c) coroutines=4 async
>
> the performance tests showed some expected feedbacks, in terms of IO
> intensive workloads, "async" beats single thread, it comes with
> overheads though, it performs about 80% of "multiple threads".
>
> Note that the above tests were done with no limit of cpu/mem
> resources, by limiting cpu to 1, "async" performs the best given the
> async io kthreads were not limited.
So if number of cpus are not limited for virtiofsd, then async (c)
did not perform better than "multiple threads" (b)?
IOW, async performed better than multiple threads only if number
of cpus was limited to 1 for virtiofsd.
Did I understand it right?
>
> So it looks like all three setups have pros and cons, it'd be great if
> we can do switching between them on the fly.
Agreed. It probably will make sense to add support for async I/O
(probably using io_uring) and provide a knob to let user select
whatever I/O interface they want to use.
Vivek
>
> thanks,
> liubo
>
> > Thanks
> > Vivek
> >
> > >
> > > thanks,
> > > liubo
> > >
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > ---
> > > > tools/virtiofsd/fuse_virtio.c | 27 +++++++++++++++++++--------
> > > > 1 file changed, 19 insertions(+), 8 deletions(-)
> > > >
> > > > Index: rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c
> > > > ===================================================================
> > > > --- rhvgoyal-qemu.orig/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:03:46.175920039 -0400
> > > > +++ rhvgoyal-qemu/tools/virtiofsd/fuse_virtio.c 2021-04-23 10:56:37.793722634 -0400
> > > > @@ -446,6 +446,15 @@ err:
> > > > static __thread bool clone_fs_called;
> > > >
> > > > /* Process one FVRequest in a thread pool */
> > > > +static void fv_queue_push_to_pool(gpointer data, gpointer user_data)
> > > > +{
> > > > + FVRequest *req = data;
> > > > + GThreadPool *pool = user_data;
> > > > +
> > > > + g_thread_pool_push(pool, req, NULL);
> > > > +}
> > > > +
> > > > +/* Process one FVRequest in a thread pool */
> > > > static void fv_queue_worker(gpointer data, gpointer user_data)
> > > > {
> > > > struct fv_QueueInfo *qi = user_data;
> > > > @@ -605,6 +614,7 @@ static void *fv_queue_thread(void *opaqu
> > > > struct fuse_session *se = qi->virtio_dev->se;
> > > > GThreadPool *pool = NULL;
> > > > GList *req_list = NULL;
> > > > + int nr_req = 0;
> > > >
> > > > if (se->thread_pool_size) {
> > > > fuse_log(FUSE_LOG_DEBUG, "%s: Creating thread pool for Queue %d\n",
> > > > @@ -686,22 +696,23 @@ static void *fv_queue_thread(void *opaqu
> > > > }
> > > >
> > > > req->reply_sent = false;
> > > > -
> > > > - if (!se->thread_pool_size) {
> > > > - req_list = g_list_prepend(req_list, req);
> > > > - } else {
> > > > - g_thread_pool_push(pool, req, NULL);
> > > > - }
> > > > + req_list = g_list_prepend(req_list, req);
> > > > + nr_req++;
> > > > }
> > > >
> > > > pthread_mutex_unlock(&qi->vq_lock);
> > > > vu_dispatch_unlock(qi->virtio_dev);
> > > >
> > > > /* Process all the requests. */
> > > > - if (!se->thread_pool_size && req_list != NULL) {
> > > > - g_list_foreach(req_list, fv_queue_worker, qi);
> > > > + if (req_list != NULL) {
> > > > + if (!se->thread_pool_size || nr_req <= 2) {
> > > > + g_list_foreach(req_list, fv_queue_worker, qi);
> > > > + } else {
> > > > + g_list_foreach(req_list, fv_queue_push_to_pool, pool);
> > > > + }
> > > > g_list_free(req_list);
> > > > req_list = NULL;
> > > > + nr_req = 0;
> > > > }
> > > > }
> > > >
> > > >
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > Virtio-fs@redhat.com
> > > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > >
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Virtio-fs] [External] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing
2021-04-26 17:59 ` Dr. David Alan Gilbert
@ 2021-04-28 13:40 ` Vivek Goyal
0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2021-04-28 13:40 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, Michey Mehta
On Mon, Apr 26, 2021 at 06:59:49PM +0100, Dr. David Alan Gilbert wrote:
[..]
> > > I can imagine something based on the % of time a thread is utilised as
> > > being an indication you might need more/less; but that's an even more
> > > complicated heuristic.
> >
> > Yes, that's even more complicated heuristic.
> >
> > Also not sure how to account for when thread blocks in kernel waiting for
> > I/O to finish. Thread is not keeping cpu busy but at the same time
> > can't make progress.
>
> Does it matter if the thread is blocked due to CPU or IO? If you can
> just look at the time the thread is non-free then that gives you a
> utilisation.
Hmm.., i guess you are right. So one way of doing this will be take
time snapshots.
t1 = gettimeofday()
thread-goes-to-sleep
thread-wakes-up
t2= gettimeofday()
thread-pops-requests
thread-processes-requests-inline
t3=gettimeofday()
So effectively t2-t1 is free time and t3-t2 is busy time. So % utilization
should be.
%util=(t3-t2) * 100/(t3-t1)
Once we swtich to thread pool, then we will also need something to go
back to inline processing. Some notion of how busy thread pool is. If
thread pool's %utilization is less then switch back to inline processing.
This bit might be more tricky because we use glib thread pool and glib
API might not provide us any of this data.
https://developer.gnome.org/glib/2.68/glib-Thread-Pools.html
I feel that heuristics based on number of requests are very simple to
implement. So may be we can implement that for now.
Thanks
Vivek
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-04-28 13:40 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 21:11 [Virtio-fs] [RFC] [PATCH] virtiofsd: Auto switch between inline and thread-pool processing Vivek Goyal
2021-04-24 6:12 ` Liu Bo
2021-04-26 11:39 ` Vivek Goyal
2021-04-26 19:03 ` Liu Bo
2021-04-26 19:10 ` Vivek Goyal
2021-04-25 2:29 ` [Virtio-fs] [External] " Jiachen Zhang
2021-04-26 15:11 ` Vivek Goyal
2021-04-26 17:40 ` Dr. David Alan Gilbert
2021-04-26 17:51 ` Vivek Goyal
2021-04-26 17:59 ` Dr. David Alan Gilbert
2021-04-28 13:40 ` Vivek Goyal
2021-04-26 17:34 ` [Virtio-fs] " Vivek Goyal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.