* 9p: requests efficiency
@ 2019-11-15 1:10 Christian Schoenebeck
2019-11-15 13:26 ` Greg Kurz
0 siblings, 1 reply; 4+ messages in thread
From: Christian Schoenebeck @ 2019-11-15 1:10 UTC (permalink / raw)
To: qemu-devel; +Cc: Greg Kurz
I'm currently reading up on how client requests (T messages) are currently
dispatched in general by 9pfs, to understand where potential inefficiencies
are that I am encountering.
I mean 9pfs is pretty fast on raw I/O (read/write requests), provided that the
message payload on guest side was chosen large enough (e.g.
trans=virtio,version=9p2000.L,msize=4194304,...), where I already come close
to my test disk's therotical maximum performance on read/write tests. But
obviously these are huge 9p requests.
However when there are a large number of (i.e. small) 9p requests, no matter
what the actual request type is, then I am encountering severe performance
issues with 9pfs and I try to understand whether this could be improved with
reasonable effort.
If I understand it correctly, each incoming request (T message) is dispatched
to its own qemu coroutine queue. So individual requests should already be
processed in parallel, right?
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 9p: requests efficiency
2019-11-15 1:10 9p: requests efficiency Christian Schoenebeck
@ 2019-11-15 13:26 ` Greg Kurz
2019-11-20 23:37 ` Christian Schoenebeck
0 siblings, 1 reply; 4+ messages in thread
From: Greg Kurz @ 2019-11-15 13:26 UTC (permalink / raw)
To: Christian Schoenebeck; +Cc: qemu-devel
On Fri, 15 Nov 2019 02:10:50 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> I'm currently reading up on how client requests (T messages) are currently
> dispatched in general by 9pfs, to understand where potential inefficiencies
> are that I am encountering.
>
> I mean 9pfs is pretty fast on raw I/O (read/write requests), provided that the
> message payload on guest side was chosen large enough (e.g.
> trans=virtio,version=9p2000.L,msize=4194304,...), where I already come close
> to my test disk's therotical maximum performance on read/write tests. But
> obviously these are huge 9p requests.
>
> However when there are a large number of (i.e. small) 9p requests, no matter
> what the actual request type is, then I am encountering severe performance
> issues with 9pfs and I try to understand whether this could be improved with
> reasonable effort.
>
Thanks for doing that. This is typically the kind of effort I never
dared starting on my own.
> If I understand it correctly, each incoming request (T message) is dispatched
> to its own qemu coroutine queue. So individual requests should already be
> processed in parallel, right?
>
Sort of but not exactly. The real parallelization, ie. doing parallel
processing with concurrent threads, doesn't take place on a per-request
basis. A typical request is broken down into several calls to the backend
which may block because the backend itself calls a syscall that may block
in the kernel. Each backend call is thus handled by its own thread from the
mainloop thread pool (see hw/9pfs/coth.[ch] for details). The rest of the
9p code, basically everything in 9p.c, is serialized in the mainloop thread.
Cheers,
--
Greg
> Best regards,
> Christian Schoenebeck
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 9p: requests efficiency
2019-11-15 13:26 ` Greg Kurz
@ 2019-11-20 23:37 ` Christian Schoenebeck
2019-11-23 19:26 ` Christian Schoenebeck
0 siblings, 1 reply; 4+ messages in thread
From: Christian Schoenebeck @ 2019-11-20 23:37 UTC (permalink / raw)
To: qemu-devel; +Cc: Greg Kurz, Christian Schoenebeck
On Freitag, 15. November 2019 14:26:56 CET Greg Kurz wrote:
> > However when there are a large number of (i.e. small) 9p requests, no
> > matter what the actual request type is, then I am encountering severe
> > performance issues with 9pfs and I try to understand whether this could
> > be improved with reasonable effort.
>
> Thanks for doing that. This is typically the kind of effort I never
> dared starting on my own.
If you don't mind I still ask some more questions though, just in case you can
gather them from the back of your head.
> > If I understand it correctly, each incoming request (T message) is
> > dispatched to its own qemu coroutine queue. So individual requests should
> > already be processed in parallel, right?
>
> Sort of but not exactly. The real parallelization, ie. doing parallel
> processing with concurrent threads, doesn't take place on a per-request
> basis.
Ok I see, I was just reading that each request causes this call sequence:
handle_9p_output() -> pdu_submit() -> qemu_co_queue_init(&pdu->complete)
and I was misinterpreting specifically that latter call to be an implied
thread creation. Because that's what happens with other somewhat similar
collaborative thread synchronization frameworks like "Grand Central Dispatch"
or std::async.
But now I realize the entire QEMU coroutine framework is really just managing
memory stacks, not actually anything about threads per se. The QEMU docs often
use the term "threads" which is IMO misleading for what it really does.
> A typical request is broken down into several calls to the backend
> which may block because the backend itself calls a syscall that may block
> in the kernel. Each backend call is thus handled by its own thread from the
> mainloop thread pool (see hw/9pfs/coth.[ch] for details). The rest of the
> 9p code, basically everything in 9p.c, is serialized in the mainloop thread.
So the precise parallelism fork points in 9pfs (where tasks are dispatched to
other threads) are the *_co_*() functions, and there precisely at where they
are using v9fs_co_run_in_worker( X ) respectively, correct? Or are there more
fork points than those?
If so, I haven't understood how precisely v9fs_co_run_in_worker() works. I
mean I understand now how QEMU coroutines are working, and the idea of
v9fs_co_run_in_worker() is dispatching the passed code block to the worker
thread, but immediately returning back to main thread and continueing there on
main thread with other coroutines while the worker thread's dispatched
coroutine finished. But how that happens there precisely in
v9fs_co_run_in_worker() is not yet clear to me.
Also where are the worker threads spawned actually?
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 9p: requests efficiency
2019-11-20 23:37 ` Christian Schoenebeck
@ 2019-11-23 19:26 ` Christian Schoenebeck
0 siblings, 0 replies; 4+ messages in thread
From: Christian Schoenebeck @ 2019-11-23 19:26 UTC (permalink / raw)
To: qemu-devel; +Cc: Christian Schoenebeck, Greg Kurz
On Donnerstag, 21. November 2019 00:37:36 CET Christian Schoenebeck wrote:
> If so, I haven't understood how precisely v9fs_co_run_in_worker() works. I
> mean I understand now how QEMU coroutines are working, and the idea of
> v9fs_co_run_in_worker() is dispatching the passed code block to the worker
> thread, but immediately returning back to main thread and continueing there
> on main thread with other coroutines while the worker thread's dispatched
> coroutine finished. But how that happens there precisely in
> v9fs_co_run_in_worker() is not yet clear to me.
>
> Also where are the worker threads spawned actually?
Never mind about these questions; I figured them out by myself in the
meantime.
Another question though: I see that you introduced those readdir mutex locks
with commit 7cde47d4a89. Do you remember the scenario where this concurrency
issue happened?
V9fsDir exists per file ID. So I think concurrency should only happen there if
the same file ID (of a directory) was shared and used concurrently on guest
side. Because otherwise if the file ID was used by only one thread on guest
side, then all 9p requests on that fid should end up being completely
serialized end to end.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-11-23 19:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15 1:10 9p: requests efficiency Christian Schoenebeck
2019-11-15 13:26 ` Greg Kurz
2019-11-20 23:37 ` Christian Schoenebeck
2019-11-23 19:26 ` Christian Schoenebeck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).