qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* comparison of coroutine backends
@ 2022-03-18  8:48 Paolo Bonzini
  2022-03-21 10:40 ` Stefan Hajnoczi
  2022-03-21 16:30 ` Daniel P. Berrangé
  0 siblings, 2 replies; 3+ messages in thread
From: Paolo Bonzini @ 2022-03-18  8:48 UTC (permalink / raw)
  To: qemu-devel, qemu block, Kevin Wolf, Stefan Hajnoczi, Hanna Reitz

Hi all,

based on the previous discussions here is a comparison of the various
possibilities for implementing coroutine backends in QEMU and the
respective advantages and disadvantages.

I'm adding a third possibility for stackless coroutines, which is to
use the LLVM/clang builtins.  I believe that would still require a
source-to-source translator, but it would offload to the compiler the
complicated bits such as liveness analysis.

1) Stackful coroutines:
Advantages:
- no changes to current code

Disadvantages:
- portability issues regarding shadow stacks (SafeStack, CET)
- portability/nonconformance issues regarding TLS

Another possible advantage is that it allows using the same function for
both coroutine and non-coroutine context.  I'm listing this separately
because I'm not sure that's desirable, as it prevents compile-time
checking of calls to coroutine_fn.  Compile-time checking would be
possible using clang -fthread-safety if we forgo the ability to use the
same function in both scenarios.


2) "Duff's device" stackless coroutines
Advantages:
- no portability issues regarding both shadow stacks and TLS
- compiles to good old C code
- compile-time checking of "coroutine-only" but not awaitable functions
- debuggability: stack frames should be easy to inspect

Disadvantages:
- complex source-to-source translator
- more complex build process


3) C++20 stackless coroutines
Advantages:
- no portability issues regarding both shadow stacks and TLS
- no code to write outside QEMU
- simpler build process

Disadvantages:
- requires a new compiler
- it's C++
- no compile-time checking of "coroutine-only" but not awaitable functions


4) LLVM stackless coroutines
Advantages:
- no portability issues regarding both shadow stacks and TLS
- no code to write outside QEMU

Disadvantages:
- relatively simple source-to-source translator
- more complex build process
- requires a new compiler and doesn't support GCC


Note that (2) would still have a build dependency on libclang.
However the code generation could still be done with GCC and with
any compiler version.

I'll also put it in a table, though I understand that some choices
here might be debatable:

                          stackful      Duff's device            C++20              LLVM
==============================================================================================
Code to write/maintain    ++ [1]             ---                   +++              - [2]
Changes to existing code  ++ [3]             -                     --               -
Community acceptance      ++                 ++                    --               ?
Code or PoC exists        ++                 +                     -                --
==============================================================================================
Portability               --                 ++                    +                -
Debuggability             -                  ++                    ?                ?
Performance               -                  ++ [4]                ++               ++

[1] I'm penalizing stackful coroutines here because the worse portability
has an impact on future maintainability too.

[2] This is an educated guess.

[3] If we decide to remove the possibility of using the same function for
both coroutine and non-coroutine context, the changes to existing code
would be the same as for Duff's device and LLVM coroutines.

[4] Slightly worse than C++20 coroutines for the PoC, but that is mostly due
to implementation choices that are easy to change.


Stackful coroutines are obviously pretty good, or we wouldn't have used them.
They might be a local optimum though, as shown by the negative points in terms
of portability, debuggability and performance.

Both Duff's device and LLVM would be more or less transparent to the part of
the community that doesn't care about the coroutines.  The translator would
probably be write-and-forget (though I'm not sure about the API stability of
libclang, which would be a major factor), but it would still be a substantial
amount of work to commit to.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: comparison of coroutine backends
  2022-03-18  8:48 comparison of coroutine backends Paolo Bonzini
@ 2022-03-21 10:40 ` Stefan Hajnoczi
  2022-03-21 16:30 ` Daniel P. Berrangé
  1 sibling, 0 replies; 3+ messages in thread
From: Stefan Hajnoczi @ 2022-03-21 10:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Hanna Reitz, qemu-devel, qemu block

[-- Attachment #1: Type: text/plain, Size: 5358 bytes --]

On Fri, Mar 18, 2022 at 09:48:37AM +0100, Paolo Bonzini wrote:
> Hi all,
> 
> based on the previous discussions here is a comparison of the various
> possibilities for implementing coroutine backends in QEMU and the
> respective advantages and disadvantages.
> 
> I'm adding a third possibility for stackless coroutines, which is to
> use the LLVM/clang builtins.  I believe that would still require a
> source-to-source translator, but it would offload to the compiler the
> complicated bits such as liveness analysis.
> 
> 1) Stackful coroutines:
> Advantages:
> - no changes to current code
> 
> Disadvantages:
> - portability issues regarding shadow stacks (SafeStack, CET)
> - portability/nonconformance issues regarding TLS
> 
> Another possible advantage is that it allows using the same function for
> both coroutine and non-coroutine context.  I'm listing this separately
> because I'm not sure that's desirable, as it prevents compile-time
> checking of calls to coroutine_fn.  Compile-time checking would be
> possible using clang -fthread-safety if we forgo the ability to use the
> same function in both scenarios.
> 
> 
> 2) "Duff's device" stackless coroutines
> Advantages:

- Supports gcc and clang

> - no portability issues regarding both shadow stacks and TLS
> - compiles to good old C code
> - compile-time checking of "coroutine-only" but not awaitable functions
> - debuggability: stack frames should be easy to inspect

The user needs to understand how the coroutine runtime works in order to
get a backtrace of a suspended coroutine. More likely a GDB Python
script will be needed for this.

> Disadvantages:
> - complex source-to-source translator
> - more complex build process
> 
> 
> 3) C++20 stackless coroutines
> Advantages:
> - no portability issues regarding both shadow stacks and TLS
> - no code to write outside QEMU
> - simpler build process
> 
> Disadvantages:
> - requires a new compiler
> - it's C++

- raises questions about C++ usage in QEMU, which seem to be
  controversial

> - no compile-time checking of "coroutine-only" but not awaitable functions
> 
> 
> 4) LLVM stackless coroutines
> Advantages:
> - no portability issues regarding both shadow stacks and TLS
> - no code to write outside QEMU
> 
> Disadvantages:
> - relatively simple source-to-source translator
> - more complex build process
> - requires a new compiler and doesn't support GCC
> 
> 
> Note that (2) would still have a build dependency on libclang.
> However the code generation could still be done with GCC and with
> any compiler version.
> 
> I'll also put it in a table, though I understand that some choices
> here might be debatable:
> 
>                          stackful      Duff's device            C++20              LLVM
> ==============================================================================================
> Code to write/maintain    ++ [1]             ---                   +++              - [2]
> Changes to existing code  ++ [3]             -                     --               -
> Community acceptance      ++                 ++                    --               ?
> Code or PoC exists        ++                 +                     -                --
> ==============================================================================================
> Portability               --                 ++                    +                -
> Debuggability             -                  ++                    ?                ?
> Performance               -                  ++ [4]                ++               ++
> 
> [1] I'm penalizing stackful coroutines here because the worse portability
> has an impact on future maintainability too.
> 
> [2] This is an educated guess.
> 
> [3] If we decide to remove the possibility of using the same function for
> both coroutine and non-coroutine context, the changes to existing code
> would be the same as for Duff's device and LLVM coroutines.
> 
> [4] Slightly worse than C++20 coroutines for the PoC, but that is mostly due
> to implementation choices that are easy to change.
> 
> 
> Stackful coroutines are obviously pretty good, or we wouldn't have used them.
> They might be a local optimum though, as shown by the negative points in terms
> of portability, debuggability and performance.
> 
> Both Duff's device and LLVM would be more or less transparent to the part of
> the community that doesn't care about the coroutines.  The translator would
> probably be write-and-forget (though I'm not sure about the API stability of
> libclang, which would be a major factor), but it would still be a substantial
> amount of work to commit to.

I don't see a clear winner but here is my order of preference:
1. Stackful - the devil we know
2. Duff's device - a temporary (wasteful) step before native compiler support?
3. LLVM - actually not bad but requires dropping gcc support
4. C++20 - I worry adding C++ into the codebase will cause friction

Ideally gcc and clang would support C coroutines natively, making the
choice simple. Is it worth treating this as a long term project and
working with LLVM/clang and gcc to add native C coroutine support to
compilers? We still have stackful coroutines in the short term.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: comparison of coroutine backends
  2022-03-18  8:48 comparison of coroutine backends Paolo Bonzini
  2022-03-21 10:40 ` Stefan Hajnoczi
@ 2022-03-21 16:30 ` Daniel P. Berrangé
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel P. Berrangé @ 2022-03-21 16:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Hanna Reitz, Stefan Hajnoczi, qemu-devel, qemu block

On Fri, Mar 18, 2022 at 09:48:37AM +0100, Paolo Bonzini wrote:
> Hi all,
> 
> based on the previous discussions here is a comparison of the various
> possibilities for implementing coroutine backends in QEMU and the
> respective advantages and disadvantages.
> 
> I'm adding a third possibility for stackless coroutines, which is to
> use the LLVM/clang builtins.  I believe that would still require a
> source-to-source translator, but it would offload to the compiler the
> complicated bits such as liveness analysis.
> 
> 1) Stackful coroutines:
> Advantages:
> - no changes to current code
> 
> Disadvantages:
> - portability issues regarding shadow stacks (SafeStack, CET)
> - portability/nonconformance issues regarding TLS
> 
> Another possible advantage is that it allows using the same function for
> both coroutine and non-coroutine context.  I'm listing this separately
> because I'm not sure that's desirable, as it prevents compile-time
> checking of calls to coroutine_fn.  Compile-time checking would be
> possible using clang -fthread-safety if we forgo the ability to use the
> same function in both scenarios.
> 
> 
> 2) "Duff's device" stackless coroutines
> Advantages:
> - no portability issues regarding both shadow stacks and TLS
> - compiles to good old C code
> - compile-time checking of "coroutine-only" but not awaitable functions
> - debuggability: stack frames should be easy to inspect
> 
> Disadvantages:
> - complex source-to-source translator

I guess I'm still a bit fuzzy on the actual implications of this
point. Is this a one time hit to write it, or is it something
that is going to need periodic (even frequent) updates to cope
with new places in which we use coroutines  ? Presumably most
maintainers won't have to care about / look at the details of
it ?

> - more complex build process
> 
> 
> 3) C++20 stackless coroutines
> Advantages:
> - no portability issues regarding both shadow stacks and TLS
> - no code to write outside QEMU
> - simpler build process
> 
> Disadvantages:
> - requires a new compiler
> - it's C++
> - no compile-time checking of "coroutine-only" but not awaitable functions
> 
> 
> 4) LLVM stackless coroutines
> Advantages:
> - no portability issues regarding both shadow stacks and TLS
> - no code to write outside QEMU
> 
> Disadvantages:
> - relatively simple source-to-source translator
> - more complex build process
> - requires a new compiler and doesn't support GCC
> 
> 
> Note that (2) would still have a build dependency on libclang.
> However the code generation could still be done with GCC and with
> any compiler version.

We looked at using libclang for some code generation in libvirt, via
its python API binding. While we didn't go forward with it (yet), it
looked promising as a library to use. IIRC, it was viable from clang
vintage available in RHEL-8 onwards, as versions before that point
were not compatible with the current python binding. I think it would
cover all the mainstream platforms that QEMU officially targets and
tests in CI right now.

I think libclang might also be an interesting framework on which to
experiment with code analysis checks, to augment (or even replace)
some of what is done by checkpatch.pl

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-21 16:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-18  8:48 comparison of coroutine backends Paolo Bonzini
2022-03-21 10:40 ` Stefan Hajnoczi
2022-03-21 16:30 ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).