All of lore.kernel.org
 help / color / mirror / Atom feed
* Portable inline asm to get address of TLS variable
@ 2022-02-16 17:46 Stefan Hajnoczi
  2022-02-16 18:13 ` Florian Weimer
  2022-02-16 22:28 ` Paolo Bonzini
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-02-16 17:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Florian Weimer, qemu-devel, Serge Guelton

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

Hi,
I've been trying to make the inline asm that gets the address of a TLS
variable for QEMU coroutines pass QEMU's GitLab CI.
https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89

The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't
allowed in -fPIC shared libraries) so builds fail with ./configure
--enable-modules. While I was tackling this I stumbled on this:

  void *dst_ptr;
  asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var))

What's nice about it:
- It's portable, there are no arch-specific assembly instructions.
- It works for both -fPIC and non-PIC.

However, I wonder if the compiler might reuse a register that already
contains the address. Then we'd have the coroutine problem again when
qemu_coroutine_yield() is called between the earlier address calculation
and the asm volatile statement.

Thoughts?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 17:46 Portable inline asm to get address of TLS variable Stefan Hajnoczi
@ 2022-02-16 18:13 ` Florian Weimer
  2022-02-16 20:28   ` Stefan Hajnoczi
  2022-02-16 22:28 ` Paolo Bonzini
  1 sibling, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2022-02-16 18:13 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Richard Henderson, qemu-devel, Serge Guelton

* Stefan Hajnoczi:

> I've been trying to make the inline asm that gets the address of a TLS
> variable for QEMU coroutines pass QEMU's GitLab CI.
> https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89
>
> The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't
> allowed in -fPIC shared libraries) so builds fail with ./configure
> --enable-modules. While I was tackling this I stumbled on this:
>
>   void *dst_ptr;
>   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var))
>
> What's nice about it:
> - It's portable, there are no arch-specific assembly instructions.
> - It works for both -fPIC and non-PIC.
>
> However, I wonder if the compiler might reuse a register that already
> contains the address. Then we'd have the coroutine problem again when
> qemu_coroutine_yield() is called between the earlier address calculation
> and the asm volatile statement.
>
> Thoughts?

Sorry, I don't see why this isn't equivalent to a plain &tls_var.
What exactly are you trying to achieve?

Thanks,
Florian



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 18:13 ` Florian Weimer
@ 2022-02-16 20:28   ` Stefan Hajnoczi
  2022-02-16 20:33     ` Stefan Hajnoczi
  2022-02-16 20:40     ` Florian Weimer
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-02-16 20:28 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton

On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote:
>
> * Stefan Hajnoczi:
>
> > I've been trying to make the inline asm that gets the address of a TLS
> > variable for QEMU coroutines pass QEMU's GitLab CI.
> > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89
> >
> > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't
> > allowed in -fPIC shared libraries) so builds fail with ./configure
> > --enable-modules. While I was tackling this I stumbled on this:
> >
> >   void *dst_ptr;
> >   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var))
> >
> > What's nice about it:
> > - It's portable, there are no arch-specific assembly instructions.
> > - It works for both -fPIC and non-PIC.
> >
> > However, I wonder if the compiler might reuse a register that already
> > contains the address. Then we'd have the coroutine problem again when
> > qemu_coroutine_yield() is called between the earlier address calculation
> > and the asm volatile statement.
> >
> > Thoughts?
>
> Sorry, I don't see why this isn't equivalent to a plain &tls_var.
> What exactly are you trying to achieve?

&tls_var, except forcing the compiler to calculate the address from scratch.

The goal is to avoid stale TLS variable addresses when a coroutine
yields in one thread and is resumed in another thread.

Stefan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 20:28   ` Stefan Hajnoczi
@ 2022-02-16 20:33     ` Stefan Hajnoczi
  2022-02-16 20:46       ` Florian Weimer
  2022-02-16 20:40     ` Florian Weimer
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-02-16 20:33 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton

On Wed, 16 Feb 2022 at 20:28, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Stefan Hajnoczi:
> >
> > > I've been trying to make the inline asm that gets the address of a TLS
> > > variable for QEMU coroutines pass QEMU's GitLab CI.
> > > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89
> > >
> > > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't
> > > allowed in -fPIC shared libraries) so builds fail with ./configure
> > > --enable-modules. While I was tackling this I stumbled on this:
> > >
> > >   void *dst_ptr;
> > >   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var))
> > >
> > > What's nice about it:
> > > - It's portable, there are no arch-specific assembly instructions.
> > > - It works for both -fPIC and non-PIC.
> > >
> > > However, I wonder if the compiler might reuse a register that already
> > > contains the address. Then we'd have the coroutine problem again when
> > > qemu_coroutine_yield() is called between the earlier address calculation
> > > and the asm volatile statement.
> > >
> > > Thoughts?
> >
> > Sorry, I don't see why this isn't equivalent to a plain &tls_var.
> > What exactly are you trying to achieve?
>
> &tls_var, except forcing the compiler to calculate the address from scratch.
>
> The goal is to avoid stale TLS variable addresses when a coroutine
> yields in one thread and is resumed in another thread.

I'm basically asking whether the &tls_var input operand is treated as
volatile and part of the inline assembly or whether it's just regular
C code that the compiler may optimize with the surrounding function?

Stefan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 20:28   ` Stefan Hajnoczi
  2022-02-16 20:33     ` Stefan Hajnoczi
@ 2022-02-16 20:40     ` Florian Weimer
  2022-02-17  9:28       ` Stefan Hajnoczi
  1 sibling, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2022-02-16 20:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton

* Stefan Hajnoczi:

> On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Stefan Hajnoczi:
>>
>> > I've been trying to make the inline asm that gets the address of a TLS
>> > variable for QEMU coroutines pass QEMU's GitLab CI.
>> > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89
>> >
>> > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't
>> > allowed in -fPIC shared libraries) so builds fail with ./configure
>> > --enable-modules. While I was tackling this I stumbled on this:
>> >
>> >   void *dst_ptr;
>> >   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var))
>> >
>> > What's nice about it:
>> > - It's portable, there are no arch-specific assembly instructions.
>> > - It works for both -fPIC and non-PIC.
>> >
>> > However, I wonder if the compiler might reuse a register that already
>> > contains the address. Then we'd have the coroutine problem again when
>> > qemu_coroutine_yield() is called between the earlier address calculation
>> > and the asm volatile statement.
>> >
>> > Thoughts?
>>
>> Sorry, I don't see why this isn't equivalent to a plain &tls_var.
>> What exactly are you trying to achieve?
>
> &tls_var, except forcing the compiler to calculate the address from scratch.

I think you can compute

  (void *) &tls_var - __builtin_thread_pointer ();

to get the offset.  On many targets, GCC folds away the thread pointer
load, but that doesn't change the outcome.  Then it boils down to
getting access to the thread pointer, and you can get that behind a
compiler barrier (in a separate function).

But going against ABI and toolchain in this way is really no long-term
solution.  You need to switch to stackless co-routines, or we need to
provide proper ABI-level support for this.  Today it's the thread
pointer, tomorrow it's the shadow stack pointer, and the day after that,
it's the SafeStack pointer.  And further down the road, it's some thread
state for garbage collection support.  Or something like that.

Thanks,
Florian



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 20:33     ` Stefan Hajnoczi
@ 2022-02-16 20:46       ` Florian Weimer
  2022-02-17  9:30         ` Stefan Hajnoczi
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2022-02-16 20:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton

* Stefan Hajnoczi:

> I'm basically asking whether the &tls_var input operand is treated as
> volatile and part of the inline assembly or whether it's just regular
> C code that the compiler may optimize with the surrounding function?

&tls_var is evaluated outside of the inline assembly, any compiler
barrier will come after that.  It's subject to CSE (or whatever it's
called.  Three asm statements in a row

  asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var));
  asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var));
  asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var));

result in

	movq	tls_var@gottpoff(%rip), %rax
	addq	%fs:0, %rax
	movq	%rax, %rdx
	movq	%rax, %rdx

which is probably not what you want.

Thanks,
Florian



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 17:46 Portable inline asm to get address of TLS variable Stefan Hajnoczi
  2022-02-16 18:13 ` Florian Weimer
@ 2022-02-16 22:28 ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2022-02-16 22:28 UTC (permalink / raw)
  To: Stefan Hajnoczi, Richard Henderson
  Cc: Florian Weimer, qemu-devel, Serge Guelton

On 2/16/22 18:46, Stefan Hajnoczi wrote:
> However, I wonder if the compiler might reuse a register that already
> contains the address. Then we'd have the coroutine problem again when
> qemu_coroutine_yield() is called between the earlier address calculation
> and the asm volatile statement.

Yes, the compiler should be able to reuse the register.  volatile only 
says that the contents of the "asm" cannot be subject to e.g. loop 
optimizations:

	for (i = 0; i < 10; i++) {
		asm("# assembly": "=r"(k) : "0"(10));
		j += k;
	}

will likely execute the asm once, while "asm volatile" (or an asm 
without inputs, which is always volatile) will execute it ten times.

However, the input of the assembly can be evaluated only once either 
way.  For example, in the case above you might have "movl $10, %edx" 
outside the loop even with asm volatile.

One way to fix it for modules could be to define a (global, non-TLS) 
variable in QEMU with the %fs-based offset of the relevant thread-local 
variable, and initialize it before modules are loaded.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 20:40     ` Florian Weimer
@ 2022-02-17  9:28       ` Stefan Hajnoczi
  2022-02-17 11:40         ` Paolo Bonzini
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-02-17  9:28 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton

[-- Attachment #1: Type: text/plain, Size: 3061 bytes --]

On Wed, Feb 16, 2022 at 09:40:34PM +0100, Florian Weimer wrote:
> * Stefan Hajnoczi:
> 
> > On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Stefan Hajnoczi:
> >>
> >> > I've been trying to make the inline asm that gets the address of a TLS
> >> > variable for QEMU coroutines pass QEMU's GitLab CI.
> >> > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89
> >> >
> >> > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't
> >> > allowed in -fPIC shared libraries) so builds fail with ./configure
> >> > --enable-modules. While I was tackling this I stumbled on this:
> >> >
> >> >   void *dst_ptr;
> >> >   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var))
> >> >
> >> > What's nice about it:
> >> > - It's portable, there are no arch-specific assembly instructions.
> >> > - It works for both -fPIC and non-PIC.
> >> >
> >> > However, I wonder if the compiler might reuse a register that already
> >> > contains the address. Then we'd have the coroutine problem again when
> >> > qemu_coroutine_yield() is called between the earlier address calculation
> >> > and the asm volatile statement.
> >> >
> >> > Thoughts?
> >>
> >> Sorry, I don't see why this isn't equivalent to a plain &tls_var.
> >> What exactly are you trying to achieve?
> >
> > &tls_var, except forcing the compiler to calculate the address from scratch.
> 
> I think you can compute
> 
>   (void *) &tls_var - __builtin_thread_pointer ();
> 
> to get the offset.  On many targets, GCC folds away the thread pointer
> load, but that doesn't change the outcome.  Then it boils down to
> getting access to the thread pointer, and you can get that behind a
> compiler barrier (in a separate function).

Interesting, this is something we haven't tried yet. It sounds like it
can be implemented in C without architecture- or ELF-specific inline
assembly.

> But going against ABI and toolchain in this way is really no long-term
> solution.  You need to switch to stackless co-routines, or we need to
> provide proper ABI-level support for this.  Today it's the thread
> pointer, tomorrow it's the shadow stack pointer, and the day after that,
> it's the SafeStack pointer.  And further down the road, it's some thread
> state for garbage collection support.  Or something like that.

Yes, understood :(. This does feel like solving an undefined behavior
problem by adding more undefined behavior on top!

Stackless coroutines have been tried in the past using Continuation
Passing C (https://github.com/kerneis/cpc). Ideally we'd use a solution
built into the compiler though. I'm concerned that CPC might not be
supported or available everywhere QEMU needs to run now and in the
future.

I took a quick look at C++20 coroutines since they are available in
compilers but the primitives look hard to use even from C++, let alone
from C.

If you have any suggestions for stackless coroutine implementations,
please let me know!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-16 20:46       ` Florian Weimer
@ 2022-02-17  9:30         ` Stefan Hajnoczi
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-02-17  9:30 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]

On Wed, Feb 16, 2022 at 09:46:02PM +0100, Florian Weimer wrote:
> * Stefan Hajnoczi:
> 
> > I'm basically asking whether the &tls_var input operand is treated as
> > volatile and part of the inline assembly or whether it's just regular
> > C code that the compiler may optimize with the surrounding function?
> 
> &tls_var is evaluated outside of the inline assembly, any compiler
> barrier will come after that.  It's subject to CSE (or whatever it's
> called.  Three asm statements in a row
> 
>   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var));
>   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var));
>   asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var));
> 
> result in
> 
> 	movq	tls_var@gottpoff(%rip), %rax
> 	addq	%fs:0, %rax
> 	movq	%rax, %rdx
> 	movq	%rax, %rdx
> 
> which is probably not what you want.

Right, the approach I suggested doesn't work. Thanks for sharing the
example!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-17  9:28       ` Stefan Hajnoczi
@ 2022-02-17 11:40         ` Paolo Bonzini
  2022-02-17 15:02           ` Serge Guelton
  2022-02-17 14:59         ` Serge Guelton
  2022-03-01 11:54         ` Florian Weimer
  2 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2022-02-17 11:40 UTC (permalink / raw)
  To: Stefan Hajnoczi, Florian Weimer
  Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton

On 2/17/22 10:28, Stefan Hajnoczi wrote:
>> But going against ABI and toolchain in this way is really no long-term
>> solution.  You need to switch to stackless co-routines, or we need to
>> provide proper ABI-level support for this.  Today it's the thread
>> pointer, tomorrow it's the shadow stack pointer, and the day after that,
>> it's the SafeStack pointer.  And further down the road, it's some thread
>> state for garbage collection support.  Or something like that.
> 
> Yes, understood :(. This does feel like solving an undefined behavior
> problem by adding more undefined behavior on top!

Yes, this is the kind of thing that I generally despise when I see other 
programs do it...  it's easy to dig ourselves in the same hole.

> I took a quick look at C++20 coroutines since they are available in
> compilers but the primitives look hard to use even from C++, let alone
> from C.

They're C++ only in GCC, too.  I really think that QEMU should be 
compilable in C++, but I'm not sure how easy a sell it is.

Paolo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-17  9:28       ` Stefan Hajnoczi
  2022-02-17 11:40         ` Paolo Bonzini
@ 2022-02-17 14:59         ` Serge Guelton
  2022-03-01 11:54         ` Florian Weimer
  2 siblings, 0 replies; 19+ messages in thread
From: Serge Guelton @ 2022-02-17 14:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Florian Weimer, Stefan Hajnoczi, Richard Henderson, qemu-devel

> I took a quick look at C++20 coroutines since they are available in
> compilers but the primitives look hard to use even from C++, let alone
> from C.

Same story here :-/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-17 11:40         ` Paolo Bonzini
@ 2022-02-17 15:02           ` Serge Guelton
  2022-02-17 15:11             ` Stefan Hajnoczi
  2022-02-17 15:51             ` Paolo Bonzini
  0 siblings, 2 replies; 19+ messages in thread
From: Serge Guelton @ 2022-02-17 15:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Florian Weimer, Stefan Hajnoczi, Richard Henderson, qemu-devel,
	Stefan Hajnoczi

On Thu, Feb 17, 2022 at 12:40:40PM +0100, Paolo Bonzini wrote:
> On 2/17/22 10:28, Stefan Hajnoczi wrote:
> >>But going against ABI and toolchain in this way is really no long-term
> >>solution.  You need to switch to stackless co-routines, or we need to
> >>provide proper ABI-level support for this.  Today it's the thread
> >>pointer, tomorrow it's the shadow stack pointer, and the day after that,
> >>it's the SafeStack pointer.  And further down the road, it's some thread
> >>state for garbage collection support.  Or something like that.
> >
> >Yes, understood :(. This does feel like solving an undefined behavior
> >problem by adding more undefined behavior on top!
> 
> Yes, this is the kind of thing that I generally despise when I see
> other programs do it...  it's easy to dig ourselves in the same
> hole.
> 
> >I took a quick look at C++20 coroutines since they are available in
> >compilers but the primitives look hard to use even from C++, let alone
> >from C.
> 
> They're C++ only in GCC, too.  I really think that QEMU should be
> compilable in C++, but I'm not sure how easy a sell it is.

It's perfectly fine to have one compilation unit written in C++ with a few
symbol in `extern "C"`. No need to touch the other part of the project.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-17 15:02           ` Serge Guelton
@ 2022-02-17 15:11             ` Stefan Hajnoczi
  2022-02-17 15:51             ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-02-17 15:11 UTC (permalink / raw)
  To: Serge Guelton
  Cc: Florian Weimer, Paolo Bonzini, Richard Henderson, qemu-devel,
	Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]

On Thu, Feb 17, 2022 at 04:02:16PM +0100, Serge Guelton wrote:
> On Thu, Feb 17, 2022 at 12:40:40PM +0100, Paolo Bonzini wrote:
> > On 2/17/22 10:28, Stefan Hajnoczi wrote:
> > >>But going against ABI and toolchain in this way is really no long-term
> > >>solution.  You need to switch to stackless co-routines, or we need to
> > >>provide proper ABI-level support for this.  Today it's the thread
> > >>pointer, tomorrow it's the shadow stack pointer, and the day after that,
> > >>it's the SafeStack pointer.  And further down the road, it's some thread
> > >>state for garbage collection support.  Or something like that.
> > >
> > >Yes, understood :(. This does feel like solving an undefined behavior
> > >problem by adding more undefined behavior on top!
> > 
> > Yes, this is the kind of thing that I generally despise when I see
> > other programs do it...  it's easy to dig ourselves in the same
> > hole.
> > 
> > >I took a quick look at C++20 coroutines since they are available in
> > >compilers but the primitives look hard to use even from C++, let alone
> > >from C.
> > 
> > They're C++ only in GCC, too.  I really think that QEMU should be
> > compilable in C++, but I'm not sure how easy a sell it is.
> 
> It's perfectly fine to have one compilation unit written in C++ with a few
> symbol in `extern "C"`. No need to touch the other part of the project.
> 

I don't think that's possible in this case because the coroutine
functions are spread throughout the codebase. All coroutine functions
need to be in C++ source units so the compiler can transform them and
emit code callable as a coroutine.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-17 15:02           ` Serge Guelton
  2022-02-17 15:11             ` Stefan Hajnoczi
@ 2022-02-17 15:51             ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2022-02-17 15:51 UTC (permalink / raw)
  To: Serge Guelton
  Cc: Florian Weimer, Stefan Hajnoczi, Richard Henderson, qemu-devel,
	Stefan Hajnoczi

On 2/17/22 16:02, Serge Guelton wrote:
>>> I took a quick look at C++20 coroutines since they are available in
>>> compilers but the primitives look hard to use even from C++, let alone
>> >from C.
>>
>> They're C++ only in GCC, too.  I really think that QEMU should be
>> compilable in C++, but I'm not sure how easy a sell it is.
> It's perfectly fine to have one compilation unit written in C++ with a few
> symbol in `extern "C"`. No need to touch the other part of the project.

It's not just one compilation unit, it's everything that uses coroutines 
so basically all of block/.  But yes, good point---it means for example 
that you don't have to deal as much with lack of operators in C++ enums, 
which would be a huge PITA in compiling QEMU with C++.  There would 
still be some churn such as adding extern "C" blocks to headers, etc.

The main change with C++20 coroutines would be to introduce co_await, 
co_return and std::future<> everywhere, which is also a pretty 
substantial change (possibly an improvement in the case of co_await and 
co_return, but still a lot of work).

That said, it's certainly valuable to try and get at least 
tests/unit/test-coroutine.c to run with C++ coroutines, and see how much 
work that is.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-02-17  9:28       ` Stefan Hajnoczi
  2022-02-17 11:40         ` Paolo Bonzini
  2022-02-17 14:59         ` Serge Guelton
@ 2022-03-01 11:54         ` Florian Weimer
  2022-03-01 13:39           ` Stefan Hajnoczi
  2 siblings, 1 reply; 19+ messages in thread
From: Florian Weimer @ 2022-03-01 11:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton

* Stefan Hajnoczi:

>> But going against ABI and toolchain in this way is really no long-term
>> solution.  You need to switch to stackless co-routines, or we need to
>> provide proper ABI-level support for this.  Today it's the thread
>> pointer, tomorrow it's the shadow stack pointer, and the day after that,
>> it's the SafeStack pointer.  And further down the road, it's some thread
>> state for garbage collection support.  Or something like that.
>
> Yes, understood :(. This does feel like solving an undefined behavior
> problem by adding more undefined behavior on top!
>
> Stackless coroutines have been tried in the past using Continuation
> Passing C (https://github.com/kerneis/cpc). Ideally we'd use a solution
> built into the compiler though. I'm concerned that CPC might not be
> supported or available everywhere QEMU needs to run now and in the
> future.

That seems to be require an entirely different toolchain (based on CIL).
It's one way to solve the ABI issues, but perhaps not the direction
you want to go in.

> I took a quick look at C++20 coroutines since they are available in
> compilers but the primitives look hard to use even from C++, let alone
> from C.

Could you go into details what makes them hard to use?  Is it because
coroutines are infectious across the call stack?

Thanks,
Florian



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-03-01 11:54         ` Florian Weimer
@ 2022-03-01 13:39           ` Stefan Hajnoczi
  2022-04-19 11:32             ` Florian Weimer
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-03-01 13:39 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton

[-- Attachment #1: Type: text/plain, Size: 567 bytes --]

On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote:
> > I took a quick look at C++20 coroutines since they are available in
> > compilers but the primitives look hard to use even from C++, let alone
> > from C.
> 
> Could you go into details what makes them hard to use?  Is it because
> coroutines are infectious across the call stack?

Here is the simplest tutorial on C++20 coroutines I found:
https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d

The amount of boilerplate for trivial coroutine functions is ridiculous.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-03-01 13:39           ` Stefan Hajnoczi
@ 2022-04-19 11:32             ` Florian Weimer
  2022-04-19 18:38               ` Thomas Rodgers
  2022-04-20 14:12               ` Stefan Hajnoczi
  0 siblings, 2 replies; 19+ messages in thread
From: Florian Weimer @ 2022-04-19 11:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Thomas Rodgers, Stefan Hajnoczi, Richard Henderson, qemu-devel,
	Serge Guelton

* Stefan Hajnoczi:

> On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote:
>> > I took a quick look at C++20 coroutines since they are available in
>> > compilers but the primitives look hard to use even from C++, let alone
>> > from C.
>> 
>> Could you go into details what makes them hard to use?  Is it because
>> coroutines are infectious across the call stack?
>
> Here is the simplest tutorial on C++20 coroutines I found:
> https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d
>
> The amount of boilerplate for trivial coroutine functions is ridiculous.

Would an execution agent library reduce that usage overhead?

Cc:ing Thomas, who might know the answer.

Thanks,
Florian



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-04-19 11:32             ` Florian Weimer
@ 2022-04-19 18:38               ` Thomas Rodgers
  2022-04-20 14:12               ` Stefan Hajnoczi
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Rodgers @ 2022-04-19 18:38 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Jason Merrill, Stefan Hajnoczi, Richard Henderson, qemu-devel,
	Stefan Hajnoczi, Jonathan Wakely, Serge Guelton

[-- Attachment #1: Type: text/plain, Size: 1892 bytes --]

So, this was my primary objection during the standardization of coroutines
for C++20. Red Hat's vote was consistently against adding the feature
without library support, but here we are.

Lewis Baker (formerly at Facebook) has led most of the work since on
defining what that library support might look like. The library where he
has done most of his exploration on the matter is -

https://github.com/lewissbaker/cppcoro

I spoke a bit this morning with one of the C++ committee members most
directly involved in where this is going standardization-wise and the
takeaway about the current expectations is -

C++23 is likely to get at least some minimal library support in the form of
-

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2502r1.pdf

Which defines a generator<T> that models std::ranges::input_range.

But, for anything that involves a library for scheduling asynchronous
execution of coroutines (e.g. tasks<T>'s) on different execution contexts
(threads) that is likely not going to happen until C++26.

I wish I had a better story to tell.

Tom.

On Tue, Apr 19, 2022 at 4:32 AM Florian Weimer <fweimer@redhat.com> wrote:

> * Stefan Hajnoczi:
>
> > On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote:
> >> > I took a quick look at C++20 coroutines since they are available in
> >> > compilers but the primitives look hard to use even from C++, let alone
> >> > from C.
> >>
> >> Could you go into details what makes them hard to use?  Is it because
> >> coroutines are infectious across the call stack?
> >
> > Here is the simplest tutorial on C++20 coroutines I found:
> > https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d
> >
> > The amount of boilerplate for trivial coroutine functions is ridiculous.
>
> Would an execution agent library reduce that usage overhead?
>
> Cc:ing Thomas, who might know the answer.
>
> Thanks,
> Florian
>
>

[-- Attachment #2: Type: text/html, Size: 2826 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Portable inline asm to get address of TLS variable
  2022-04-19 11:32             ` Florian Weimer
  2022-04-19 18:38               ` Thomas Rodgers
@ 2022-04-20 14:12               ` Stefan Hajnoczi
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2022-04-20 14:12 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Thomas Rodgers, Stefan Hajnoczi, Richard Henderson, qemu-devel,
	Serge Guelton

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]

On Tue, Apr 19, 2022 at 01:32:42PM +0200, Florian Weimer wrote:
> * Stefan Hajnoczi:
> 
> > On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote:
> >> > I took a quick look at C++20 coroutines since they are available in
> >> > compilers but the primitives look hard to use even from C++, let alone
> >> > from C.
> >> 
> >> Could you go into details what makes them hard to use?  Is it because
> >> coroutines are infectious across the call stack?
> >
> > Here is the simplest tutorial on C++20 coroutines I found:
> > https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d
> >
> > The amount of boilerplate for trivial coroutine functions is ridiculous.
> 
> Would an execution agent library reduce that usage overhead?

Paolo Bonzini wrote a proof-of-concept using C++20 coroutines:
https://lore.kernel.org/all/20220314093203.1420404-1-pbonzini@redhat.com/

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-04-20 14:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 17:46 Portable inline asm to get address of TLS variable Stefan Hajnoczi
2022-02-16 18:13 ` Florian Weimer
2022-02-16 20:28   ` Stefan Hajnoczi
2022-02-16 20:33     ` Stefan Hajnoczi
2022-02-16 20:46       ` Florian Weimer
2022-02-17  9:30         ` Stefan Hajnoczi
2022-02-16 20:40     ` Florian Weimer
2022-02-17  9:28       ` Stefan Hajnoczi
2022-02-17 11:40         ` Paolo Bonzini
2022-02-17 15:02           ` Serge Guelton
2022-02-17 15:11             ` Stefan Hajnoczi
2022-02-17 15:51             ` Paolo Bonzini
2022-02-17 14:59         ` Serge Guelton
2022-03-01 11:54         ` Florian Weimer
2022-03-01 13:39           ` Stefan Hajnoczi
2022-04-19 11:32             ` Florian Weimer
2022-04-19 18:38               ` Thomas Rodgers
2022-04-20 14:12               ` Stefan Hajnoczi
2022-02-16 22:28 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.