* Portable inline asm to get address of TLS variable @ 2022-02-16 17:46 Stefan Hajnoczi 2022-02-16 18:13 ` Florian Weimer 2022-02-16 22:28 ` Paolo Bonzini 0 siblings, 2 replies; 19+ messages in thread From: Stefan Hajnoczi @ 2022-02-16 17:46 UTC (permalink / raw) To: Richard Henderson; +Cc: Florian Weimer, qemu-devel, Serge Guelton [-- Attachment #1: Type: text/plain, Size: 880 bytes --] Hi, I've been trying to make the inline asm that gets the address of a TLS variable for QEMU coroutines pass QEMU's GitLab CI. https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89 The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't allowed in -fPIC shared libraries) so builds fail with ./configure --enable-modules. While I was tackling this I stumbled on this: void *dst_ptr; asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)) What's nice about it: - It's portable, there are no arch-specific assembly instructions. - It works for both -fPIC and non-PIC. However, I wonder if the compiler might reuse a register that already contains the address. Then we'd have the coroutine problem again when qemu_coroutine_yield() is called between the earlier address calculation and the asm volatile statement. Thoughts? Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 17:46 Portable inline asm to get address of TLS variable Stefan Hajnoczi @ 2022-02-16 18:13 ` Florian Weimer 2022-02-16 20:28 ` Stefan Hajnoczi 2022-02-16 22:28 ` Paolo Bonzini 1 sibling, 1 reply; 19+ messages in thread From: Florian Weimer @ 2022-02-16 18:13 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Richard Henderson, qemu-devel, Serge Guelton * Stefan Hajnoczi: > I've been trying to make the inline asm that gets the address of a TLS > variable for QEMU coroutines pass QEMU's GitLab CI. > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89 > > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't > allowed in -fPIC shared libraries) so builds fail with ./configure > --enable-modules. While I was tackling this I stumbled on this: > > void *dst_ptr; > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)) > > What's nice about it: > - It's portable, there are no arch-specific assembly instructions. > - It works for both -fPIC and non-PIC. > > However, I wonder if the compiler might reuse a register that already > contains the address. Then we'd have the coroutine problem again when > qemu_coroutine_yield() is called between the earlier address calculation > and the asm volatile statement. > > Thoughts? Sorry, I don't see why this isn't equivalent to a plain &tls_var. What exactly are you trying to achieve? Thanks, Florian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 18:13 ` Florian Weimer @ 2022-02-16 20:28 ` Stefan Hajnoczi 2022-02-16 20:33 ` Stefan Hajnoczi 2022-02-16 20:40 ` Florian Weimer 0 siblings, 2 replies; 19+ messages in thread From: Stefan Hajnoczi @ 2022-02-16 20:28 UTC (permalink / raw) To: Florian Weimer Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote: > > * Stefan Hajnoczi: > > > I've been trying to make the inline asm that gets the address of a TLS > > variable for QEMU coroutines pass QEMU's GitLab CI. > > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89 > > > > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't > > allowed in -fPIC shared libraries) so builds fail with ./configure > > --enable-modules. While I was tackling this I stumbled on this: > > > > void *dst_ptr; > > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)) > > > > What's nice about it: > > - It's portable, there are no arch-specific assembly instructions. > > - It works for both -fPIC and non-PIC. > > > > However, I wonder if the compiler might reuse a register that already > > contains the address. Then we'd have the coroutine problem again when > > qemu_coroutine_yield() is called between the earlier address calculation > > and the asm volatile statement. > > > > Thoughts? > > Sorry, I don't see why this isn't equivalent to a plain &tls_var. > What exactly are you trying to achieve? &tls_var, except forcing the compiler to calculate the address from scratch. The goal is to avoid stale TLS variable addresses when a coroutine yields in one thread and is resumed in another thread. Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 20:28 ` Stefan Hajnoczi @ 2022-02-16 20:33 ` Stefan Hajnoczi 2022-02-16 20:46 ` Florian Weimer 2022-02-16 20:40 ` Florian Weimer 1 sibling, 1 reply; 19+ messages in thread From: Stefan Hajnoczi @ 2022-02-16 20:33 UTC (permalink / raw) To: Florian Weimer Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton On Wed, 16 Feb 2022 at 20:28, Stefan Hajnoczi <stefanha@gmail.com> wrote: > > On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote: > > > > * Stefan Hajnoczi: > > > > > I've been trying to make the inline asm that gets the address of a TLS > > > variable for QEMU coroutines pass QEMU's GitLab CI. > > > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89 > > > > > > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't > > > allowed in -fPIC shared libraries) so builds fail with ./configure > > > --enable-modules. While I was tackling this I stumbled on this: > > > > > > void *dst_ptr; > > > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)) > > > > > > What's nice about it: > > > - It's portable, there are no arch-specific assembly instructions. > > > - It works for both -fPIC and non-PIC. > > > > > > However, I wonder if the compiler might reuse a register that already > > > contains the address. Then we'd have the coroutine problem again when > > > qemu_coroutine_yield() is called between the earlier address calculation > > > and the asm volatile statement. > > > > > > Thoughts? > > > > Sorry, I don't see why this isn't equivalent to a plain &tls_var. > > What exactly are you trying to achieve? > > &tls_var, except forcing the compiler to calculate the address from scratch. > > The goal is to avoid stale TLS variable addresses when a coroutine > yields in one thread and is resumed in another thread. I'm basically asking whether the &tls_var input operand is treated as volatile and part of the inline assembly or whether it's just regular C code that the compiler may optimize with the surrounding function? Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 20:33 ` Stefan Hajnoczi @ 2022-02-16 20:46 ` Florian Weimer 2022-02-17 9:30 ` Stefan Hajnoczi 0 siblings, 1 reply; 19+ messages in thread From: Florian Weimer @ 2022-02-16 20:46 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton * Stefan Hajnoczi: > I'm basically asking whether the &tls_var input operand is treated as > volatile and part of the inline assembly or whether it's just regular > C code that the compiler may optimize with the surrounding function? &tls_var is evaluated outside of the inline assembly, any compiler barrier will come after that. It's subject to CSE (or whatever it's called. Three asm statements in a row asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)); asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)); asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)); result in movq tls_var@gottpoff(%rip), %rax addq %fs:0, %rax movq %rax, %rdx movq %rax, %rdx which is probably not what you want. Thanks, Florian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 20:46 ` Florian Weimer @ 2022-02-17 9:30 ` Stefan Hajnoczi 0 siblings, 0 replies; 19+ messages in thread From: Stefan Hajnoczi @ 2022-02-17 9:30 UTC (permalink / raw) To: Florian Weimer Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton [-- Attachment #1: Type: text/plain, Size: 928 bytes --] On Wed, Feb 16, 2022 at 09:46:02PM +0100, Florian Weimer wrote: > * Stefan Hajnoczi: > > > I'm basically asking whether the &tls_var input operand is treated as > > volatile and part of the inline assembly or whether it's just regular > > C code that the compiler may optimize with the surrounding function? > > &tls_var is evaluated outside of the inline assembly, any compiler > barrier will come after that. It's subject to CSE (or whatever it's > called. Three asm statements in a row > > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)); > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)); > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)); > > result in > > movq tls_var@gottpoff(%rip), %rax > addq %fs:0, %rax > movq %rax, %rdx > movq %rax, %rdx > > which is probably not what you want. Right, the approach I suggested doesn't work. Thanks for sharing the example! Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 20:28 ` Stefan Hajnoczi 2022-02-16 20:33 ` Stefan Hajnoczi @ 2022-02-16 20:40 ` Florian Weimer 2022-02-17 9:28 ` Stefan Hajnoczi 1 sibling, 1 reply; 19+ messages in thread From: Florian Weimer @ 2022-02-16 20:40 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Richard Henderson, qemu-devel, Stefan Hajnoczi, Serge Guelton * Stefan Hajnoczi: > On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote: >> >> * Stefan Hajnoczi: >> >> > I've been trying to make the inline asm that gets the address of a TLS >> > variable for QEMU coroutines pass QEMU's GitLab CI. >> > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89 >> > >> > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't >> > allowed in -fPIC shared libraries) so builds fail with ./configure >> > --enable-modules. While I was tackling this I stumbled on this: >> > >> > void *dst_ptr; >> > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)) >> > >> > What's nice about it: >> > - It's portable, there are no arch-specific assembly instructions. >> > - It works for both -fPIC and non-PIC. >> > >> > However, I wonder if the compiler might reuse a register that already >> > contains the address. Then we'd have the coroutine problem again when >> > qemu_coroutine_yield() is called between the earlier address calculation >> > and the asm volatile statement. >> > >> > Thoughts? >> >> Sorry, I don't see why this isn't equivalent to a plain &tls_var. >> What exactly are you trying to achieve? > > &tls_var, except forcing the compiler to calculate the address from scratch. I think you can compute (void *) &tls_var - __builtin_thread_pointer (); to get the offset. On many targets, GCC folds away the thread pointer load, but that doesn't change the outcome. Then it boils down to getting access to the thread pointer, and you can get that behind a compiler barrier (in a separate function). But going against ABI and toolchain in this way is really no long-term solution. You need to switch to stackless co-routines, or we need to provide proper ABI-level support for this. Today it's the thread pointer, tomorrow it's the shadow stack pointer, and the day after that, it's the SafeStack pointer. And further down the road, it's some thread state for garbage collection support. Or something like that. Thanks, Florian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 20:40 ` Florian Weimer @ 2022-02-17 9:28 ` Stefan Hajnoczi 2022-02-17 11:40 ` Paolo Bonzini ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Stefan Hajnoczi @ 2022-02-17 9:28 UTC (permalink / raw) To: Florian Weimer Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton [-- Attachment #1: Type: text/plain, Size: 3061 bytes --] On Wed, Feb 16, 2022 at 09:40:34PM +0100, Florian Weimer wrote: > * Stefan Hajnoczi: > > > On Wed, 16 Feb 2022 at 18:14, Florian Weimer <fweimer@redhat.com> wrote: > >> > >> * Stefan Hajnoczi: > >> > >> > I've been trying to make the inline asm that gets the address of a TLS > >> > variable for QEMU coroutines pass QEMU's GitLab CI. > >> > https://gitlab.com/stefanha/qemu/-/blob/coroutine-tls-fix/include/qemu/coroutine-tls.h#L89 > >> > > >> > The code isn't -fPIC-friendly (R_X86_64_TPOFF32 relocations aren't > >> > allowed in -fPIC shared libraries) so builds fail with ./configure > >> > --enable-modules. While I was tackling this I stumbled on this: > >> > > >> > void *dst_ptr; > >> > asm volatile("" : "=r"(dst_ptr) : "0"(&tls_var)) > >> > > >> > What's nice about it: > >> > - It's portable, there are no arch-specific assembly instructions. > >> > - It works for both -fPIC and non-PIC. > >> > > >> > However, I wonder if the compiler might reuse a register that already > >> > contains the address. Then we'd have the coroutine problem again when > >> > qemu_coroutine_yield() is called between the earlier address calculation > >> > and the asm volatile statement. > >> > > >> > Thoughts? > >> > >> Sorry, I don't see why this isn't equivalent to a plain &tls_var. > >> What exactly are you trying to achieve? > > > > &tls_var, except forcing the compiler to calculate the address from scratch. > > I think you can compute > > (void *) &tls_var - __builtin_thread_pointer (); > > to get the offset. On many targets, GCC folds away the thread pointer > load, but that doesn't change the outcome. Then it boils down to > getting access to the thread pointer, and you can get that behind a > compiler barrier (in a separate function). Interesting, this is something we haven't tried yet. It sounds like it can be implemented in C without architecture- or ELF-specific inline assembly. > But going against ABI and toolchain in this way is really no long-term > solution. You need to switch to stackless co-routines, or we need to > provide proper ABI-level support for this. Today it's the thread > pointer, tomorrow it's the shadow stack pointer, and the day after that, > it's the SafeStack pointer. And further down the road, it's some thread > state for garbage collection support. Or something like that. Yes, understood :(. This does feel like solving an undefined behavior problem by adding more undefined behavior on top! Stackless coroutines have been tried in the past using Continuation Passing C (https://github.com/kerneis/cpc). Ideally we'd use a solution built into the compiler though. I'm concerned that CPC might not be supported or available everywhere QEMU needs to run now and in the future. I took a quick look at C++20 coroutines since they are available in compilers but the primitives look hard to use even from C++, let alone from C. If you have any suggestions for stackless coroutine implementations, please let me know! Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-17 9:28 ` Stefan Hajnoczi @ 2022-02-17 11:40 ` Paolo Bonzini 2022-02-17 15:02 ` Serge Guelton 2022-02-17 14:59 ` Serge Guelton 2022-03-01 11:54 ` Florian Weimer 2 siblings, 1 reply; 19+ messages in thread From: Paolo Bonzini @ 2022-02-17 11:40 UTC (permalink / raw) To: Stefan Hajnoczi, Florian Weimer Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton On 2/17/22 10:28, Stefan Hajnoczi wrote: >> But going against ABI and toolchain in this way is really no long-term >> solution. You need to switch to stackless co-routines, or we need to >> provide proper ABI-level support for this. Today it's the thread >> pointer, tomorrow it's the shadow stack pointer, and the day after that, >> it's the SafeStack pointer. And further down the road, it's some thread >> state for garbage collection support. Or something like that. > > Yes, understood :(. This does feel like solving an undefined behavior > problem by adding more undefined behavior on top! Yes, this is the kind of thing that I generally despise when I see other programs do it... it's easy to dig ourselves in the same hole. > I took a quick look at C++20 coroutines since they are available in > compilers but the primitives look hard to use even from C++, let alone > from C. They're C++ only in GCC, too. I really think that QEMU should be compilable in C++, but I'm not sure how easy a sell it is. Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-17 11:40 ` Paolo Bonzini @ 2022-02-17 15:02 ` Serge Guelton 2022-02-17 15:11 ` Stefan Hajnoczi 2022-02-17 15:51 ` Paolo Bonzini 0 siblings, 2 replies; 19+ messages in thread From: Serge Guelton @ 2022-02-17 15:02 UTC (permalink / raw) To: Paolo Bonzini Cc: Florian Weimer, Stefan Hajnoczi, Richard Henderson, qemu-devel, Stefan Hajnoczi On Thu, Feb 17, 2022 at 12:40:40PM +0100, Paolo Bonzini wrote: > On 2/17/22 10:28, Stefan Hajnoczi wrote: > >>But going against ABI and toolchain in this way is really no long-term > >>solution. You need to switch to stackless co-routines, or we need to > >>provide proper ABI-level support for this. Today it's the thread > >>pointer, tomorrow it's the shadow stack pointer, and the day after that, > >>it's the SafeStack pointer. And further down the road, it's some thread > >>state for garbage collection support. Or something like that. > > > >Yes, understood :(. This does feel like solving an undefined behavior > >problem by adding more undefined behavior on top! > > Yes, this is the kind of thing that I generally despise when I see > other programs do it... it's easy to dig ourselves in the same > hole. > > >I took a quick look at C++20 coroutines since they are available in > >compilers but the primitives look hard to use even from C++, let alone > >from C. > > They're C++ only in GCC, too. I really think that QEMU should be > compilable in C++, but I'm not sure how easy a sell it is. It's perfectly fine to have one compilation unit written in C++ with a few symbol in `extern "C"`. No need to touch the other part of the project. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-17 15:02 ` Serge Guelton @ 2022-02-17 15:11 ` Stefan Hajnoczi 2022-02-17 15:51 ` Paolo Bonzini 1 sibling, 0 replies; 19+ messages in thread From: Stefan Hajnoczi @ 2022-02-17 15:11 UTC (permalink / raw) To: Serge Guelton Cc: Florian Weimer, Paolo Bonzini, Richard Henderson, qemu-devel, Stefan Hajnoczi [-- Attachment #1: Type: text/plain, Size: 1660 bytes --] On Thu, Feb 17, 2022 at 04:02:16PM +0100, Serge Guelton wrote: > On Thu, Feb 17, 2022 at 12:40:40PM +0100, Paolo Bonzini wrote: > > On 2/17/22 10:28, Stefan Hajnoczi wrote: > > >>But going against ABI and toolchain in this way is really no long-term > > >>solution. You need to switch to stackless co-routines, or we need to > > >>provide proper ABI-level support for this. Today it's the thread > > >>pointer, tomorrow it's the shadow stack pointer, and the day after that, > > >>it's the SafeStack pointer. And further down the road, it's some thread > > >>state for garbage collection support. Or something like that. > > > > > >Yes, understood :(. This does feel like solving an undefined behavior > > >problem by adding more undefined behavior on top! > > > > Yes, this is the kind of thing that I generally despise when I see > > other programs do it... it's easy to dig ourselves in the same > > hole. > > > > >I took a quick look at C++20 coroutines since they are available in > > >compilers but the primitives look hard to use even from C++, let alone > > >from C. > > > > They're C++ only in GCC, too. I really think that QEMU should be > > compilable in C++, but I'm not sure how easy a sell it is. > > It's perfectly fine to have one compilation unit written in C++ with a few > symbol in `extern "C"`. No need to touch the other part of the project. > I don't think that's possible in this case because the coroutine functions are spread throughout the codebase. All coroutine functions need to be in C++ source units so the compiler can transform them and emit code callable as a coroutine. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-17 15:02 ` Serge Guelton 2022-02-17 15:11 ` Stefan Hajnoczi @ 2022-02-17 15:51 ` Paolo Bonzini 1 sibling, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2022-02-17 15:51 UTC (permalink / raw) To: Serge Guelton Cc: Florian Weimer, Stefan Hajnoczi, Richard Henderson, qemu-devel, Stefan Hajnoczi On 2/17/22 16:02, Serge Guelton wrote: >>> I took a quick look at C++20 coroutines since they are available in >>> compilers but the primitives look hard to use even from C++, let alone >> >from C. >> >> They're C++ only in GCC, too. I really think that QEMU should be >> compilable in C++, but I'm not sure how easy a sell it is. > It's perfectly fine to have one compilation unit written in C++ with a few > symbol in `extern "C"`. No need to touch the other part of the project. It's not just one compilation unit, it's everything that uses coroutines so basically all of block/. But yes, good point---it means for example that you don't have to deal as much with lack of operators in C++ enums, which would be a huge PITA in compiling QEMU with C++. There would still be some churn such as adding extern "C" blocks to headers, etc. The main change with C++20 coroutines would be to introduce co_await, co_return and std::future<> everywhere, which is also a pretty substantial change (possibly an improvement in the case of co_await and co_return, but still a lot of work). That said, it's certainly valuable to try and get at least tests/unit/test-coroutine.c to run with C++ coroutines, and see how much work that is. Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-17 9:28 ` Stefan Hajnoczi 2022-02-17 11:40 ` Paolo Bonzini @ 2022-02-17 14:59 ` Serge Guelton 2022-03-01 11:54 ` Florian Weimer 2 siblings, 0 replies; 19+ messages in thread From: Serge Guelton @ 2022-02-17 14:59 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Florian Weimer, Stefan Hajnoczi, Richard Henderson, qemu-devel > I took a quick look at C++20 coroutines since they are available in > compilers but the primitives look hard to use even from C++, let alone > from C. Same story here :-/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-17 9:28 ` Stefan Hajnoczi 2022-02-17 11:40 ` Paolo Bonzini 2022-02-17 14:59 ` Serge Guelton @ 2022-03-01 11:54 ` Florian Weimer 2022-03-01 13:39 ` Stefan Hajnoczi 2 siblings, 1 reply; 19+ messages in thread From: Florian Weimer @ 2022-03-01 11:54 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton * Stefan Hajnoczi: >> But going against ABI and toolchain in this way is really no long-term >> solution. You need to switch to stackless co-routines, or we need to >> provide proper ABI-level support for this. Today it's the thread >> pointer, tomorrow it's the shadow stack pointer, and the day after that, >> it's the SafeStack pointer. And further down the road, it's some thread >> state for garbage collection support. Or something like that. > > Yes, understood :(. This does feel like solving an undefined behavior > problem by adding more undefined behavior on top! > > Stackless coroutines have been tried in the past using Continuation > Passing C (https://github.com/kerneis/cpc). Ideally we'd use a solution > built into the compiler though. I'm concerned that CPC might not be > supported or available everywhere QEMU needs to run now and in the > future. That seems to be require an entirely different toolchain (based on CIL). It's one way to solve the ABI issues, but perhaps not the direction you want to go in. > I took a quick look at C++20 coroutines since they are available in > compilers but the primitives look hard to use even from C++, let alone > from C. Could you go into details what makes them hard to use? Is it because coroutines are infectious across the call stack? Thanks, Florian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-03-01 11:54 ` Florian Weimer @ 2022-03-01 13:39 ` Stefan Hajnoczi 2022-04-19 11:32 ` Florian Weimer 0 siblings, 1 reply; 19+ messages in thread From: Stefan Hajnoczi @ 2022-03-01 13:39 UTC (permalink / raw) To: Florian Weimer Cc: Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton [-- Attachment #1: Type: text/plain, Size: 567 bytes --] On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote: > > I took a quick look at C++20 coroutines since they are available in > > compilers but the primitives look hard to use even from C++, let alone > > from C. > > Could you go into details what makes them hard to use? Is it because > coroutines are infectious across the call stack? Here is the simplest tutorial on C++20 coroutines I found: https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d The amount of boilerplate for trivial coroutine functions is ridiculous. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-03-01 13:39 ` Stefan Hajnoczi @ 2022-04-19 11:32 ` Florian Weimer 2022-04-19 18:38 ` Thomas Rodgers 2022-04-20 14:12 ` Stefan Hajnoczi 0 siblings, 2 replies; 19+ messages in thread From: Florian Weimer @ 2022-04-19 11:32 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Thomas Rodgers, Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton * Stefan Hajnoczi: > On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote: >> > I took a quick look at C++20 coroutines since they are available in >> > compilers but the primitives look hard to use even from C++, let alone >> > from C. >> >> Could you go into details what makes them hard to use? Is it because >> coroutines are infectious across the call stack? > > Here is the simplest tutorial on C++20 coroutines I found: > https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d > > The amount of boilerplate for trivial coroutine functions is ridiculous. Would an execution agent library reduce that usage overhead? Cc:ing Thomas, who might know the answer. Thanks, Florian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-04-19 11:32 ` Florian Weimer @ 2022-04-19 18:38 ` Thomas Rodgers 2022-04-20 14:12 ` Stefan Hajnoczi 1 sibling, 0 replies; 19+ messages in thread From: Thomas Rodgers @ 2022-04-19 18:38 UTC (permalink / raw) To: Florian Weimer Cc: Jason Merrill, Stefan Hajnoczi, Richard Henderson, qemu-devel, Stefan Hajnoczi, Jonathan Wakely, Serge Guelton [-- Attachment #1: Type: text/plain, Size: 1892 bytes --] So, this was my primary objection during the standardization of coroutines for C++20. Red Hat's vote was consistently against adding the feature without library support, but here we are. Lewis Baker (formerly at Facebook) has led most of the work since on defining what that library support might look like. The library where he has done most of his exploration on the matter is - https://github.com/lewissbaker/cppcoro I spoke a bit this morning with one of the C++ committee members most directly involved in where this is going standardization-wise and the takeaway about the current expectations is - C++23 is likely to get at least some minimal library support in the form of - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2502r1.pdf Which defines a generator<T> that models std::ranges::input_range. But, for anything that involves a library for scheduling asynchronous execution of coroutines (e.g. tasks<T>'s) on different execution contexts (threads) that is likely not going to happen until C++26. I wish I had a better story to tell. Tom. On Tue, Apr 19, 2022 at 4:32 AM Florian Weimer <fweimer@redhat.com> wrote: > * Stefan Hajnoczi: > > > On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote: > >> > I took a quick look at C++20 coroutines since they are available in > >> > compilers but the primitives look hard to use even from C++, let alone > >> > from C. > >> > >> Could you go into details what makes them hard to use? Is it because > >> coroutines are infectious across the call stack? > > > > Here is the simplest tutorial on C++20 coroutines I found: > > https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d > > > > The amount of boilerplate for trivial coroutine functions is ridiculous. > > Would an execution agent library reduce that usage overhead? > > Cc:ing Thomas, who might know the answer. > > Thanks, > Florian > > [-- Attachment #2: Type: text/html, Size: 2826 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-04-19 11:32 ` Florian Weimer 2022-04-19 18:38 ` Thomas Rodgers @ 2022-04-20 14:12 ` Stefan Hajnoczi 1 sibling, 0 replies; 19+ messages in thread From: Stefan Hajnoczi @ 2022-04-20 14:12 UTC (permalink / raw) To: Florian Weimer Cc: Thomas Rodgers, Stefan Hajnoczi, Richard Henderson, qemu-devel, Serge Guelton [-- Attachment #1: Type: text/plain, Size: 907 bytes --] On Tue, Apr 19, 2022 at 01:32:42PM +0200, Florian Weimer wrote: > * Stefan Hajnoczi: > > > On Tue, Mar 01, 2022 at 12:54:49PM +0100, Florian Weimer wrote: > >> > I took a quick look at C++20 coroutines since they are available in > >> > compilers but the primitives look hard to use even from C++, let alone > >> > from C. > >> > >> Could you go into details what makes them hard to use? Is it because > >> coroutines are infectious across the call stack? > > > > Here is the simplest tutorial on C++20 coroutines I found: > > https://itnext.io/c-20-coroutines-complete-guide-7c3fc08db89d > > > > The amount of boilerplate for trivial coroutine functions is ridiculous. > > Would an execution agent library reduce that usage overhead? Paolo Bonzini wrote a proof-of-concept using C++20 coroutines: https://lore.kernel.org/all/20220314093203.1420404-1-pbonzini@redhat.com/ Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Portable inline asm to get address of TLS variable 2022-02-16 17:46 Portable inline asm to get address of TLS variable Stefan Hajnoczi 2022-02-16 18:13 ` Florian Weimer @ 2022-02-16 22:28 ` Paolo Bonzini 1 sibling, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2022-02-16 22:28 UTC (permalink / raw) To: Stefan Hajnoczi, Richard Henderson Cc: Florian Weimer, qemu-devel, Serge Guelton On 2/16/22 18:46, Stefan Hajnoczi wrote: > However, I wonder if the compiler might reuse a register that already > contains the address. Then we'd have the coroutine problem again when > qemu_coroutine_yield() is called between the earlier address calculation > and the asm volatile statement. Yes, the compiler should be able to reuse the register. volatile only says that the contents of the "asm" cannot be subject to e.g. loop optimizations: for (i = 0; i < 10; i++) { asm("# assembly": "=r"(k) : "0"(10)); j += k; } will likely execute the asm once, while "asm volatile" (or an asm without inputs, which is always volatile) will execute it ten times. However, the input of the assembly can be evaluated only once either way. For example, in the case above you might have "movl $10, %edx" outside the loop even with asm volatile. One way to fix it for modules could be to define a (global, non-TLS) variable in QEMU with the %fs-based offset of the relevant thread-local variable, and initialize it before modules are loaded. Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2022-04-20 14:38 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-16 17:46 Portable inline asm to get address of TLS variable Stefan Hajnoczi 2022-02-16 18:13 ` Florian Weimer 2022-02-16 20:28 ` Stefan Hajnoczi 2022-02-16 20:33 ` Stefan Hajnoczi 2022-02-16 20:46 ` Florian Weimer 2022-02-17 9:30 ` Stefan Hajnoczi 2022-02-16 20:40 ` Florian Weimer 2022-02-17 9:28 ` Stefan Hajnoczi 2022-02-17 11:40 ` Paolo Bonzini 2022-02-17 15:02 ` Serge Guelton 2022-02-17 15:11 ` Stefan Hajnoczi 2022-02-17 15:51 ` Paolo Bonzini 2022-02-17 14:59 ` Serge Guelton 2022-03-01 11:54 ` Florian Weimer 2022-03-01 13:39 ` Stefan Hajnoczi 2022-04-19 11:32 ` Florian Weimer 2022-04-19 18:38 ` Thomas Rodgers 2022-04-20 14:12 ` Stefan Hajnoczi 2022-02-16 22:28 ` Paolo Bonzini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.