* [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 @ 2017-04-06 10:22 Nikunj A Dadhania 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw) To: qemu-ppc, david, rth Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj The series enables Multi-Threaded TCG on PPC64 Patch 01: Use atomic_cmpxchg in store conditional 02: Handle first write to page during atomic operation 03: Generate memory barriers for sync/isync and load/store conditional Patches are based on ppc-for-2.10 Tested using following: ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img Todo: * Enable other machine types and PPC32. * More testing for corner cases. Nikunj A Dadhania (3): target/ppc: Emulate LL/SC using cmpxchg helpers cputlb: handle first atomic write to the page target/ppc: Generate fence operations cputlb.c | 8 +++++++- target/ppc/translate.c | 29 ++++++++++++++++++++++++++--- 2 files changed, 33 insertions(+), 4 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers 2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania @ 2017-04-06 10:22 ` Nikunj A Dadhania 2017-04-06 15:51 ` Richard Henderson ` (2 more replies) 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania ` (2 subsequent siblings) 3 siblings, 3 replies; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw) To: qemu-ppc, david, rth Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. However, portable parallel code is written assuming only cmpxchg which means that in practice this is a viable alternative. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> --- target/ppc/translate.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index b6abc60..a9c733d 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -73,6 +73,7 @@ static TCGv cpu_cfar; #endif static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32; static TCGv cpu_reserve; +static TCGv cpu_reserve_val; static TCGv cpu_fpscr; static TCGv_i32 cpu_access_type; @@ -181,6 +182,9 @@ void ppc_translate_init(void) cpu_reserve = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, reserve_addr), "reserve_addr"); + cpu_reserve_val = tcg_global_mem_new(cpu_env, + offsetof(CPUPPCState, reserve_val), + "reserve_val"); cpu_fpscr = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, fpscr), "fpscr"); @@ -3023,7 +3027,7 @@ static void gen_##name(DisasContext *ctx) \ } \ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \ tcg_gen_mov_tl(cpu_reserve, t0); \ - tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \ + tcg_gen_mov_tl(cpu_reserve_val, gpr); \ tcg_temp_free(t0); \ } @@ -3156,14 +3160,28 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA, int reg, int memop) { TCGLabel *l1; + TCGv_i32 tmp = tcg_temp_local_new_i32(); + TCGv t0; + tcg_gen_movi_i32(tmp, 0); tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); l1 = gen_new_label(); tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); + + t0 = tcg_temp_new(); + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], + ctx->mem_idx, DEF_MEMOP(memop)); + tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); + tcg_gen_trunc_tl_i32(tmp, t0); + gen_set_label(l1); + tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT); + tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp); tcg_gen_movi_tl(cpu_reserve, -1); + tcg_gen_movi_tl(cpu_reserve_val, 0); + + tcg_temp_free(t0); + tcg_temp_free_i32(tmp); } #endif -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania @ 2017-04-06 15:51 ` Richard Henderson 2017-04-07 5:12 ` Nikunj A Dadhania 2017-04-06 15:53 ` Richard Henderson 2017-04-07 5:23 ` David Gibson 2 siblings, 1 reply; 30+ messages in thread From: Richard Henderson @ 2017-04-06 15:51 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: > + TCGv_i32 tmp = tcg_temp_local_new_i32(); > + TCGv t0; > > + tcg_gen_movi_i32(tmp, 0); > tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); > l1 = gen_new_label(); > tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); > - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); > - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); > + > + t0 = tcg_temp_new(); > + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], > + ctx->mem_idx, DEF_MEMOP(memop)); > + tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); > + tcg_gen_trunc_tl_i32(tmp, t0); > + > gen_set_label(l1); > + tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT); > + tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp); I encourage you to move these two lines up beside the setcond. That way you don't need to use a local tmp, which implies a spill/restore from the stack. r~ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers 2017-04-06 15:51 ` Richard Henderson @ 2017-04-07 5:12 ` Nikunj A Dadhania 0 siblings, 0 replies; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-07 5:12 UTC (permalink / raw) To: Richard Henderson, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata Richard Henderson <rth@twiddle.net> writes: > On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: >> + TCGv_i32 tmp = tcg_temp_local_new_i32(); >> + TCGv t0; >> >> + tcg_gen_movi_i32(tmp, 0); >> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); >> l1 = gen_new_label(); >> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); >> - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); >> - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); >> + >> + t0 = tcg_temp_new(); >> + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], >> + ctx->mem_idx, DEF_MEMOP(memop)); >> + tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); >> + tcg_gen_trunc_tl_i32(tmp, t0); >> + >> gen_set_label(l1); >> + tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT); >> + tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp); > > I encourage you to move these two lines up beside the setcond. > That way you don't need to use a local tmp, which implies a > spill/restore from the stack. Sure. Regards Nikunj ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania 2017-04-06 15:51 ` Richard Henderson @ 2017-04-06 15:53 ` Richard Henderson 2017-04-07 5:14 ` Nikunj A Dadhania 2017-04-07 5:23 ` David Gibson 2 siblings, 1 reply; 30+ messages in thread From: Richard Henderson @ 2017-04-06 15:53 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: > tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); > l1 = gen_new_label(); > tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); > - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); > - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); > + > + t0 = tcg_temp_new(); > + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], > + ctx->mem_idx, DEF_MEMOP(memop)); Actually, I noticed another, existing, problem. This code changes CRF[0] before the user memory write, which might fault. This needs to delay any changes to the architecture visible state until after any exception may be triggered. r~ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers 2017-04-06 15:53 ` Richard Henderson @ 2017-04-07 5:14 ` Nikunj A Dadhania 0 siblings, 0 replies; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-07 5:14 UTC (permalink / raw) To: Richard Henderson, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata Richard Henderson <rth@twiddle.net> writes: > On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: >> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); >> l1 = gen_new_label(); >> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); >> - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); >> - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); >> + >> + t0 = tcg_temp_new(); >> + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], >> + ctx->mem_idx, DEF_MEMOP(memop)); > > Actually, I noticed another, existing, problem. > > This code changes CRF[0] before the user memory write, which might fault. This > needs to delay any changes to the architecture visible state until after any > exception may be triggered. Sure, here you are mentioning cpu_so being moved to CRF. Regards Nikunj ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania 2017-04-06 15:51 ` Richard Henderson 2017-04-06 15:53 ` Richard Henderson @ 2017-04-07 5:23 ` David Gibson 2017-04-07 5:42 ` Nikunj A Dadhania 2 siblings, 1 reply; 30+ messages in thread From: David Gibson @ 2017-04-07 5:23 UTC (permalink / raw) To: Nikunj A Dadhania Cc: qemu-ppc, rth, qemu-devel, alex.bennee, programmingkidx, bharata [-- Attachment #1: Type: text/plain, Size: 3389 bytes --] On Thu, Apr 06, 2017 at 03:52:47PM +0530, Nikunj A Dadhania wrote: > Emulating LL/SC with cmpxchg is not correct, since it can suffer from > the ABA problem. However, portable parallel code is written assuming > only cmpxchg which means that in practice this is a viable alternative. > > Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> > --- > target/ppc/translate.c | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c > index b6abc60..a9c733d 100644 > --- a/target/ppc/translate.c > +++ b/target/ppc/translate.c > @@ -73,6 +73,7 @@ static TCGv cpu_cfar; > #endif > static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32; > static TCGv cpu_reserve; > +static TCGv cpu_reserve_val; > static TCGv cpu_fpscr; > static TCGv_i32 cpu_access_type; > > @@ -181,6 +182,9 @@ void ppc_translate_init(void) > cpu_reserve = tcg_global_mem_new(cpu_env, > offsetof(CPUPPCState, reserve_addr), > "reserve_addr"); > + cpu_reserve_val = tcg_global_mem_new(cpu_env, > + offsetof(CPUPPCState, reserve_val), > + "reserve_val"); I notice that lqarx is not updated. Does that matter? > cpu_fpscr = tcg_global_mem_new(cpu_env, > offsetof(CPUPPCState, fpscr), "fpscr"); > @@ -3023,7 +3027,7 @@ static void gen_##name(DisasContext *ctx) \ > } \ > tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \ > tcg_gen_mov_tl(cpu_reserve, t0); \ > - tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \ > + tcg_gen_mov_tl(cpu_reserve_val, gpr); \ > tcg_temp_free(t0); \ > } > > @@ -3156,14 +3160,28 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA, > int reg, int memop) > { > TCGLabel *l1; > + TCGv_i32 tmp = tcg_temp_local_new_i32(); > + TCGv t0; > > + tcg_gen_movi_i32(tmp, 0); > tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so); > l1 = gen_new_label(); > tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); > - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ); > - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop); > + > + t0 = tcg_temp_new(); > + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg], > + ctx->mem_idx, DEF_MEMOP(memop)); > + tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); > + tcg_gen_trunc_tl_i32(tmp, t0); > + > gen_set_label(l1); > + tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT); > + tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp); > tcg_gen_movi_tl(cpu_reserve, -1); > + tcg_gen_movi_tl(cpu_reserve_val, 0); > + > + tcg_temp_free(t0); > + tcg_temp_free_i32(tmp); > } > #endif > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers 2017-04-07 5:23 ` David Gibson @ 2017-04-07 5:42 ` Nikunj A Dadhania 0 siblings, 0 replies; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-07 5:42 UTC (permalink / raw) To: David Gibson Cc: qemu-ppc, rth, qemu-devel, alex.bennee, programmingkidx, bharata David Gibson <david@gibson.dropbear.id.au> writes: > [ Unknown signature status ] > On Thu, Apr 06, 2017 at 03:52:47PM +0530, Nikunj A Dadhania wrote: >> Emulating LL/SC with cmpxchg is not correct, since it can suffer from >> the ABA problem. However, portable parallel code is written assuming >> only cmpxchg which means that in practice this is a viable alternative. >> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> >> --- >> target/ppc/translate.c | 24 +++++++++++++++++++++--- >> 1 file changed, 21 insertions(+), 3 deletions(-) >> >> diff --git a/target/ppc/translate.c b/target/ppc/translate.c >> index b6abc60..a9c733d 100644 >> --- a/target/ppc/translate.c >> +++ b/target/ppc/translate.c >> @@ -73,6 +73,7 @@ static TCGv cpu_cfar; >> #endif >> static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32; >> static TCGv cpu_reserve; >> +static TCGv cpu_reserve_val; >> static TCGv cpu_fpscr; >> static TCGv_i32 cpu_access_type; >> >> @@ -181,6 +182,9 @@ void ppc_translate_init(void) >> cpu_reserve = tcg_global_mem_new(cpu_env, >> offsetof(CPUPPCState, reserve_addr), >> "reserve_addr"); >> + cpu_reserve_val = tcg_global_mem_new(cpu_env, >> + offsetof(CPUPPCState, reserve_val), >> + "reserve_val"); > > I notice that lqarx is not updated. Does that matter? Thats correct, I haven't touched that yet. Most of the locks are implemented using lwarx/stwcx. Regards Nikunj ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page 2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania @ 2017-04-06 10:22 ` Nikunj A Dadhania 2017-04-06 15:54 ` Richard Henderson 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania 2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater 3 siblings, 1 reply; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw) To: qemu-ppc, david, rth Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj In case where the conditional write is the first write to the page, TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as a special case and set the dirty bit. After that fall through to the actual atomic instruction below. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> --- cputlb.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/cputlb.c b/cputlb.c index f5d056c..743776a 100644 --- a/cputlb.c +++ b/cputlb.c @@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr, tlb_addr = tlbe->addr_write; } - /* Notice an IO access, or a notdirty page. */ + /* Check notdirty */ + if (unlikely(tlb_addr & TLB_NOTDIRTY)) { + tlb_set_dirty(ENV_GET_CPU(env), addr); + tlb_addr = tlb_addr & ~TLB_NOTDIRTY; + } + + /* Notice an IO access */ if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { /* There's really nothing that can be done to support this apart from stop-the-world. */ -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania @ 2017-04-06 15:54 ` Richard Henderson 0 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2017-04-06 15:54 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: > In case where the conditional write is the first write to the page, > TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as > a special case and set the dirty bit. After that fall through to the > actual atomic instruction below. > > Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> > --- > cputlb.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) Reviewed-by: Richard Henderson <rth@twiddle.net> r~ ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations 2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania @ 2017-04-06 10:22 ` Nikunj A Dadhania 2017-04-06 16:15 ` Richard Henderson 2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater 3 siblings, 1 reply; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw) To: qemu-ppc, david, rth Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> --- target/ppc/translate.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index a9c733d..87b4fe4 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -2971,6 +2971,7 @@ static void gen_stswx(DisasContext *ctx) /* eieio */ static void gen_eieio(DisasContext *ctx) { + tcg_gen_mb(TCG_MO_LD_ST | TCG_BAR_SC); } #if !defined(CONFIG_USER_ONLY) @@ -3008,6 +3009,7 @@ static void gen_isync(DisasContext *ctx) if (!ctx->pr) { gen_check_tlb_flush(ctx, false); } + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); gen_stop_exception(ctx); } @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \ tcg_gen_mov_tl(cpu_reserve, t0); \ tcg_gen_mov_tl(cpu_reserve_val, gpr); \ + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \ tcg_temp_free(t0); \ } @@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) \ if (len > 1) { \ gen_check_align(ctx, t0, (len) - 1); \ } \ + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \ gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \ tcg_temp_free(t0); \ } @@ -3309,6 +3313,7 @@ static void gen_sync(DisasContext *ctx) if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) { gen_check_tlb_flush(ctx, true); } + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); } /* wait */ -- 2.9.3 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania @ 2017-04-06 16:15 ` Richard Henderson 2017-04-07 5:21 ` Nikunj A Dadhania 0 siblings, 1 reply; 30+ messages in thread From: Richard Henderson @ 2017-04-06 16:15 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: > @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \ > tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \ > tcg_gen_mov_tl(cpu_reserve, t0); \ > tcg_gen_mov_tl(cpu_reserve_val, gpr); \ > + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \ Please put the barrier next to the load. I hope we can merge these soon. > @@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) \ > if (len > 1) { \ > gen_check_align(ctx, t0, (len) - 1); \ > } \ > + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \ > gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \ > tcg_temp_free(t0); \ > } This one is more complicated... The success path includes an atomic_cmpxchg, which itself is a SC barrier. However, the fail path branches across the cmpxchg and so sees no barrier, but one is still required by the architecture. How about a gen_conditional_store that looks like this: { TCGLabel *l1 = gen_new_label(); TCGLabel *l2 = gen_new_label(); TCGv t0; tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); t0 = tcg_temp_new(); tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val, cpu_gpr[reg], ctx->mem_idx, DEF_MEMOP(memop) | MO_ALIGN); tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT); tcg_gen_or_tl(t0, t0, cpu_so); tcg_gen_trunc_tl(cpu_crf[0], t0); tcg_temp_free(t0); tcg_gen_br(l2); gen_set_label(l1); /* Address mismatch implies failure. But we still need to provide the memory barrier semantics of the instruction. */ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); tcg_gen_trunc_tl(cpu_crf[0], cpu_so); gen_set_label(l2); tcg_gen_movi_tl(cpu_reserve, -1); } Note that you don't need to reset cpu_reserve_val at all. I just thought of something we might need to check for this and other ll/sc implemetations -- do we need to check for address misalignment along the address comparison failure path? I suspect that we do. r~ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations 2017-04-06 16:15 ` Richard Henderson @ 2017-04-07 5:21 ` Nikunj A Dadhania 2017-04-07 18:19 ` Richard Henderson 0 siblings, 1 reply; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-07 5:21 UTC (permalink / raw) To: Richard Henderson, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata Richard Henderson <rth@twiddle.net> writes: > On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote: >> @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \ >> tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \ >> tcg_gen_mov_tl(cpu_reserve, t0); \ >> tcg_gen_mov_tl(cpu_reserve_val, gpr); \ >> + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \ > > Please put the barrier next to the load. > I hope we can merge these soon. Sure > >> @@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) \ >> if (len > 1) { \ >> gen_check_align(ctx, t0, (len) - 1); \ >> } \ >> + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \ >> gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \ >> tcg_temp_free(t0); \ >> } > > This one is more complicated... > > The success path includes an atomic_cmpxchg, which itself is a SC barrier. > However, the fail path branches across the cmpxchg and so sees no barrier, but > one is still required by the architecture. How about a gen_conditional_store > that looks like this: > > { > TCGLabel *l1 = gen_new_label(); > TCGLabel *l2 = gen_new_label(); > TCGv t0; > > tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1); > > t0 = tcg_temp_new(); > tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val, > cpu_gpr[reg], ctx->mem_idx, > DEF_MEMOP(memop) | MO_ALIGN); > tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); > tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT); > tcg_gen_or_tl(t0, t0, cpu_so); > tcg_gen_trunc_tl(cpu_crf[0], t0); > tcg_temp_free(t0); > tcg_gen_br(l2); > > gen_set_label(l1); > > /* Address mismatch implies failure. But we still need to provide the > memory barrier semantics of the instruction. */ > tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); > tcg_gen_trunc_tl(cpu_crf[0], cpu_so); > > gen_set_label(l2); > tcg_gen_movi_tl(cpu_reserve, -1); > } > > Note that you don't need to reset cpu_reserve_val at all. Sure. > I just thought of something we might need to check for this and other ll/sc > implemetations -- do we need to check for address misalignment along the > address comparison failure path? We do that in the macro: if (len > 1) { \ gen_check_align(ctx, t0, (len) - 1); \ } \ Would we still need a barrier before the alignment check? > I suspect that we do. Regards Nikunj ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations 2017-04-07 5:21 ` Nikunj A Dadhania @ 2017-04-07 18:19 ` Richard Henderson 0 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2017-04-07 18:19 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david Cc: qemu-devel, alex.bennee, programmingkidx, bharata On 04/06/2017 10:21 PM, Nikunj A Dadhania wrote: > We do that in the macro: > > if (len > 1) { \ > gen_check_align(ctx, t0, (len) - 1); \ > } \ > > Would we still need a barrier before the alignment check? Ah, that's where it's been placed. So, the MO_ALIGN flag to tcg_gen_atomic_* takes care of the alignment check too. So we could move this down into the failure path. r~ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania ` (2 preceding siblings ...) 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania @ 2017-04-06 13:26 ` Cédric Le Goater 2017-04-06 13:28 ` G 3 2017-04-07 5:24 ` Nikunj A Dadhania 3 siblings, 2 replies; 30+ messages in thread From: Cédric Le Goater @ 2017-04-06 13:26 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david, rth Cc: programmingkidx, alex.bennee, qemu-devel, bharata Hello Nikunj, On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: > The series enables Multi-Threaded TCG on PPC64 > > Patch 01: Use atomic_cmpxchg in store conditional > 02: Handle first write to page during atomic operation > 03: Generate memory barriers for sync/isync and load/store conditional > > Patches are based on ppc-for-2.10 > > Tested using following: > ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked good : the CPU usage of QEMU reached 760% on the host. > Todo: > * Enable other machine types and PPC32. I am quite ignorant on the topic. Have you looked at what it would take to emulate support of the HW threads ? and the PowerNV machine ? Thanks, C. > * More testing for corner cases. > > Nikunj A Dadhania (3): > target/ppc: Emulate LL/SC using cmpxchg helpers > cputlb: handle first atomic write to the page > target/ppc: Generate fence operations > > cputlb.c | 8 +++++++- > target/ppc/translate.c | 29 ++++++++++++++++++++++++++--- > 2 files changed, 33 insertions(+), 4 deletions(-) > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater @ 2017-04-06 13:28 ` G 3 2017-04-06 13:32 ` Cédric Le Goater 2017-04-07 5:24 ` Nikunj A Dadhania 1 sibling, 1 reply; 30+ messages in thread From: G 3 @ 2017-04-06 13:28 UTC (permalink / raw) To: Cédric Le Goater Cc: Nikunj A Dadhania, qemu-ppc, david, rth, alex.bennee, qemu-devel, bharata On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote: > Hello Nikunj, > > On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >> The series enables Multi-Threaded TCG on PPC64 >> >> Patch 01: Use atomic_cmpxchg in store conditional >> 02: Handle first write to page during atomic operation >> 03: Generate memory barriers for sync/isync and load/store >> conditional >> >> Patches are based on ppc-for-2.10 >> >> Tested using following: >> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic >> -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel >> tcg,thread=multi f23.img > > I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked > good : the CPU usage of QEMU reached 760% on the host. What was your guest operating system? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-06 13:28 ` G 3 @ 2017-04-06 13:32 ` Cédric Le Goater 2017-04-06 13:47 ` G 3 2017-04-06 17:08 ` luigi burdo 0 siblings, 2 replies; 30+ messages in thread From: Cédric Le Goater @ 2017-04-06 13:32 UTC (permalink / raw) To: G 3 Cc: Nikunj A Dadhania, qemu-ppc, david, rth, alex.bennee, qemu-devel, bharata On 04/06/2017 03:28 PM, G 3 wrote: > > On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote: > >> Hello Nikunj, >> >> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>> The series enables Multi-Threaded TCG on PPC64 >>> >>> Patch 01: Use atomic_cmpxchg in store conditional >>> 02: Handle first write to page during atomic operation >>> 03: Generate memory barriers for sync/isync and load/store conditional >>> >>> Patches are based on ppc-for-2.10 >>> >>> Tested using following: >>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >> >> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >> good : the CPU usage of QEMU reached 760% on the host. > > What was your guest operating system? The guest is an Ubuntu 16.04.2 and the host is an Ubuntu 17.04. C. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-06 13:32 ` Cédric Le Goater @ 2017-04-06 13:47 ` G 3 2017-04-06 17:08 ` luigi burdo 1 sibling, 0 replies; 30+ messages in thread From: G 3 @ 2017-04-06 13:47 UTC (permalink / raw) To: Cédric Le Goater Cc: Nikunj A Dadhania, qemu-ppc, david, rth, alex.bennee, qemu-devel, bharata On Apr 6, 2017, at 9:32 AM, Cédric Le Goater wrote: > On 04/06/2017 03:28 PM, G 3 wrote: >> >> On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote: >> >>> Hello Nikunj, >>> >>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>> The series enables Multi-Threaded TCG on PPC64 >>>> >>>> Patch 01: Use atomic_cmpxchg in store conditional >>>> 02: Handle first write to page during atomic operation >>>> 03: Generate memory barriers for sync/isync and load/store >>>> conditional >>>> >>>> Patches are based on ppc-for-2.10 >>>> >>>> Tested using following: >>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none - >>>> nographic -machine pseries,usb=off -m 2G -smp >>>> 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>> >>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It >>> looked >>> good : the CPU usage of QEMU reached 760% on the host. >> >> What was your guest operating system? > > The guest is an Ubuntu 16.04.2 and the host is an Ubuntu 17.04. > > C. Thank you for the information. What you could do is run QEMU in emulation mode (non-kvm mode) and time how long it takes Ubuntu to boot up with one emulated core vs how long it takes to boot up on say 4 emulated cores. This would be a good start: Boot up times: one core: two cores: four cores: eight cores: ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-06 13:32 ` Cédric Le Goater 2017-04-06 13:47 ` G 3 @ 2017-04-06 17:08 ` luigi burdo 2017-04-06 18:06 ` G 3 1 sibling, 1 reply; 30+ messages in thread From: luigi burdo @ 2017-04-06 17:08 UTC (permalink / raw) To: Cédric Le Goater, G 3 Cc: qemu-devel, qemu-ppc, bharata, david, alex.bennee, rth Hi i can help test it too on my two Be machine. If some one help me to find where is the patch or where i can download the commits Thanks Luigi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-06 17:08 ` luigi burdo @ 2017-04-06 18:06 ` G 3 0 siblings, 0 replies; 30+ messages in thread From: G 3 @ 2017-04-06 18:06 UTC (permalink / raw) To: luigi burdo Cc: Cédric Le Goater, qemu-devel, qemu-ppc, bharata, david, alex.bennee, rth On Apr 6, 2017, at 1:08 PM, luigi burdo wrote: > > Hi i can help test it too on my two Be machine. > If some one help me to find where is the patch or where i can > download the commits > > Thanks > Luigi Here are the patches: 1/3 https://patchwork.ozlabs.org/patch/747691/ 2/3 https://patchwork.ozlabs.org/patch/747692/ 3/3 https://patchwork.ozlabs.org/patch/747688/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater 2017-04-06 13:28 ` G 3 @ 2017-04-07 5:24 ` Nikunj A Dadhania 2017-04-07 6:07 ` Cédric Le Goater 1 sibling, 1 reply; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-07 5:24 UTC (permalink / raw) To: Cédric Le Goater, qemu-ppc, david, rth Cc: programmingkidx, alex.bennee, qemu-devel, bharata Cédric Le Goater <clg@kaod.org> writes: > Hello Nikunj, > > On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >> The series enables Multi-Threaded TCG on PPC64 >> >> Patch 01: Use atomic_cmpxchg in store conditional >> 02: Handle first write to page during atomic operation >> 03: Generate memory barriers for sync/isync and load/store conditional >> >> Patches are based on ppc-for-2.10 >> >> Tested using following: >> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img > > I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked > good : the CPU usage of QEMU reached 760% on the host. Cool. >> Todo: >> * Enable other machine types and PPC32. > > I am quite ignorant on the topic. > Have you looked at what it would take to emulate support of the HW > threads ? We would need to implement msgsndp (doorbell support for IPI between threads of same core) > and the PowerNV machine ? Haven't tried it, should work. Just give a shot, let me know if you see problems. Regards Nikunj ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-07 5:24 ` Nikunj A Dadhania @ 2017-04-07 6:07 ` Cédric Le Goater 2017-04-10 16:41 ` Cédric Le Goater 0 siblings, 1 reply; 30+ messages in thread From: Cédric Le Goater @ 2017-04-07 6:07 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david, rth Cc: programmingkidx, alex.bennee, qemu-devel, bharata On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: > Cédric Le Goater <clg@kaod.org> writes: > >> Hello Nikunj, >> >> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>> The series enables Multi-Threaded TCG on PPC64 >>> >>> Patch 01: Use atomic_cmpxchg in store conditional >>> 02: Handle first write to page during atomic operation >>> 03: Generate memory barriers for sync/isync and load/store conditional >>> >>> Patches are based on ppc-for-2.10 >>> >>> Tested using following: >>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >> >> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >> good : the CPU usage of QEMU reached 760% on the host. > > Cool. > >>> Todo: >>> * Enable other machine types and PPC32. >> >> I am quite ignorant on the topic. >> Have you looked at what it would take to emulate support of the HW >> threads ? > > We would need to implement msgsndp (doorbell support for IPI between > threads of same core) ok. I get it. Thanks, >> and the PowerNV machine ? > > Haven't tried it, should work. Just give a shot, let me know if you see problems. sure. pnv is still on 2.9, so I will rebase on 2.10, merge your patches and tell you. Thanks, C. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-07 6:07 ` Cédric Le Goater @ 2017-04-10 16:41 ` Cédric Le Goater 2017-04-10 16:44 ` Nikunj A Dadhania 2017-04-10 17:20 ` Alex Bennée 0 siblings, 2 replies; 30+ messages in thread From: Cédric Le Goater @ 2017-04-10 16:41 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david, rth Cc: programmingkidx, alex.bennee, qemu-devel, bharata On 04/07/2017 08:07 AM, Cédric Le Goater wrote: > On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >> Cédric Le Goater <clg@kaod.org> writes: >> >>> Hello Nikunj, >>> >>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>> The series enables Multi-Threaded TCG on PPC64 >>>> >>>> Patch 01: Use atomic_cmpxchg in store conditional >>>> 02: Handle first write to page during atomic operation >>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>> >>>> Patches are based on ppc-for-2.10 >>>> >>>> Tested using following: >>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>> >>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>> good : the CPU usage of QEMU reached 760% on the host. >> >> Cool. >> >>>> Todo: >>>> * Enable other machine types and PPC32. >>> >>> I am quite ignorant on the topic. >>> Have you looked at what it would take to emulate support of the HW >>> threads ? >> >> We would need to implement msgsndp (doorbell support for IPI between >> threads of same core) > > ok. I get it. Thanks, > >>> and the PowerNV machine ? >> >> Haven't tried it, should work. Just give a shot, let me know if you see problems. > > sure. pnv is still on 2.9, so I will rebase on 2.10, merge your > patches and tell you. The system seems to be spinning in skiboot in cpu_idle/relax when starting the linux kernel. It finally boots, but it is rather long. David has merged enough to test if you want to give it a try. Cheers, C. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-10 16:41 ` Cédric Le Goater @ 2017-04-10 16:44 ` Nikunj A Dadhania 2017-04-10 16:59 ` Cédric Le Goater 2017-04-10 17:20 ` Alex Bennée 1 sibling, 1 reply; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-10 16:44 UTC (permalink / raw) To: Cédric Le Goater, qemu-ppc, david, rth Cc: programmingkidx, alex.bennee, qemu-devel, bharata Cédric Le Goater <clg@kaod.org> writes: > On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>> Cédric Le Goater <clg@kaod.org> writes: >>> >>>> Hello Nikunj, >>>> >>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>> The series enables Multi-Threaded TCG on PPC64 >>>>> >>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>> 02: Handle first write to page during atomic operation >>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>> >>>>> Patches are based on ppc-for-2.10 >>>>> >>>>> Tested using following: >>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>> >>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>> good : the CPU usage of QEMU reached 760% on the host. >>> >>> Cool. >>> >>>>> Todo: >>>>> * Enable other machine types and PPC32. >>>> >>>> I am quite ignorant on the topic. >>>> Have you looked at what it would take to emulate support of the HW >>>> threads ? >>> >>> We would need to implement msgsndp (doorbell support for IPI between >>> threads of same core) >> >> ok. I get it. Thanks, >> >>>> and the PowerNV machine ? >>> >>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >> >> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >> patches and tell you. > > The system seems to be spinning in skiboot in cpu_idle/relax when > starting the linux kernel. It finally boots, but it is rather long. > David has merged enough to test if you want to give it a try. I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing them. I too saw delay during boot, but wasn't aware that its caused by mttcg. I will have a look. Regards Nikunj ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-10 16:44 ` Nikunj A Dadhania @ 2017-04-10 16:59 ` Cédric Le Goater 2017-04-10 17:07 ` Nikunj A Dadhania 0 siblings, 1 reply; 30+ messages in thread From: Cédric Le Goater @ 2017-04-10 16:59 UTC (permalink / raw) To: Nikunj A Dadhania, qemu-ppc, david, rth Cc: programmingkidx, alex.bennee, qemu-devel, bharata On 04/10/2017 06:44 PM, Nikunj A Dadhania wrote: > Cédric Le Goater <clg@kaod.org> writes: > >> On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>>> Cédric Le Goater <clg@kaod.org> writes: >>>> >>>>> Hello Nikunj, >>>>> >>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>>> The series enables Multi-Threaded TCG on PPC64 >>>>>> >>>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>>> 02: Handle first write to page during atomic operation >>>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>>> >>>>>> Patches are based on ppc-for-2.10 >>>>>> >>>>>> Tested using following: >>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>>> >>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>>> good : the CPU usage of QEMU reached 760% on the host. >>>> >>>> Cool. >>>> >>>>>> Todo: >>>>>> * Enable other machine types and PPC32. >>>>> >>>>> I am quite ignorant on the topic. >>>>> Have you looked at what it would take to emulate support of the HW >>>>> threads ? >>>> >>>> We would need to implement msgsndp (doorbell support for IPI between >>>> threads of same core) >>> >>> ok. I get it. Thanks, >>> >>>>> and the PowerNV machine ? >>>> >>>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >>> >>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >>> patches and tell you. >> >> The system seems to be spinning in skiboot in cpu_idle/relax when >> starting the linux kernel. It finally boots, but it is rather long. >> David has merged enough to test if you want to give it a try. > > I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing > them. I too saw delay during boot, but wasn't aware that its caused by > mttcg. I will have a look. You can use David's branch directly now, there is enough support. I am not sure where that is exactly, the kernel is somewhere in early_setup(). It might be the secondary spinloop. thanks, C. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-10 16:59 ` Cédric Le Goater @ 2017-04-10 17:07 ` Nikunj A Dadhania 0 siblings, 0 replies; 30+ messages in thread From: Nikunj A Dadhania @ 2017-04-10 17:07 UTC (permalink / raw) To: Cédric Le Goater, qemu-ppc, david, rth Cc: programmingkidx, alex.bennee, qemu-devel, bharata Cédric Le Goater <clg@kaod.org> writes: > On 04/10/2017 06:44 PM, Nikunj A Dadhania wrote: >> Cédric Le Goater <clg@kaod.org> writes: >> >>> On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >>>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>>>> Cédric Le Goater <clg@kaod.org> writes: >>>> >>>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >>>> patches and tell you. >>> >>> The system seems to be spinning in skiboot in cpu_idle/relax when >>> starting the linux kernel. It finally boots, but it is rather long. >>> David has merged enough to test if you want to give it a try. >> >> I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing >> them. I too saw delay during boot, but wasn't aware that its caused by >> mttcg. I will have a look. > > You can use David's branch directly now, there is enough support. Sure > I am not sure where that is exactly, the kernel is somewhere in > early_setup(). It might be the secondary spinloop. Lot of prints missing, i think I need to add a console. [ 2.303286014,5] INIT: Starting kernel at 0x20010000, fdt at 0x30354908 14865 bytes) [ 43.421998779,5] OPAL: Switch to little-endian OS -> smp_release_cpus() spinning_secondaries = 3 <- smp_release_cpus() [ 0.260526] nvram: Failed to find or create lnx,oops-log partition, err -28 [ 0.264448] nvram: Failed to initialize oops partition! Regards, Nikunj ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-10 16:41 ` Cédric Le Goater 2017-04-10 16:44 ` Nikunj A Dadhania @ 2017-04-10 17:20 ` Alex Bennée 2017-04-11 12:28 ` Cédric Le Goater 1 sibling, 1 reply; 30+ messages in thread From: Alex Bennée @ 2017-04-10 17:20 UTC (permalink / raw) To: Cédric Le Goater Cc: Nikunj A Dadhania, qemu-ppc, david, rth, programmingkidx, qemu-devel, bharata Cédric Le Goater <clg@kaod.org> writes: > On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>> Cédric Le Goater <clg@kaod.org> writes: >>> >>>> Hello Nikunj, >>>> >>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>> The series enables Multi-Threaded TCG on PPC64 >>>>> >>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>> 02: Handle first write to page during atomic operation >>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>> >>>>> Patches are based on ppc-for-2.10 >>>>> >>>>> Tested using following: >>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>> >>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>> good : the CPU usage of QEMU reached 760% on the host. >>> >>> Cool. >>> >>>>> Todo: >>>>> * Enable other machine types and PPC32. >>>> >>>> I am quite ignorant on the topic. >>>> Have you looked at what it would take to emulate support of the HW >>>> threads ? >>> >>> We would need to implement msgsndp (doorbell support for IPI between >>> threads of same core) >> >> ok. I get it. Thanks, >> >>>> and the PowerNV machine ? >>> >>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >> >> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >> patches and tell you. > > The system seems to be spinning in skiboot in cpu_idle/relax when > starting the linux kernel. It finally boots, but it is rather long. > David has merged enough to test if you want to give it a try. Does PPC have Wait-for-irq or similar "sleeping" instructions? We had to ensure we were not jumping out of the cpu loop and suspend normally. See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. -- Alex Bennée ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-10 17:20 ` Alex Bennée @ 2017-04-11 12:28 ` Cédric Le Goater 2017-04-11 13:26 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 30+ messages in thread From: Cédric Le Goater @ 2017-04-11 12:28 UTC (permalink / raw) To: Alex Bennée Cc: Nikunj A Dadhania, qemu-ppc, david, rth, programmingkidx, qemu-devel, bharata, Benjamin Herrenschmidt On 04/10/2017 07:20 PM, Alex Bennée wrote: > > Cédric Le Goater <clg@kaod.org> writes: > >> On 04/07/2017 08:07 AM, Cédric Le Goater wrote: >>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote: >>>> Cédric Le Goater <clg@kaod.org> writes: >>>> >>>>> Hello Nikunj, >>>>> >>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote: >>>>>> The series enables Multi-Threaded TCG on PPC64 >>>>>> >>>>>> Patch 01: Use atomic_cmpxchg in store conditional >>>>>> 02: Handle first write to page during atomic operation >>>>>> 03: Generate memory barriers for sync/isync and load/store conditional >>>>>> >>>>>> Patches are based on ppc-for-2.10 >>>>>> >>>>>> Tested using following: >>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img >>>>> >>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked >>>>> good : the CPU usage of QEMU reached 760% on the host. >>>> >>>> Cool. >>>> >>>>>> Todo: >>>>>> * Enable other machine types and PPC32. >>>>> >>>>> I am quite ignorant on the topic. >>>>> Have you looked at what it would take to emulate support of the HW >>>>> threads ? >>>> >>>> We would need to implement msgsndp (doorbell support for IPI between >>>> threads of same core) >>> >>> ok. I get it. Thanks, >>> >>>>> and the PowerNV machine ? >>>> >>>> Haven't tried it, should work. Just give a shot, let me know if you see problems. >>> >>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your >>> patches and tell you. >> >> The system seems to be spinning in skiboot in cpu_idle/relax when >> starting the linux kernel. It finally boots, but it is rather long. >> David has merged enough to test if you want to give it a try. > > Does PPC have Wait-for-irq or similar "sleeping" instructions? > > We had to ensure we were not jumping out of the cpu loop and suspend > normally. I really don't know. Ben, now that we have mttcg activated by default on ppc, it takes a while for the linux kernel to do the early setup. I think we are in the code section where we spin loop the secondaries. Most of the time is spent in skiboot under cpu_idle/relax. Any idea where that could come from ? > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. Thanks for the hint. C. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-11 12:28 ` Cédric Le Goater @ 2017-04-11 13:26 ` Benjamin Herrenschmidt 2017-04-11 14:04 ` Alex Bennée 0 siblings, 1 reply; 30+ messages in thread From: Benjamin Herrenschmidt @ 2017-04-11 13:26 UTC (permalink / raw) To: Cédric Le Goater, Alex Bennée Cc: Nikunj A Dadhania, qemu-ppc, david, rth, programmingkidx, qemu-devel, bharata On Tue, 2017-04-11 at 14:28 +0200, Cédric Le Goater wrote: > I really don't know. > > Ben, now that we have mttcg activated by default on ppc, it takes > a while for the linux kernel to do the early setup. I think we are > in the code section where we spin loop the secondaries. Most of the > time is spent in skiboot under cpu_idle/relax. > > Any idea where that could come from ? > > > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. > > Thanks for the hint. They are spinning, but they have smt_low instructions in the loop, maybe that causes us to do some kind of synchronization as we exit the emulation loop on these ? I added that to force relinguish time to other threads on the pre-MT TCG... There isn't really such a "pause" instruction. At least not yet.... P9 has a wait that is meant to wait for special AS_Notify cycles but will also wait for interrupts. We don't have an mwait at this point. Ben. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 2017-04-11 13:26 ` Benjamin Herrenschmidt @ 2017-04-11 14:04 ` Alex Bennée 0 siblings, 0 replies; 30+ messages in thread From: Alex Bennée @ 2017-04-11 14:04 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Cédric Le Goater, Nikunj A Dadhania, qemu-ppc, david, rth, programmingkidx, qemu-devel, bharata Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Tue, 2017-04-11 at 14:28 +0200, Cédric Le Goater wrote: >> I really don't know. >> >> Ben, now that we have mttcg activated by default on ppc, it takes >> a while for the linux kernel to do the early setup. I think we are >> in the code section where we spin loop the secondaries. Most of the >> time is spent in skiboot under cpu_idle/relax. >> >> Any idea where that could come from ? >> >> > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example. >> >> Thanks for the hint. > > They are spinning, but they have smt_low instructions in the loop, > maybe that causes us to do some kind of synchronization as we exit > the emulation loop on these ? I added that to force relinguish time > to other threads on the pre-MT TCG... Yeah you need a tweak the approach when running with MTTCG as otherwise you end up causing exits for one vCPUs loop to yield to vCPUs that are already running in other threads. > There isn't really such a "pause" instruction. At least not yet.... P9 > has a wait that is meant to wait for special AS_Notify cycles but will > also wait for interrupts. We don't have an mwait at this point. They are worth implementing. FWIW on ARM we only really handle WFI (Wait-for-interrupt) which will cause the EXCP_HALT and that will put the vCPU into a halted state which can be woken up next interrupt. For the other cases YIELD and WFE (wait-for-event) we just treat them as NOPs when MTTCG is enabled (test parallel_cpus). So they will busy-wait spin around the guests wfe code but don't trigger expensive longjmps out of the execution loop. This was all done in: c22edfebff target-arm: don't generate WFE/YIELD calls for MTTCG One other thing I noticed while looking through the PPC stuff is I couldn't see any handling of cpu_reset/powering on. There is a potential race here which ThreadSanitizer will complain about if you start loading up your about-to-be-powered-on-vCPU from another thread. The fix here is to defer the setup with async work. See: 062ba099e0 target-arm/powerctl: defer cpu reset work to CPU context -- Alex Bennée ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2017-04-11 14:04 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania 2017-04-06 15:51 ` Richard Henderson 2017-04-07 5:12 ` Nikunj A Dadhania 2017-04-06 15:53 ` Richard Henderson 2017-04-07 5:14 ` Nikunj A Dadhania 2017-04-07 5:23 ` David Gibson 2017-04-07 5:42 ` Nikunj A Dadhania 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania 2017-04-06 15:54 ` Richard Henderson 2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania 2017-04-06 16:15 ` Richard Henderson 2017-04-07 5:21 ` Nikunj A Dadhania 2017-04-07 18:19 ` Richard Henderson 2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater 2017-04-06 13:28 ` G 3 2017-04-06 13:32 ` Cédric Le Goater 2017-04-06 13:47 ` G 3 2017-04-06 17:08 ` luigi burdo 2017-04-06 18:06 ` G 3 2017-04-07 5:24 ` Nikunj A Dadhania 2017-04-07 6:07 ` Cédric Le Goater 2017-04-10 16:41 ` Cédric Le Goater 2017-04-10 16:44 ` Nikunj A Dadhania 2017-04-10 16:59 ` Cédric Le Goater 2017-04-10 17:07 ` Nikunj A Dadhania 2017-04-10 17:20 ` Alex Bennée 2017-04-11 12:28 ` Cédric Le Goater 2017-04-11 13:26 ` Benjamin Herrenschmidt 2017-04-11 14:04 ` Alex Bennée
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.