* [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
@ 2017-04-06 10:22 Nikunj A Dadhania
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania
` (3 more replies)
0 siblings, 4 replies; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj
The series enables Multi-Threaded TCG on PPC64
Patch 01: Use atomic_cmpxchg in store conditional
02: Handle first write to page during atomic operation
03: Generate memory barriers for sync/isync and load/store conditional
Patches are based on ppc-for-2.10
Tested using following:
./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
Todo:
* Enable other machine types and PPC32.
* More testing for corner cases.
Nikunj A Dadhania (3):
target/ppc: Emulate LL/SC using cmpxchg helpers
cputlb: handle first atomic write to the page
target/ppc: Generate fence operations
cputlb.c | 8 +++++++-
target/ppc/translate.c | 29 ++++++++++++++++++++++++++---
2 files changed, 33 insertions(+), 4 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania
@ 2017-04-06 10:22 ` Nikunj A Dadhania
2017-04-06 15:51 ` Richard Henderson
` (2 more replies)
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania
` (2 subsequent siblings)
3 siblings, 3 replies; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj
Emulating LL/SC with cmpxchg is not correct, since it can suffer from
the ABA problem. However, portable parallel code is written assuming
only cmpxchg which means that in practice this is a viable alternative.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
target/ppc/translate.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b6abc60..a9c733d 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -73,6 +73,7 @@ static TCGv cpu_cfar;
#endif
static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
static TCGv cpu_reserve;
+static TCGv cpu_reserve_val;
static TCGv cpu_fpscr;
static TCGv_i32 cpu_access_type;
@@ -181,6 +182,9 @@ void ppc_translate_init(void)
cpu_reserve = tcg_global_mem_new(cpu_env,
offsetof(CPUPPCState, reserve_addr),
"reserve_addr");
+ cpu_reserve_val = tcg_global_mem_new(cpu_env,
+ offsetof(CPUPPCState, reserve_val),
+ "reserve_val");
cpu_fpscr = tcg_global_mem_new(cpu_env,
offsetof(CPUPPCState, fpscr), "fpscr");
@@ -3023,7 +3027,7 @@ static void gen_##name(DisasContext *ctx) \
} \
tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \
tcg_gen_mov_tl(cpu_reserve, t0); \
- tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \
+ tcg_gen_mov_tl(cpu_reserve_val, gpr); \
tcg_temp_free(t0); \
}
@@ -3156,14 +3160,28 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
int reg, int memop)
{
TCGLabel *l1;
+ TCGv_i32 tmp = tcg_temp_local_new_i32();
+ TCGv t0;
+ tcg_gen_movi_i32(tmp, 0);
tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
l1 = gen_new_label();
tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
- tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
- tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
+
+ t0 = tcg_temp_new();
+ tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
+ ctx->mem_idx, DEF_MEMOP(memop));
+ tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
+ tcg_gen_trunc_tl_i32(tmp, t0);
+
gen_set_label(l1);
+ tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT);
+ tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp);
tcg_gen_movi_tl(cpu_reserve, -1);
+ tcg_gen_movi_tl(cpu_reserve_val, 0);
+
+ tcg_temp_free(t0);
+ tcg_temp_free_i32(tmp);
}
#endif
--
2.9.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page
2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania
@ 2017-04-06 10:22 ` Nikunj A Dadhania
2017-04-06 15:54 ` Richard Henderson
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania
2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater
3 siblings, 1 reply; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj
In case where the conditional write is the first write to the page,
TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as
a special case and set the dirty bit. After that fall through to the
actual atomic instruction below.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
cputlb.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/cputlb.c b/cputlb.c
index f5d056c..743776a 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -930,7 +930,13 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
tlb_addr = tlbe->addr_write;
}
- /* Notice an IO access, or a notdirty page. */
+ /* Check notdirty */
+ if (unlikely(tlb_addr & TLB_NOTDIRTY)) {
+ tlb_set_dirty(ENV_GET_CPU(env), addr);
+ tlb_addr = tlb_addr & ~TLB_NOTDIRTY;
+ }
+
+ /* Notice an IO access */
if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
/* There's really nothing that can be done to
support this apart from stop-the-world. */
--
2.9.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations
2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania
@ 2017-04-06 10:22 ` Nikunj A Dadhania
2017-04-06 16:15 ` Richard Henderson
2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater
3 siblings, 1 reply; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-06 10:22 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, alex.bennee, programmingkidx, bharata, nikunj
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
target/ppc/translate.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index a9c733d..87b4fe4 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2971,6 +2971,7 @@ static void gen_stswx(DisasContext *ctx)
/* eieio */
static void gen_eieio(DisasContext *ctx)
{
+ tcg_gen_mb(TCG_MO_LD_ST | TCG_BAR_SC);
}
#if !defined(CONFIG_USER_ONLY)
@@ -3008,6 +3009,7 @@ static void gen_isync(DisasContext *ctx)
if (!ctx->pr) {
gen_check_tlb_flush(ctx, false);
}
+ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
gen_stop_exception(ctx);
}
@@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \
tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \
tcg_gen_mov_tl(cpu_reserve, t0); \
tcg_gen_mov_tl(cpu_reserve_val, gpr); \
+ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \
tcg_temp_free(t0); \
}
@@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) \
if (len > 1) { \
gen_check_align(ctx, t0, (len) - 1); \
} \
+ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \
gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \
tcg_temp_free(t0); \
}
@@ -3309,6 +3313,7 @@ static void gen_sync(DisasContext *ctx)
if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) {
gen_check_tlb_flush(ctx, true);
}
+ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
}
/* wait */
--
2.9.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania
` (2 preceding siblings ...)
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania
@ 2017-04-06 13:26 ` Cédric Le Goater
2017-04-06 13:28 ` G 3
2017-04-07 5:24 ` Nikunj A Dadhania
3 siblings, 2 replies; 30+ messages in thread
From: Cédric Le Goater @ 2017-04-06 13:26 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david, rth
Cc: programmingkidx, alex.bennee, qemu-devel, bharata
Hello Nikunj,
On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
> The series enables Multi-Threaded TCG on PPC64
>
> Patch 01: Use atomic_cmpxchg in store conditional
> 02: Handle first write to page during atomic operation
> 03: Generate memory barriers for sync/isync and load/store conditional
>
> Patches are based on ppc-for-2.10
>
> Tested using following:
> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
good : the CPU usage of QEMU reached 760% on the host.
> Todo:
> * Enable other machine types and PPC32.
I am quite ignorant on the topic. Have you looked at what it would
take to emulate support of the HW threads ? and the PowerNV machine ?
Thanks,
C.
> * More testing for corner cases.
>
> Nikunj A Dadhania (3):
> target/ppc: Emulate LL/SC using cmpxchg helpers
> cputlb: handle first atomic write to the page
> target/ppc: Generate fence operations
>
> cputlb.c | 8 +++++++-
> target/ppc/translate.c | 29 ++++++++++++++++++++++++++---
> 2 files changed, 33 insertions(+), 4 deletions(-)
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater
@ 2017-04-06 13:28 ` G 3
2017-04-06 13:32 ` Cédric Le Goater
2017-04-07 5:24 ` Nikunj A Dadhania
1 sibling, 1 reply; 30+ messages in thread
From: G 3 @ 2017-04-06 13:28 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Nikunj A Dadhania, qemu-ppc, david, rth, alex.bennee, qemu-devel,
bharata
On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote:
> Hello Nikunj,
>
> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>> The series enables Multi-Threaded TCG on PPC64
>>
>> Patch 01: Use atomic_cmpxchg in store conditional
>> 02: Handle first write to page during atomic operation
>> 03: Generate memory barriers for sync/isync and load/store
>> conditional
>>
>> Patches are based on ppc-for-2.10
>>
>> Tested using following:
>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic
>> -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel
>> tcg,thread=multi f23.img
>
> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
> good : the CPU usage of QEMU reached 760% on the host.
What was your guest operating system?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-06 13:28 ` G 3
@ 2017-04-06 13:32 ` Cédric Le Goater
2017-04-06 13:47 ` G 3
2017-04-06 17:08 ` luigi burdo
0 siblings, 2 replies; 30+ messages in thread
From: Cédric Le Goater @ 2017-04-06 13:32 UTC (permalink / raw)
To: G 3
Cc: Nikunj A Dadhania, qemu-ppc, david, rth, alex.bennee, qemu-devel,
bharata
On 04/06/2017 03:28 PM, G 3 wrote:
>
> On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote:
>
>> Hello Nikunj,
>>
>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>> The series enables Multi-Threaded TCG on PPC64
>>>
>>> Patch 01: Use atomic_cmpxchg in store conditional
>>> 02: Handle first write to page during atomic operation
>>> 03: Generate memory barriers for sync/isync and load/store conditional
>>>
>>> Patches are based on ppc-for-2.10
>>>
>>> Tested using following:
>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>
>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
>> good : the CPU usage of QEMU reached 760% on the host.
>
> What was your guest operating system?
The guest is an Ubuntu 16.04.2 and the host is an Ubuntu 17.04.
C.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-06 13:32 ` Cédric Le Goater
@ 2017-04-06 13:47 ` G 3
2017-04-06 17:08 ` luigi burdo
1 sibling, 0 replies; 30+ messages in thread
From: G 3 @ 2017-04-06 13:47 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Nikunj A Dadhania, qemu-ppc, david, rth, alex.bennee, qemu-devel,
bharata
On Apr 6, 2017, at 9:32 AM, Cédric Le Goater wrote:
> On 04/06/2017 03:28 PM, G 3 wrote:
>>
>> On Apr 6, 2017, at 9:26 AM, Cédric Le Goater wrote:
>>
>>> Hello Nikunj,
>>>
>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>>> The series enables Multi-Threaded TCG on PPC64
>>>>
>>>> Patch 01: Use atomic_cmpxchg in store conditional
>>>> 02: Handle first write to page during atomic operation
>>>> 03: Generate memory barriers for sync/isync and load/store
>>>> conditional
>>>>
>>>> Patches are based on ppc-for-2.10
>>>>
>>>> Tested using following:
>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -
>>>> nographic -machine pseries,usb=off -m 2G -smp
>>>> 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>>
>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It
>>> looked
>>> good : the CPU usage of QEMU reached 760% on the host.
>>
>> What was your guest operating system?
>
> The guest is an Ubuntu 16.04.2 and the host is an Ubuntu 17.04.
>
> C.
Thank you for the information.
What you could do is run QEMU in emulation mode (non-kvm mode) and
time how long it takes Ubuntu to boot up with one emulated core vs
how long it takes to boot up on say 4 emulated cores. This would be a
good start:
Boot up times:
one core:
two cores:
four cores:
eight cores:
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania
@ 2017-04-06 15:51 ` Richard Henderson
2017-04-07 5:12 ` Nikunj A Dadhania
2017-04-06 15:53 ` Richard Henderson
2017-04-07 5:23 ` David Gibson
2 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-04-06 15:51 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote:
> + TCGv_i32 tmp = tcg_temp_local_new_i32();
> + TCGv t0;
>
> + tcg_gen_movi_i32(tmp, 0);
> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
> l1 = gen_new_label();
> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
> - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
> - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
> +
> + t0 = tcg_temp_new();
> + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
> + ctx->mem_idx, DEF_MEMOP(memop));
> + tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
> + tcg_gen_trunc_tl_i32(tmp, t0);
> +
> gen_set_label(l1);
> + tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT);
> + tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp);
I encourage you to move these two lines up beside the setcond.
That way you don't need to use a local tmp, which implies a
spill/restore from the stack.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania
2017-04-06 15:51 ` Richard Henderson
@ 2017-04-06 15:53 ` Richard Henderson
2017-04-07 5:14 ` Nikunj A Dadhania
2017-04-07 5:23 ` David Gibson
2 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-04-06 15:53 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote:
> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
> l1 = gen_new_label();
> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
> - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
> - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
> +
> + t0 = tcg_temp_new();
> + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
> + ctx->mem_idx, DEF_MEMOP(memop));
Actually, I noticed another, existing, problem.
This code changes CRF[0] before the user memory write, which might fault. This
needs to delay any changes to the architecture visible state until after any
exception may be triggered.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania
@ 2017-04-06 15:54 ` Richard Henderson
0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2017-04-06 15:54 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote:
> In case where the conditional write is the first write to the page,
> TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as
> a special case and set the dirty bit. After that fall through to the
> actual atomic instruction below.
>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
> cputlb.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania
@ 2017-04-06 16:15 ` Richard Henderson
2017-04-07 5:21 ` Nikunj A Dadhania
0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-04-06 16:15 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote:
> @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \
> tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \
> tcg_gen_mov_tl(cpu_reserve, t0); \
> tcg_gen_mov_tl(cpu_reserve_val, gpr); \
> + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \
Please put the barrier next to the load.
I hope we can merge these soon.
> @@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) \
> if (len > 1) { \
> gen_check_align(ctx, t0, (len) - 1); \
> } \
> + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \
> gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \
> tcg_temp_free(t0); \
> }
This one is more complicated...
The success path includes an atomic_cmpxchg, which itself is a SC barrier.
However, the fail path branches across the cmpxchg and so sees no barrier, but
one is still required by the architecture. How about a gen_conditional_store
that looks like this:
{
TCGLabel *l1 = gen_new_label();
TCGLabel *l2 = gen_new_label();
TCGv t0;
tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
t0 = tcg_temp_new();
tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val,
cpu_gpr[reg], ctx->mem_idx,
DEF_MEMOP(memop) | MO_ALIGN);
tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT);
tcg_gen_or_tl(t0, t0, cpu_so);
tcg_gen_trunc_tl(cpu_crf[0], t0);
tcg_temp_free(t0);
tcg_gen_br(l2);
gen_set_label(l1);
/* Address mismatch implies failure. But we still need to provide the
memory barrier semantics of the instruction. */
tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
tcg_gen_trunc_tl(cpu_crf[0], cpu_so);
gen_set_label(l2);
tcg_gen_movi_tl(cpu_reserve, -1);
}
Note that you don't need to reset cpu_reserve_val at all.
I just thought of something we might need to check for this and other ll/sc
implemetations -- do we need to check for address misalignment along the
address comparison failure path? I suspect that we do.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-06 13:32 ` Cédric Le Goater
2017-04-06 13:47 ` G 3
@ 2017-04-06 17:08 ` luigi burdo
2017-04-06 18:06 ` G 3
1 sibling, 1 reply; 30+ messages in thread
From: luigi burdo @ 2017-04-06 17:08 UTC (permalink / raw)
To: Cédric Le Goater, G 3
Cc: qemu-devel, qemu-ppc, bharata, david, alex.bennee, rth
Hi i can help test it too on my two Be machine.
If some one help me to find where is the patch or where i can download the commits
Thanks
Luigi
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-06 17:08 ` luigi burdo
@ 2017-04-06 18:06 ` G 3
0 siblings, 0 replies; 30+ messages in thread
From: G 3 @ 2017-04-06 18:06 UTC (permalink / raw)
To: luigi burdo
Cc: Cédric Le Goater, qemu-devel, qemu-ppc, bharata, david,
alex.bennee, rth
On Apr 6, 2017, at 1:08 PM, luigi burdo wrote:
>
> Hi i can help test it too on my two Be machine.
> If some one help me to find where is the patch or where i can
> download the commits
>
> Thanks
> Luigi
Here are the patches:
1/3 https://patchwork.ozlabs.org/patch/747691/
2/3 https://patchwork.ozlabs.org/patch/747692/
3/3 https://patchwork.ozlabs.org/patch/747688/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
2017-04-06 15:51 ` Richard Henderson
@ 2017-04-07 5:12 ` Nikunj A Dadhania
0 siblings, 0 replies; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-07 5:12 UTC (permalink / raw)
To: Richard Henderson, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
Richard Henderson <rth@twiddle.net> writes:
> On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote:
>> + TCGv_i32 tmp = tcg_temp_local_new_i32();
>> + TCGv t0;
>>
>> + tcg_gen_movi_i32(tmp, 0);
>> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
>> l1 = gen_new_label();
>> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
>> - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
>> - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
>> +
>> + t0 = tcg_temp_new();
>> + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
>> + ctx->mem_idx, DEF_MEMOP(memop));
>> + tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
>> + tcg_gen_trunc_tl_i32(tmp, t0);
>> +
>> gen_set_label(l1);
>> + tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT);
>> + tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp);
>
> I encourage you to move these two lines up beside the setcond.
> That way you don't need to use a local tmp, which implies a
> spill/restore from the stack.
Sure.
Regards
Nikunj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
2017-04-06 15:53 ` Richard Henderson
@ 2017-04-07 5:14 ` Nikunj A Dadhania
0 siblings, 0 replies; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-07 5:14 UTC (permalink / raw)
To: Richard Henderson, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
Richard Henderson <rth@twiddle.net> writes:
> On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote:
>> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
>> l1 = gen_new_label();
>> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
>> - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
>> - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
>> +
>> + t0 = tcg_temp_new();
>> + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
>> + ctx->mem_idx, DEF_MEMOP(memop));
>
> Actually, I noticed another, existing, problem.
>
> This code changes CRF[0] before the user memory write, which might fault. This
> needs to delay any changes to the architecture visible state until after any
> exception may be triggered.
Sure, here you are mentioning cpu_so being moved to CRF.
Regards
Nikunj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations
2017-04-06 16:15 ` Richard Henderson
@ 2017-04-07 5:21 ` Nikunj A Dadhania
2017-04-07 18:19 ` Richard Henderson
0 siblings, 1 reply; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-07 5:21 UTC (permalink / raw)
To: Richard Henderson, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
Richard Henderson <rth@twiddle.net> writes:
> On 04/06/2017 03:22 AM, Nikunj A Dadhania wrote:
>> @@ -3028,6 +3030,7 @@ static void gen_##name(DisasContext *ctx) \
>> tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \
>> tcg_gen_mov_tl(cpu_reserve, t0); \
>> tcg_gen_mov_tl(cpu_reserve_val, gpr); \
>> + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \
>
> Please put the barrier next to the load.
> I hope we can merge these soon.
Sure
>
>> @@ -3196,6 +3199,7 @@ static void gen_##name(DisasContext *ctx) \
>> if (len > 1) { \
>> gen_check_align(ctx, t0, (len) - 1); \
>> } \
>> + tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); \
>> gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \
>> tcg_temp_free(t0); \
>> }
>
> This one is more complicated...
>
> The success path includes an atomic_cmpxchg, which itself is a SC barrier.
> However, the fail path branches across the cmpxchg and so sees no barrier, but
> one is still required by the architecture. How about a gen_conditional_store
> that looks like this:
>
> {
> TCGLabel *l1 = gen_new_label();
> TCGLabel *l2 = gen_new_label();
> TCGv t0;
>
> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
>
> t0 = tcg_temp_new();
> tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val,
> cpu_gpr[reg], ctx->mem_idx,
> DEF_MEMOP(memop) | MO_ALIGN);
> tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
> tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT);
> tcg_gen_or_tl(t0, t0, cpu_so);
> tcg_gen_trunc_tl(cpu_crf[0], t0);
> tcg_temp_free(t0);
> tcg_gen_br(l2);
>
> gen_set_label(l1);
>
> /* Address mismatch implies failure. But we still need to provide the
> memory barrier semantics of the instruction. */
> tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
> tcg_gen_trunc_tl(cpu_crf[0], cpu_so);
>
> gen_set_label(l2);
> tcg_gen_movi_tl(cpu_reserve, -1);
> }
>
> Note that you don't need to reset cpu_reserve_val at all.
Sure.
> I just thought of something we might need to check for this and other ll/sc
> implemetations -- do we need to check for address misalignment along the
> address comparison failure path?
We do that in the macro:
if (len > 1) { \
gen_check_align(ctx, t0, (len) - 1); \
} \
Would we still need a barrier before the alignment check?
> I suspect that we do.
Regards
Nikunj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania
2017-04-06 15:51 ` Richard Henderson
2017-04-06 15:53 ` Richard Henderson
@ 2017-04-07 5:23 ` David Gibson
2017-04-07 5:42 ` Nikunj A Dadhania
2 siblings, 1 reply; 30+ messages in thread
From: David Gibson @ 2017-04-07 5:23 UTC (permalink / raw)
To: Nikunj A Dadhania
Cc: qemu-ppc, rth, qemu-devel, alex.bennee, programmingkidx, bharata
[-- Attachment #1: Type: text/plain, Size: 3389 bytes --]
On Thu, Apr 06, 2017 at 03:52:47PM +0530, Nikunj A Dadhania wrote:
> Emulating LL/SC with cmpxchg is not correct, since it can suffer from
> the ABA problem. However, portable parallel code is written assuming
> only cmpxchg which means that in practice this is a viable alternative.
>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
> target/ppc/translate.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b6abc60..a9c733d 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -73,6 +73,7 @@ static TCGv cpu_cfar;
> #endif
> static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
> static TCGv cpu_reserve;
> +static TCGv cpu_reserve_val;
> static TCGv cpu_fpscr;
> static TCGv_i32 cpu_access_type;
>
> @@ -181,6 +182,9 @@ void ppc_translate_init(void)
> cpu_reserve = tcg_global_mem_new(cpu_env,
> offsetof(CPUPPCState, reserve_addr),
> "reserve_addr");
> + cpu_reserve_val = tcg_global_mem_new(cpu_env,
> + offsetof(CPUPPCState, reserve_val),
> + "reserve_val");
I notice that lqarx is not updated. Does that matter?
> cpu_fpscr = tcg_global_mem_new(cpu_env,
> offsetof(CPUPPCState, fpscr), "fpscr");
> @@ -3023,7 +3027,7 @@ static void gen_##name(DisasContext *ctx) \
> } \
> tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \
> tcg_gen_mov_tl(cpu_reserve, t0); \
> - tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \
> + tcg_gen_mov_tl(cpu_reserve_val, gpr); \
> tcg_temp_free(t0); \
> }
>
> @@ -3156,14 +3160,28 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
> int reg, int memop)
> {
> TCGLabel *l1;
> + TCGv_i32 tmp = tcg_temp_local_new_i32();
> + TCGv t0;
>
> + tcg_gen_movi_i32(tmp, 0);
> tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
> l1 = gen_new_label();
> tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
> - tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
> - tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
> +
> + t0 = tcg_temp_new();
> + tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
> + ctx->mem_idx, DEF_MEMOP(memop));
> + tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
> + tcg_gen_trunc_tl_i32(tmp, t0);
> +
> gen_set_label(l1);
> + tcg_gen_shli_i32(tmp, tmp, CRF_EQ_BIT);
> + tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp);
> tcg_gen_movi_tl(cpu_reserve, -1);
> + tcg_gen_movi_tl(cpu_reserve_val, 0);
> +
> + tcg_temp_free(t0);
> + tcg_temp_free_i32(tmp);
> }
> #endif
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater
2017-04-06 13:28 ` G 3
@ 2017-04-07 5:24 ` Nikunj A Dadhania
2017-04-07 6:07 ` Cédric Le Goater
1 sibling, 1 reply; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-07 5:24 UTC (permalink / raw)
To: Cédric Le Goater, qemu-ppc, david, rth
Cc: programmingkidx, alex.bennee, qemu-devel, bharata
Cédric Le Goater <clg@kaod.org> writes:
> Hello Nikunj,
>
> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>> The series enables Multi-Threaded TCG on PPC64
>>
>> Patch 01: Use atomic_cmpxchg in store conditional
>> 02: Handle first write to page during atomic operation
>> 03: Generate memory barriers for sync/isync and load/store conditional
>>
>> Patches are based on ppc-for-2.10
>>
>> Tested using following:
>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>
> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
> good : the CPU usage of QEMU reached 760% on the host.
Cool.
>> Todo:
>> * Enable other machine types and PPC32.
>
> I am quite ignorant on the topic.
> Have you looked at what it would take to emulate support of the HW
> threads ?
We would need to implement msgsndp (doorbell support for IPI between
threads of same core)
> and the PowerNV machine ?
Haven't tried it, should work. Just give a shot, let me know if you see problems.
Regards
Nikunj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers
2017-04-07 5:23 ` David Gibson
@ 2017-04-07 5:42 ` Nikunj A Dadhania
0 siblings, 0 replies; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-07 5:42 UTC (permalink / raw)
To: David Gibson
Cc: qemu-ppc, rth, qemu-devel, alex.bennee, programmingkidx, bharata
David Gibson <david@gibson.dropbear.id.au> writes:
> [ Unknown signature status ]
> On Thu, Apr 06, 2017 at 03:52:47PM +0530, Nikunj A Dadhania wrote:
>> Emulating LL/SC with cmpxchg is not correct, since it can suffer from
>> the ABA problem. However, portable parallel code is written assuming
>> only cmpxchg which means that in practice this is a viable alternative.
>>
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>> target/ppc/translate.c | 24 +++++++++++++++++++++---
>> 1 file changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index b6abc60..a9c733d 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -73,6 +73,7 @@ static TCGv cpu_cfar;
>> #endif
>> static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
>> static TCGv cpu_reserve;
>> +static TCGv cpu_reserve_val;
>> static TCGv cpu_fpscr;
>> static TCGv_i32 cpu_access_type;
>>
>> @@ -181,6 +182,9 @@ void ppc_translate_init(void)
>> cpu_reserve = tcg_global_mem_new(cpu_env,
>> offsetof(CPUPPCState, reserve_addr),
>> "reserve_addr");
>> + cpu_reserve_val = tcg_global_mem_new(cpu_env,
>> + offsetof(CPUPPCState, reserve_val),
>> + "reserve_val");
>
> I notice that lqarx is not updated. Does that matter?
Thats correct, I haven't touched that yet. Most of the locks are
implemented using lwarx/stwcx.
Regards
Nikunj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-07 5:24 ` Nikunj A Dadhania
@ 2017-04-07 6:07 ` Cédric Le Goater
2017-04-10 16:41 ` Cédric Le Goater
0 siblings, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2017-04-07 6:07 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david, rth
Cc: programmingkidx, alex.bennee, qemu-devel, bharata
On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote:
> Cédric Le Goater <clg@kaod.org> writes:
>
>> Hello Nikunj,
>>
>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>> The series enables Multi-Threaded TCG on PPC64
>>>
>>> Patch 01: Use atomic_cmpxchg in store conditional
>>> 02: Handle first write to page during atomic operation
>>> 03: Generate memory barriers for sync/isync and load/store conditional
>>>
>>> Patches are based on ppc-for-2.10
>>>
>>> Tested using following:
>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>
>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
>> good : the CPU usage of QEMU reached 760% on the host.
>
> Cool.
>
>>> Todo:
>>> * Enable other machine types and PPC32.
>>
>> I am quite ignorant on the topic.
>> Have you looked at what it would take to emulate support of the HW
>> threads ?
>
> We would need to implement msgsndp (doorbell support for IPI between
> threads of same core)
ok. I get it. Thanks,
>> and the PowerNV machine ?
>
> Haven't tried it, should work. Just give a shot, let me know if you see problems.
sure. pnv is still on 2.9, so I will rebase on 2.10, merge your
patches and tell you.
Thanks,
C.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations
2017-04-07 5:21 ` Nikunj A Dadhania
@ 2017-04-07 18:19 ` Richard Henderson
0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2017-04-07 18:19 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, alex.bennee, programmingkidx, bharata
On 04/06/2017 10:21 PM, Nikunj A Dadhania wrote:
> We do that in the macro:
>
> if (len > 1) { \
> gen_check_align(ctx, t0, (len) - 1); \
> } \
>
> Would we still need a barrier before the alignment check?
Ah, that's where it's been placed.
So, the MO_ALIGN flag to tcg_gen_atomic_* takes care of the alignment check
too. So we could move this down into the failure path.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-07 6:07 ` Cédric Le Goater
@ 2017-04-10 16:41 ` Cédric Le Goater
2017-04-10 16:44 ` Nikunj A Dadhania
2017-04-10 17:20 ` Alex Bennée
0 siblings, 2 replies; 30+ messages in thread
From: Cédric Le Goater @ 2017-04-10 16:41 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david, rth
Cc: programmingkidx, alex.bennee, qemu-devel, bharata
On 04/07/2017 08:07 AM, Cédric Le Goater wrote:
> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote:
>> Cédric Le Goater <clg@kaod.org> writes:
>>
>>> Hello Nikunj,
>>>
>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>>> The series enables Multi-Threaded TCG on PPC64
>>>>
>>>> Patch 01: Use atomic_cmpxchg in store conditional
>>>> 02: Handle first write to page during atomic operation
>>>> 03: Generate memory barriers for sync/isync and load/store conditional
>>>>
>>>> Patches are based on ppc-for-2.10
>>>>
>>>> Tested using following:
>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>>
>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
>>> good : the CPU usage of QEMU reached 760% on the host.
>>
>> Cool.
>>
>>>> Todo:
>>>> * Enable other machine types and PPC32.
>>>
>>> I am quite ignorant on the topic.
>>> Have you looked at what it would take to emulate support of the HW
>>> threads ?
>>
>> We would need to implement msgsndp (doorbell support for IPI between
>> threads of same core)
>
> ok. I get it. Thanks,
>
>>> and the PowerNV machine ?
>>
>> Haven't tried it, should work. Just give a shot, let me know if you see problems.
>
> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your
> patches and tell you.
The system seems to be spinning in skiboot in cpu_idle/relax when
starting the linux kernel. It finally boots, but it is rather long.
David has merged enough to test if you want to give it a try.
Cheers,
C.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-10 16:41 ` Cédric Le Goater
@ 2017-04-10 16:44 ` Nikunj A Dadhania
2017-04-10 16:59 ` Cédric Le Goater
2017-04-10 17:20 ` Alex Bennée
1 sibling, 1 reply; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-10 16:44 UTC (permalink / raw)
To: Cédric Le Goater, qemu-ppc, david, rth
Cc: programmingkidx, alex.bennee, qemu-devel, bharata
Cédric Le Goater <clg@kaod.org> writes:
> On 04/07/2017 08:07 AM, Cédric Le Goater wrote:
>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote:
>>> Cédric Le Goater <clg@kaod.org> writes:
>>>
>>>> Hello Nikunj,
>>>>
>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>>>> The series enables Multi-Threaded TCG on PPC64
>>>>>
>>>>> Patch 01: Use atomic_cmpxchg in store conditional
>>>>> 02: Handle first write to page during atomic operation
>>>>> 03: Generate memory barriers for sync/isync and load/store conditional
>>>>>
>>>>> Patches are based on ppc-for-2.10
>>>>>
>>>>> Tested using following:
>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>>>
>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
>>>> good : the CPU usage of QEMU reached 760% on the host.
>>>
>>> Cool.
>>>
>>>>> Todo:
>>>>> * Enable other machine types and PPC32.
>>>>
>>>> I am quite ignorant on the topic.
>>>> Have you looked at what it would take to emulate support of the HW
>>>> threads ?
>>>
>>> We would need to implement msgsndp (doorbell support for IPI between
>>> threads of same core)
>>
>> ok. I get it. Thanks,
>>
>>>> and the PowerNV machine ?
>>>
>>> Haven't tried it, should work. Just give a shot, let me know if you see problems.
>>
>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your
>> patches and tell you.
>
> The system seems to be spinning in skiboot in cpu_idle/relax when
> starting the linux kernel. It finally boots, but it is rather long.
> David has merged enough to test if you want to give it a try.
I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing
them. I too saw delay during boot, but wasn't aware that its caused by
mttcg. I will have a look.
Regards
Nikunj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-10 16:44 ` Nikunj A Dadhania
@ 2017-04-10 16:59 ` Cédric Le Goater
2017-04-10 17:07 ` Nikunj A Dadhania
0 siblings, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2017-04-10 16:59 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david, rth
Cc: programmingkidx, alex.bennee, qemu-devel, bharata
On 04/10/2017 06:44 PM, Nikunj A Dadhania wrote:
> Cédric Le Goater <clg@kaod.org> writes:
>
>> On 04/07/2017 08:07 AM, Cédric Le Goater wrote:
>>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote:
>>>> Cédric Le Goater <clg@kaod.org> writes:
>>>>
>>>>> Hello Nikunj,
>>>>>
>>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>>>>> The series enables Multi-Threaded TCG on PPC64
>>>>>>
>>>>>> Patch 01: Use atomic_cmpxchg in store conditional
>>>>>> 02: Handle first write to page during atomic operation
>>>>>> 03: Generate memory barriers for sync/isync and load/store conditional
>>>>>>
>>>>>> Patches are based on ppc-for-2.10
>>>>>>
>>>>>> Tested using following:
>>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>>>>
>>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
>>>>> good : the CPU usage of QEMU reached 760% on the host.
>>>>
>>>> Cool.
>>>>
>>>>>> Todo:
>>>>>> * Enable other machine types and PPC32.
>>>>>
>>>>> I am quite ignorant on the topic.
>>>>> Have you looked at what it would take to emulate support of the HW
>>>>> threads ?
>>>>
>>>> We would need to implement msgsndp (doorbell support for IPI between
>>>> threads of same core)
>>>
>>> ok. I get it. Thanks,
>>>
>>>>> and the PowerNV machine ?
>>>>
>>>> Haven't tried it, should work. Just give a shot, let me know if you see problems.
>>>
>>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your
>>> patches and tell you.
>>
>> The system seems to be spinning in skiboot in cpu_idle/relax when
>> starting the linux kernel. It finally boots, but it is rather long.
>> David has merged enough to test if you want to give it a try.
>
> I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing
> them. I too saw delay during boot, but wasn't aware that its caused by
> mttcg. I will have a look.
You can use David's branch directly now, there is enough support.
I am not sure where that is exactly, the kernel is somewhere in
early_setup(). It might be the secondary spinloop.
thanks,
C.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-10 16:59 ` Cédric Le Goater
@ 2017-04-10 17:07 ` Nikunj A Dadhania
0 siblings, 0 replies; 30+ messages in thread
From: Nikunj A Dadhania @ 2017-04-10 17:07 UTC (permalink / raw)
To: Cédric Le Goater, qemu-ppc, david, rth
Cc: programmingkidx, alex.bennee, qemu-devel, bharata
Cédric Le Goater <clg@kaod.org> writes:
> On 04/10/2017 06:44 PM, Nikunj A Dadhania wrote:
>> Cédric Le Goater <clg@kaod.org> writes:
>>
>>> On 04/07/2017 08:07 AM, Cédric Le Goater wrote:
>>>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote:
>>>>> Cédric Le Goater <clg@kaod.org> writes:
>>>>
>>>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your
>>>> patches and tell you.
>>>
>>> The system seems to be spinning in skiboot in cpu_idle/relax when
>>> starting the linux kernel. It finally boots, but it is rather long.
>>> David has merged enough to test if you want to give it a try.
>>
>> I have got your powernv-ipmi-2.9 + ppc64 mttcg patches, and testing
>> them. I too saw delay during boot, but wasn't aware that its caused by
>> mttcg. I will have a look.
>
> You can use David's branch directly now, there is enough support.
Sure
> I am not sure where that is exactly, the kernel is somewhere in
> early_setup(). It might be the secondary spinloop.
Lot of prints missing, i think I need to add a console.
[ 2.303286014,5] INIT: Starting kernel at 0x20010000, fdt at 0x30354908 14865 bytes)
[ 43.421998779,5] OPAL: Switch to little-endian OS
-> smp_release_cpus()
spinning_secondaries = 3
<- smp_release_cpus()
[ 0.260526] nvram: Failed to find or create lnx,oops-log partition, err -28
[ 0.264448] nvram: Failed to initialize oops partition!
Regards,
Nikunj
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-10 16:41 ` Cédric Le Goater
2017-04-10 16:44 ` Nikunj A Dadhania
@ 2017-04-10 17:20 ` Alex Bennée
2017-04-11 12:28 ` Cédric Le Goater
1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2017-04-10 17:20 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Nikunj A Dadhania, qemu-ppc, david, rth, programmingkidx,
qemu-devel, bharata
Cédric Le Goater <clg@kaod.org> writes:
> On 04/07/2017 08:07 AM, Cédric Le Goater wrote:
>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote:
>>> Cédric Le Goater <clg@kaod.org> writes:
>>>
>>>> Hello Nikunj,
>>>>
>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>>>> The series enables Multi-Threaded TCG on PPC64
>>>>>
>>>>> Patch 01: Use atomic_cmpxchg in store conditional
>>>>> 02: Handle first write to page during atomic operation
>>>>> 03: Generate memory barriers for sync/isync and load/store conditional
>>>>>
>>>>> Patches are based on ppc-for-2.10
>>>>>
>>>>> Tested using following:
>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>>>
>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
>>>> good : the CPU usage of QEMU reached 760% on the host.
>>>
>>> Cool.
>>>
>>>>> Todo:
>>>>> * Enable other machine types and PPC32.
>>>>
>>>> I am quite ignorant on the topic.
>>>> Have you looked at what it would take to emulate support of the HW
>>>> threads ?
>>>
>>> We would need to implement msgsndp (doorbell support for IPI between
>>> threads of same core)
>>
>> ok. I get it. Thanks,
>>
>>>> and the PowerNV machine ?
>>>
>>> Haven't tried it, should work. Just give a shot, let me know if you see problems.
>>
>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your
>> patches and tell you.
>
> The system seems to be spinning in skiboot in cpu_idle/relax when
> starting the linux kernel. It finally boots, but it is rather long.
> David has merged enough to test if you want to give it a try.
Does PPC have Wait-for-irq or similar "sleeping" instructions?
We had to ensure we were not jumping out of the cpu loop and suspend
normally.
See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example.
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-10 17:20 ` Alex Bennée
@ 2017-04-11 12:28 ` Cédric Le Goater
2017-04-11 13:26 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2017-04-11 12:28 UTC (permalink / raw)
To: Alex Bennée
Cc: Nikunj A Dadhania, qemu-ppc, david, rth, programmingkidx,
qemu-devel, bharata, Benjamin Herrenschmidt
On 04/10/2017 07:20 PM, Alex Bennée wrote:
>
> Cédric Le Goater <clg@kaod.org> writes:
>
>> On 04/07/2017 08:07 AM, Cédric Le Goater wrote:
>>> On 04/07/2017 07:24 AM, Nikunj A Dadhania wrote:
>>>> Cédric Le Goater <clg@kaod.org> writes:
>>>>
>>>>> Hello Nikunj,
>>>>>
>>>>> On 04/06/2017 12:22 PM, Nikunj A Dadhania wrote:
>>>>>> The series enables Multi-Threaded TCG on PPC64
>>>>>>
>>>>>> Patch 01: Use atomic_cmpxchg in store conditional
>>>>>> 02: Handle first write to page during atomic operation
>>>>>> 03: Generate memory barriers for sync/isync and load/store conditional
>>>>>>
>>>>>> Patches are based on ppc-for-2.10
>>>>>>
>>>>>> Tested using following:
>>>>>> ./ppc64-softmmu/qemu-system-ppc64 -cpu POWER8 -vga none -nographic -machine pseries,usb=off -m 2G -smp 8,cores=8,threads=1 -accel tcg,thread=multi f23.img
>>>>>
>>>>> I tried it with a Ubuntu 16.04.2 guest using stress --cpu 8. It looked
>>>>> good : the CPU usage of QEMU reached 760% on the host.
>>>>
>>>> Cool.
>>>>
>>>>>> Todo:
>>>>>> * Enable other machine types and PPC32.
>>>>>
>>>>> I am quite ignorant on the topic.
>>>>> Have you looked at what it would take to emulate support of the HW
>>>>> threads ?
>>>>
>>>> We would need to implement msgsndp (doorbell support for IPI between
>>>> threads of same core)
>>>
>>> ok. I get it. Thanks,
>>>
>>>>> and the PowerNV machine ?
>>>>
>>>> Haven't tried it, should work. Just give a shot, let me know if you see problems.
>>>
>>> sure. pnv is still on 2.9, so I will rebase on 2.10, merge your
>>> patches and tell you.
>>
>> The system seems to be spinning in skiboot in cpu_idle/relax when
>> starting the linux kernel. It finally boots, but it is rather long.
>> David has merged enough to test if you want to give it a try.
>
> Does PPC have Wait-for-irq or similar "sleeping" instructions?
>
> We had to ensure we were not jumping out of the cpu loop and suspend
> normally.
I really don't know.
Ben, now that we have mttcg activated by default on ppc, it takes
a while for the linux kernel to do the early setup. I think we are
in the code section where we spin loop the secondaries. Most of the
time is spent in skiboot under cpu_idle/relax.
Any idea where that could come from ?
> See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example.
Thanks for the hint.
C.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-11 12:28 ` Cédric Le Goater
@ 2017-04-11 13:26 ` Benjamin Herrenschmidt
2017-04-11 14:04 ` Alex Bennée
0 siblings, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2017-04-11 13:26 UTC (permalink / raw)
To: Cédric Le Goater, Alex Bennée
Cc: Nikunj A Dadhania, qemu-ppc, david, rth, programmingkidx,
qemu-devel, bharata
On Tue, 2017-04-11 at 14:28 +0200, Cédric Le Goater wrote:
> I really don't know.
>
> Ben, now that we have mttcg activated by default on ppc, it takes
> a while for the linux kernel to do the early setup. I think we are
> in the code section where we spin loop the secondaries. Most of the
> time is spent in skiboot under cpu_idle/relax.
>
> Any idea where that could come from ?
>
> > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example.
>
> Thanks for the hint.
They are spinning, but they have smt_low instructions in the loop,
maybe that causes us to do some kind of synchronization as we exit
the emulation loop on these ? I added that to force relinguish time
to other threads on the pre-MT TCG...
There isn't really such a "pause" instruction. At least not yet.... P9
has a wait that is meant to wait for special AS_Notify cycles but will
also wait for interrupts. We don't have an mwait at this point.
Ben.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64
2017-04-11 13:26 ` Benjamin Herrenschmidt
@ 2017-04-11 14:04 ` Alex Bennée
0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2017-04-11 14:04 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Cédric Le Goater, Nikunj A Dadhania, qemu-ppc, david, rth,
programmingkidx, qemu-devel, bharata
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> On Tue, 2017-04-11 at 14:28 +0200, Cédric Le Goater wrote:
>> I really don't know.
>>
>> Ben, now that we have mttcg activated by default on ppc, it takes
>> a while for the linux kernel to do the early setup. I think we are
>> in the code section where we spin loop the secondaries. Most of the
>> time is spent in skiboot under cpu_idle/relax.
>>
>> Any idea where that could come from ?
>>
>> > See c22edfebff29f63d793032e4fbd42a035bb73e27 for an example.
>>
>> Thanks for the hint.
>
> They are spinning, but they have smt_low instructions in the loop,
> maybe that causes us to do some kind of synchronization as we exit
> the emulation loop on these ? I added that to force relinguish time
> to other threads on the pre-MT TCG...
Yeah you need a tweak the approach when running with MTTCG as otherwise
you end up causing exits for one vCPUs loop to yield to vCPUs that are
already running in other threads.
> There isn't really such a "pause" instruction. At least not yet.... P9
> has a wait that is meant to wait for special AS_Notify cycles but will
> also wait for interrupts. We don't have an mwait at this point.
They are worth implementing. FWIW on ARM we only really handle WFI
(Wait-for-interrupt) which will cause the EXCP_HALT and that will put
the vCPU into a halted state which can be woken up next interrupt.
For the other cases YIELD and WFE (wait-for-event) we just treat them as
NOPs when MTTCG is enabled (test parallel_cpus). So they will busy-wait
spin around the guests wfe code but don't trigger expensive longjmps out
of the execution loop. This was all done in:
c22edfebff target-arm: don't generate WFE/YIELD calls for MTTCG
One other thing I noticed while looking through the PPC stuff is I
couldn't see any handling of cpu_reset/powering on. There is a potential
race here which ThreadSanitizer will complain about if you start loading
up your about-to-be-powered-on-vCPU from another thread. The fix here is
to defer the setup with async work. See:
062ba099e0 target-arm/powerctl: defer cpu reset work to CPU context
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2017-04-11 14:04 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 10:22 [Qemu-devel] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Nikunj A Dadhania
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 1/3] target/ppc: Emulate LL/SC using cmpxchg helpers Nikunj A Dadhania
2017-04-06 15:51 ` Richard Henderson
2017-04-07 5:12 ` Nikunj A Dadhania
2017-04-06 15:53 ` Richard Henderson
2017-04-07 5:14 ` Nikunj A Dadhania
2017-04-07 5:23 ` David Gibson
2017-04-07 5:42 ` Nikunj A Dadhania
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 2/3] cputlb: handle first atomic write to the page Nikunj A Dadhania
2017-04-06 15:54 ` Richard Henderson
2017-04-06 10:22 ` [Qemu-devel] [PATCH RFC v1 3/3] target/ppc: Generate fence operations Nikunj A Dadhania
2017-04-06 16:15 ` Richard Henderson
2017-04-07 5:21 ` Nikunj A Dadhania
2017-04-07 18:19 ` Richard Henderson
2017-04-06 13:26 ` [Qemu-devel] [Qemu-ppc] [PATCH RFC v1 0/3] Enable MTTCG on PPC64 Cédric Le Goater
2017-04-06 13:28 ` G 3
2017-04-06 13:32 ` Cédric Le Goater
2017-04-06 13:47 ` G 3
2017-04-06 17:08 ` luigi burdo
2017-04-06 18:06 ` G 3
2017-04-07 5:24 ` Nikunj A Dadhania
2017-04-07 6:07 ` Cédric Le Goater
2017-04-10 16:41 ` Cédric Le Goater
2017-04-10 16:44 ` Nikunj A Dadhania
2017-04-10 16:59 ` Cédric Le Goater
2017-04-10 17:07 ` Nikunj A Dadhania
2017-04-10 17:20 ` Alex Bennée
2017-04-11 12:28 ` Cédric Le Goater
2017-04-11 13:26 ` Benjamin Herrenschmidt
2017-04-11 14:04 ` Alex Bennée
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.