All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
@ 2016-09-02  6:32 Nikunj A Dadhania
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call Nikunj A Dadhania
                   ` (5 more replies)
  0 siblings, 6 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  6:32 UTC (permalink / raw)
  To: qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel, nikunj, benh

The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
Changes that were needed to enable PowerPC are pretty simple;

Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
      02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
      03: Use atomic_cmpxchg in store conditional
      04: With more threads, flush the entry from each cpu. 
	  This can be optimized further.

The patches are based on the Alex Bennee's base enabling patches for 
MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
above patches is here:

https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2

Apart from the above, PPC patches are based out of ppc-for-2.8 and 
load/store consolidation patches [2]

Series with all dependent patches available here: 
https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1

Testing: 
========

-smp 4,cores=1,threads=4 -accel tcg,thread=multi

TODO
====
Implement msgsndp instructions(door-bell), newer kernels enable it 
depending on the PVR. I have been using following workaround to boot.
https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg391966.html
[2] https://lists.gnu.org/archive/html/qemu-ppc/2016-08/msg00265.html

Nikunj A Dadhania (4):
  spapr-hcall: take iothread lock during handler call
  target-ppc: with MTTCG report more threads
  target-ppc: use atomic_cmpxchg for ld/st reservation
  target-ppc: flush tlb from all the cpu

 cputlb.c                | 15 +++++++++++++++
 hw/ppc/spapr_hcall.c    | 11 +++++++++--
 include/exec/exec-all.h |  2 ++
 target-ppc/kvm.c        |  2 +-
 target-ppc/kvm_ppc.h    |  2 +-
 target-ppc/mmu-hash64.c |  2 +-
 target-ppc/translate.c  | 24 +++++++++++++++++++++---
 7 files changed, 50 insertions(+), 8 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call
  2016-09-02  6:32 [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Nikunj A Dadhania
@ 2016-09-02  6:32 ` Nikunj A Dadhania
  2016-09-02  8:53   ` Greg Kurz
  2016-09-02 10:06   ` Thomas Huth
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads Nikunj A Dadhania
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  6:32 UTC (permalink / raw)
  To: qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel, nikunj, benh

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 hw/ppc/spapr_hcall.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index e5eca67..daea7a0 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1075,20 +1075,27 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
                              target_ulong *args)
 {
     sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+    target_ulong ret;
 
     if ((opcode <= MAX_HCALL_OPCODE)
         && ((opcode & 0x3) == 0)) {
         spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
 
         if (fn) {
-            return fn(cpu, spapr, opcode, args);
+            qemu_mutex_lock_iothread();
+            ret = fn(cpu, spapr, opcode, args);
+            qemu_mutex_unlock_iothread();
+            return ret;
         }
     } else if ((opcode >= KVMPPC_HCALL_BASE) &&
                (opcode <= KVMPPC_HCALL_MAX)) {
         spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
 
         if (fn) {
-            return fn(cpu, spapr, opcode, args);
+            qemu_mutex_lock_iothread();
+            ret = fn(cpu, spapr, opcode, args);
+            qemu_mutex_unlock_iothread();
+            return ret;
         }
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads
  2016-09-02  6:32 [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Nikunj A Dadhania
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call Nikunj A Dadhania
@ 2016-09-02  6:32 ` Nikunj A Dadhania
  2016-09-02  9:28   ` Greg Kurz
  2016-09-07  3:51   ` David Gibson
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation Nikunj A Dadhania
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  6:32 UTC (permalink / raw)
  To: qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel, nikunj, benh

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/kvm.c     | 2 +-
 target-ppc/kvm_ppc.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index dcb68b9..20eb450 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
 
 int kvmppc_smt_threads(void)
 {
-    return cap_ppc_smt ? cap_ppc_smt : 1;
+    return cap_ppc_smt ? cap_ppc_smt : 8;
 }
 
 #ifdef TARGET_PPC64
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 5461d10..053db0a 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
 
 static inline int kvmppc_smt_threads(void)
 {
-    return 1;
+    return 8;
 }
 
 static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-02  6:32 [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Nikunj A Dadhania
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call Nikunj A Dadhania
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads Nikunj A Dadhania
@ 2016-09-02  6:32 ` Nikunj A Dadhania
  2016-09-07  4:02   ` David Gibson
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu Nikunj A Dadhania
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  6:32 UTC (permalink / raw)
  To: qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel, nikunj, benh

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 4a882b3..447c13e 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -72,6 +72,7 @@ static TCGv cpu_cfar;
 #endif
 static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca;
 static TCGv cpu_reserve;
+static TCGv cpu_reserve_val;
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
 
@@ -176,6 +177,9 @@ void ppc_translate_init(void)
     cpu_reserve = tcg_global_mem_new(cpu_env,
                                      offsetof(CPUPPCState, reserve_addr),
                                      "reserve_addr");
+    cpu_reserve_val = tcg_global_mem_new(cpu_env,
+                                     offsetof(CPUPPCState, reserve_val),
+                                     "reserve_val");
 
     cpu_fpscr = tcg_global_mem_new(cpu_env,
                                    offsetof(CPUPPCState, fpscr), "fpscr");
@@ -3086,7 +3090,7 @@ static void gen_##name(DisasContext *ctx)                            \
     }                                                                \
     tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);                \
     tcg_gen_mov_tl(cpu_reserve, t0);                                 \
-    tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \
+    tcg_gen_mov_tl(cpu_reserve_val, gpr);                            \
     tcg_temp_free(t0);                                               \
 }
 
@@ -3112,14 +3116,28 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
                                   int reg, int memop)
 {
     TCGLabel *l1;
+    TCGv_i32 tmp = tcg_temp_local_new_i32();
+    TCGv t0;
 
+    tcg_gen_movi_i32(tmp, 0);
     tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
     l1 = gen_new_label();
     tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
-    tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
-    tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
+
+    t0 = tcg_temp_new();
+    tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
+                              ctx->mem_idx, DEF_MEMOP(memop));
+    tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
+    tcg_gen_trunc_tl_i32(tmp, t0);
+
     gen_set_label(l1);
+    tcg_gen_shli_i32(tmp, tmp, CRF_EQ);
+    tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp);
     tcg_gen_movi_tl(cpu_reserve, -1);
+    tcg_gen_movi_tl(cpu_reserve_val, 0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free_i32(tmp);
 }
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-02  6:32 [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Nikunj A Dadhania
                   ` (2 preceding siblings ...)
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation Nikunj A Dadhania
@ 2016-09-02  6:32 ` Nikunj A Dadhania
  2016-09-02  7:22   ` Benjamin Herrenschmidt
  2016-09-02  6:43 ` [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Cédric Le Goater
  2016-09-02  7:19 ` Benjamin Herrenschmidt
  5 siblings, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  6:32 UTC (permalink / raw)
  To: qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel, nikunj, benh

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 cputlb.c                | 15 +++++++++++++++
 include/exec/exec-all.h |  2 ++
 target-ppc/mmu-hash64.c |  2 +-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/cputlb.c b/cputlb.c
index 64faf47..17ff58e 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -123,6 +123,21 @@ void tlb_flush(CPUState *cpu, int flush_global)
     }
 }
 
+static void tlb_flush_all_async_work(CPUState *cpu, void *opaque)
+{
+    tlb_flush_nocheck(cpu, GPOINTER_TO_INT(opaque));
+}
+
+void tlb_flush_all(CPUState *cpu, int flush_global)
+{
+    CPUState *c;
+
+    CPU_FOREACH(c) {
+        async_run_on_cpu(c, tlb_flush_all_async_work,
+                         GUINT_TO_POINTER(flush_global));
+    }
+}
+
 static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, void *mmu_bitmask)
 {
     CPUArchState *env = cpu->env_ptr;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e9f3bcf..55c344b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -116,6 +116,8 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr);
  * TLB entries, and the argument is ignored.
  */
 void tlb_flush(CPUState *cpu, int flush_global);
+void tlb_flush_all(CPUState *cpu, int flush_global);
+
 /**
  * tlb_flush_page_by_mmuidx:
  * @cpu: CPU whose TLB should be flushed
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 8118143..d852c21 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -912,7 +912,7 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
      * invalidate, and we still don't have a tlb_flush_mask(env, n,
      * mask) in QEMU, we just invalidate all TLBs
      */
-    tlb_flush(CPU(cpu), 1);
+    tlb_flush_all(CPU(cpu), 1);
 }
 
 void ppc_hash64_update_rmls(CPUPPCState *env)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  6:32 [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Nikunj A Dadhania
                   ` (3 preceding siblings ...)
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu Nikunj A Dadhania
@ 2016-09-02  6:43 ` Cédric Le Goater
  2016-09-02  6:46   ` Nikunj A Dadhania
  2016-09-02  7:57   ` Thomas Huth
  2016-09-02  7:19 ` Benjamin Herrenschmidt
  5 siblings, 2 replies; 45+ messages in thread
From: Cédric Le Goater @ 2016-09-02  6:43 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

On 09/02/2016 08:32 AM, Nikunj A Dadhania wrote:
> The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
> Changes that were needed to enable PowerPC are pretty simple;
> 
> Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
>       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
>       03: Use atomic_cmpxchg in store conditional
>       04: With more threads, flush the entry from each cpu. 
> 	  This can be optimized further.
> 
> The patches are based on the Alex Bennee's base enabling patches for 
> MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
> above patches is here:
> 
> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
> 
> Apart from the above, PPC patches are based out of ppc-for-2.8 and 
> load/store consolidation patches [2]
> 
> Series with all dependent patches available here: 
> https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1
> 
> Testing: 
> ========
> 
> -smp 4,cores=1,threads=4 -accel tcg,thread=multi
> 
> TODO
> ====
> Implement msgsndp instructions(door-bell), newer kernels enable it 
> depending on the PVR. I have been using following workaround to boot.
> https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4

You could also introduce a Power8 DD1 in qemu. From the kernel cputable :

	{	/* Power8 DD1: Does not support doorbell IPIs */
		.pvr_mask		= 0xffffff00,
		.pvr_value		= 0x004d0100,
		.cpu_name		= "POWER8 (raw)",
		.cpu_features		= CPU_FTRS_POWER8_DD1,
		...

Cheers,
C.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  6:43 ` [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Cédric Le Goater
@ 2016-09-02  6:46   ` Nikunj A Dadhania
  2016-09-02  7:57   ` Thomas Huth
  1 sibling, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  6:46 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

Cédric Le Goater <clg@kaod.org> writes:

> On 09/02/2016 08:32 AM, Nikunj A Dadhania wrote:
>> The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
>> Changes that were needed to enable PowerPC are pretty simple;
>> 
>> Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
>>       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
>>       03: Use atomic_cmpxchg in store conditional
>>       04: With more threads, flush the entry from each cpu. 
>> 	  This can be optimized further.
>> 
>> The patches are based on the Alex Bennee's base enabling patches for 
>> MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
>> above patches is here:
>> 
>> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>> 
>> Apart from the above, PPC patches are based out of ppc-for-2.8 and 
>> load/store consolidation patches [2]
>> 
>> Series with all dependent patches available here: 
>> https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1
>> 
>> Testing: 
>> ========
>> 
>> -smp 4,cores=1,threads=4 -accel tcg,thread=multi
>> 
>> TODO
>> ====
>> Implement msgsndp instructions(door-bell), newer kernels enable it 
>> depending on the PVR. I have been using following workaround to boot.
>> https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4
>
> You could also introduce a Power8 DD1 in qemu. From the kernel cputable :

I see this as a temporary workaround, need to fix it.

> 	{	/* Power8 DD1: Does not support doorbell IPIs */
> 		.pvr_mask		= 0xffffff00,
> 		.pvr_value		= 0x004d0100,
> 		.cpu_name		= "POWER8 (raw)",
> 		.cpu_features		= CPU_FTRS_POWER8_DD1,
> 		...

Regards
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  6:32 [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Nikunj A Dadhania
                   ` (4 preceding siblings ...)
  2016-09-02  6:43 ` [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Cédric Le Goater
@ 2016-09-02  7:19 ` Benjamin Herrenschmidt
  2016-09-02  7:39   ` Nikunj A Dadhania
  2016-09-07  4:08   ` David Gibson
  5 siblings, 2 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-02  7:19 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
> The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
> Changes that were needed to enable PowerPC are pretty simple;
> 
> Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
>       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation

If we do this, we need to implement the shared SPRs properly and the
inter-thread doorbells...

>       03: Use atomic_cmpxchg in store conditional
>       04: With more threads, flush the entry from each cpu. 
> 	  This can be optimized further.
> 
> The patches are based on the Alex Bennee's base enabling patches for 
> MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
> above patches is here:
> 
> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
> 
> Apart from the above, PPC patches are based out of ppc-for-2.8 and 
> load/store consolidation patches [2]
> 
> Series with all dependent patches available here: 
> https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1
> 
> Testing: 
> ========
> 
> -smp 4,cores=1,threads=4 -accel tcg,thread=multi
> 
> TODO
> ====
> Implement msgsndp instructions(door-bell), newer kernels enable it 
> depending on the PVR. I have been using following workaround to boot.
> https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4
> 
> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg391966.html
> [2] https://lists.gnu.org/archive/html/qemu-ppc/2016-08/msg00265.html
> 
> Nikunj A Dadhania (4):
>   spapr-hcall: take iothread lock during handler call
>   target-ppc: with MTTCG report more threads
>   target-ppc: use atomic_cmpxchg for ld/st reservation
>   target-ppc: flush tlb from all the cpu
> 
>  cputlb.c                | 15 +++++++++++++++
>  hw/ppc/spapr_hcall.c    | 11 +++++++++--
>  include/exec/exec-all.h |  2 ++
>  target-ppc/kvm.c        |  2 +-
>  target-ppc/kvm_ppc.h    |  2 +-
>  target-ppc/mmu-hash64.c |  2 +-
>  target-ppc/translate.c  | 24 +++++++++++++++++++++---
>  7 files changed, 50 insertions(+), 8 deletions(-)
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu Nikunj A Dadhania
@ 2016-09-02  7:22   ` Benjamin Herrenschmidt
  2016-09-02  7:34     ` Nikunj A Dadhania
  0 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-02  7:22 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  cputlb.c                | 15 +++++++++++++++
>  include/exec/exec-all.h |  2 ++
>  target-ppc/mmu-hash64.c |  2 +-
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/cputlb.c b/cputlb.c
> index 64faf47..17ff58e 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -123,6 +123,21 @@ void tlb_flush(CPUState *cpu, int flush_global)
>      }
>  }
>  
> +static void tlb_flush_all_async_work(CPUState *cpu, void *opaque)
> +{
> +    tlb_flush_nocheck(cpu, GPOINTER_TO_INT(opaque));
> +}
> +
> +void tlb_flush_all(CPUState *cpu, int flush_global)
> +{
> +    CPUState *c;
> +
> +    CPU_FOREACH(c) {
> +        async_run_on_cpu(c, tlb_flush_all_async_work,
> +                         GUINT_TO_POINTER(flush_global));
> +    }
> +}

Hrm... this is asynchronous ? It probably needs to be synchronous...
We must provide a guarantee that no other processor can see the old
translation when the tlb invalidation sequence completes. With the
current lazy TLB flush, we already delay the invalidation until
we hit that synchronization point so we need to be synchronous.

Cheers,
Ben.

>  static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, void
> *mmu_bitmask)
>  {
>      CPUArchState *env = cpu->env_ptr;
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e9f3bcf..55c344b 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -116,6 +116,8 @@ void tlb_flush_page(CPUState *cpu, target_ulong
> addr);
>   * TLB entries, and the argument is ignored.
>   */
>  void tlb_flush(CPUState *cpu, int flush_global);
> +void tlb_flush_all(CPUState *cpu, int flush_global);
> +
>  /**
>   * tlb_flush_page_by_mmuidx:
>   * @cpu: CPU whose TLB should be flushed
> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> index 8118143..d852c21 100644
> --- a/target-ppc/mmu-hash64.c
> +++ b/target-ppc/mmu-hash64.c
> @@ -912,7 +912,7 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
>       * invalidate, and we still don't have a tlb_flush_mask(env, n,
>       * mask) in QEMU, we just invalidate all TLBs
>       */
> -    tlb_flush(CPU(cpu), 1);
> +    tlb_flush_all(CPU(cpu), 1);
>  }
>  
>  void ppc_hash64_update_rmls(CPUPPCState *env)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-02  7:22   ` Benjamin Herrenschmidt
@ 2016-09-02  7:34     ` Nikunj A Dadhania
  2016-09-04 17:00       ` Alex Bennée
  0 siblings, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  7:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  cputlb.c                | 15 +++++++++++++++
>>  include/exec/exec-all.h |  2 ++
>>  target-ppc/mmu-hash64.c |  2 +-
>>  3 files changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/cputlb.c b/cputlb.c
>> index 64faf47..17ff58e 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -123,6 +123,21 @@ void tlb_flush(CPUState *cpu, int flush_global)
>>      }
>>  }
>>  
>> +static void tlb_flush_all_async_work(CPUState *cpu, void *opaque)
>> +{
>> +    tlb_flush_nocheck(cpu, GPOINTER_TO_INT(opaque));
>> +}
>> +
>> +void tlb_flush_all(CPUState *cpu, int flush_global)
>> +{
>> +    CPUState *c;
>> +
>> +    CPU_FOREACH(c) {
>> +        async_run_on_cpu(c, tlb_flush_all_async_work,
>> +                         GUINT_TO_POINTER(flush_global));
>> +    }
>> +}
>
> Hrm... this is asynchronous ?

Yes.

> It probably needs to be synchronous...

I see run_on_cpu() which seems suitable.

> We must provide a guarantee that no other processor can see the old
> translation when the tlb invalidation sequence completes. With the
> current lazy TLB flush, we already delay the invalidation until
> we hit that synchronization point so we need to be synchronous.


>> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
>> index 8118143..d852c21 100644
>> --- a/target-ppc/mmu-hash64.c
>> +++ b/target-ppc/mmu-hash64.c
>> @@ -912,7 +912,7 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
>>       * invalidate, and we still don't have a tlb_flush_mask(env, n,
>>       * mask) in QEMU, we just invalidate all TLBs
>>       */
>> -    tlb_flush(CPU(cpu), 1);
>> +    tlb_flush_all(CPU(cpu), 1);
>>  }
>>  
>>  void ppc_hash64_update_rmls(CPUPPCState *env)

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  7:19 ` Benjamin Herrenschmidt
@ 2016-09-02  7:39   ` Nikunj A Dadhania
  2016-09-02 12:13     ` Benjamin Herrenschmidt
  2016-09-07  4:08   ` David Gibson
  1 sibling, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  7:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
>> The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
>> Changes that were needed to enable PowerPC are pretty simple;
>> 
>> Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
>>       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
>
> If we do this, we need to implement the shared SPRs properly

Sure, will have a look at it.

> and the inter-thread doorbells...

You mean the below ?

>> TODO
>> ====
>> Implement msgsndp instructions(door-bell), newer kernels enable it 
>> depending on the PVR. I have been using following workaround to boot.
>> https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  6:43 ` [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Cédric Le Goater
  2016-09-02  6:46   ` Nikunj A Dadhania
@ 2016-09-02  7:57   ` Thomas Huth
  2016-09-02 11:44     ` Cédric Le Goater
  1 sibling, 1 reply; 45+ messages in thread
From: Thomas Huth @ 2016-09-02  7:57 UTC (permalink / raw)
  To: Cédric Le Goater, Nikunj A Dadhania, qemu-ppc, alex.bennee,
	david, rth
  Cc: qemu-devel

On 02.09.2016 08:43, Cédric Le Goater wrote:
> On 09/02/2016 08:32 AM, Nikunj A Dadhania wrote:
>> The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
>> Changes that were needed to enable PowerPC are pretty simple;
>>
>> Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
>>       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
>>       03: Use atomic_cmpxchg in store conditional
>>       04: With more threads, flush the entry from each cpu. 
>> 	  This can be optimized further.
>>
>> The patches are based on the Alex Bennee's base enabling patches for 
>> MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
>> above patches is here:
>>
>> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>>
>> Apart from the above, PPC patches are based out of ppc-for-2.8 and 
>> load/store consolidation patches [2]
>>
>> Series with all dependent patches available here: 
>> https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1
>>
>> Testing: 
>> ========
>>
>> -smp 4,cores=1,threads=4 -accel tcg,thread=multi
>>
>> TODO
>> ====
>> Implement msgsndp instructions(door-bell), newer kernels enable it 
>> depending on the PVR. I have been using following workaround to boot.
>> https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4
> 
> You could also introduce a Power8 DD1 in qemu. From the kernel cputable :
> 
> 	{	/* Power8 DD1: Does not support doorbell IPIs */
> 		.pvr_mask		= 0xffffff00,
> 		.pvr_value		= 0x004d0100,
> 		.cpu_name		= "POWER8 (raw)",
> 		.cpu_features		= CPU_FTRS_POWER8_DD1,
> 		...

Please try to avoid that! We just got rid of DD1 in qemu a couple of
months ago:

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=9d6ba75df26d699a6

 Thomas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call Nikunj A Dadhania
@ 2016-09-02  8:53   ` Greg Kurz
  2016-09-02  9:28     ` Nikunj A Dadhania
  2016-09-02 10:06   ` Thomas Huth
  1 sibling, 1 reply; 45+ messages in thread
From: Greg Kurz @ 2016-09-02  8:53 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, alex.bennee, david, rth, qemu-devel

On Fri,  2 Sep 2016 12:02:53 +0530
Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:

> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_hcall.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index e5eca67..daea7a0 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1075,20 +1075,27 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>                               target_ulong *args)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    target_ulong ret;
>  
>      if ((opcode <= MAX_HCALL_OPCODE)
>          && ((opcode & 0x3) == 0)) {
>          spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
>  
>          if (fn) {
> -            return fn(cpu, spapr, opcode, args);
> +            qemu_mutex_lock_iothread();
> +            ret = fn(cpu, spapr, opcode, args);
> +            qemu_mutex_unlock_iothread();
> +            return ret;
>          }
>      } else if ((opcode >= KVMPPC_HCALL_BASE) &&
>                 (opcode <= KVMPPC_HCALL_MAX)) {
>          spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
>  
>          if (fn) {
> -            return fn(cpu, spapr, opcode, args);
> +            qemu_mutex_lock_iothread();
> +            ret = fn(cpu, spapr, opcode, args);
> +            qemu_mutex_unlock_iothread();
> +            return ret;
>          }
>      }
>  

This will serialize all hypercalls, even when it is not needed... Isn't that
too much coarse grain locking ?

Cheers.

--
Greg

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads Nikunj A Dadhania
@ 2016-09-02  9:28   ` Greg Kurz
  2016-09-02  9:34     ` Nikunj A Dadhania
  2016-09-07  3:51   ` David Gibson
  1 sibling, 1 reply; 45+ messages in thread
From: Greg Kurz @ 2016-09-02  9:28 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, alex.bennee, david, rth, qemu-devel

On Fri,  2 Sep 2016 12:02:54 +0530
Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:

> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---

Shouldn't this patch be the last one, when all other issues have been addressed ?

>  target-ppc/kvm.c     | 2 +-
>  target-ppc/kvm_ppc.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index dcb68b9..20eb450 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  
>  int kvmppc_smt_threads(void)
>  {
> -    return cap_ppc_smt ? cap_ppc_smt : 1;
> +    return cap_ppc_smt ? cap_ppc_smt : 8;

If KVM is there but does not support SMT processor modes, it looks
wrong to return anything but 1. This check needs kvm_enabled().

Also, why 8 ? This depends on the CPU model. Also, since real HW allows to
choose the SMT mode, maybe this should be configurable from the command
line as well.

>  }
>  
>  #ifdef TARGET_PPC64
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 5461d10..053db0a 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  
>  static inline int kvmppc_smt_threads(void)
>  {
> -    return 1;
> +    return 8;

Same remark.

>  }
>  
>  static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)

Cheers.

--
Greg

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call
  2016-09-02  8:53   ` Greg Kurz
@ 2016-09-02  9:28     ` Nikunj A Dadhania
  2016-09-02  9:57       ` Greg Kurz
  0 siblings, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  9:28 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, alex.bennee, david, rth, qemu-devel

Greg Kurz <groug@kaod.org> writes:

> On Fri,  2 Sep 2016 12:02:53 +0530
> Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
>
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_hcall.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index e5eca67..daea7a0 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -1075,20 +1075,27 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>                               target_ulong *args)
>>  {
>>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    target_ulong ret;
>>  
>>      if ((opcode <= MAX_HCALL_OPCODE)
>>          && ((opcode & 0x3) == 0)) {
>>          spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
>>  
>>          if (fn) {
>> -            return fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_lock_iothread();
>> +            ret = fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_unlock_iothread();
>> +            return ret;
>>          }
>>      } else if ((opcode >= KVMPPC_HCALL_BASE) &&
>>                 (opcode <= KVMPPC_HCALL_MAX)) {
>>          spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
>>  
>>          if (fn) {
>> -            return fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_lock_iothread();
>> +            ret = fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_unlock_iothread();
>> +            return ret;
>>          }
>>      }
>>  
>
> This will serialize all hypercalls, even when it is not needed... Isn't that
> too much coarse grain locking ?

You are right, I was thinking to do this only for emulation case, as
this is not needed for hardware acceleration.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads
  2016-09-02  9:28   ` Greg Kurz
@ 2016-09-02  9:34     ` Nikunj A Dadhania
  2016-09-02 10:45       ` Greg Kurz
  0 siblings, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-02  9:34 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, alex.bennee, david, rth, qemu-devel

Greg Kurz <groug@kaod.org> writes:

> On Fri,  2 Sep 2016 12:02:54 +0530
> Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
>
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>
> Shouldn't this patch be the last one, when all other issues have been addressed ?
>
>>  target-ppc/kvm.c     | 2 +-
>>  target-ppc/kvm_ppc.h | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index dcb68b9..20eb450 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>  
>>  int kvmppc_smt_threads(void)
>>  {
>> -    return cap_ppc_smt ? cap_ppc_smt : 1;
>> +    return cap_ppc_smt ? cap_ppc_smt : 8;
>
> If KVM is there but does not support SMT processor modes, it looks
> wrong to return anything but 1. This check needs kvm_enabled().

This also gets called when emulating PPC on PPC. 

> Also, why 8 ? This depends on the CPU model.

Not sure if I need to emulate according to the host cpu model. I had
selected 8, as that was the highest number of threads possible for POWER.

> Also, since real HW allows to choose the SMT mode, maybe this should
> be configurable from the command line as well.
>
>>  }
>>  
>>  #ifdef TARGET_PPC64
>> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
>> index 5461d10..053db0a 100644
>> --- a/target-ppc/kvm_ppc.h
>> +++ b/target-ppc/kvm_ppc.h
>> @@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>  
>>  static inline int kvmppc_smt_threads(void)
>>  {
>> -    return 1;
>> +    return 8;
>
> Same remark.
>
>>  }
>>  
>>  static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)

Regards
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call
  2016-09-02  9:28     ` Nikunj A Dadhania
@ 2016-09-02  9:57       ` Greg Kurz
  2016-09-03 16:31         ` Nikunj A Dadhania
  0 siblings, 1 reply; 45+ messages in thread
From: Greg Kurz @ 2016-09-02  9:57 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: alex.bennee, rth, qemu-ppc, qemu-devel, david

On Fri, 02 Sep 2016 14:58:12 +0530
Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:

> Greg Kurz <groug@kaod.org> writes:
> 
> > On Fri,  2 Sep 2016 12:02:53 +0530
> > Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
> >  
> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr_hcall.c | 11 +++++++++--
> >>  1 file changed, 9 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> >> index e5eca67..daea7a0 100644
> >> --- a/hw/ppc/spapr_hcall.c
> >> +++ b/hw/ppc/spapr_hcall.c
> >> @@ -1075,20 +1075,27 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
> >>                               target_ulong *args)
> >>  {
> >>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> >> +    target_ulong ret;
> >>  
> >>      if ((opcode <= MAX_HCALL_OPCODE)
> >>          && ((opcode & 0x3) == 0)) {
> >>          spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
> >>  
> >>          if (fn) {
> >> -            return fn(cpu, spapr, opcode, args);
> >> +            qemu_mutex_lock_iothread();
> >> +            ret = fn(cpu, spapr, opcode, args);
> >> +            qemu_mutex_unlock_iothread();
> >> +            return ret;
> >>          }
> >>      } else if ((opcode >= KVMPPC_HCALL_BASE) &&
> >>                 (opcode <= KVMPPC_HCALL_MAX)) {
> >>          spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
> >>  
> >>          if (fn) {
> >> -            return fn(cpu, spapr, opcode, args);
> >> +            qemu_mutex_lock_iothread();
> >> +            ret = fn(cpu, spapr, opcode, args);
> >> +            qemu_mutex_unlock_iothread();
> >> +            return ret;
> >>          }
> >>      }
> >>    
> >
> > This will serialize all hypercalls, even when it is not needed... Isn't that
> > too much coarse grain locking ?  
> 
> You are right, I was thinking to do this only for emulation case, as
> this is not needed for hardware acceleration.
> 

Yes, at the very least. And even in the MTTCG case, shouldn't we serialize only
when we know I/O will actually happen ?

> Regards
> Nikunj
> 
> 

Cheers.

--
Greg

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call Nikunj A Dadhania
  2016-09-02  8:53   ` Greg Kurz
@ 2016-09-02 10:06   ` Thomas Huth
  2016-09-03 16:33     ` Nikunj A Dadhania
  1 sibling, 1 reply; 45+ messages in thread
From: Thomas Huth @ 2016-09-02 10:06 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

On 02.09.2016 08:32, Nikunj A Dadhania wrote:
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_hcall.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index e5eca67..daea7a0 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1075,20 +1075,27 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>                               target_ulong *args)
>  {
>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +    target_ulong ret;
>  
>      if ((opcode <= MAX_HCALL_OPCODE)
>          && ((opcode & 0x3) == 0)) {
>          spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
>  
>          if (fn) {
> -            return fn(cpu, spapr, opcode, args);
> +            qemu_mutex_lock_iothread();
> +            ret = fn(cpu, spapr, opcode, args);
> +            qemu_mutex_unlock_iothread();
> +            return ret;
>          }
>      } else if ((opcode >= KVMPPC_HCALL_BASE) &&
>                 (opcode <= KVMPPC_HCALL_MAX)) {
>          spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
>  
>          if (fn) {
> -            return fn(cpu, spapr, opcode, args);
> +            qemu_mutex_lock_iothread();
> +            ret = fn(cpu, spapr, opcode, args);
> +            qemu_mutex_unlock_iothread();
> +            return ret;
>          }
>      }

I think this will cause a deadlock when running on KVM since the lock is
already taken in kvm_arch_handle_exit() - which calls spapr_hypercall()!

 Thomas

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads
  2016-09-02  9:34     ` Nikunj A Dadhania
@ 2016-09-02 10:45       ` Greg Kurz
  2016-09-03 16:34         ` Nikunj A Dadhania
  0 siblings, 1 reply; 45+ messages in thread
From: Greg Kurz @ 2016-09-02 10:45 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, alex.bennee, david, rth, qemu-devel

On Fri, 02 Sep 2016 15:04:47 +0530
Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:

> Greg Kurz <groug@kaod.org> writes:
> 
> > On Fri,  2 Sep 2016 12:02:54 +0530
> > Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
> >  
> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> >> ---  
> >
> > Shouldn't this patch be the last one, when all other issues have been addressed ?
> >  
> >>  target-ppc/kvm.c     | 2 +-
> >>  target-ppc/kvm_ppc.h | 2 +-
> >>  2 files changed, 2 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> >> index dcb68b9..20eb450 100644
> >> --- a/target-ppc/kvm.c
> >> +++ b/target-ppc/kvm.c
> >> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>  
> >>  int kvmppc_smt_threads(void)
> >>  {
> >> -    return cap_ppc_smt ? cap_ppc_smt : 1;
> >> +    return cap_ppc_smt ? cap_ppc_smt : 8;  
> >
> > If KVM is there but does not support SMT processor modes, it looks
> > wrong to return anything but 1. This check needs kvm_enabled().  
> 
> This also gets called when emulating PPC on PPC. 
> 

Yes and the current value of 1 is the default for non HV KVM and TCG. If
you want to change the default for MTTCG, then you need separate paths.

> > Also, why 8 ? This depends on the CPU model.  
> 
> Not sure if I need to emulate according to the host cpu model. I had
> selected 8, as that was the highest number of threads possible for POWER.
> 

If running in full emulation and cpu type is POWER7, this shouldn't be
higher than 4.

In the end, something like:

int kvmppc_smt_threads(void)
{
    if (kvm_enabled()) {
        return cap_ppc_smt ? cap_ppc_smt : 1;
    } else {
        return 8_or_4_or_2_or_1_depending_on_the_cpu_model;
    }
}

Cheers.

--
Greg

> > Also, since real HW allows to choose the SMT mode, maybe this should
> > be configurable from the command line as well.
> >  
> >>  }
> >>  
> >>  #ifdef TARGET_PPC64
> >> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> >> index 5461d10..053db0a 100644
> >> --- a/target-ppc/kvm_ppc.h
> >> +++ b/target-ppc/kvm_ppc.h
> >> @@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
> >>  
> >>  static inline int kvmppc_smt_threads(void)
> >>  {
> >> -    return 1;
> >> +    return 8;  
> >
> > Same remark.
> >  
> >>  }
> >>  
> >>  static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)  
> 
> Regards
> Nikunj
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  7:57   ` Thomas Huth
@ 2016-09-02 11:44     ` Cédric Le Goater
  0 siblings, 0 replies; 45+ messages in thread
From: Cédric Le Goater @ 2016-09-02 11:44 UTC (permalink / raw)
  To: Thomas Huth, Nikunj A Dadhania, qemu-ppc, alex.bennee, david, rth
  Cc: qemu-devel

On 09/02/2016 09:57 AM, Thomas Huth wrote:
> On 02.09.2016 08:43, Cédric Le Goater wrote:
>> On 09/02/2016 08:32 AM, Nikunj A Dadhania wrote:
>>> The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
>>> Changes that were needed to enable PowerPC are pretty simple;
>>>
>>> Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
>>>       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
>>>       03: Use atomic_cmpxchg in store conditional
>>>       04: With more threads, flush the entry from each cpu. 
>>> 	  This can be optimized further.
>>>
>>> The patches are based on the Alex Bennee's base enabling patches for 
>>> MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
>>> above patches is here:
>>>
>>> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>>>
>>> Apart from the above, PPC patches are based out of ppc-for-2.8 and 
>>> load/store consolidation patches [2]
>>>
>>> Series with all dependent patches available here: 
>>> https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1
>>>
>>> Testing: 
>>> ========
>>>
>>> -smp 4,cores=1,threads=4 -accel tcg,thread=multi
>>>
>>> TODO
>>> ====
>>> Implement msgsndp instructions(door-bell), newer kernels enable it 
>>> depending on the PVR. I have been using following workaround to boot.
>>> https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4
>>
>> You could also introduce a Power8 DD1 in qemu. From the kernel cputable :
>>
>> 	{	/* Power8 DD1: Does not support doorbell IPIs */
>> 		.pvr_mask		= 0xffffff00,
>> 		.pvr_value		= 0x004d0100,
>> 		.cpu_name		= "POWER8 (raw)",
>> 		.cpu_features		= CPU_FTRS_POWER8_DD1,
>> 		...
> 
> Please try to avoid that! We just got rid of DD1 in qemu a couple of
> months ago:
> 
> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=9d6ba75df26d699a6

he. I missed that. So the suggestion is a bit unfortunate. 

Thanks,
C.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  7:39   ` Nikunj A Dadhania
@ 2016-09-02 12:13     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-02 12:13 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

On Fri, 2016-09-02 at 13:09 +0530, Nikunj A Dadhania wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > 
> > On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
> > > 
> > > The series is a first attempt at enabling Multi-Threaded TCG on
> > > PowerPC.
> > > Changes that were needed to enable PowerPC are pretty simple;
> > > 
> > > Patch 01: Take a iothread lock during hcall, as hcall can
> > > generate io requests
> > >       02: For TCG, we were harcoding smt as 1, this gets rid of
> > > the limitation
> > 
> > If we do this, we need to implement the shared SPRs properly
> 
> Sure, will have a look at it.

My idea was to have a bitmap of SPRs that are shared. When hitting a
shared one, after a write, we call a helper that "sync" the update to
all the other threads using run_on_cpu.

> > 
> > and the inter-thread doorbells...
> 
> You mean the below ?
> 
> > 
> > > 
> > > TODO
> > > ====
> > > Implement msgsndp instructions(door-bell), newer kernels enable
> > > it 
> > > depending on the PVR. I have been using following workaround to
> > > boot.
> > > https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba
> > > 3ce1813fcf50bc4
> 
> Regards,
> Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call
  2016-09-02  9:57       ` Greg Kurz
@ 2016-09-03 16:31         ` Nikunj A Dadhania
  0 siblings, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-03 16:31 UTC (permalink / raw)
  To: Greg Kurz; +Cc: alex.bennee, rth, qemu-ppc, qemu-devel, david

Greg Kurz <groug@kaod.org> writes:

> On Fri, 02 Sep 2016 14:58:12 +0530
> Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
>
>> Greg Kurz <groug@kaod.org> writes:
>> 
>> > On Fri,  2 Sep 2016 12:02:53 +0530
>> > Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
>> >  
>> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> >> ---
>> >>  hw/ppc/spapr_hcall.c | 11 +++++++++--
>> >>  1 file changed, 9 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> >> index e5eca67..daea7a0 100644
>> >> --- a/hw/ppc/spapr_hcall.c
>> >> +++ b/hw/ppc/spapr_hcall.c
>> >> @@ -1075,20 +1075,27 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>> >>                               target_ulong *args)
>> >>  {
>> >>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> >> +    target_ulong ret;
>> >>  
>> >>      if ((opcode <= MAX_HCALL_OPCODE)
>> >>          && ((opcode & 0x3) == 0)) {
>> >>          spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
>> >>  
>> >>          if (fn) {
>> >> -            return fn(cpu, spapr, opcode, args);
>> >> +            qemu_mutex_lock_iothread();
>> >> +            ret = fn(cpu, spapr, opcode, args);
>> >> +            qemu_mutex_unlock_iothread();
>> >> +            return ret;
>> >>          }
>> >>      } else if ((opcode >= KVMPPC_HCALL_BASE) &&
>> >>                 (opcode <= KVMPPC_HCALL_MAX)) {
>> >>          spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
>> >>  
>> >>          if (fn) {
>> >> -            return fn(cpu, spapr, opcode, args);
>> >> +            qemu_mutex_lock_iothread();
>> >> +            ret = fn(cpu, spapr, opcode, args);
>> >> +            qemu_mutex_unlock_iothread();
>> >> +            return ret;
>> >>          }
>> >>      }
>> >>    
>> >
>> > This will serialize all hypercalls, even when it is not needed... Isn't that
>> > too much coarse grain locking ?  
>> 
>> You are right, I was thinking to do this only for emulation case, as
>> this is not needed for hardware acceleration.
>> 
>
> Yes, at the very least. And even in the MTTCG case, shouldn't we serialize only
> when we know I/O will actually happen ?

Yes, haven't figured out what all would need protection apart from I/O.
I have started with coarse grain locking and will start fine tuning,
once other issues are sorted out.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call
  2016-09-02 10:06   ` Thomas Huth
@ 2016-09-03 16:33     ` Nikunj A Dadhania
  0 siblings, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-03 16:33 UTC (permalink / raw)
  To: Thomas Huth, qemu-ppc, alex.bennee, david, rth; +Cc: qemu-devel

Thomas Huth <thuth@redhat.com> writes:

> On 02.09.2016 08:32, Nikunj A Dadhania wrote:
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_hcall.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index e5eca67..daea7a0 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -1075,20 +1075,27 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>>                               target_ulong *args)
>>  {
>>      sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +    target_ulong ret;
>>  
>>      if ((opcode <= MAX_HCALL_OPCODE)
>>          && ((opcode & 0x3) == 0)) {
>>          spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
>>  
>>          if (fn) {
>> -            return fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_lock_iothread();
>> +            ret = fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_unlock_iothread();
>> +            return ret;
>>          }
>>      } else if ((opcode >= KVMPPC_HCALL_BASE) &&
>>                 (opcode <= KVMPPC_HCALL_MAX)) {
>>          spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
>>  
>>          if (fn) {
>> -            return fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_lock_iothread();
>> +            ret = fn(cpu, spapr, opcode, args);
>> +            qemu_mutex_unlock_iothread();
>> +            return ret;
>>          }
>>      }
>
> I think this will cause a deadlock when running on KVM since the lock is
> already taken in kvm_arch_handle_exit() - which calls spapr_hypercall()!

Ouch, havent tried this branch yet on KVM :(
Will change to emulation only as suggested in my previous mails.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads
  2016-09-02 10:45       ` Greg Kurz
@ 2016-09-03 16:34         ` Nikunj A Dadhania
  0 siblings, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-03 16:34 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, alex.bennee, david, rth, qemu-devel

Greg Kurz <groug@kaod.org> writes:

> On Fri, 02 Sep 2016 15:04:47 +0530
> Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
>
>> Greg Kurz <groug@kaod.org> writes:
>> 
>> > On Fri,  2 Sep 2016 12:02:54 +0530
>> > Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> wrote:
>> >  
>> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> >> ---  
>> >
>> > Shouldn't this patch be the last one, when all other issues have been addressed ?
>> >  
>> >>  target-ppc/kvm.c     | 2 +-
>> >>  target-ppc/kvm_ppc.h | 2 +-
>> >>  2 files changed, 2 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> >> index dcb68b9..20eb450 100644
>> >> --- a/target-ppc/kvm.c
>> >> +++ b/target-ppc/kvm.c
>> >> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>> >>  
>> >>  int kvmppc_smt_threads(void)
>> >>  {
>> >> -    return cap_ppc_smt ? cap_ppc_smt : 1;
>> >> +    return cap_ppc_smt ? cap_ppc_smt : 8;  
>> >
>> > If KVM is there but does not support SMT processor modes, it looks
>> > wrong to return anything but 1. This check needs kvm_enabled().  
>> 
>> This also gets called when emulating PPC on PPC. 
>> 
>
> Yes and the current value of 1 is the default for non HV KVM and TCG. If
> you want to change the default for MTTCG, then you need separate paths.
>
>> > Also, why 8 ? This depends on the CPU model.  
>> 
>> Not sure if I need to emulate according to the host cpu model. I had
>> selected 8, as that was the highest number of threads possible for POWER.
>> 
>
> If running in full emulation and cpu type is POWER7, this shouldn't be
> higher than 4.
>
> In the end, something like:
>
> int kvmppc_smt_threads(void)
> {
>     if (kvm_enabled()) {
>         return cap_ppc_smt ? cap_ppc_smt : 1;
>     } else {
>         return 8_or_4_or_2_or_1_depending_on_the_cpu_model;
>     }
> }

Sure, will need something like this.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-02  7:34     ` Nikunj A Dadhania
@ 2016-09-04 17:00       ` Alex Bennée
  2016-09-04 22:17         ` Benjamin Herrenschmidt
  2016-09-05  0:10         ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 45+ messages in thread
From: Alex Bennée @ 2016-09-04 17:00 UTC (permalink / raw)
  To: Nikunj A Dadhania
  Cc: Benjamin Herrenschmidt, qemu-ppc, david, rth, qemu-devel


Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> writes:

> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>
>> On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
>>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>>> ---
>>> cputlb.c| 15 +++++++++++++++
>>> include/exec/exec-all.h |2 ++
>>> target-ppc/mmu-hash64.c |2 +-
>>> 3 files changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/cputlb.c b/cputlb.c
>>> index 64faf47..17ff58e 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -123,6 +123,21 @@ void tlb_flush(CPUState *cpu, int flush_global)
>>> }
>>> }
>>>
>>> +static void tlb_flush_all_async_work(CPUState *cpu, void *opaque)
>>> +{
>>> +tlb_flush_nocheck(cpu, GPOINTER_TO_INT(opaque));
>>> +}
>>> +
>>> +void tlb_flush_all(CPUState *cpu, int flush_global)
>>> +{
>>> +CPUState *c;
>>> +
>>> +CPU_FOREACH(c) {
>>> +async_run_on_cpu(c, tlb_flush_all_async_work,
>>> +GUINT_TO_POINTER(flush_global));
>>> +}
>>> +}
>>
>> Hrm... this is asynchronous?
>
> Yes.
>
>> It probably needs to be synchronous...
>
> I see run_on_cpu() which seems suitable.

I'm not so happy with run_on_cpu as it involves busy waiting for the
other CPU to finish.

>> We must provide a guarantee that no other processor can see the old
>> translation when the tlb invalidation sequence completes. With the
>> current lazy TLB flush, we already delay the invalidation until
>> we hit that synchronization point so we need to be synchronous.

When is the synchronisation point? On ARM we end the basic block on
system instructions that mess with the cache. As a result the flush is
done as soon as we exit the run loop on the next instruction.

>
>
>>> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
>>> index 8118143..d852c21 100644
>>> --- a/target-ppc/mmu-hash64.c
>>> +++ b/target-ppc/mmu-hash64.c
>>> @@ -912,7 +912,7 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
>>> * invalidate, and we still don't have a tlb_flush_mask(env, n,
>>> * mask) in QEMU, we just invalidate all TLBs
>>> */
>>> -tlb_flush(CPU(cpu), 1);
>>> +tlb_flush_all(CPU(cpu), 1);
>>> }
>>>
>>> void ppc_hash64_update_rmls(CPUPPCState *env)
>
> Regards,
> Nikunj


--
Alex Bennée

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-04 17:00       ` Alex Bennée
@ 2016-09-04 22:17         ` Benjamin Herrenschmidt
  2016-09-05  0:10         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-04 22:17 UTC (permalink / raw)
  To: Alex Bennée, Nikunj A Dadhania; +Cc: qemu-ppc, david, rth, qemu-devel

On Sun, 2016-09-04 at 18:00 +0100, Alex Bennée wrote:
> > 
> > > We must provide a guarantee that no other processor can see the old
> > > translation when the tlb invalidation sequence completes. With the
> > > current lazy TLB flush, we already delay the invalidation until
> > > we hit that synchronization point so we need to be synchronous.
> 
> When is the synchronisation point? On ARM we end the basic block on
> system instructions that mess with the cache. As a result the flush is
> done as soon as we exit the run loop on the next instruction.

Look for gen_check_tlb_flush() in translated code and check_tlb_flush
elsewhere in target-ppc.

Basically, when we hit tlbie or slbie (TLB or segment invalidation
instruction), we just set a flag indicating that the TLB "needs
flushing".

When we hit an execution synchronizing instruction (isync) or a
ptesync, or if we hit an exception, we do the actual flush.

This isn't 100% architecturally correct but work with every OS out there
and saves quite a bit of churn, especially on context switch when we
invalidate/replae multiple segments or when invalidating range of pages.

In any case, ptesync especially needs to be the hard sync point, past that
point all translation must have been gone and all accesses using the previous
transltion completed or retried on all processors.

Another approach would be to shoot asynchronous event on the actual tlbie/
slbie instructions and synchronize at the end, but I suspect it won't be
any better, especially since the current code structure can't do fine grained
invalidation of the qemu TLB anyway, we can only blow it all up.

So better safe than sorry here.

That being said, your statement about basic block confuses me a bit. You
mean MT TCG will sync all the threads when exiting a basic block on any CPU ?
 
Cheers,
Ben.

> > 
> > 
> > 
> > > 
> > > > 
> > > > diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> > > > index 8118143..d852c21 100644
> > > > --- a/target-ppc/mmu-hash64.c
> > > > +++ b/target-ppc/mmu-hash64.c
> > > > @@ -912,7 +912,7 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
> > > > * invalidate, and we still don't have a tlb_flush_mask(env, n,
> > > > * mask) in QEMU, we just invalidate all TLBs
> > > > */
> > > > -tlb_flush(CPU(cpu), 1);
> > > > +tlb_flush_all(CPU(cpu), 1);
> > > > }
> > > > 
> > > > void ppc_hash64_update_rmls(CPUPPCState *env)
> > 
> > Regards,
> > Nikunj
> 
> 
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-04 17:00       ` Alex Bennée
  2016-09-04 22:17         ` Benjamin Herrenschmidt
@ 2016-09-05  0:10         ` Benjamin Herrenschmidt
  2016-09-06  1:55           ` Nikunj A Dadhania
  1 sibling, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-05  0:10 UTC (permalink / raw)
  To: Alex Bennée, Nikunj A Dadhania; +Cc: qemu-ppc, david, rth, qemu-devel

On Sun, 2016-09-04 at 18:00 +0100, Alex Bennée wrote:

> When is the synchronisation point? On ARM we end the basic block on
> system instructions that mess with the cache. As a result the flush
> is done as soon as we exit the run loop on the next instruction.

Talking o this... Nikunj, I notice, all our TLB flushing is only ever
done on the "current" CPU. I mean today, without MT-TCG. That looks
broken already isn't it ?

Looking at ARM, they do this:

/* IS variants of TLB operations must affect all cores */
static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                             uint64_t value)
{
    CPUState *other_cs;

    CPU_FOREACH(other_cs) {
        tlb_flush(other_cs, 1);
    }
}

I wonder if we should audit all tlb_flush() calls in target-ppc to
do the right thing as well ?

Something like the (untested, not even compiled as I have to run) patch
below.

Now to do things a bit better, we could split the check_tlb_flush() helper
(or pass an argument) to tell it whether to check/flush other CPUs or not.

All the slb operations and tlbiel only need to affect the current CPU, but
broadcast tlbie's (and thus H_REMOVE) should have a global affect. We could
add another flag to the env in addition to the tlb_need_flush, something like
tlb_need_global_flush which is set on tlbie instructions to inform
check_tlb_flush what to do.

diff --git a/roms/SLOF b/roms/SLOF
--- a/roms/SLOF
+++ b/roms/SLOF
@@ -1 +1 @@
-Subproject commit e3d05727a074619fc12d0a67f05cf2c42c875cce
+Subproject commit e3d05727a074619fc12d0a67f05cf2c42c875cce-dirty
diff --git a/roms/openbios b/roms/openbios
index e79bca6..46ee135 160000
--- a/roms/openbios
+++ b/roms/openbios
@@ -1 +1 @@
-Subproject commit e79bca64838c96ec44fd7acd508879c5284233dd
+Subproject commit 46ee1352c50aa891e3dce9b3e3428ae9a5703fbe-dirty
diff --git a/roms/seabios b/roms/seabios
--- a/roms/seabios
+++ b/roms/seabios
@@ -1 +1 @@
-Subproject commit e2fc41e24ee0ada60fc511d60b15a41b294538be
+Subproject commit e2fc41e24ee0ada60fc511d60b15a41b294538be-dirty
diff --git a/roms/vgabios b/roms/vgabios
--- a/roms/vgabios
+++ b/roms/vgabios
@@ -1 +1 @@
-Subproject commit 19ea12c230ded95928ecaef0db47a82231c2e485
+Subproject commit 19ea12c230ded95928ecaef0db47a82231c2e485-dirty
diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
index 0ee0e5a..f2302ec 100644
--- a/target-ppc/excp_helper.c
+++ b/target-ppc/excp_helper.c
@@ -959,8 +959,13 @@ static inline void do_rfi(CPUPPCState *env, target_ulong nip, target_ulong msr)
 {
     CPUState *cs = CPU(ppc_env_get_cpu(env));
 
-    /* MSR:POW cannot be set by any form of rfi */
-    msr &= ~(1ULL << MSR_POW);
+    /* These bits cannot be set by RFI on non-BookE systems and so must
+     * be filtered out. 6xx and 7xxx with SW TLB management will put
+     * TLB related junk in there among other things.
+     */
+    if (!(env->excp_model & POWERPC_EXCP_BOOKE)) {
+            msr &= ~(target_ulong)0xf0000;
+    }
 
 #if defined(TARGET_PPC64)
     /* Switching to 32-bit ? Crop the nip */
@@ -990,7 +995,6 @@ void helper_rfi(CPUPPCState *env)
     do_rfi(env, env->spr[SPR_SRR0], env->spr[SPR_SRR1] & 0xfffffffful);
 }
 
-#define MSR_BOOK3S_MASK
 #if defined(TARGET_PPC64)
 void helper_rfid(CPUPPCState *env)
 {
diff --git a/target-ppc/helper_regs.h b/target-ppc/helper_regs.h
index 3d279f1..f3eb21d 100644
--- a/target-ppc/helper_regs.h
+++ b/target-ppc/helper_regs.h
@@ -156,10 +156,15 @@ static inline int hreg_store_msr(CPUPPCState *env, target_ulong value,
 #if !defined(CONFIG_USER_ONLY)
 static inline void check_tlb_flush(CPUPPCState *env)
 {
-    CPUState *cs = CPU(ppc_env_get_cpu(env));
-    if (env->tlb_need_flush) {
-        env->tlb_need_flush = 0;
-        tlb_flush(cs, 1);
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        CPUPPCState *env = &cpu->env;
+        if (env->tlb_need_flush) {
+            env->tlb_need_flush = 0;
+            tlb_flush(cs, 1);
+        }
     }
 }
 #else
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 8118143..a76c92b 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -907,12 +907,16 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
                                target_ulong pte_index,
                                target_ulong pte0, target_ulong pte1)
 {
-    /*
-     * XXX: given the fact that there are too many segments to
-     * invalidate, and we still don't have a tlb_flush_mask(env, n,
-     * mask) in QEMU, we just invalidate all TLBs
-     */
-    tlb_flush(CPU(cpu), 1);
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        /*
+         * XXX: given the fact that there are too many segments to
+         * invalidate, and we still don't have a tlb_flush_mask(env, n,
+         * mask) in QEMU, we just invalidate all TLBs
+         */
+        tlb_flush(cs, 1);
+    }
 }
 
 void ppc_hash64_update_rmls(CPUPPCState *env)
diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 696bb03..1d84fc4 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -2758,6 +2758,7 @@ static inline void booke206_invalidate_ea_tlb(CPUPPCState *env, int tlbn,
 void helper_booke206_tlbivax(CPUPPCState *env, target_ulong address)
 {
     PowerPCCPU *cpu = ppc_env_get_cpu(env);
+    CPUState *other_cs;
 
     if (address & 0x4) {
         /* flush all entries */
@@ -2774,11 +2775,15 @@ void helper_booke206_tlbivax(CPUPPCState *env, target_ulong address)
     if (address & 0x8) {
         /* flush TLB1 entries */
         booke206_invalidate_ea_tlb(env, 1, address);
-        tlb_flush(CPU(cpu), 1);
+        CPU_FOREACH(other_cs) {
+            tlb_flush(other_cs, 1);
+        }
     } else {
         /* flush TLB0 entries */
         booke206_invalidate_ea_tlb(env, 0, address);
-        tlb_flush_page(CPU(cpu), address & MAS2_EPN_MASK);
+        CPU_FOREACH(other_cs) {
+            tlb_flush_page(other_cs, address & MAS2_EPN_MASK);
+        }
     }
 }


Cheers,
Ben.

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-05  0:10         ` Benjamin Herrenschmidt
@ 2016-09-06  1:55           ` Nikunj A Dadhania
  2016-09-06  3:05             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-06  1:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alex Bennée; +Cc: qemu-ppc, david, rth, qemu-devel

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Sun, 2016-09-04 at 18:00 +0100, Alex Bennée wrote:
>
>> When is the synchronisation point? On ARM we end the basic block on
>> system instructions that mess with the cache. As a result the flush
>> is done as soon as we exit the run loop on the next instruction.
>
> Talking o this... Nikunj, I notice, all our TLB flushing is only ever
> done on the "current" CPU. I mean today, without MT-TCG. That looks
> broken already isn't it ?

Without MT-TCG, there was only one cpu, so I think we never hit that
issue.

>
> Looking at ARM, they do this:
>
> /* IS variants of TLB operations must affect all cores */
> static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                              uint64_t value)
> {
>     CPUState *other_cs;
>
>     CPU_FOREACH(other_cs) {
>         tlb_flush(other_cs, 1);
>     }
> }
>
> I wonder if we should audit all tlb_flush() calls in target-ppc to
> do the right thing as well ?
>
> Something like the (untested, not even compiled as I have to run) patch
> below.
>
> Now to do things a bit better, we could split the check_tlb_flush() helper
> (or pass an argument) to tell it whether to check/flush other CPUs or not.
>
> All the slb operations and tlbiel only need to affect the current CPU, but
> broadcast tlbie's (and thus H_REMOVE) should have a global affect. We could
> add another flag to the env in addition to the tlb_need_flush, something like
> tlb_need_global_flush which is set on tlbie instructions to inform
> check_tlb_flush what to do.
>
> diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
> index 0ee0e5a..f2302ec 100644
> --- a/target-ppc/excp_helper.c
> +++ b/target-ppc/excp_helper.c
> @@ -959,8 +959,13 @@ static inline void do_rfi(CPUPPCState *env, target_ulong nip, target_ulong msr)
>  {
>      CPUState *cs = CPU(ppc_env_get_cpu(env));
>
> -    /* MSR:POW cannot be set by any form of rfi */
> -    msr &= ~(1ULL << MSR_POW);
> +    /* These bits cannot be set by RFI on non-BookE systems and so must
> +     * be filtered out. 6xx and 7xxx with SW TLB management will put
> +     * TLB related junk in there among other things.
> +     */
> +    if (!(env->excp_model & POWERPC_EXCP_BOOKE)) {
> +            msr &= ~(target_ulong)0xf0000;
> +    }
>
>  #if defined(TARGET_PPC64)
>      /* Switching to 32-bit ? Crop the nip */
> @@ -990,7 +995,6 @@ void helper_rfi(CPUPPCState *env)
>      do_rfi(env, env->spr[SPR_SRR0], env->spr[SPR_SRR1] & 0xfffffffful);
>  }
>
> -#define MSR_BOOK3S_MASK
>  #if defined(TARGET_PPC64)
>  void helper_rfid(CPUPPCState *env)
>  {
> diff --git a/target-ppc/helper_regs.h b/target-ppc/helper_regs.h
> index 3d279f1..f3eb21d 100644
> --- a/target-ppc/helper_regs.h
> +++ b/target-ppc/helper_regs.h
> @@ -156,10 +156,15 @@ static inline int hreg_store_msr(CPUPPCState *env, target_ulong value,
>  #if !defined(CONFIG_USER_ONLY)
>  static inline void check_tlb_flush(CPUPPCState *env)
>  {
> -    CPUState *cs = CPU(ppc_env_get_cpu(env));
> -    if (env->tlb_need_flush) {
> -        env->tlb_need_flush = 0;
> -        tlb_flush(cs, 1);
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +        CPUPPCState *env = &cpu->env;
> +        if (env->tlb_need_flush) {
> +            env->tlb_need_flush = 0;
> +            tlb_flush(cs, 1);
> +        }
>      }
>  }
>  #else
> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> index 8118143..a76c92b 100644
> --- a/target-ppc/mmu-hash64.c
> +++ b/target-ppc/mmu-hash64.c
> @@ -907,12 +907,16 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu,
>                                 target_ulong pte_index,
>                                 target_ulong pte0, target_ulong pte1)
>  {
> -    /*
> -     * XXX: given the fact that there are too many segments to
> -     * invalidate, and we still don't have a tlb_flush_mask(env, n,
> -     * mask) in QEMU, we just invalidate all TLBs
> -     */
> -    tlb_flush(CPU(cpu), 1);
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        /*
> +         * XXX: given the fact that there are too many segments to
> +         * invalidate, and we still don't have a tlb_flush_mask(env, n,
> +         * mask) in QEMU, we just invalidate all TLBs
> +         */
> +        tlb_flush(cs, 1);
> +    }
>  }
>
>  void ppc_hash64_update_rmls(CPUPPCState *env)
> diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
> index 696bb03..1d84fc4 100644
> --- a/target-ppc/mmu_helper.c
> +++ b/target-ppc/mmu_helper.c
> @@ -2758,6 +2758,7 @@ static inline void booke206_invalidate_ea_tlb(CPUPPCState *env, int tlbn,
>  void helper_booke206_tlbivax(CPUPPCState *env, target_ulong address)
>  {
>      PowerPCCPU *cpu = ppc_env_get_cpu(env);
> +    CPUState *other_cs;
>
>      if (address & 0x4) {
>          /* flush all entries */
> @@ -2774,11 +2775,15 @@ void helper_booke206_tlbivax(CPUPPCState *env, target_ulong address)
>      if (address & 0x8) {
>          /* flush TLB1 entries */
>          booke206_invalidate_ea_tlb(env, 1, address);
> -        tlb_flush(CPU(cpu), 1);
> +        CPU_FOREACH(other_cs) {
> +            tlb_flush(other_cs, 1);
> +        }
>      } else {
>          /* flush TLB0 entries */
>          booke206_invalidate_ea_tlb(env, 0, address);
> -        tlb_flush_page(CPU(cpu), address & MAS2_EPN_MASK);
> +        CPU_FOREACH(other_cs) {
> +            tlb_flush_page(other_cs, address & MAS2_EPN_MASK);
> +        }
>      }
>  }

Sure, will have a round of testing.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-06  1:55           ` Nikunj A Dadhania
@ 2016-09-06  3:05             ` Benjamin Herrenschmidt
  2016-09-06  4:53               ` Nikunj A Dadhania
  0 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-06  3:05 UTC (permalink / raw)
  To: Nikunj A Dadhania, Alex Bennée; +Cc: qemu-ppc, david, rth, qemu-devel

On Tue, 2016-09-06 at 07:25 +0530, Nikunj A Dadhania wrote:
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > 
> > On Sun, 2016-09-04 at 18:00 +0100, Alex Bennée wrote:
> > 
> > > 
> > > When is the synchronisation point? On ARM we end the basic block on
> > > system instructions that mess with the cache. As a result the flush
> > > is done as soon as we exit the run loop on the next instruction.
> > 
> > Talking o this... Nikunj, I notice, all our TLB flushing is only ever
> > done on the "current" CPU. I mean today, without MT-TCG. That looks
> > broken already isn't it ?
> 
> Without MT-TCG, there was only one cpu, so I think we never hit that
> issue.

No there isn't. You can start qemu with --smp 4 and have 4 CPUs. It
will alternate between them, but they *will* have differrent TLBs.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-06  3:05             ` Benjamin Herrenschmidt
@ 2016-09-06  4:53               ` Nikunj A Dadhania
  2016-09-06  5:30                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-06  4:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alex Bennée; +Cc: qemu-ppc, david, rth, qemu-devel

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Tue, 2016-09-06 at 07:25 +0530, Nikunj A Dadhania wrote:
>> > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>> 
>> > 
>> > On Sun, 2016-09-04 at 18:00 +0100, Alex Bennée wrote:
>> > 
>> > > 
>> > > When is the synchronisation point? On ARM we end the basic block on
>> > > system instructions that mess with the cache. As a result the flush
>> > > is done as soon as we exit the run loop on the next instruction.
>> > 
>> > Talking o this... Nikunj, I notice, all our TLB flushing is only ever
>> > done on the "current" CPU. I mean today, without MT-TCG. That looks
>> > broken already isn't it ?
>> 
>> Without MT-TCG, there was only one cpu, so I think we never hit that
>> issue.
>
> No there isn't. You can start qemu with --smp 4 and have 4 CPUs.

That case was prevented to even start in case of TCG. That is why I had
to add "target-ppc: with MTTCG report more threads"

> It will alternate between them, but they *will* have differrent TLBs.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-06  4:53               ` Nikunj A Dadhania
@ 2016-09-06  5:30                 ` Benjamin Herrenschmidt
  2016-09-06  6:57                   ` Nikunj A Dadhania
  0 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-06  5:30 UTC (permalink / raw)
  To: Nikunj A Dadhania, Alex Bennée; +Cc: qemu-ppc, david, rth, qemu-devel

On Tue, 2016-09-06 at 10:23 +0530, Nikunj A Dadhania wrote:
> >
> > No there isn't. You can start qemu with --smp 4 and have 4 CPUs.
> 
> That case was prevented to even start in case of TCG. That is why I had
> to add "target-ppc: with MTTCG report more threads"

No, it works, you are confusing cores and threads ... you always could
start TCG with multiple cores. Try the powernv or the mac model...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu
  2016-09-06  5:30                 ` Benjamin Herrenschmidt
@ 2016-09-06  6:57                   ` Nikunj A Dadhania
  0 siblings, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-06  6:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alex Bennée; +Cc: qemu-ppc, david, rth, qemu-devel

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Tue, 2016-09-06 at 10:23 +0530, Nikunj A Dadhania wrote:
>> >
>> > No there isn't. You can start qemu with --smp 4 and have 4 CPUs.
>> 
>> That case was prevented to even start in case of TCG. That is why I had
>> to add "target-ppc: with MTTCG report more threads"
>
> No, it works, you are confusing cores and threads ... you always could
> start TCG with multiple cores. Try the powernv or the mac model...

Yes, your right, forgot about the multi core case.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads Nikunj A Dadhania
  2016-09-02  9:28   ` Greg Kurz
@ 2016-09-07  3:51   ` David Gibson
  2016-09-07  4:41     ` Nikunj A Dadhania
  1 sibling, 1 reply; 45+ messages in thread
From: David Gibson @ 2016-09-07  3:51 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, alex.bennee, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]

On Fri, Sep 02, 2016 at 12:02:54PM +0530, Nikunj A Dadhania wrote:
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/kvm.c     | 2 +-
>  target-ppc/kvm_ppc.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index dcb68b9..20eb450 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  
>  int kvmppc_smt_threads(void)
>  {
> -    return cap_ppc_smt ? cap_ppc_smt : 1;
> +    return cap_ppc_smt ? cap_ppc_smt : 8;
>  }
>  
>  #ifdef TARGET_PPC64
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 5461d10..053db0a 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>  
>  static inline int kvmppc_smt_threads(void)
>  {
> -    return 1;
> +    return 8;
>  }
>  
>  static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)

This doesn't make sense to me.  Allowing multiple hardware threads to
be emulated in TCG (which I think will require a bunch of fixes) seems
like a separate question to MTTCG - i.e. allowing separate vcpus to be
TCG emulated in separate host threads.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation Nikunj A Dadhania
@ 2016-09-07  4:02   ` David Gibson
  2016-09-07  4:47     ` Nikunj A Dadhania
  0 siblings, 1 reply; 45+ messages in thread
From: David Gibson @ 2016-09-07  4:02 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, alex.bennee, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3682 bytes --]

On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

This really needs a comment indicating that this implementation isn't
strictly correct (although probably good enough in practice).
Specifically a racing store which happens to store the same value
which was already in memory should clobber the reservation, but won't
with this implementation.

I had a long discussion at KVM Forum with Emilio Costa about this, in
which I discovered just how hard it is to strictly implement
store-conditional semantics in terms of anything else.  So, this is
probably a reasonable substitute, but we should note the fact that
it's not 100%.

> ---
>  target-ppc/translate.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 4a882b3..447c13e 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -72,6 +72,7 @@ static TCGv cpu_cfar;
>  #endif
>  static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca;
>  static TCGv cpu_reserve;
> +static TCGv cpu_reserve_val;
>  static TCGv cpu_fpscr;
>  static TCGv_i32 cpu_access_type;
>  
> @@ -176,6 +177,9 @@ void ppc_translate_init(void)
>      cpu_reserve = tcg_global_mem_new(cpu_env,
>                                       offsetof(CPUPPCState, reserve_addr),
>                                       "reserve_addr");
> +    cpu_reserve_val = tcg_global_mem_new(cpu_env,
> +                                     offsetof(CPUPPCState, reserve_val),
> +                                     "reserve_val");
>  
>      cpu_fpscr = tcg_global_mem_new(cpu_env,
>                                     offsetof(CPUPPCState, fpscr), "fpscr");
> @@ -3086,7 +3090,7 @@ static void gen_##name(DisasContext *ctx)                            \
>      }                                                                \
>      tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);                \
>      tcg_gen_mov_tl(cpu_reserve, t0);                                 \
> -    tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \
> +    tcg_gen_mov_tl(cpu_reserve_val, gpr);                            \
>      tcg_temp_free(t0);                                               \
>  }
>  
> @@ -3112,14 +3116,28 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
>                                    int reg, int memop)
>  {
>      TCGLabel *l1;
> +    TCGv_i32 tmp = tcg_temp_local_new_i32();
> +    TCGv t0;
>  
> +    tcg_gen_movi_i32(tmp, 0);
>      tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
>      l1 = gen_new_label();
>      tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
> -    tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
> -    tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
> +
> +    t0 = tcg_temp_new();
> +    tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
> +                              ctx->mem_idx, DEF_MEMOP(memop));
> +    tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
> +    tcg_gen_trunc_tl_i32(tmp, t0);
> +
>      gen_set_label(l1);
> +    tcg_gen_shli_i32(tmp, tmp, CRF_EQ);
> +    tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp);
>      tcg_gen_movi_tl(cpu_reserve, -1);
> +    tcg_gen_movi_tl(cpu_reserve_val, 0);
> +
> +    tcg_temp_free(t0);
> +    tcg_temp_free_i32(tmp);
>  }
>  #endif
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-02  7:19 ` Benjamin Herrenschmidt
  2016-09-02  7:39   ` Nikunj A Dadhania
@ 2016-09-07  4:08   ` David Gibson
  2016-09-07  4:48     ` Nikunj A Dadhania
  1 sibling, 1 reply; 45+ messages in thread
From: David Gibson @ 2016-09-07  4:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Nikunj A Dadhania, qemu-ppc, alex.bennee, rth, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

On Fri, Sep 02, 2016 at 05:19:21PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
> > The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
> > Changes that were needed to enable PowerPC are pretty simple;
> > 
> > Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
> >       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
> 
> If we do this, we need to implement the shared SPRs properly and the
> inter-thread doorbells...

Exactly.  I think we want to get MTTCG working with multiple emulated
cores first, then look into handling emulated hardware
multi-threading.  2/4 suggests you are conflating the two, when
they're really not related AFAICT.

> 
> >       03: Use atomic_cmpxchg in store conditional
> >       04: With more threads, flush the entry from each cpu. 
> > 	  This can be optimized further.
> > 
> > The patches are based on the Alex Bennee's base enabling patches for 
> > MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
> > above patches is here:
> > 
> > https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
> > 
> > Apart from the above, PPC patches are based out of ppc-for-2.8 and 
> > load/store consolidation patches [2]
> > 
> > Series with all dependent patches available here: 
> > https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1
> > 
> > Testing: 
> > ========
> > 
> > -smp 4,cores=1,threads=4 -accel tcg,thread=multi
> > 
> > TODO
> > ====
> > Implement msgsndp instructions(door-bell), newer kernels enable it 
> > depending on the PVR. I have been using following workaround to boot.
> > https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4
> > 
> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg391966.html
> > [2] https://lists.gnu.org/archive/html/qemu-ppc/2016-08/msg00265.html
> > 
> > Nikunj A Dadhania (4):
> >   spapr-hcall: take iothread lock during handler call
> >   target-ppc: with MTTCG report more threads
> >   target-ppc: use atomic_cmpxchg for ld/st reservation
> >   target-ppc: flush tlb from all the cpu
> > 
> >  cputlb.c                | 15 +++++++++++++++
> >  hw/ppc/spapr_hcall.c    | 11 +++++++++--
> >  include/exec/exec-all.h |  2 ++
> >  target-ppc/kvm.c        |  2 +-
> >  target-ppc/kvm_ppc.h    |  2 +-
> >  target-ppc/mmu-hash64.c |  2 +-
> >  target-ppc/translate.c  | 24 +++++++++++++++++++++---
> >  7 files changed, 50 insertions(+), 8 deletions(-)
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads
  2016-09-07  3:51   ` David Gibson
@ 2016-09-07  4:41     ` Nikunj A Dadhania
  0 siblings, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-07  4:41 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, alex.bennee, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Fri, Sep 02, 2016 at 12:02:54PM +0530, Nikunj A Dadhania wrote:
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  target-ppc/kvm.c     | 2 +-
>>  target-ppc/kvm_ppc.h | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index dcb68b9..20eb450 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>  
>>  int kvmppc_smt_threads(void)
>>  {
>> -    return cap_ppc_smt ? cap_ppc_smt : 1;
>> +    return cap_ppc_smt ? cap_ppc_smt : 8;
>>  }
>>  
>>  #ifdef TARGET_PPC64
>> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
>> index 5461d10..053db0a 100644
>> --- a/target-ppc/kvm_ppc.h
>> +++ b/target-ppc/kvm_ppc.h
>> @@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy)
>>  
>>  static inline int kvmppc_smt_threads(void)
>>  {
>> -    return 1;
>> +    return 8;
>>  }
>>  
>>  static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)
>
> This doesn't make sense to me.  Allowing multiple hardware threads to
> be emulated in TCG (which I think will require a bunch of fixes) seems
> like a separate question to MTTCG - i.e. allowing separate vcpus to be
> TCG emulated in separate host threads.

Right, I was planning to drop this for now and introduce this later. As
noted in the other thread, i did not consider the multi-core case.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-07  4:02   ` David Gibson
@ 2016-09-07  4:47     ` Nikunj A Dadhania
  2016-09-07  5:24       ` Benjamin Herrenschmidt
  2016-09-07  5:34       ` David Gibson
  0 siblings, 2 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-07  4:47 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, alex.bennee, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>
> This really needs a comment indicating that this implementation isn't
> strictly correct (although probably good enough in practice).

Sure. And it also does not help if someone uses any store other than
store conditional, that isn't taken care.

Assumption here is the locking primitives use load with reservation and
store conditional. And no other ld/st variant touch this memory.

> Specifically a racing store which happens to store the same value
> which was already in memory should clobber the reservation, but won't
> with this implementation.
>
> I had a long discussion at KVM Forum with Emilio Costa about this, in
> which I discovered just how hard it is to strictly implement
> store-conditional semantics in terms of anything else.  So, this is
> probably a reasonable substitute, but we should note the fact that
> it's not 100%.

I will update the commit log.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC
  2016-09-07  4:08   ` David Gibson
@ 2016-09-07  4:48     ` Nikunj A Dadhania
  0 siblings, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-07  4:48 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt
  Cc: qemu-ppc, alex.bennee, rth, qemu-devel

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Fri, Sep 02, 2016 at 05:19:21PM +1000, Benjamin Herrenschmidt wrote:
>> On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
>> > The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
>> > Changes that were needed to enable PowerPC are pretty simple;
>> > 
>> > Patch 01: Take a iothread lock during hcall, as hcall can generate io requests
>> >       02: For TCG, we were harcoding smt as 1, this gets rid of the limitation
>> 
>> If we do this, we need to implement the shared SPRs properly and the
>> inter-thread doorbells...
>
> Exactly.  I think we want to get MTTCG working with multiple emulated
> cores first, then look into handling emulated hardware
> multi-threading.  2/4 suggests you are conflating the two, when
> they're really not related AFAICT.

Sure. Will do that in my next version.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-07  4:47     ` Nikunj A Dadhania
@ 2016-09-07  5:24       ` Benjamin Herrenschmidt
  2016-09-07  8:42         ` Nikunj A Dadhania
  2016-09-07  5:34       ` David Gibson
  1 sibling, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-07  5:24 UTC (permalink / raw)
  To: Nikunj A Dadhania, David Gibson; +Cc: qemu-ppc, alex.bennee, rth, qemu-devel

On Wed, 2016-09-07 at 10:17 +0530, Nikunj A Dadhania wrote:
> > David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > 
> > [ Unknown signature status ]
> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
> > > 
> > > > > > Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> > 
> > This really needs a comment indicating that this implementation isn't
> > strictly correct (although probably good enough in practice).
> 
> Sure. And it also does not help if someone uses any store other than
> store conditional, that isn't taken care.
> 
> Assumption here is the locking primitives use load with reservation and
> store conditional. And no other ld/st variant touch this memory.

This is an incorrect assumption. spin_unlock for example uses a normal
store.

That being said, you will observe the difference in value which should
hopefully make things work...

I *hope* we don't have anything that relies on a normal store of the same
value as the atomic breaking the reservation, I *think* we might get away
with it, but it is indeed fishy.

> > Specifically a racing store which happens to store the same value
> > which was already in memory should clobber the reservation, but won't
> > with this implementation.
> > 
> > I had a long discussion at KVM Forum with Emilio Costa about this, in
> > which I discovered just how hard it is to strictly implement
> > store-conditional semantics in terms of anything else.  So, this is
> > probably a reasonable substitute, but we should note the fact that
> > it's not 100%.
> 
> I will update the commit log.
> 
> Regards,
> Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-07  4:47     ` Nikunj A Dadhania
  2016-09-07  5:24       ` Benjamin Herrenschmidt
@ 2016-09-07  5:34       ` David Gibson
  2016-09-07  7:13         ` Alex Bennée
  1 sibling, 1 reply; 45+ messages in thread
From: David Gibson @ 2016-09-07  5:34 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, alex.bennee, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

On Wed, Sep 07, 2016 at 10:17:42AM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > [ Unknown signature status ]
> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> >
> > This really needs a comment indicating that this implementation isn't
> > strictly correct (although probably good enough in practice).
> 
> Sure. And it also does not help if someone uses any store other than
> store conditional, that isn't taken care.
> 
> Assumption here is the locking primitives use load with reservation and
> store conditional. And no other ld/st variant touch this memory.

So, a) I don't think this really relies on that: an ordinary store
(assuming it changes the value) will still get picked up the cmpxchg.
Well.. at least after a suitable memory barrier.  Matching memory
models between emulated and host cpus is a whole other kettle of fish.

I think this does matter, IIRC a kernel spin unlock on ppc is a
barrier + plain store, no load locked or store conditional.

> > Specifically a racing store which happens to store the same value
> > which was already in memory should clobber the reservation, but won't
> > with this implementation.
> >
> > I had a long discussion at KVM Forum with Emilio Costa about this, in
> > which I discovered just how hard it is to strictly implement
> > store-conditional semantics in terms of anything else.  So, this is
> > probably a reasonable substitute, but we should note the fact that
> > it's not 100%.
> 
> I will update the commit log.
> 
> Regards,
> Nikunj
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-07  5:34       ` David Gibson
@ 2016-09-07  7:13         ` Alex Bennée
  2016-09-12  1:19           ` David Gibson
  0 siblings, 1 reply; 45+ messages in thread
From: Alex Bennée @ 2016-09-07  7:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Nikunj A Dadhania, qemu-ppc, rth, qemu-devel, benh


David Gibson <david@gibson.dropbear.id.au> writes:

> On Wed, Sep 07, 2016 at 10:17:42AM +0530, Nikunj A Dadhania wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>>
>> > [ Unknown signature status ]
>> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
>> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> >
>> > This really needs a comment indicating that this implementation isn't
>> > strictly correct (although probably good enough in practice).
>>
>> Sure. And it also does not help if someone uses any store other than
>> store conditional, that isn't taken care.
>>
>> Assumption here is the locking primitives use load with reservation and
>> store conditional. And no other ld/st variant touch this memory.
>
> So, a) I don't think this really relies on that: an ordinary store
> (assuming it changes the value) will still get picked up the cmpxchg.
> Well.. at least after a suitable memory barrier.  Matching memory
> models between emulated and host cpus is a whole other kettle of fish.

Have you seen Pranith's memory barrier patches?

>
> I think this does matter, IIRC a kernel spin unlock on ppc is a
> barrier + plain store, no load locked or store conditional.
>
>> > Specifically a racing store which happens to store the same value
>> > which was already in memory should clobber the reservation, but won't
>> > with this implementation.
>> >
>> > I had a long discussion at KVM Forum with Emilio Costa about this, in
>> > which I discovered just how hard it is to strictly implement
>> > store-conditional semantics in terms of anything else.  So, this is
>> > probably a reasonable substitute, but we should note the fact that
>> > it's not 100%.
>>
>> I will update the commit log.
>>
>> Regards,
>> Nikunj
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-07  5:24       ` Benjamin Herrenschmidt
@ 2016-09-07  8:42         ` Nikunj A Dadhania
  0 siblings, 0 replies; 45+ messages in thread
From: Nikunj A Dadhania @ 2016-09-07  8:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson
  Cc: qemu-ppc, alex.bennee, rth, qemu-devel

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Wed, 2016-09-07 at 10:17 +0530, Nikunj A Dadhania wrote:
>> > David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > 
>> > [ Unknown signature status ]
>> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
>> > > 
>> > > > > > Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> > 
>> > This really needs a comment indicating that this implementation isn't
>> > strictly correct (although probably good enough in practice).
>> 
>> Sure. And it also does not help if someone uses any store other than
>> store conditional, that isn't taken care.
>> 
>> Assumption here is the locking primitives use load with reservation and
>> store conditional. And no other ld/st variant touch this memory.
>
> This is an incorrect assumption. spin_unlock for example uses a normal
> store.
>
> That being said, you will observe the difference in value which should
> hopefully make things work...
>
> I *hope* we don't have anything that relies on a normal store of the same
> value as the atomic breaking the reservation, I *think* we might get away
> with it, but it is indeed fishy.

In arch/powerpc/include/asm/spinlock.h: 

We have __arch_spin_trylock() which uses lwarx and "stwcx." instructions
to acquire lock. And later during the unlock, a normal store will do.
IMHO, that is the reason for it work.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-07  7:13         ` Alex Bennée
@ 2016-09-12  1:19           ` David Gibson
  2016-09-12  8:39             ` Alex Bennée
  0 siblings, 1 reply; 45+ messages in thread
From: David Gibson @ 2016-09-12  1:19 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Nikunj A Dadhania, qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

On Wed, Sep 07, 2016 at 08:13:31AM +0100, Alex Bennée wrote:
> 
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Wed, Sep 07, 2016 at 10:17:42AM +0530, Nikunj A Dadhania wrote:
> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >>
> >> > [ Unknown signature status ]
> >> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
> >> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> >> >
> >> > This really needs a comment indicating that this implementation isn't
> >> > strictly correct (although probably good enough in practice).
> >>
> >> Sure. And it also does not help if someone uses any store other than
> >> store conditional, that isn't taken care.
> >>
> >> Assumption here is the locking primitives use load with reservation and
> >> store conditional. And no other ld/st variant touch this memory.
> >
> > So, a) I don't think this really relies on that: an ordinary store
> > (assuming it changes the value) will still get picked up the cmpxchg.
> > Well.. at least after a suitable memory barrier.  Matching memory
> > models between emulated and host cpus is a whole other kettle of fish.
> 
> Have you seen Pranith's memory barrier patches?

I have not.

> 
> >
> > I think this does matter, IIRC a kernel spin unlock on ppc is a
> > barrier + plain store, no load locked or store conditional.
> >
> >> > Specifically a racing store which happens to store the same value
> >> > which was already in memory should clobber the reservation, but won't
> >> > with this implementation.
> >> >
> >> > I had a long discussion at KVM Forum with Emilio Costa about this, in
> >> > which I discovered just how hard it is to strictly implement
> >> > store-conditional semantics in terms of anything else.  So, this is
> >> > probably a reasonable substitute, but we should note the fact that
> >> > it's not 100%.
> >>
> >> I will update the commit log.
> >>
> >> Regards,
> >> Nikunj
> >>
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-12  1:19           ` David Gibson
@ 2016-09-12  8:39             ` Alex Bennée
  2016-09-12  9:15               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 45+ messages in thread
From: Alex Bennée @ 2016-09-12  8:39 UTC (permalink / raw)
  To: David Gibson; +Cc: Nikunj A Dadhania, qemu-ppc, rth, qemu-devel, benh


David Gibson <david@gibson.dropbear.id.au> writes:

> On Wed, Sep 07, 2016 at 08:13:31AM +0100, Alex Bennée wrote:
>>
>> David Gibson <david@gibson.dropbear.id.au> writes:
>>
>> > On Wed, Sep 07, 2016 at 10:17:42AM +0530, Nikunj A Dadhania wrote:
>> >> David Gibson <david@gibson.dropbear.id.au> writes:
>> >>
>> >> > [ Unknown signature status ]
>> >> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
>> >> >> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> >> >
>> >> > This really needs a comment indicating that this implementation isn't
>> >> > strictly correct (although probably good enough in practice).
>> >>
>> >> Sure. And it also does not help if someone uses any store other than
>> >> store conditional, that isn't taken care.
>> >>
>> >> Assumption here is the locking primitives use load with reservation and
>> >> store conditional. And no other ld/st variant touch this memory.
>> >
>> > So, a) I don't think this really relies on that: an ordinary store
>> > (assuming it changes the value) will still get picked up the cmpxchg.
>> > Well.. at least after a suitable memory barrier.  Matching memory
>> > models between emulated and host cpus is a whole other kettle of fish.
>>
>> Have you seen Pranith's memory barrier patches?
>
> I have not.

They are now in Richard's tcg-next queue

Message-Id: <1473282648-23487-1-git-send-email-rth@twiddle.net>
Subject: [Qemu-devel] [PULL 00/18] tcg queued patches

All the backends support the new fence op, so far only ARM, Alpha and
x86 emit the fence TCGOps as these are best added by someone who
understands the frontend well.

>> >
>> > I think this does matter, IIRC a kernel spin unlock on ppc is a
>> > barrier + plain store, no load locked or store conditional.
>> >
>> >> > Specifically a racing store which happens to store the same value
>> >> > which was already in memory should clobber the reservation, but won't
>> >> > with this implementation.
>> >> >
>> >> > I had a long discussion at KVM Forum with Emilio Costa about this, in
>> >> > which I discovered just how hard it is to strictly implement
>> >> > store-conditional semantics in terms of anything else.  So, this is
>> >> > probably a reasonable substitute, but we should note the fact that
>> >> > it's not 100%.
>> >>
>> >> I will update the commit log.
>> >>
>> >> Regards,
>> >> Nikunj
>> >>
>>
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation
  2016-09-12  8:39             ` Alex Bennée
@ 2016-09-12  9:15               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-12  9:15 UTC (permalink / raw)
  To: Alex Bennée, David Gibson
  Cc: Nikunj A Dadhania, qemu-ppc, rth, qemu-devel

On Mon, 2016-09-12 at 09:39 +0100, Alex Bennée wrote:
> 
> They are now in Richard's tcg-next queue
> 
> Message-Id: <1473282648-23487-1-git-send-email-rth@twiddle.net>
> Subject: [Qemu-devel] [PULL 00/18] tcg queued patches
> 
> All the backends support the new fence op, so far only ARM, Alpha and
> x86 emit the fence TCGOps as these are best added by someone who
> understands the frontend well.

Hrm, I should probably have a look ;-)

A bit swamped this week, I'll see what I can do.

Cheers,
Ben.

> > 
> > > 
> > > > 
> > > > 
> > > > I think this does matter, IIRC a kernel spin unlock on ppc is a
> > > > barrier + plain store, no load locked or store conditional.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Specifically a racing store which happens to store the same
> > > > > > value
> > > > > > which was already in memory should clobber the reservation,
> > > > > > but won't
> > > > > > with this implementation.
> > > > > > 
> > > > > > I had a long discussion at KVM Forum with Emilio Costa
> > > > > > about this, in
> > > > > > which I discovered just how hard it is to strictly
> > > > > > implement
> > > > > > store-conditional semantics in terms of anything else.  So,
> > > > > > this is
> > > > > > probably a reasonable substitute, but we should note the
> > > > > > fact that
> > > > > > it's not 100%.
> > > > > 
> > > > > I will update the commit log.
> > > > > 
> > > > > Regards,
> > > > > Nikunj
> > > > > 
> > > 
> > > 
> 
> 
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-09-12  9:15 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02  6:32 [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Nikunj A Dadhania
2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 1/4] spapr-hcall: take iothread lock during handler call Nikunj A Dadhania
2016-09-02  8:53   ` Greg Kurz
2016-09-02  9:28     ` Nikunj A Dadhania
2016-09-02  9:57       ` Greg Kurz
2016-09-03 16:31         ` Nikunj A Dadhania
2016-09-02 10:06   ` Thomas Huth
2016-09-03 16:33     ` Nikunj A Dadhania
2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads Nikunj A Dadhania
2016-09-02  9:28   ` Greg Kurz
2016-09-02  9:34     ` Nikunj A Dadhania
2016-09-02 10:45       ` Greg Kurz
2016-09-03 16:34         ` Nikunj A Dadhania
2016-09-07  3:51   ` David Gibson
2016-09-07  4:41     ` Nikunj A Dadhania
2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation Nikunj A Dadhania
2016-09-07  4:02   ` David Gibson
2016-09-07  4:47     ` Nikunj A Dadhania
2016-09-07  5:24       ` Benjamin Herrenschmidt
2016-09-07  8:42         ` Nikunj A Dadhania
2016-09-07  5:34       ` David Gibson
2016-09-07  7:13         ` Alex Bennée
2016-09-12  1:19           ` David Gibson
2016-09-12  8:39             ` Alex Bennée
2016-09-12  9:15               ` Benjamin Herrenschmidt
2016-09-02  6:32 ` [Qemu-devel] [PATCH RFC 4/4] target-ppc: flush tlb from all the cpu Nikunj A Dadhania
2016-09-02  7:22   ` Benjamin Herrenschmidt
2016-09-02  7:34     ` Nikunj A Dadhania
2016-09-04 17:00       ` Alex Bennée
2016-09-04 22:17         ` Benjamin Herrenschmidt
2016-09-05  0:10         ` Benjamin Herrenschmidt
2016-09-06  1:55           ` Nikunj A Dadhania
2016-09-06  3:05             ` Benjamin Herrenschmidt
2016-09-06  4:53               ` Nikunj A Dadhania
2016-09-06  5:30                 ` Benjamin Herrenschmidt
2016-09-06  6:57                   ` Nikunj A Dadhania
2016-09-02  6:43 ` [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC Cédric Le Goater
2016-09-02  6:46   ` Nikunj A Dadhania
2016-09-02  7:57   ` Thomas Huth
2016-09-02 11:44     ` Cédric Le Goater
2016-09-02  7:19 ` Benjamin Herrenschmidt
2016-09-02  7:39   ` Nikunj A Dadhania
2016-09-02 12:13     ` Benjamin Herrenschmidt
2016-09-07  4:08   ` David Gibson
2016-09-07  4:48     ` Nikunj A Dadhania

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.