All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15
@ 2017-02-22 11:44 Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 01/10] target/ppc: move cpu_[read, write]_xer to cpu.c Nikunj A Dadhania
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

This series contains implentation of CA32 and OV32 bits added to the 
ISA 3.0. Various fixed-point arithmetic instructions are updated to take
care of the newer flags. 

Finally the last patch adds new instruction mcrxrx, that helps reading 
the carry (CA and CA32) and the overflow (OV and OV32) flags

Changelog:
v2: 
* Add missing condition in narrow mode(add/subf), multiply and divide
* Drop nego patch, subf implementation is sufficient for setting OV and OV32
* Retaining neg[.], as the code is simplified.
* Fix OV resetting in compute_ov()

v1: 
* Use these ISA 3.0 flag to enable CA32 and OV32
* Re-write ca32 compute routine
* Add setting of flags for "neg." and "nego."

Nikunj A Dadhania (10):
  target/ppc: move cpu_[read, write]_xer to cpu.c
  target/ppc: optimize gen_write_xer()
  target/ppc: support for 32-bit carry and overflow
  target/ppc: update ca32 in arithmetic add
  target/ppc: update ca32 in arithmetic substract
  target/ppc: update overflow flags for add/sub
  target/ppc: use tcg ops for neg instruction
  target/ppc: add ov32 flag for multiply low insns
  target/ppc: add ov32 flag in divide operations
  target/ppc: add mcrxrx instruction

 target/ppc/Makefile.objs    |   1 +
 target/ppc/cpu.c            |  51 ++++++++++++++++++
 target/ppc/cpu.h            |  21 ++++----
 target/ppc/int_helper.c     |  53 +++++++-----------
 target/ppc/translate.c      | 128 ++++++++++++++++++++++++++++++++++++++------
 target/ppc/translate_init.c |   4 +-
 6 files changed, 194 insertions(+), 64 deletions(-)
 create mode 100644 target/ppc/cpu.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 01/10] target/ppc: move cpu_[read, write]_xer to cpu.c
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 02/10] target/ppc: optimize gen_write_xer() Nikunj A Dadhania
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target/ppc/Makefile.objs |  1 +
 target/ppc/cpu.c         | 36 ++++++++++++++++++++++++++++++++++++
 target/ppc/cpu.h         | 14 ++------------
 3 files changed, 39 insertions(+), 12 deletions(-)
 create mode 100644 target/ppc/cpu.c

diff --git a/target/ppc/Makefile.objs b/target/ppc/Makefile.objs
index a8c7a30..4f4168f 100644
--- a/target/ppc/Makefile.objs
+++ b/target/ppc/Makefile.objs
@@ -1,4 +1,5 @@
 obj-y += cpu-models.o
+obj-y += cpu.o
 obj-y += translate.o
 ifeq ($(CONFIG_SOFTMMU),y)
 obj-y += machine.o mmu_helper.o mmu-hash32.o monitor.o
diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
new file mode 100644
index 0000000..de3004b
--- /dev/null
+++ b/target/ppc/cpu.c
@@ -0,0 +1,36 @@
+/*
+ *  PowerPC CPU routines for qemu.
+ *
+ * Copyright (c) 2017 Nikunj A Dadhania, IBM Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "cpu-models.h"
+
+target_ulong cpu_read_xer(CPUPPCState *env)
+{
+    return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
+        (env->ca << XER_CA);
+}
+
+void cpu_write_xer(CPUPPCState *env, target_ulong xer)
+{
+    env->so = (xer >> XER_SO) & 1;
+    env->ov = (xer >> XER_OV) & 1;
+    env->ca = (xer >> XER_CA) & 1;
+    env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
+}
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 425e79d..b559b67 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2343,18 +2343,8 @@ enum {
 
 /*****************************************************************************/
 
-static inline target_ulong cpu_read_xer(CPUPPCState *env)
-{
-    return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) | (env->ca << XER_CA);
-}
-
-static inline void cpu_write_xer(CPUPPCState *env, target_ulong xer)
-{
-    env->so = (xer >> XER_SO) & 1;
-    env->ov = (xer >> XER_OV) & 1;
-    env->ca = (xer >> XER_CA) & 1;
-    env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
-}
+target_ulong cpu_read_xer(CPUPPCState *env);
+void cpu_write_xer(CPUPPCState *env, target_ulong xer);
 
 static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
                                         target_ulong *cs_base, uint32_t *flags)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 02/10] target/ppc: optimize gen_write_xer()
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 01/10] target/ppc: move cpu_[read, write]_xer to cpu.c Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow Nikunj A Dadhania
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target/ppc/translate.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3ba2616..b09e16f 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3724,12 +3724,9 @@ static void gen_write_xer(TCGv src)
 {
     tcg_gen_andi_tl(cpu_xer, src,
                     ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
-    tcg_gen_shri_tl(cpu_so, src, XER_SO);
-    tcg_gen_shri_tl(cpu_ov, src, XER_OV);
-    tcg_gen_shri_tl(cpu_ca, src, XER_CA);
-    tcg_gen_andi_tl(cpu_so, cpu_so, 1);
-    tcg_gen_andi_tl(cpu_ov, cpu_ov, 1);
-    tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+    tcg_gen_extract_tl(cpu_so, src, XER_SO, 1);
+    tcg_gen_extract_tl(cpu_ov, src, XER_OV, 1);
+    tcg_gen_extract_tl(cpu_ca, src, XER_CA, 1);
 }
 
 /* mcrxr */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 01/10] target/ppc: move cpu_[read, write]_xer to cpu.c Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 02/10] target/ppc: optimize gen_write_xer() Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 17:17   ` Richard Henderson
                     ` (2 more replies)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 04/10] target/ppc: update ca32 in arithmetic add Nikunj A Dadhania
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

POWER ISA 3.0 adds CA32 and OV32 status in 64-bit mode. Add the flags
and corresponding defines.

Moreover, CA32 is updated when CA is updated and OV32 is updated when OV
is updated.

Arithmetic instructions:
    * Addition and Substractions:

        addic, addic., subfic, addc, subfc, adde, subfe, addme, subfme,
        addze, and subfze always updates CA and CA32.

        => CA reflects the carry out of bit 0 in 64-bit mode and out of
           bit 32 in 32-bit mode.
        => CA32 reflects the carry out of bit 32 independent of the
           mode.

        => SO and OV reflects overflow of the 64-bit result in 64-bit
           mode and overflow of the low-order 32-bit result in 32-bit
           mode
        => OV32 reflects overflow of the low-order 32-bit independent of
           the mode

    * Multiply Low and Divide:

        For mulld, divd, divde, divdu and divdeu: SO, OV, and OV32 bits
        reflects overflow of the 64-bit result

        For mullw, divw, divwe, divwu and divweu: SO, OV, and OV32 bits
        reflects overflow of the 32-bit result

     * Negate with OE=1 (nego)

       For 64-bit mode if the register RA contains
       0x8000_0000_0000_0000, OV and OV32 are set to 1.

       For 32-bit mode if the register RA contains 0x8000_0000, OV and
       OV32 are set to 1.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target/ppc/cpu.c            | 19 +++++++++++++++++--
 target/ppc/cpu.h            |  7 +++++++
 target/ppc/translate.c      | 29 ++++++++++++++++++++++++-----
 target/ppc/translate_init.c |  4 ++--
 4 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index de3004b..89c1ccb 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -23,8 +23,15 @@
 
 target_ulong cpu_read_xer(CPUPPCState *env)
 {
-    return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
+    target_ulong xer;
+
+    xer = env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
         (env->ca << XER_CA);
+
+    if (is_isa300(env)) {
+        xer |= (env->ov32 << XER_OV32) | (env->ca32 << XER_CA32);
+    }
+    return xer;
 }
 
 void cpu_write_xer(CPUPPCState *env, target_ulong xer)
@@ -32,5 +39,13 @@ void cpu_write_xer(CPUPPCState *env, target_ulong xer)
     env->so = (xer >> XER_SO) & 1;
     env->ov = (xer >> XER_OV) & 1;
     env->ca = (xer >> XER_CA) & 1;
-    env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
+    if (is_isa300(env)) {
+        env->ov32 = (xer >> XER_OV32) & 1;
+        env->ca32 = (xer >> XER_CA32) & 1;
+        env->xer = xer & ~((1ul << XER_SO) |
+                           (1ul << XER_OV) | (1ul << XER_CA) |
+                           (1ul << XER_OV32) | (1ul << XER_CA32));
+    } else {
+        env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
+    }
 }
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index b559b67..ee2eb45 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -965,6 +965,8 @@ struct CPUPPCState {
     target_ulong so;
     target_ulong ov;
     target_ulong ca;
+    target_ulong ov32;
+    target_ulong ca32;
     /* Reservation address */
     target_ulong reserve_addr;
     /* Reservation value */
@@ -1372,11 +1374,15 @@ int ppc_compat_max_threads(PowerPCCPU *cpu);
 #define XER_SO  31
 #define XER_OV  30
 #define XER_CA  29
+#define XER_OV32  19
+#define XER_CA32  18
 #define XER_CMP  8
 #define XER_BC   0
 #define xer_so  (env->so)
 #define xer_ov  (env->ov)
 #define xer_ca  (env->ca)
+#define xer_ov32  (env->ov)
+#define xer_ca32  (env->ca)
 #define xer_cmp ((env->xer >> XER_CMP) & 0xFF)
 #define xer_bc  ((env->xer >> XER_BC)  & 0x7F)
 
@@ -2343,6 +2349,7 @@ enum {
 
 /*****************************************************************************/
 
+#define is_isa300(ctx) (!!(ctx->insns_flags2 & PPC2_ISA300))
 target_ulong cpu_read_xer(CPUPPCState *env);
 void cpu_write_xer(CPUPPCState *env, target_ulong xer);
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b09e16f..c9f6768 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -71,7 +71,7 @@ static TCGv cpu_lr;
 #if defined(TARGET_PPC64)
 static TCGv cpu_cfar;
 #endif
-static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca;
+static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
 static TCGv cpu_reserve;
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
@@ -173,6 +173,10 @@ void ppc_translate_init(void)
                                 offsetof(CPUPPCState, ov), "OV");
     cpu_ca = tcg_global_mem_new(cpu_env,
                                 offsetof(CPUPPCState, ca), "CA");
+    cpu_ov32 = tcg_global_mem_new(cpu_env,
+                                  offsetof(CPUPPCState, ov32), "OV32");
+    cpu_ca32 = tcg_global_mem_new(cpu_env,
+                                  offsetof(CPUPPCState, ca32), "CA32");
 
     cpu_reserve = tcg_global_mem_new(cpu_env,
                                      offsetof(CPUPPCState, reserve_addr),
@@ -3703,7 +3707,7 @@ static void gen_tdi(DisasContext *ctx)
 
 /***                          Processor control                            ***/
 
-static void gen_read_xer(TCGv dst)
+static void gen_read_xer(DisasContext *ctx, TCGv dst)
 {
     TCGv t0 = tcg_temp_new();
     TCGv t1 = tcg_temp_new();
@@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
     tcg_gen_or_tl(t0, t0, t1);
     tcg_gen_or_tl(dst, dst, t2);
     tcg_gen_or_tl(dst, dst, t0);
+    if (is_isa300(ctx)) {
+        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
+        tcg_gen_or_tl(dst, dst, t0);
+        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
+        tcg_gen_or_tl(dst, dst, t0);
+    }
     tcg_temp_free(t0);
     tcg_temp_free(t1);
     tcg_temp_free(t2);
 }
 
-static void gen_write_xer(TCGv src)
+static void gen_write_xer(DisasContext *ctx, TCGv src)
 {
-    tcg_gen_andi_tl(cpu_xer, src,
-                    ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
+    if (is_isa300(ctx)) {
+        tcg_gen_andi_tl(cpu_xer, src,
+                        ~((1u << XER_SO) |
+                          (1u << XER_OV) | (1u << XER_OV32) |
+                          (1u << XER_CA) | (1u << XER_CA32)));
+        tcg_gen_extract_tl(cpu_ov32, src, XER_OV32, 1);
+        tcg_gen_extract_tl(cpu_ca32, src, XER_CA32, 1);
+    } else {
+        tcg_gen_andi_tl(cpu_xer, src,
+                        ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
+    }
     tcg_gen_extract_tl(cpu_so, src, XER_SO, 1);
     tcg_gen_extract_tl(cpu_ov, src, XER_OV, 1);
     tcg_gen_extract_tl(cpu_ca, src, XER_CA, 1);
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index be35cbd..eb667bb 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -107,12 +107,12 @@ static void spr_access_nop(DisasContext *ctx, int sprn, int gprn)
 /* XER */
 static void spr_read_xer (DisasContext *ctx, int gprn, int sprn)
 {
-    gen_read_xer(cpu_gpr[gprn]);
+    gen_read_xer(ctx, cpu_gpr[gprn]);
 }
 
 static void spr_write_xer (DisasContext *ctx, int sprn, int gprn)
 {
-    gen_write_xer(cpu_gpr[gprn]);
+    gen_write_xer(ctx, cpu_gpr[gprn]);
 }
 
 /* LR */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 04/10] target/ppc: update ca32 in arithmetic add
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (2 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 17:20   ` Richard Henderson
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 05/10] target/ppc: update ca32 in arithmetic substract Nikunj A Dadhania
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

Adds routine to compute ca32 - gen_op_arith_compute_ca32

For 64-bit mode use the compute ca32 routine. While for 32-bit mode, CA
and CA32 will have same value.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target/ppc/translate.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c9f6768..9165450 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -816,6 +816,23 @@ static inline void gen_op_arith_compute_ov(DisasContext *ctx, TCGv arg0,
     tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 }
 
+static inline void gen_op_arith_compute_ca32(DisasContext *ctx,
+                                             TCGv res, TCGv arg0, TCGv arg1,
+                                             int sub)
+{
+    TCGv t0;
+
+    if (!is_isa300(ctx)) {
+        return;
+    }
+
+    t0 = tcg_temp_new();
+    tcg_gen_xor_tl(t0, arg0, arg1);
+    tcg_gen_xor_tl(t0, t0, res);
+    tcg_gen_extract_tl(cpu_ca32, t0, 32, 1);
+    tcg_temp_free(t0);
+}
+
 /* Common add function */
 static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1,
                                     TCGv arg2, bool add_ca, bool compute_ca,
@@ -842,6 +859,9 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1,
             tcg_temp_free(t1);
             tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
             tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+            if (is_isa300(ctx)) {
+                tcg_gen_mov_tl(cpu_ca32, cpu_ca);
+            }
         } else {
             TCGv zero = tcg_const_tl(0);
             if (add_ca) {
@@ -850,6 +870,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1,
             } else {
                 tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, arg2, zero);
             }
+            gen_op_arith_compute_ca32(ctx, t0, arg1, arg2, 0);
             tcg_temp_free(zero);
         }
     } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 05/10] target/ppc: update ca32 in arithmetic substract
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (3 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 04/10] target/ppc: update ca32 in arithmetic add Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 17:21   ` Richard Henderson
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 06/10] target/ppc: update overflow flags for add/sub Nikunj A Dadhania
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target/ppc/translate.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 9165450..f3f92aa 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -827,7 +827,12 @@ static inline void gen_op_arith_compute_ca32(DisasContext *ctx,
     }
 
     t0 = tcg_temp_new();
-    tcg_gen_xor_tl(t0, arg0, arg1);
+    if (sub) {
+        tcg_gen_not_tl(t0, arg0);
+        tcg_gen_xor_tl(t0, t0, arg1);
+    } else {
+        tcg_gen_xor_tl(t0, arg0, arg1);
+    }
     tcg_gen_xor_tl(t0, t0, res);
     tcg_gen_extract_tl(cpu_ca32, t0, 32, 1);
     tcg_temp_free(t0);
@@ -1378,17 +1383,22 @@ static inline void gen_op_arith_subf(DisasContext *ctx, TCGv ret, TCGv arg1,
             tcg_temp_free(t1);
             tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);    /* extract bit 32 */
             tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+            if (is_isa300(ctx)) {
+                tcg_gen_mov_tl(cpu_ca32, cpu_ca);
+            }
         } else if (add_ca) {
             TCGv zero, inv1 = tcg_temp_new();
             tcg_gen_not_tl(inv1, arg1);
             zero = tcg_const_tl(0);
             tcg_gen_add2_tl(t0, cpu_ca, arg2, zero, cpu_ca, zero);
             tcg_gen_add2_tl(t0, cpu_ca, t0, cpu_ca, inv1, zero);
+            gen_op_arith_compute_ca32(ctx, t0, inv1, arg2, 0);
             tcg_temp_free(zero);
             tcg_temp_free(inv1);
         } else {
             tcg_gen_setcond_tl(TCG_COND_GEU, cpu_ca, arg2, arg1);
             tcg_gen_sub_tl(t0, arg2, arg1);
+            gen_op_arith_compute_ca32(ctx, t0, arg1, arg2, 1);
         }
     } else if (add_ca) {
         /* Since we're ignoring carry-out, we can simplify the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 06/10] target/ppc: update overflow flags for add/sub
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (4 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 05/10] target/ppc: update ca32 in arithmetic substract Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 17:26   ` Richard Henderson
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 07/10] target/ppc: use tcg ops for neg instruction Nikunj A Dadhania
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

* SO and OV reflects overflow of the 64-bit result in 64-bit mode and
  overflow of the low-order 32-bit result in 32-bit mode

* OV32 reflects overflow of the low-order 32-bit independent of the mode

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target/ppc/translate.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f3f92aa..43366e7 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -809,10 +809,19 @@ static inline void gen_op_arith_compute_ov(DisasContext *ctx, TCGv arg0,
         tcg_gen_andc_tl(cpu_ov, cpu_ov, t0);
     }
     tcg_temp_free(t0);
-    if (NARROW_MODE(ctx)) {
-        tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
+    if (is_isa300(ctx)) {
+        tcg_gen_extract_tl(cpu_ov32, cpu_ov, 31, 1);
+        if (NARROW_MODE(ctx)) {
+            tcg_gen_mov_tl(cpu_ov, cpu_ov32);
+        } else {
+            tcg_gen_extract_tl(cpu_ov, cpu_ov, 63, 1);
+        }
+    } else {
+        if (NARROW_MODE(ctx)) {
+            tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
+        }
+        tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);
     }
-    tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);
     tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 07/10] target/ppc: use tcg ops for neg instruction
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (5 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 06/10] target/ppc: update overflow flags for add/sub Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 08/10] target/ppc: add ov32 flag for multiply low insns Nikunj A Dadhania
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target/ppc/translate.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 43366e7..19e6292 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1486,7 +1486,10 @@ static inline void gen_op_arith_neg(DisasContext *ctx, bool compute_ov)
 
 static void gen_neg(DisasContext *ctx)
 {
-    gen_op_arith_neg(ctx, 0);
+    tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
+    if (unlikely(Rc(ctx->opcode))) {
+        gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+    }
 }
 
 static void gen_nego(DisasContext *ctx)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 08/10] target/ppc: add ov32 flag for multiply low insns
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (6 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 07/10] target/ppc: use tcg ops for neg instruction Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 09/10] target/ppc: add ov32 flag in divide operations Nikunj A Dadhania
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

For Multiply Word:
SO, OV, and OV32 bits reflects overflow of the 32-bit result

For Multiply DoubleWord:
SO, OV, and OV32 bits reflects overflow of the 64-bit result

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target/ppc/translate.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 19e6292..ba3387e 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1288,6 +1288,9 @@ static void gen_mullwo(DisasContext *ctx)
     tcg_gen_sari_i32(t0, t0, 31);
     tcg_gen_setcond_i32(TCG_COND_NE, t0, t0, t1);
     tcg_gen_extu_i32_tl(cpu_ov, t0);
+    if (is_isa300(ctx)) {
+        tcg_gen_mov_tl(cpu_ov32, cpu_ov);
+    }
     tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 
     tcg_temp_free_i32(t0);
@@ -1349,6 +1352,9 @@ static void gen_mulldo(DisasContext *ctx)
 
     tcg_gen_sari_i64(t0, t0, 63);
     tcg_gen_setcond_i64(TCG_COND_NE, cpu_ov, t0, t1);
+    if (is_isa300(ctx)) {
+        tcg_gen_mov_tl(cpu_ov32, cpu_ov);
+    }
     tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
 
     tcg_temp_free_i64(t0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 09/10] target/ppc: add ov32 flag in divide operations
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (7 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 08/10] target/ppc: add ov32 flag for multiply low insns Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 10/10] target/ppc: add mcrxrx instruction Nikunj A Dadhania
  2017-02-23  3:27 ` [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 David Gibson
  10 siblings, 0 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

Add helper_div_compute_ov() in the int_helper for updating the overflow
flags.

For Divide Word:
SO, OV, and OV32 bits reflects overflow of the 32-bit result

For Divide DoubleWord:
SO, OV, and OV32 bits reflects overflow of the 64-bit result

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target/ppc/int_helper.c | 53 +++++++++++++++++++------------------------------
 target/ppc/translate.c  | 10 ++++++++--
 2 files changed, 28 insertions(+), 35 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index dd0a892..1cad62f 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -28,6 +28,22 @@
 /*****************************************************************************/
 /* Fixed point operations helpers */
 
+static inline void helper_div_compute_ov(CPUPPCState *env, uint32_t oe,
+                                         int overflow)
+{
+    if (oe) {
+        if (unlikely(overflow)) {
+            env->so = env->ov = 1;
+        } else {
+            env->ov = 0;
+        }
+
+        if (is_isa300(env)) {
+            env->ov32 = env->ov;
+        }
+    }
+}
+
 target_ulong helper_divweu(CPUPPCState *env, target_ulong ra, target_ulong rb,
                            uint32_t oe)
 {
@@ -48,14 +64,7 @@ target_ulong helper_divweu(CPUPPCState *env, target_ulong ra, target_ulong rb,
         rt = 0; /* Undefined */
     }
 
-    if (oe) {
-        if (unlikely(overflow)) {
-            env->so = env->ov = 1;
-        } else {
-            env->ov = 0;
-        }
-    }
-
+    helper_div_compute_ov(env, oe, overflow);
     return (target_ulong)rt;
 }
 
@@ -80,14 +89,7 @@ target_ulong helper_divwe(CPUPPCState *env, target_ulong ra, target_ulong rb,
         rt = 0; /* Undefined */
     }
 
-    if (oe) {
-        if (unlikely(overflow)) {
-            env->so = env->ov = 1;
-        } else {
-            env->ov = 0;
-        }
-    }
-
+    helper_div_compute_ov(env, oe, overflow);
     return (target_ulong)rt;
 }
 
@@ -104,14 +106,7 @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, uint64_t rb, uint32_t oe)
         rt = 0; /* Undefined */
     }
 
-    if (oe) {
-        if (unlikely(overflow)) {
-            env->so = env->ov = 1;
-        } else {
-            env->ov = 0;
-        }
-    }
-
+    helper_div_compute_ov(env, oe, overflow);
     return rt;
 }
 
@@ -126,15 +121,7 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
         rt = 0; /* Undefined */
     }
 
-    if (oe) {
-
-        if (unlikely(overflow)) {
-            env->so = env->ov = 1;
-        } else {
-            env->ov = 0;
-        }
-    }
-
+    helper_div_compute_ov(env, oe, overflow);
     return rt;
 }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index ba3387e..99dfcf7 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1024,6 +1024,9 @@ static inline void gen_op_arith_divw(DisasContext *ctx, TCGv ret, TCGv arg1,
     }
     if (compute_ov) {
         tcg_gen_extu_i32_tl(cpu_ov, t2);
+        if (is_isa300(ctx)) {
+            tcg_gen_extu_i32_tl(cpu_ov32, t2);
+        }
         tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
     }
     tcg_temp_free_i32(t0);
@@ -1095,6 +1098,9 @@ static inline void gen_op_arith_divd(DisasContext *ctx, TCGv ret, TCGv arg1,
     }
     if (compute_ov) {
         tcg_gen_mov_tl(cpu_ov, t2);
+        if (is_isa300(ctx)) {
+            tcg_gen_mov_tl(cpu_ov32, t2);
+        }
         tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
     }
     tcg_temp_free_i64(t0);
@@ -1113,10 +1119,10 @@ static void glue(gen_, name)(DisasContext *ctx)
                       cpu_gpr[rA(ctx->opcode)], cpu_gpr[rB(ctx->opcode)],     \
                       sign, compute_ov);                                      \
 }
-/* divwu  divwu.  divwuo  divwuo.   */
+/* divdu  divdu.  divduo  divduo.   */
 GEN_INT_ARITH_DIVD(divdu, 0x0E, 0, 0);
 GEN_INT_ARITH_DIVD(divduo, 0x1E, 0, 1);
-/* divw  divw.  divwo  divwo.   */
+/* divd  divd.  divdo  divdo.   */
 GEN_INT_ARITH_DIVD(divd, 0x0F, 1, 0);
 GEN_INT_ARITH_DIVD(divdo, 0x1F, 1, 1);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH v3 10/10] target/ppc: add mcrxrx instruction
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (8 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 09/10] target/ppc: add ov32 flag in divide operations Nikunj A Dadhania
@ 2017-02-22 11:44 ` Nikunj A Dadhania
  2017-02-23  3:27 ` [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 David Gibson
  10 siblings, 0 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-22 11:44 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, bharata, nikunj

mcrxrx: Move to CR from XER Extended

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target/ppc/translate.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 99dfcf7..90400fc 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3826,6 +3826,28 @@ static void gen_mcrxr(DisasContext *ctx)
     tcg_gen_movi_tl(cpu_ca, 0);
 }
 
+#ifdef TARGET_PPC64
+/* mcrxrx */
+static void gen_mcrxrx(DisasContext *ctx)
+{
+    TCGv t0 = tcg_temp_new();
+    TCGv t1 = tcg_temp_new();
+    TCGv_i32 dst = cpu_crf[crfD(ctx->opcode)];
+
+    /* copy OV and OV32 */
+    tcg_gen_shli_tl(t0, cpu_ov, 1);
+    tcg_gen_or_tl(t0, t0, cpu_ov32);
+    tcg_gen_shli_tl(t0, t0, 2);
+    /* copy CA and CA32 */
+    tcg_gen_shli_tl(t1, cpu_ca, 1);
+    tcg_gen_or_tl(t1, t1, cpu_ca32);
+    tcg_gen_or_tl(t0, t0, t1);
+    tcg_gen_trunc_tl_i32(dst, t0);
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+#endif
+
 /* mfcr mfocrf */
 static void gen_mfcr(DisasContext *ctx)
 {
@@ -6495,6 +6517,7 @@ GEN_HANDLER(mtcrf, 0x1F, 0x10, 0x04, 0x00000801, PPC_MISC),
 #if defined(TARGET_PPC64)
 GEN_HANDLER(mtmsrd, 0x1F, 0x12, 0x05, 0x001EF801, PPC_64B),
 GEN_HANDLER_E(setb, 0x1F, 0x00, 0x04, 0x0003F801, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(mcrxrx, 0x1F, 0x00, 0x12, 0x007FF801, PPC_NONE, PPC2_ISA300),
 #endif
 GEN_HANDLER(mtmsr, 0x1F, 0x12, 0x04, 0x001EF801, PPC_MISC),
 GEN_HANDLER(mtspr, 0x1F, 0x13, 0x0E, 0x00000000, PPC_MISC),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow Nikunj A Dadhania
@ 2017-02-22 17:17   ` Richard Henderson
  2017-02-22 17:20   ` Richard Henderson
  2017-02-23  3:21   ` [Qemu-devel] " David Gibson
  2 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2017-02-22 17:17 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, bharata

On 02/22/2017 10:44 PM, Nikunj A Dadhania wrote:
> POWER ISA 3.0 adds CA32 and OV32 status in 64-bit mode. Add the flags
> and corresponding defines.
>
> Moreover, CA32 is updated when CA is updated and OV32 is updated when OV
> is updated.
>
> Arithmetic instructions:
>     * Addition and Substractions:
>
>         addic, addic., subfic, addc, subfc, adde, subfe, addme, subfme,
>         addze, and subfze always updates CA and CA32.
>
>         => CA reflects the carry out of bit 0 in 64-bit mode and out of
>            bit 32 in 32-bit mode.
>         => CA32 reflects the carry out of bit 32 independent of the
>            mode.
>
>         => SO and OV reflects overflow of the 64-bit result in 64-bit
>            mode and overflow of the low-order 32-bit result in 32-bit
>            mode
>         => OV32 reflects overflow of the low-order 32-bit independent of
>            the mode
>
>     * Multiply Low and Divide:
>
>         For mulld, divd, divde, divdu and divdeu: SO, OV, and OV32 bits
>         reflects overflow of the 64-bit result
>
>         For mullw, divw, divwe, divwu and divweu: SO, OV, and OV32 bits
>         reflects overflow of the 32-bit result
>
>      * Negate with OE=1 (nego)
>
>        For 64-bit mode if the register RA contains
>        0x8000_0000_0000_0000, OV and OV32 are set to 1.
>
>        For 32-bit mode if the register RA contains 0x8000_0000, OV and
>        OV32 are set to 1.
>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target/ppc/cpu.c            | 19 +++++++++++++++++--
>  target/ppc/cpu.h            |  7 +++++++
>  target/ppc/translate.c      | 29 ++++++++++++++++++++++++-----
>  target/ppc/translate_init.c |  4 ++--
>  4 files changed, 50 insertions(+), 9 deletions(-)
>
> diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
> index de3004b..89c1ccb 100644
> --- a/target/ppc/cpu.c
> +++ b/target/ppc/cpu.c
> @@ -23,8 +23,15 @@
>
>  target_ulong cpu_read_xer(CPUPPCState *env)
>  {
> -    return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
> +    target_ulong xer;
> +
> +    xer = env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
>          (env->ca << XER_CA);
> +
> +    if (is_isa300(env)) {
> +        xer |= (env->ov32 << XER_OV32) | (env->ca32 << XER_CA32);
> +    }
> +    return xer;
>  }
>
>  void cpu_write_xer(CPUPPCState *env, target_ulong xer)
> @@ -32,5 +39,13 @@ void cpu_write_xer(CPUPPCState *env, target_ulong xer)
>      env->so = (xer >> XER_SO) & 1;
>      env->ov = (xer >> XER_OV) & 1;
>      env->ca = (xer >> XER_CA) & 1;
> -    env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
> +    if (is_isa300(env)) {
> +        env->ov32 = (xer >> XER_OV32) & 1;
> +        env->ca32 = (xer >> XER_CA32) & 1;
> +        env->xer = xer & ~((1ul << XER_SO) |
> +                           (1ul << XER_OV) | (1ul << XER_CA) |
> +                           (1ul << XER_OV32) | (1ul << XER_CA32));
> +    } else {
> +        env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
> +    }
>  }
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b559b67..ee2eb45 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -965,6 +965,8 @@ struct CPUPPCState {
>      target_ulong so;
>      target_ulong ov;
>      target_ulong ca;
> +    target_ulong ov32;
> +    target_ulong ca32;
>      /* Reservation address */
>      target_ulong reserve_addr;
>      /* Reservation value */
> @@ -1372,11 +1374,15 @@ int ppc_compat_max_threads(PowerPCCPU *cpu);
>  #define XER_SO  31
>  #define XER_OV  30
>  #define XER_CA  29
> +#define XER_OV32  19
> +#define XER_CA32  18
>  #define XER_CMP  8
>  #define XER_BC   0
>  #define xer_so  (env->so)
>  #define xer_ov  (env->ov)
>  #define xer_ca  (env->ca)
> +#define xer_ov32  (env->ov)
> +#define xer_ca32  (env->ca)
>  #define xer_cmp ((env->xer >> XER_CMP) & 0xFF)
>  #define xer_bc  ((env->xer >> XER_BC)  & 0x7F)
>
> @@ -2343,6 +2349,7 @@ enum {
>
>  /*****************************************************************************/
>
> +#define is_isa300(ctx) (!!(ctx->insns_flags2 & PPC2_ISA300))
>  target_ulong cpu_read_xer(CPUPPCState *env);
>  void cpu_write_xer(CPUPPCState *env, target_ulong xer);
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b09e16f..c9f6768 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -71,7 +71,7 @@ static TCGv cpu_lr;
>  #if defined(TARGET_PPC64)
>  static TCGv cpu_cfar;
>  #endif
> -static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca;
> +static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
>  static TCGv cpu_reserve;
>  static TCGv cpu_fpscr;
>  static TCGv_i32 cpu_access_type;
> @@ -173,6 +173,10 @@ void ppc_translate_init(void)
>                                  offsetof(CPUPPCState, ov), "OV");
>      cpu_ca = tcg_global_mem_new(cpu_env,
>                                  offsetof(CPUPPCState, ca), "CA");
> +    cpu_ov32 = tcg_global_mem_new(cpu_env,
> +                                  offsetof(CPUPPCState, ov32), "OV32");
> +    cpu_ca32 = tcg_global_mem_new(cpu_env,
> +                                  offsetof(CPUPPCState, ca32), "CA32");
>
>      cpu_reserve = tcg_global_mem_new(cpu_env,
>                                       offsetof(CPUPPCState, reserve_addr),
> @@ -3703,7 +3707,7 @@ static void gen_tdi(DisasContext *ctx)
>
>  /***                          Processor control                            ***/
>
> -static void gen_read_xer(TCGv dst)
> +static void gen_read_xer(DisasContext *ctx, TCGv dst)
>  {
>      TCGv t0 = tcg_temp_new();
>      TCGv t1 = tcg_temp_new();
> @@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
>      tcg_gen_or_tl(t0, t0, t1);
>      tcg_gen_or_tl(dst, dst, t2);
>      tcg_gen_or_tl(dst, dst, t0);
> +    if (is_isa300(ctx)) {
> +        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
> +        tcg_gen_or_tl(dst, dst, t0);
> +        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
> +        tcg_gen_or_tl(dst, dst, t0);
> +    }
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
>      tcg_temp_free(t2);
>  }
>
> -static void gen_write_xer(TCGv src)
> +static void gen_write_xer(DisasContext *ctx, TCGv src)
>  {
> -    tcg_gen_andi_tl(cpu_xer, src,
> -                    ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
> +    if (is_isa300(ctx)) {
> +        tcg_gen_andi_tl(cpu_xer, src,
> +                        ~((1u << XER_SO) |
> +                          (1u << XER_OV) | (1u << XER_OV32) |
> +                          (1u << XER_CA) | (1u << XER_CA32)));
> +        tcg_gen_extract_tl(cpu_ov32, src, XER_OV32, 1);
> +        tcg_gen_extract_tl(cpu_ca32, src, XER_CA32, 1);
> +    } else {
> +        tcg_gen_andi_tl(cpu_xer, src,
> +                        ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
> +    }

You just determined that power8 does not store all of the bits that are 
written.  We ought to clear more bits here.  Indeed I suspect that the ANDI 
will be able to be shared between these paths.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow Nikunj A Dadhania
  2017-02-22 17:17   ` Richard Henderson
@ 2017-02-22 17:20   ` Richard Henderson
  2017-02-23  6:40     ` Nikunj A Dadhania
  2017-02-23  3:21   ` [Qemu-devel] " David Gibson
  2 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-02-22 17:20 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, bharata

Bah.  Hit return too soon...

On 02/22/2017 10:44 PM, Nikunj A Dadhania wrote:
> -static void gen_read_xer(TCGv dst)
> +static void gen_read_xer(DisasContext *ctx, TCGv dst)
>  {
>      TCGv t0 = tcg_temp_new();
>      TCGv t1 = tcg_temp_new();
> @@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
>      tcg_gen_or_tl(t0, t0, t1);
>      tcg_gen_or_tl(dst, dst, t2);
>      tcg_gen_or_tl(dst, dst, t0);
> +    if (is_isa300(ctx)) {
> +        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
> +        tcg_gen_or_tl(dst, dst, t0);
> +        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
> +        tcg_gen_or_tl(dst, dst, t0);
> +    }
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
>      tcg_temp_free(t2);
>  }
>
> -static void gen_write_xer(TCGv src)
> +static void gen_write_xer(DisasContext *ctx, TCGv src)
>  {
> -    tcg_gen_andi_tl(cpu_xer, src,
> -                    ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
> +    if (is_isa300(ctx)) {
> +        tcg_gen_andi_tl(cpu_xer, src,
> +                        ~((1u << XER_SO) |
> +                          (1u << XER_OV) | (1u << XER_OV32) |
> +                          (1u << XER_CA) | (1u << XER_CA32)));
> +        tcg_gen_extract_tl(cpu_ov32, src, XER_OV32, 1);
> +        tcg_gen_extract_tl(cpu_ca32, src, XER_CA32, 1);
> +    } else {
> +        tcg_gen_andi_tl(cpu_xer, src,
> +                        ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
> +    }
>      tcg_gen_extract_tl(cpu_so, src, XER_SO, 1);
>      tcg_gen_extract_tl(cpu_ov, src, XER_OV, 1);
>      tcg_gen_extract_tl(cpu_ca, src, XER_CA, 1);

These functions are becoming quite large.  Are they performance critical enough 
that they need to stay as inline code, or should they be moved to helpers and 
share code with cpu_read/write_xer?


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/10] target/ppc: update ca32 in arithmetic add
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 04/10] target/ppc: update ca32 in arithmetic add Nikunj A Dadhania
@ 2017-02-22 17:20   ` Richard Henderson
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2017-02-22 17:20 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, bharata

On 02/22/2017 10:44 PM, Nikunj A Dadhania wrote:
> Adds routine to compute ca32 - gen_op_arith_compute_ca32
>
> For 64-bit mode use the compute ca32 routine. While for 32-bit mode, CA
> and CA32 will have same value.
>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/10] target/ppc: update ca32 in arithmetic substract
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 05/10] target/ppc: update ca32 in arithmetic substract Nikunj A Dadhania
@ 2017-02-22 17:21   ` Richard Henderson
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2017-02-22 17:21 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, bharata

On 02/22/2017 10:44 PM, Nikunj A Dadhania wrote:
> +    if (sub) {
> +        tcg_gen_not_tl(t0, arg0);
> +        tcg_gen_xor_tl(t0, t0, arg1);

tcg_gen_eqv_tl.

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/10] target/ppc: update overflow flags for add/sub
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 06/10] target/ppc: update overflow flags for add/sub Nikunj A Dadhania
@ 2017-02-22 17:26   ` Richard Henderson
  2017-02-23  4:46     ` Nikunj A Dadhania
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-02-22 17:26 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, bharata

On 02/22/2017 10:44 PM, Nikunj A Dadhania wrote:
> * SO and OV reflects overflow of the 64-bit result in 64-bit mode and
>   overflow of the low-order 32-bit result in 32-bit mode
>
> * OV32 reflects overflow of the low-order 32-bit independent of the mode
>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target/ppc/translate.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index f3f92aa..43366e7 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -809,10 +809,19 @@ static inline void gen_op_arith_compute_ov(DisasContext *ctx, TCGv arg0,
>          tcg_gen_andc_tl(cpu_ov, cpu_ov, t0);
>      }
>      tcg_temp_free(t0);
> -    if (NARROW_MODE(ctx)) {
> -        tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
> +    if (is_isa300(ctx)) {
> +        tcg_gen_extract_tl(cpu_ov32, cpu_ov, 31, 1);
> +        if (NARROW_MODE(ctx)) {
> +            tcg_gen_mov_tl(cpu_ov, cpu_ov32);
> +        } else {
> +            tcg_gen_extract_tl(cpu_ov, cpu_ov, 63, 1);
> +        }
> +    } else {
> +        if (NARROW_MODE(ctx)) {
> +            tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
> +        }
> +        tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);
>      }
> -    tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);

We're computing this two different ways for no reason.  How about

   if (NARROW_MODE(ctx)) {
     tcg_gen_extract_tl(cpu_ov, cpu_ov, 31, 1);
     if (is_isa300(ctx)) {
         tcg_gen_mov_tl(cpu_ov32, cpu_ov);
     }
   } else {
     if (is_isa300(ctx)) {
         tcg_gen_extract_tl(cpu_ov32, cpu_ov, 31, 1);
     }
     tcg_gen_extract_tl(cpu_ov, cpu_ov, 63, 1);
   }


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow Nikunj A Dadhania
  2017-02-22 17:17   ` Richard Henderson
  2017-02-22 17:20   ` Richard Henderson
@ 2017-02-23  3:21   ` David Gibson
  2017-02-23  5:09     ` Nikunj A Dadhania
  2017-02-23  7:02     ` Nikunj A Dadhania
  2 siblings, 2 replies; 31+ messages in thread
From: David Gibson @ 2017-02-23  3:21 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, bharata

[-- Attachment #1: Type: text/plain, Size: 8803 bytes --]

On Wed, Feb 22, 2017 at 05:14:36PM +0530, Nikunj A Dadhania wrote:
> POWER ISA 3.0 adds CA32 and OV32 status in 64-bit mode. Add the flags
> and corresponding defines.
> 
> Moreover, CA32 is updated when CA is updated and OV32 is updated when OV
> is updated.
> 
> Arithmetic instructions:
>     * Addition and Substractions:
> 
>         addic, addic., subfic, addc, subfc, adde, subfe, addme, subfme,
>         addze, and subfze always updates CA and CA32.
> 
>         => CA reflects the carry out of bit 0 in 64-bit mode and out of
>            bit 32 in 32-bit mode.
>         => CA32 reflects the carry out of bit 32 independent of the
>            mode.
> 
>         => SO and OV reflects overflow of the 64-bit result in 64-bit
>            mode and overflow of the low-order 32-bit result in 32-bit
>            mode
>         => OV32 reflects overflow of the low-order 32-bit independent of
>            the mode
> 
>     * Multiply Low and Divide:
> 
>         For mulld, divd, divde, divdu and divdeu: SO, OV, and OV32 bits
>         reflects overflow of the 64-bit result
> 
>         For mullw, divw, divwe, divwu and divweu: SO, OV, and OV32 bits
>         reflects overflow of the 32-bit result
> 
>      * Negate with OE=1 (nego)
> 
>        For 64-bit mode if the register RA contains
>        0x8000_0000_0000_0000, OV and OV32 are set to 1.
> 
>        For 32-bit mode if the register RA contains 0x8000_0000, OV and
>        OV32 are set to 1.
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target/ppc/cpu.c            | 19 +++++++++++++++++--
>  target/ppc/cpu.h            |  7 +++++++
>  target/ppc/translate.c      | 29 ++++++++++++++++++++++++-----
>  target/ppc/translate_init.c |  4 ++--
>  4 files changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
> index de3004b..89c1ccb 100644
> --- a/target/ppc/cpu.c
> +++ b/target/ppc/cpu.c
> @@ -23,8 +23,15 @@
>  
>  target_ulong cpu_read_xer(CPUPPCState *env)
>  {
> -    return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
> +    target_ulong xer;
> +
> +    xer = env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
>          (env->ca << XER_CA);
> +
> +    if (is_isa300(env)) {
> +        xer |= (env->ov32 << XER_OV32) | (env->ca32 << XER_CA32);
> +    }
> +    return xer;
>  }
>  
>  void cpu_write_xer(CPUPPCState *env, target_ulong xer)
> @@ -32,5 +39,13 @@ void cpu_write_xer(CPUPPCState *env, target_ulong xer)
>      env->so = (xer >> XER_SO) & 1;
>      env->ov = (xer >> XER_OV) & 1;
>      env->ca = (xer >> XER_CA) & 1;
> -    env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
> +    if (is_isa300(env)) {
> +        env->ov32 = (xer >> XER_OV32) & 1;
> +        env->ca32 = (xer >> XER_CA32) & 1;

I think these might as well be unconditional - as long as the read_xer
doesn't read the bits back, the guest won't care that we track them in
internal state.

I'm also wondering if it might be worth adding a xer_mask to the env,
instead of explicitly checking isa300 all over the place.

> +        env->xer = xer & ~((1ul << XER_SO) |
> +                           (1ul << XER_OV) | (1ul << XER_CA) |
> +                           (1ul << XER_OV32) | (1ul << XER_CA32));
> +    } else {
> +        env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
> +    }

And you can definitely use the stricer mask for both archs.  If it's
ISA300, you've stashed them elsewhere, if it's not those bits are
invalid anyway,

(Incidentally given the modern balance between the cost of
instructions and cachelines, I wonder if all these split out bits of
the XER are a good idea in any case, but that would be a big change
out  of scope for what you're attempting here)

>  }
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b559b67..ee2eb45 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -965,6 +965,8 @@ struct CPUPPCState {
>      target_ulong so;
>      target_ulong ov;
>      target_ulong ca;
> +    target_ulong ov32;
> +    target_ulong ca32;
>      /* Reservation address */
>      target_ulong reserve_addr;
>      /* Reservation value */
> @@ -1372,11 +1374,15 @@ int ppc_compat_max_threads(PowerPCCPU *cpu);
>  #define XER_SO  31
>  #define XER_OV  30
>  #define XER_CA  29
> +#define XER_OV32  19
> +#define XER_CA32  18
>  #define XER_CMP  8
>  #define XER_BC   0
>  #define xer_so  (env->so)
>  #define xer_ov  (env->ov)
>  #define xer_ca  (env->ca)
> +#define xer_ov32  (env->ov)
> +#define xer_ca32  (env->ca)
>  #define xer_cmp ((env->xer >> XER_CMP) & 0xFF)
>  #define xer_bc  ((env->xer >> XER_BC)  & 0x7F)
>  
> @@ -2343,6 +2349,7 @@ enum {
>  
>  /*****************************************************************************/
>  
> +#define is_isa300(ctx) (!!(ctx->insns_flags2 & PPC2_ISA300))
>  target_ulong cpu_read_xer(CPUPPCState *env);
>  void cpu_write_xer(CPUPPCState *env, target_ulong xer);
>  
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b09e16f..c9f6768 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -71,7 +71,7 @@ static TCGv cpu_lr;
>  #if defined(TARGET_PPC64)
>  static TCGv cpu_cfar;
>  #endif
> -static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca;
> +static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
>  static TCGv cpu_reserve;
>  static TCGv cpu_fpscr;
>  static TCGv_i32 cpu_access_type;
> @@ -173,6 +173,10 @@ void ppc_translate_init(void)
>                                  offsetof(CPUPPCState, ov), "OV");
>      cpu_ca = tcg_global_mem_new(cpu_env,
>                                  offsetof(CPUPPCState, ca), "CA");
> +    cpu_ov32 = tcg_global_mem_new(cpu_env,
> +                                  offsetof(CPUPPCState, ov32), "OV32");
> +    cpu_ca32 = tcg_global_mem_new(cpu_env,
> +                                  offsetof(CPUPPCState, ca32), "CA32");
>  
>      cpu_reserve = tcg_global_mem_new(cpu_env,
>                                       offsetof(CPUPPCState, reserve_addr),
> @@ -3703,7 +3707,7 @@ static void gen_tdi(DisasContext *ctx)
>  
>  /***                          Processor control                            ***/
>  
> -static void gen_read_xer(TCGv dst)
> +static void gen_read_xer(DisasContext *ctx, TCGv dst)
>  {
>      TCGv t0 = tcg_temp_new();
>      TCGv t1 = tcg_temp_new();
> @@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
>      tcg_gen_or_tl(t0, t0, t1);
>      tcg_gen_or_tl(dst, dst, t2);
>      tcg_gen_or_tl(dst, dst, t0);
> +    if (is_isa300(ctx)) {
> +        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
> +        tcg_gen_or_tl(dst, dst, t0);
> +        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
> +        tcg_gen_or_tl(dst, dst, t0);

Could you use 2 deposits here, instead of 2 shifts and 2 ors?

> +    }
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
>      tcg_temp_free(t2);
>  }
>  
> -static void gen_write_xer(TCGv src)
> +static void gen_write_xer(DisasContext *ctx, TCGv src)
>  {
> -    tcg_gen_andi_tl(cpu_xer, src,
> -                    ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
> +    if (is_isa300(ctx)) {
> +        tcg_gen_andi_tl(cpu_xer, src,
> +                        ~((1u << XER_SO) |
> +                          (1u << XER_OV) | (1u << XER_OV32) |
> +                          (1u << XER_CA) | (1u << XER_CA32)));
> +        tcg_gen_extract_tl(cpu_ov32, src, XER_OV32, 1);
> +        tcg_gen_extract_tl(cpu_ca32, src, XER_CA32, 1);
> +    } else {
> +        tcg_gen_andi_tl(cpu_xer, src,
> +                        ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
> +    }
>      tcg_gen_extract_tl(cpu_so, src, XER_SO, 1);
>      tcg_gen_extract_tl(cpu_ov, src, XER_OV, 1);
>      tcg_gen_extract_tl(cpu_ca, src, XER_CA, 1);
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index be35cbd..eb667bb 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -107,12 +107,12 @@ static void spr_access_nop(DisasContext *ctx, int sprn, int gprn)
>  /* XER */
>  static void spr_read_xer (DisasContext *ctx, int gprn, int sprn)
>  {
> -    gen_read_xer(cpu_gpr[gprn]);
> +    gen_read_xer(ctx, cpu_gpr[gprn]);
>  }
>  
>  static void spr_write_xer (DisasContext *ctx, int sprn, int gprn)
>  {
> -    gen_write_xer(cpu_gpr[gprn]);
> +    gen_write_xer(ctx, cpu_gpr[gprn]);
>  }
>  
>  /* LR */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15
  2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
                   ` (9 preceding siblings ...)
  2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 10/10] target/ppc: add mcrxrx instruction Nikunj A Dadhania
@ 2017-02-23  3:27 ` David Gibson
  10 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2017-02-23  3:27 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, bharata

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

On Wed, Feb 22, 2017 at 05:14:33PM +0530, Nikunj A Dadhania wrote:
> This series contains implentation of CA32 and OV32 bits added to the 
> ISA 3.0. Various fixed-point arithmetic instructions are updated to take
> care of the newer flags. 
> 
> Finally the last patch adds new instruction mcrxrx, that helps reading 
> the carry (CA and CA32) and the overflow (OV and OV32) flags

I've applied patches 1 & 2 to ppc-for-2.9.

The rest I've left for a resend pending my comments and rth's.

> 
> Changelog:
> v2: 
> * Add missing condition in narrow mode(add/subf), multiply and divide
> * Drop nego patch, subf implementation is sufficient for setting OV and OV32
> * Retaining neg[.], as the code is simplified.
> * Fix OV resetting in compute_ov()
> 
> v1: 
> * Use these ISA 3.0 flag to enable CA32 and OV32
> * Re-write ca32 compute routine
> * Add setting of flags for "neg." and "nego."
> 
> Nikunj A Dadhania (10):
>   target/ppc: move cpu_[read, write]_xer to cpu.c
>   target/ppc: optimize gen_write_xer()
>   target/ppc: support for 32-bit carry and overflow
>   target/ppc: update ca32 in arithmetic add
>   target/ppc: update ca32 in arithmetic substract
>   target/ppc: update overflow flags for add/sub
>   target/ppc: use tcg ops for neg instruction
>   target/ppc: add ov32 flag for multiply low insns
>   target/ppc: add ov32 flag in divide operations
>   target/ppc: add mcrxrx instruction
> 
>  target/ppc/Makefile.objs    |   1 +
>  target/ppc/cpu.c            |  51 ++++++++++++++++++
>  target/ppc/cpu.h            |  21 ++++----
>  target/ppc/int_helper.c     |  53 +++++++-----------
>  target/ppc/translate.c      | 128 ++++++++++++++++++++++++++++++++++++++------
>  target/ppc/translate_init.c |   4 +-
>  6 files changed, 194 insertions(+), 64 deletions(-)
>  create mode 100644 target/ppc/cpu.c
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/10] target/ppc: update overflow flags for add/sub
  2017-02-22 17:26   ` Richard Henderson
@ 2017-02-23  4:46     ` Nikunj A Dadhania
  0 siblings, 0 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-23  4:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc, david; +Cc: qemu-devel, bharata

Richard Henderson <rth@twiddle.net> writes:

> On 02/22/2017 10:44 PM, Nikunj A Dadhania wrote:
>> * SO and OV reflects overflow of the 64-bit result in 64-bit mode and
>>   overflow of the low-order 32-bit result in 32-bit mode
>>
>> * OV32 reflects overflow of the low-order 32-bit independent of the mode
>>
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  target/ppc/translate.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index f3f92aa..43366e7 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -809,10 +809,19 @@ static inline void gen_op_arith_compute_ov(DisasContext *ctx, TCGv arg0,
>>          tcg_gen_andc_tl(cpu_ov, cpu_ov, t0);
>>      }
>>      tcg_temp_free(t0);
>> -    if (NARROW_MODE(ctx)) {
>> -        tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
>> +    if (is_isa300(ctx)) {
>> +        tcg_gen_extract_tl(cpu_ov32, cpu_ov, 31, 1);
>> +        if (NARROW_MODE(ctx)) {
>> +            tcg_gen_mov_tl(cpu_ov, cpu_ov32);
>> +        } else {
>> +            tcg_gen_extract_tl(cpu_ov, cpu_ov, 63, 1);
>> +        }
>> +    } else {
>> +        if (NARROW_MODE(ctx)) {
>> +            tcg_gen_ext32s_tl(cpu_ov, cpu_ov);
>> +        }
>> +        tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);
>>      }
>> -    tcg_gen_shri_tl(cpu_ov, cpu_ov, TARGET_LONG_BITS - 1);
>
> We're computing this two different ways for no reason.  How about
>
>    if (NARROW_MODE(ctx)) {
>      tcg_gen_extract_tl(cpu_ov, cpu_ov, 31, 1);
>      if (is_isa300(ctx)) {
>          tcg_gen_mov_tl(cpu_ov32, cpu_ov);
>      }
>    } else {
>      if (is_isa300(ctx)) {
>          tcg_gen_extract_tl(cpu_ov32, cpu_ov, 31, 1);
>      }
>      tcg_gen_extract_tl(cpu_ov, cpu_ov, 63, 1);
>    }

Yes, no need to extend-sign and shift. Will incorparate.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23  3:21   ` [Qemu-devel] " David Gibson
@ 2017-02-23  5:09     ` Nikunj A Dadhania
  2017-02-23  5:32       ` David Gibson
  2017-02-23  7:02     ` Nikunj A Dadhania
  1 sibling, 1 reply; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-23  5:09 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, bharata

David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
>> index de3004b..89c1ccb 100644
>> --- a/target/ppc/cpu.c
>> +++ b/target/ppc/cpu.c
>> @@ -23,8 +23,15 @@
>>  
>>  target_ulong cpu_read_xer(CPUPPCState *env)
>>  {
>> -    return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
>> +    target_ulong xer;
>> +
>> +    xer = env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
>>          (env->ca << XER_CA);
>> +
>> +    if (is_isa300(env)) {
>> +        xer |= (env->ov32 << XER_OV32) | (env->ca32 << XER_CA32);
>> +    }
>> +    return xer;
>>  }
>>  
>>  void cpu_write_xer(CPUPPCState *env, target_ulong xer)
>> @@ -32,5 +39,13 @@ void cpu_write_xer(CPUPPCState *env, target_ulong xer)
>>      env->so = (xer >> XER_SO) & 1;
>>      env->ov = (xer >> XER_OV) & 1;
>>      env->ca = (xer >> XER_CA) & 1;
>> -    env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
>> +    if (is_isa300(env)) {
>> +        env->ov32 = (xer >> XER_OV32) & 1;
>> +        env->ca32 = (xer >> XER_CA32) & 1;
>
> I think these might as well be unconditional - as long as the read_xer
> doesn't read the bits back, the guest won't care that we track them in
> internal state.

Sure.


> I'm also wondering if it might be worth adding a xer_mask to the env,
> instead of explicitly checking isa300 all over the place.

Let me try that out.

Can we also update ov32/ca32 in all the arithmetic operations as if its
supported. And as you suggested, whenever there is a read attempted,
only give relevant bits back(xer_mask). This would save lot of
conditions in translations (couple of more tcg-ops for non-isa300)

>
>> +        env->xer = xer & ~((1ul << XER_SO) |
>> +                           (1ul << XER_OV) | (1ul << XER_CA) |
>> +                           (1ul << XER_OV32) | (1ul << XER_CA32));
>> +    } else {
>> +        env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
>> +    }
>
> And you can definitely use the stricer mask for both archs.  If it's
> ISA300, you've stashed them elsewhere, if it's not those bits are
> invalid anyway,
>
> (Incidentally given the modern balance between the cost of
> instructions and cachelines, I wonder if all these split out bits of
> the XER are a good idea in any case, but that would be a big change
> out  of scope for what you're attempting here)

Will have a look at this after finishing isa300. I have faced issues
with RISU wrt having the state stashed in different tcg variables.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23  5:09     ` Nikunj A Dadhania
@ 2017-02-23  5:32       ` David Gibson
  0 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2017-02-23  5:32 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, bharata

[-- Attachment #1: Type: text/plain, Size: 3315 bytes --]

On Thu, Feb 23, 2017 at 10:39:47AM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> >> 
> >> diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
> >> index de3004b..89c1ccb 100644
> >> --- a/target/ppc/cpu.c
> >> +++ b/target/ppc/cpu.c
> >> @@ -23,8 +23,15 @@
> >>  
> >>  target_ulong cpu_read_xer(CPUPPCState *env)
> >>  {
> >> -    return env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
> >> +    target_ulong xer;
> >> +
> >> +    xer = env->xer | (env->so << XER_SO) | (env->ov << XER_OV) |
> >>          (env->ca << XER_CA);
> >> +
> >> +    if (is_isa300(env)) {
> >> +        xer |= (env->ov32 << XER_OV32) | (env->ca32 << XER_CA32);
> >> +    }
> >> +    return xer;
> >>  }
> >>  
> >>  void cpu_write_xer(CPUPPCState *env, target_ulong xer)
> >> @@ -32,5 +39,13 @@ void cpu_write_xer(CPUPPCState *env, target_ulong xer)
> >>      env->so = (xer >> XER_SO) & 1;
> >>      env->ov = (xer >> XER_OV) & 1;
> >>      env->ca = (xer >> XER_CA) & 1;
> >> -    env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
> >> +    if (is_isa300(env)) {
> >> +        env->ov32 = (xer >> XER_OV32) & 1;
> >> +        env->ca32 = (xer >> XER_CA32) & 1;
> >
> > I think these might as well be unconditional - as long as the read_xer
> > doesn't read the bits back, the guest won't care that we track them in
> > internal state.
> 
> Sure.
> 
> 
> > I'm also wondering if it might be worth adding a xer_mask to the env,
> > instead of explicitly checking isa300 all over the place.
> 
> Let me try that out.
> 
> Can we also update ov32/ca32 in all the arithmetic operations as if its
> supported. And as you suggested, whenever there is a read attempted,
> only give relevant bits back(xer_mask). This would save lot of
> conditions in translations (couple of more tcg-ops for non-isa300)

So if it was a straight trade-off between conditions and math
operations, I'd pick the extra math every time.  However, in this case
we're trading off math on every execution, versus a condition only on
translation, which should occur less often.  So in this case I suspect
it's worth keeping the conditional.

> >> +        env->xer = xer & ~((1ul << XER_SO) |
> >> +                           (1ul << XER_OV) | (1ul << XER_CA) |
> >> +                           (1ul << XER_OV32) | (1ul << XER_CA32));
> >> +    } else {
> >> +        env->xer = xer & ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA));
> >> +    }
> >
> > And you can definitely use the stricer mask for both archs.  If it's
> > ISA300, you've stashed them elsewhere, if it's not those bits are
> > invalid anyway,
> >
> > (Incidentally given the modern balance between the cost of
> > instructions and cachelines, I wonder if all these split out bits of
> > the XER are a good idea in any case, but that would be a big change
> > out  of scope for what you're attempting here)
> 
> Will have a look at this after finishing isa300. I have faced issues
> with RISU wrt having the state stashed in different tcg variables.

Thanks.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-22 17:20   ` Richard Henderson
@ 2017-02-23  6:40     ` Nikunj A Dadhania
  2017-02-23 22:34       ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-23  6:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc, david; +Cc: qemu-devel, bharata

Richard Henderson <rth@twiddle.net> writes:

> Bah.  Hit return too soon...
>
> On 02/22/2017 10:44 PM, Nikunj A Dadhania wrote:
>> -static void gen_read_xer(TCGv dst)
>> +static void gen_read_xer(DisasContext *ctx, TCGv dst)
>>  {
>>      TCGv t0 = tcg_temp_new();
>>      TCGv t1 = tcg_temp_new();
>> @@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
>>      tcg_gen_or_tl(t0, t0, t1);
>>      tcg_gen_or_tl(dst, dst, t2);
>>      tcg_gen_or_tl(dst, dst, t0);
>> +    if (is_isa300(ctx)) {
>> +        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
>> +        tcg_gen_or_tl(dst, dst, t0);
>> +        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
>> +        tcg_gen_or_tl(dst, dst, t0);
>> +    }
>>      tcg_temp_free(t0);
>>      tcg_temp_free(t1);
>>      tcg_temp_free(t2);
>>  }
>>
>> -static void gen_write_xer(TCGv src)
>> +static void gen_write_xer(DisasContext *ctx, TCGv src)
>>  {
>> -    tcg_gen_andi_tl(cpu_xer, src,
>> -                    ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
>> +    if (is_isa300(ctx)) {
>> +        tcg_gen_andi_tl(cpu_xer, src,
>> +                        ~((1u << XER_SO) |
>> +                          (1u << XER_OV) | (1u << XER_OV32) |
>> +                          (1u << XER_CA) | (1u << XER_CA32)));
>> +        tcg_gen_extract_tl(cpu_ov32, src, XER_OV32, 1);
>> +        tcg_gen_extract_tl(cpu_ca32, src, XER_CA32, 1);
>> +    } else {
>> +        tcg_gen_andi_tl(cpu_xer, src,
>> +                        ~((1u << XER_SO) | (1u << XER_OV) | (1u << XER_CA)));
>> +    }
>>      tcg_gen_extract_tl(cpu_so, src, XER_SO, 1);
>>      tcg_gen_extract_tl(cpu_ov, src, XER_OV, 1);
>>      tcg_gen_extract_tl(cpu_ca, src, XER_CA, 1);
>
> These functions are becoming quite large.  Are they performance critical enough 
> that they need to stay as inline code, or should they be moved to helpers and 
> share code with cpu_read/write_xer?

Just to boot to login prompt, these are the numbers for gen_read/write_xer:

helper_myprint - rd_count 231103, wr_count 68897

And it keeps on incrementing, maybe scope of optimization here.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23  3:21   ` [Qemu-devel] " David Gibson
  2017-02-23  5:09     ` Nikunj A Dadhania
@ 2017-02-23  7:02     ` Nikunj A Dadhania
  2017-02-23  9:29       ` David Gibson
  2017-02-23 22:36       ` Richard Henderson
  1 sibling, 2 replies; 31+ messages in thread
From: Nikunj A Dadhania @ 2017-02-23  7:02 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, bharata

David Gibson <david@gibson.dropbear.id.au> writes:

> -static void gen_read_xer(TCGv dst)
>> +static void gen_read_xer(DisasContext *ctx, TCGv dst)
>>  {
>>      TCGv t0 = tcg_temp_new();
>>      TCGv t1 = tcg_temp_new();
>> @@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
>>      tcg_gen_or_tl(t0, t0, t1);
>>      tcg_gen_or_tl(dst, dst, t2);
>>      tcg_gen_or_tl(dst, dst, t0);
>> +    if (is_isa300(ctx)) {
>> +        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
>> +        tcg_gen_or_tl(dst, dst, t0);
>> +        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
>> +        tcg_gen_or_tl(dst, dst, t0);
>
> Could you use 2 deposits here, instead of 2 shifts and 2 ors?

I checked the implementation of tcg_gen_deposit_i64, resultant will have much
more than 2 shifts + 2 ors.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23  7:02     ` Nikunj A Dadhania
@ 2017-02-23  9:29       ` David Gibson
  2017-02-23 22:36       ` Richard Henderson
  1 sibling, 0 replies; 31+ messages in thread
From: David Gibson @ 2017-02-23  9:29 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, bharata

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

On Thu, Feb 23, 2017 at 12:32:44PM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > -static void gen_read_xer(TCGv dst)
> >> +static void gen_read_xer(DisasContext *ctx, TCGv dst)
> >>  {
> >>      TCGv t0 = tcg_temp_new();
> >>      TCGv t1 = tcg_temp_new();
> >> @@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
> >>      tcg_gen_or_tl(t0, t0, t1);
> >>      tcg_gen_or_tl(dst, dst, t2);
> >>      tcg_gen_or_tl(dst, dst, t0);
> >> +    if (is_isa300(ctx)) {
> >> +        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
> >> +        tcg_gen_or_tl(dst, dst, t0);
> >> +        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
> >> +        tcg_gen_or_tl(dst, dst, t0);
> >
> > Could you use 2 deposits here, instead of 2 shifts and 2 ors?
> 
> I checked the implementation of tcg_gen_deposit_i64, resultant will have much
> more than 2 shifts + 2 ors.

Ok, fair enough.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23  6:40     ` Nikunj A Dadhania
@ 2017-02-23 22:34       ` Richard Henderson
  2017-02-23 22:53         ` David Gibson
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-02-23 22:34 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, bharata

On 02/23/2017 05:40 PM, Nikunj A Dadhania wrote:
> Richard Henderson <rth@twiddle.net> writes:
>> These functions are becoming quite large.  Are they performance critical enough
>> that they need to stay as inline code, or should they be moved to helpers and
>> share code with cpu_read/write_xer?
>
> Just to boot to login prompt, these are the numbers for gen_read/write_xer:
>
> helper_myprint - rd_count 231103, wr_count 68897
>
> And it keeps on incrementing, maybe scope of optimization here.

That's not very large considering the total number of instructions executed 
during a boot to prompt.

Thoughts, David?


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23  7:02     ` Nikunj A Dadhania
  2017-02-23  9:29       ` David Gibson
@ 2017-02-23 22:36       ` Richard Henderson
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2017-02-23 22:36 UTC (permalink / raw)
  To: Nikunj A Dadhania, David Gibson; +Cc: bharata, qemu-ppc, qemu-devel

On 02/23/2017 06:02 PM, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
>
>> -static void gen_read_xer(TCGv dst)
>>> +static void gen_read_xer(DisasContext *ctx, TCGv dst)
>>>  {
>>>      TCGv t0 = tcg_temp_new();
>>>      TCGv t1 = tcg_temp_new();
>>> @@ -3715,15 +3719,30 @@ static void gen_read_xer(TCGv dst)
>>>      tcg_gen_or_tl(t0, t0, t1);
>>>      tcg_gen_or_tl(dst, dst, t2);
>>>      tcg_gen_or_tl(dst, dst, t0);
>>> +    if (is_isa300(ctx)) {
>>> +        tcg_gen_shli_tl(t0, cpu_ov32, XER_OV32);
>>> +        tcg_gen_or_tl(dst, dst, t0);
>>> +        tcg_gen_shli_tl(t0, cpu_ca32, XER_CA32);
>>> +        tcg_gen_or_tl(dst, dst, t0);
>>
>> Could you use 2 deposits here, instead of 2 shifts and 2 ors?
>
> I checked the implementation of tcg_gen_deposit_i64, resultant will have much
> more than 2 shifts + 2 ors.

Well, that depends on the host.  For a host that implements deposit, like 
aarch64 or ppc64, it will be one instruction.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23 22:34       ` Richard Henderson
@ 2017-02-23 22:53         ` David Gibson
  2017-02-24  0:41           ` [Qemu-devel] [Qemu-ppc] " Nikunj Dadhania
  0 siblings, 1 reply; 31+ messages in thread
From: David Gibson @ 2017-02-23 22:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Nikunj A Dadhania, qemu-ppc, qemu-devel, bharata

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On Fri, Feb 24, 2017 at 09:34:32AM +1100, Richard Henderson wrote:
> On 02/23/2017 05:40 PM, Nikunj A Dadhania wrote:
> > Richard Henderson <rth@twiddle.net> writes:
> > > These functions are becoming quite large.  Are they performance critical enough
> > > that they need to stay as inline code, or should they be moved to helpers and
> > > share code with cpu_read/write_xer?
> > 
> > Just to boot to login prompt, these are the numbers for gen_read/write_xer:
> > 
> > helper_myprint - rd_count 231103, wr_count 68897
> > 
> > And it keeps on incrementing, maybe scope of optimization here.
> 
> That's not very large considering the total number of instructions executed
> during a boot to prompt.
> 
> Thoughts, David?

Hm, I'm not clear if that's the number of executions, or the number of
translations.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-23 22:53         ` David Gibson
@ 2017-02-24  0:41           ` Nikunj Dadhania
  2017-02-24  4:50             ` David Gibson
  0 siblings, 1 reply; 31+ messages in thread
From: Nikunj Dadhania @ 2017-02-24  0:41 UTC (permalink / raw)
  To: David Gibson; +Cc: Richard Henderson, qemu-ppc, qemu-devel, Bharata B Rao

On 24 February 2017 at 04:23, David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Feb 24, 2017 at 09:34:32AM +1100, Richard Henderson wrote:
>> On 02/23/2017 05:40 PM, Nikunj A Dadhania wrote:
>> > Richard Henderson <rth@twiddle.net> writes:
>> > > These functions are becoming quite large.  Are they performance critical enough
>> > > that they need to stay as inline code, or should they be moved to helpers and
>> > > share code with cpu_read/write_xer?
>> >
>> > Just to boot to login prompt, these are the numbers for gen_read/write_xer:
>> >
>> > helper_myprint - rd_count 231103, wr_count 68897
>> >
>> > And it keeps on incrementing, maybe scope of optimization here.
>>
>> That's not very large considering the total number of instructions executed
>> during a boot to prompt.
>>
>> Thoughts, David?
>
> Hm, I'm not clear if that's the number of executions, or the number of
> translations.

That is number of executions.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-24  0:41           ` [Qemu-devel] [Qemu-ppc] " Nikunj Dadhania
@ 2017-02-24  4:50             ` David Gibson
  2017-02-24  6:30               ` Richard Henderson
  0 siblings, 1 reply; 31+ messages in thread
From: David Gibson @ 2017-02-24  4:50 UTC (permalink / raw)
  To: Nikunj Dadhania; +Cc: Richard Henderson, qemu-ppc, qemu-devel, Bharata B Rao

[-- Attachment #1: Type: text/plain, Size: 1482 bytes --]

On Fri, Feb 24, 2017 at 06:11:30AM +0530, Nikunj Dadhania wrote:
> On 24 February 2017 at 04:23, David Gibson <david@gibson.dropbear.id.au> wrote:
> > On Fri, Feb 24, 2017 at 09:34:32AM +1100, Richard Henderson wrote:
> >> On 02/23/2017 05:40 PM, Nikunj A Dadhania wrote:
> >> > Richard Henderson <rth@twiddle.net> writes:
> >> > > These functions are becoming quite large.  Are they performance critical enough
> >> > > that they need to stay as inline code, or should they be moved to helpers and
> >> > > share code with cpu_read/write_xer?
> >> >
> >> > Just to boot to login prompt, these are the numbers for gen_read/write_xer:
> >> >
> >> > helper_myprint - rd_count 231103, wr_count 68897
> >> >
> >> > And it keeps on incrementing, maybe scope of optimization here.
> >>
> >> That's not very large considering the total number of instructions executed
> >> during a boot to prompt.
> >>
> >> Thoughts, David?
> >
> > Hm, I'm not clear if that's the number of executions, or the number of
> > translations.
> 
> That is number of executions.

Ok, I guess that's not that big, then.  I guess moving them into
helpers would make sense.

Although I guess they'd shrink right down again if we put an
env->xer_mask in.  Thoughts on that option Richard?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-24  4:50             ` David Gibson
@ 2017-02-24  6:30               ` Richard Henderson
  2017-02-27  1:39                 ` David Gibson
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-02-24  6:30 UTC (permalink / raw)
  To: David Gibson, Nikunj Dadhania; +Cc: qemu-ppc, qemu-devel, Bharata B Rao

On 02/24/2017 03:50 PM, David Gibson wrote:
> Although I guess they'd shrink right down again if we put an
> env->xer_mask in.  Thoughts on that option Richard?

Why would xer_mask shrink the code?  I can't see that we'd be able to eliminate 
any code using the mask.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow
  2017-02-24  6:30               ` Richard Henderson
@ 2017-02-27  1:39                 ` David Gibson
  0 siblings, 0 replies; 31+ messages in thread
From: David Gibson @ 2017-02-27  1:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Nikunj Dadhania, qemu-ppc, qemu-devel, Bharata B Rao

[-- Attachment #1: Type: text/plain, Size: 740 bytes --]

On Fri, Feb 24, 2017 at 05:30:23PM +1100, Richard Henderson wrote:
> On 02/24/2017 03:50 PM, David Gibson wrote:
> > Although I guess they'd shrink right down again if we put an
> > env->xer_mask in.  Thoughts on that option Richard?
> 
> Why would xer_mask shrink the code?  I can't see that we'd be able to
> eliminate any code using the mask.

Uh.. I think I was thinking about the qemu code, not the generated
code.  It means we could unconditionally and with the xer_mask in some
places, rather than having fiddly conditionals.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-02-27  1:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-22 11:44 [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 Nikunj A Dadhania
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 01/10] target/ppc: move cpu_[read, write]_xer to cpu.c Nikunj A Dadhania
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 02/10] target/ppc: optimize gen_write_xer() Nikunj A Dadhania
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 03/10] target/ppc: support for 32-bit carry and overflow Nikunj A Dadhania
2017-02-22 17:17   ` Richard Henderson
2017-02-22 17:20   ` Richard Henderson
2017-02-23  6:40     ` Nikunj A Dadhania
2017-02-23 22:34       ` Richard Henderson
2017-02-23 22:53         ` David Gibson
2017-02-24  0:41           ` [Qemu-devel] [Qemu-ppc] " Nikunj Dadhania
2017-02-24  4:50             ` David Gibson
2017-02-24  6:30               ` Richard Henderson
2017-02-27  1:39                 ` David Gibson
2017-02-23  3:21   ` [Qemu-devel] " David Gibson
2017-02-23  5:09     ` Nikunj A Dadhania
2017-02-23  5:32       ` David Gibson
2017-02-23  7:02     ` Nikunj A Dadhania
2017-02-23  9:29       ` David Gibson
2017-02-23 22:36       ` Richard Henderson
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 04/10] target/ppc: update ca32 in arithmetic add Nikunj A Dadhania
2017-02-22 17:20   ` Richard Henderson
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 05/10] target/ppc: update ca32 in arithmetic substract Nikunj A Dadhania
2017-02-22 17:21   ` Richard Henderson
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 06/10] target/ppc: update overflow flags for add/sub Nikunj A Dadhania
2017-02-22 17:26   ` Richard Henderson
2017-02-23  4:46     ` Nikunj A Dadhania
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 07/10] target/ppc: use tcg ops for neg instruction Nikunj A Dadhania
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 08/10] target/ppc: add ov32 flag for multiply low insns Nikunj A Dadhania
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 09/10] target/ppc: add ov32 flag in divide operations Nikunj A Dadhania
2017-02-22 11:44 ` [Qemu-devel] [PATCH v3 10/10] target/ppc: add mcrxrx instruction Nikunj A Dadhania
2017-02-23  3:27 ` [Qemu-devel] [PATCH v3 00/10] POWER9 TCG enablements - part15 David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.