All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/9] Add precise/invariant semantics to TGSI
@ 2017-06-11 18:42 Karol Herbst
  2017-06-11 18:42 ` [RFC 3/9] st/glsl_to_tgsi: handle precise modifier Karol Herbst
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev; +Cc: nouveau

Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise
modifiers on variables inside Nouveau.

This series add precise/invariant handling to TGSI, which can be then used by
drivers to disable certain unsafe optimisations which may otherwise alter
calculations, which depend on having the same result across shaders.

This series fixes this bug in Tomb Raider and one CTS test for 4.4 and 4.5

Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to apply the
precise flag on instruction emited in ir_assignment->rhs->accept(); but I found
no other easy way to handle this. Maybe somebody of you has a better idea?

Karol Herbst (9):
  tgsi: add precise flag to tgsi_instruction
  tgsi/dump: print _PRECISE modifier on Instrutions
  st/glsl_to_tgsi: handle precise modifier
  tgsi: populate precise
  tgsi/text: parse _PRECISE modifier
  nv50/ir: add precise field to Instruction
  nv50/ir/tgsi: handle precise for most ALU instructions
  nv50/ir: disable mul+add to mad for precise instructions
  nv50/ir/tgsi: split mad to mul+add

 src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
 src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
 src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
 src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
 src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
 src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
 src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
 src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
 src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92 +++++++++++++++++-----
 src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
 src/mesa/state_tracker/st_pbo.c                    |  2 +-
 15 files changed, 172 insertions(+), 59 deletions(-)

-- 
2.13.1

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC 1/9] tgsi: add precise flag to tgsi_instruction
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-06-11 18:42   ` Karol Herbst
  2017-06-11 18:42   ` [RFC 2/9] tgsi/dump: print _PRECISE modifier on Instrutions Karol Herbst
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_build.c    | 1 +
 src/gallium/include/pipe/p_shader_tokens.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 00843241f8..55e4d064ed 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -642,6 +642,7 @@ tgsi_default_instruction( void )
    instruction.Label = 0;
    instruction.Texture = 0;
    instruction.Memory = 0;
+   instruction.Precise = 0;
    instruction.Padding = 0;
 
    return instruction;
diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h
index 1e08d97329..aa0fb3e3b3 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -638,7 +638,8 @@ struct tgsi_instruction
    unsigned Label      : 1;
    unsigned Texture    : 1;
    unsigned Memory     : 1;
-   unsigned Padding    : 2;
+   unsigned Precise    : 1;
+   unsigned Padding    : 1;
 };
 
 /*
-- 
2.13.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 2/9] tgsi/dump: print _PRECISE modifier on Instrutions
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-06-11 18:42   ` [RFC 1/9] tgsi: add precise flag to tgsi_instruction Karol Herbst
@ 2017-06-11 18:42   ` Karol Herbst
  2017-06-11 18:42   ` [RFC 4/9] tgsi: populate precise Karol Herbst
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index f6eba7424b..b58e64511c 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
@@ -584,6 +584,10 @@ iter_instruction(
       TXT( "_SAT" );
    }
 
+   if (inst->Instruction.Precise) {
+      TXT( "_PRECISE" );
+   }
+
    for (i = 0; i < inst->Instruction.NumDstRegs; i++) {
       const struct tgsi_full_dst_register *dst = &inst->Dst[i];
 
-- 
2.13.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 3/9] st/glsl_to_tgsi: handle precise modifier
  2017-06-11 18:42 [RFC 0/9] Add precise/invariant semantics to TGSI Karol Herbst
@ 2017-06-11 18:42 ` Karol Herbst
       [not found]   ` <20170611184239.7204-4-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-06-11 18:42 ` [RFC 9/9] nv50/ir/tgsi: split mad to mul+add Karol Herbst
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev; +Cc: nouveau

all subexpression inside an ir_assignment needs to be tagged as precise.

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 80 ++++++++++++++++++++++++------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index c5d2e0fcd2..19f90f21fe 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -87,6 +87,13 @@ static int swizzle_for_type(const glsl_type *type, int component = 0)
    return swizzle;
 }
 
+static unsigned is_precise(const ir_variable *ir)
+{
+   if (!ir)
+      return 0;
+   return ir->data.precise || ir->data.invariant;
+}
+
 /**
  * This struct is a corresponding struct to TGSI ureg_src.
  */
@@ -296,6 +303,7 @@ public:
    ir_instruction *ir;
 
    unsigned op:8; /**< TGSI opcode */
+   unsigned precise:1;
    unsigned saturate:1;
    unsigned is_64bit_expanded:1;
    unsigned sampler_base:5;
@@ -435,6 +443,7 @@ public:
    bool have_fma;
    bool use_shared_memory;
    bool has_tex_txf_lz;
+   unsigned precise;
 
    variable_storage *find_variable_storage(ir_variable *var);
 
@@ -505,13 +514,29 @@ public:
                                       st_src_reg src0 = undef_src,
                                       st_src_reg src1 = undef_src,
                                       st_src_reg src2 = undef_src,
-                                      st_src_reg src3 = undef_src);
+                                      st_src_reg src3 = undef_src,
+                                      unsigned precise = 0);
 
    glsl_to_tgsi_instruction *emit_asm(ir_instruction *ir, unsigned op,
                                       st_dst_reg dst, st_dst_reg dst1,
                                       st_src_reg src0 = undef_src,
                                       st_src_reg src1 = undef_src,
                                       st_src_reg src2 = undef_src,
+                                      st_src_reg src3 = undef_src,
+                                      unsigned precise = 0);
+
+   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
+                                      st_dst_reg dst = undef_dst,
+                                      st_src_reg src0 = undef_src,
+                                      st_src_reg src1 = undef_src,
+                                      st_src_reg src2 = undef_src,
+                                      st_src_reg src3 = undef_src);
+
+   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
+                                      st_dst_reg dst, st_dst_reg dst1,
+                                      st_src_reg src0 = undef_src,
+                                      st_src_reg src1 = undef_src,
+                                      st_src_reg src2 = undef_src,
                                       st_src_reg src3 = undef_src);
 
    unsigned get_opcode(unsigned op,
@@ -650,7 +675,8 @@ glsl_to_tgsi_instruction *
 glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
                                st_dst_reg dst, st_dst_reg dst1,
                                st_src_reg src0, st_src_reg src1,
-                               st_src_reg src2, st_src_reg src3)
+                               st_src_reg src2, st_src_reg src3,
+                               unsigned precise)
 {
    glsl_to_tgsi_instruction *inst = new(mem_ctx) glsl_to_tgsi_instruction();
    int num_reladdr = 0, i, j;
@@ -691,6 +717,7 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
    STATIC_ASSERT(TGSI_OPCODE_LAST <= 255);
 
    inst->op = op;
+   inst->precise = precise;
    inst->info = tgsi_get_opcode_info(op);
    inst->dst[0] = dst;
    inst->dst[1] = dst1;
@@ -881,9 +908,28 @@ glsl_to_tgsi_instruction *
 glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
                                st_dst_reg dst,
                                st_src_reg src0, st_src_reg src1,
+                               st_src_reg src2, st_src_reg src3,
+                               unsigned precise)
+{
+   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, precise);
+}
+
+glsl_to_tgsi_instruction *
+glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
+                               st_dst_reg dst,
+                               st_src_reg src0, st_src_reg src1,
+                               st_src_reg src2, st_src_reg src3)
+{
+   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, this->precise);
+}
+
+glsl_to_tgsi_instruction *
+glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
+                               st_dst_reg dst, st_dst_reg dst1,
+                               st_src_reg src0, st_src_reg src1,
                                st_src_reg src2, st_src_reg src3)
 {
-   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3);
+   return emit_asm(ir, op, dst, dst1, src0, src1, src2, src3, this->precise);
 }
 
 /**
@@ -1116,7 +1162,7 @@ glsl_to_tgsi_visitor::emit_arl(ir_instruction *ir,
    if (dst.index >= this->num_address_regs)
       this->num_address_regs = dst.index + 1;
 
-   emit_asm(NULL, op, dst, src0);
+   emit_asm((ir_instruction *)NULL, op, dst, src0);
 }
 
 int
@@ -1406,11 +1452,11 @@ glsl_to_tgsi_visitor::visit(ir_variable *ir)
 void
 glsl_to_tgsi_visitor::visit(ir_loop *ir)
 {
-   emit_asm(NULL, TGSI_OPCODE_BGNLOOP);
+   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BGNLOOP);
 
    visit_exec_list(&ir->body_instructions, this);
 
-   emit_asm(NULL, TGSI_OPCODE_ENDLOOP);
+   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ENDLOOP);
 }
 
 void
@@ -1418,10 +1464,10 @@ glsl_to_tgsi_visitor::visit(ir_loop_jump *ir)
 {
    switch (ir->mode) {
    case ir_loop_jump::jump_break:
-      emit_asm(NULL, TGSI_OPCODE_BRK);
+      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BRK);
       break;
    case ir_loop_jump::jump_continue:
-      emit_asm(NULL, TGSI_OPCODE_CONT);
+      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_CONT);
       break;
    }
 }
@@ -2703,7 +2749,7 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
             st_dst_reg dst = st_dst_reg(get_temp(var->type));
             st_src_reg src = st_src_reg(PROGRAM_OUTPUT, decl->mesa_index,
                                         var->type, component, decl->array_id);
-            emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src);
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_FBFETCH, dst, src);
             entry = new(mem_ctx) variable_storage(var, dst.file, dst.index,
                                                   dst.array_id);
          } else {
@@ -3148,7 +3194,10 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
    st_dst_reg l;
    st_src_reg r;
 
+   /* all generated instructions need to be flaged as precise */
+   this->precise = is_precise(ir->lhs->variable_referenced());
    ir->rhs->accept(this);
+   this->precise = 0;
    r = this->result;
 
    l = get_assignment_lhs(ir->lhs, this, &dst_component);
@@ -3233,7 +3282,8 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
        */
       glsl_to_tgsi_instruction *inst, *new_inst;
       inst = (glsl_to_tgsi_instruction *)this->instructions.get_tail();
-      new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3]);
+      new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3],
+                          is_precise(ir->lhs->variable_referenced()));
       new_inst->saturate = inst->saturate;
       inst->dead_mask = inst->dst[0].writemask;
    } else {
@@ -4072,16 +4122,16 @@ glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference *tail,
 
          deref_arr->array_index->accept(this);
          if (*array_elements != 1)
-            emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements));
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements));
          else
-            emit_asm(NULL, TGSI_OPCODE_MOV, temp_dst, this->result);
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, temp_dst, this->result);
 
          if (indirect->file == PROGRAM_UNDEFINED)
             *indirect = temp_reg;
          else {
             temp_dst = st_dst_reg(*indirect);
             temp_dst.writemask = 1;
-            emit_asm(NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg);
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg);
          }
       } else
          *index += array_index->value.u[0] * *array_elements;
@@ -4141,7 +4191,7 @@ glsl_to_tgsi_visitor::canonicalize_gather_offset(st_src_reg offset)
       st_src_reg tmp = get_temp(glsl_type::ivec2_type);
       st_dst_reg tmp_dst = st_dst_reg(tmp);
       tmp_dst.writemask = WRITEMASK_XY;
-      emit_asm(NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
+      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
       return tmp;
    }
 
@@ -6777,7 +6827,7 @@ get_mesa_program_tgsi(struct gl_context *ctx,
    v->renumber_registers();
 
    /* Write the END instruction. */
-   v->emit_asm(NULL, TGSI_OPCODE_END);
+   v->emit_asm((ir_instruction *)NULL, TGSI_OPCODE_END);
 
    if (ctx->_Shader->Flags & GLSL_DUMP) {
       _mesa_log("\n");
-- 
2.13.1

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 4/9] tgsi: populate precise
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-06-11 18:42   ` [RFC 1/9] tgsi: add precise flag to tgsi_instruction Karol Herbst
  2017-06-11 18:42   ` [RFC 2/9] tgsi/dump: print _PRECISE modifier on Instrutions Karol Herbst
@ 2017-06-11 18:42   ` Karol Herbst
       [not found]     ` <20170611184239.7204-5-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-06-11 18:42   ` [RFC 5/9] tgsi/text: parse _PRECISE modifier Karol Herbst
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Only implemented for glsl->tgsi. Other converters just set precise to 0.

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_build.c       |  3 +++
 src/gallium/auxiliary/tgsi/tgsi_ureg.c        | 14 +++++++---
 src/gallium/auxiliary/tgsi/tgsi_ureg.h        | 20 +++++++++++---
 src/gallium/auxiliary/util/u_simple_shaders.c |  2 +-
 src/gallium/state_trackers/nine/nine_shader.c |  6 ++---
 src/mesa/state_tracker/st_atifs_to_tgsi.c     | 38 +++++++++++++--------------
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp    | 12 ++++-----
 src/mesa/state_tracker/st_mesa_to_tgsi.c      |  8 +++---
 src/mesa/state_tracker/st_pbo.c               |  2 +-
 9 files changed, 65 insertions(+), 40 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 55e4d064ed..144a017768 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -651,6 +651,7 @@ tgsi_default_instruction( void )
 static struct tgsi_instruction
 tgsi_build_instruction(unsigned opcode,
                        unsigned saturate,
+                       unsigned precise,
                        unsigned num_dst_regs,
                        unsigned num_src_regs,
                        struct tgsi_header *header)
@@ -665,6 +666,7 @@ tgsi_build_instruction(unsigned opcode,
    instruction = tgsi_default_instruction();
    instruction.Opcode = opcode;
    instruction.Saturate = saturate;
+   instruction.Precise = precise;
    instruction.NumDstRegs = num_dst_regs;
    instruction.NumSrcRegs = num_src_regs;
 
@@ -1061,6 +1063,7 @@ tgsi_build_full_instruction(
 
    *instruction = tgsi_build_instruction(full_inst->Instruction.Opcode,
                                          full_inst->Instruction.Saturate,
+                                         full_inst->Instruction.Precise,
                                          full_inst->Instruction.NumDstRegs,
                                          full_inst->Instruction.NumSrcRegs,
                                          header);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
index 5bd779728a..56db2252c5 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
@@ -1213,6 +1213,7 @@ struct ureg_emit_insn_result
 ureg_emit_insn(struct ureg_program *ureg,
                unsigned opcode,
                boolean saturate,
+               unsigned precise,
                unsigned num_dst,
                unsigned num_src)
 {
@@ -1226,6 +1227,7 @@ ureg_emit_insn(struct ureg_program *ureg,
    out[0].insn = tgsi_default_instruction();
    out[0].insn.Opcode = opcode;
    out[0].insn.Saturate = saturate;
+   out[0].insn.Precise = precise;
    out[0].insn.NumDstRegs = num_dst;
    out[0].insn.NumSrcRegs = num_src;
 
@@ -1354,7 +1356,8 @@ ureg_insn(struct ureg_program *ureg,
           const struct ureg_dst *dst,
           unsigned nr_dst,
           const struct ureg_src *src,
-          unsigned nr_src )
+          unsigned nr_src,
+          unsigned precise )
 {
    struct ureg_emit_insn_result insn;
    unsigned i;
@@ -1369,6 +1372,7 @@ ureg_insn(struct ureg_program *ureg,
    insn = ureg_emit_insn(ureg,
                          opcode,
                          saturate,
+                         precise,
                          nr_dst,
                          nr_src);
 
@@ -1391,7 +1395,8 @@ ureg_tex_insn(struct ureg_program *ureg,
               const struct tgsi_texture_offset *texoffsets,
               unsigned nr_offset,
               const struct ureg_src *src,
-              unsigned nr_src )
+              unsigned nr_src,
+              unsigned precise )
 {
    struct ureg_emit_insn_result insn;
    unsigned i;
@@ -1406,6 +1411,7 @@ ureg_tex_insn(struct ureg_program *ureg,
    insn = ureg_emit_insn(ureg,
                          opcode,
                          saturate,
+                         precise,
                          nr_dst,
                          nr_src);
 
@@ -1434,7 +1440,8 @@ ureg_memory_insn(struct ureg_program *ureg,
                  unsigned nr_src,
                  unsigned qualifier,
                  unsigned texture,
-                 unsigned format)
+                 unsigned format,
+                 unsigned precise)
 {
    struct ureg_emit_insn_result insn;
    unsigned i;
@@ -1442,6 +1449,7 @@ ureg_memory_insn(struct ureg_program *ureg,
    insn = ureg_emit_insn(ureg,
                          opcode,
                          FALSE,
+                         precise,
                          nr_dst,
                          nr_src);
 
diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
index 54f95ba565..105c85abd5 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
@@ -546,7 +546,8 @@ ureg_insn(struct ureg_program *ureg,
           const struct ureg_dst *dst,
           unsigned nr_dst,
           const struct ureg_src *src,
-          unsigned nr_src );
+          unsigned nr_src,
+          unsigned precise);
 
 
 void
@@ -559,7 +560,8 @@ ureg_tex_insn(struct ureg_program *ureg,
               const struct tgsi_texture_offset *texoffsets,
               unsigned nr_offset,
               const struct ureg_src *src,
-              unsigned nr_src );
+              unsigned nr_src,
+              unsigned precise);
 
 
 void
@@ -571,7 +573,8 @@ ureg_memory_insn(struct ureg_program *ureg,
                  unsigned nr_src,
                  unsigned qualifier,
                  unsigned texture,
-                 unsigned format);
+                 unsigned format,
+                 unsigned precise);
 
 /***********************************************************************
  * Internal instruction helpers, don't call these directly:
@@ -586,6 +589,7 @@ struct ureg_emit_insn_result
 ureg_emit_insn(struct ureg_program *ureg,
                unsigned opcode,
                boolean saturate,
+               unsigned precise,
                unsigned num_dst,
                unsigned num_src);
 
@@ -632,6 +636,7 @@ static inline void ureg_##op( struct ureg_program *ureg )       \
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          0);                                    \
    ureg_fixup_insn_size( ureg, insn.insn_token );               \
 }
@@ -646,6 +651,7 @@ static inline void ureg_##op( struct ureg_program *ureg,        \
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          1);                                    \
    ureg_emit_src( ureg, src );                                  \
    ureg_fixup_insn_size( ureg, insn.insn_token );               \
@@ -661,6 +667,7 @@ static inline void ureg_##op( struct ureg_program *ureg,        \
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          0);                                    \
    ureg_emit_label( ureg, insn.extended_token, label_token );   \
    ureg_fixup_insn_size( ureg, insn.insn_token );               \
@@ -677,6 +684,7 @@ static inline void ureg_##op( struct ureg_program *ureg,        \
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          1);                                    \
    ureg_emit_label( ureg, insn.extended_token, label_token );   \
    ureg_emit_src( ureg, src );                                  \
@@ -694,6 +702,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          0);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -713,6 +722,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          1);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -733,6 +743,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          2);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -756,6 +767,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          2);                                            \
    ureg_emit_texture( ureg, insn.extended_token, target,                \
@@ -780,6 +792,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          3);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -806,6 +819,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          4);                                            \
    ureg_emit_texture( ureg, insn.extended_token, target,                \
diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c b/src/gallium/auxiliary/util/u_simple_shaders.c
index 5874d0e9aa..79331b5638 100644
--- a/src/gallium/auxiliary/util/u_simple_shaders.c
+++ b/src/gallium/auxiliary/util/u_simple_shaders.c
@@ -954,7 +954,7 @@ util_make_geometry_passthrough_shader(struct pipe_context *pipe,
    }
 
    /* EMIT IMM[0] */
-   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1);
+   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1, 0);
 
    /* END */
    ureg_END(ureg);
diff --git a/src/gallium/state_trackers/nine/nine_shader.c b/src/gallium/state_trackers/nine/nine_shader.c
index 40fb6be88f..f405090811 100644
--- a/src/gallium/state_trackers/nine/nine_shader.c
+++ b/src/gallium/state_trackers/nine/nine_shader.c
@@ -1879,7 +1879,7 @@ DECL_SPECIAL(IFC)
     struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
     src[0] = tx_src_param(tx, &tx->insn.src[0]);
     src[1] = tx_src_param(tx, &tx->insn.src[1]);
-    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
+    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
     ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx));
     return D3D_OK;
 }
@@ -1897,7 +1897,7 @@ DECL_SPECIAL(BREAKC)
     struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
     src[0] = tx_src_param(tx, &tx->insn.src[0]);
     src[1] = tx_src_param(tx, &tx->insn.src[1]);
-    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
+    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
     ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx));
     ureg_BRK(tx->ureg);
     tx_endcond(tx);
@@ -3029,7 +3029,7 @@ NineTranslateInstruction_Generic(struct shader_translator *tx)
 
     ureg_insn(tx->ureg, tx->insn.info->opcode,
               dst, tx->insn.ndst,
-              src, tx->insn.nsrc);
+              src, tx->insn.nsrc, 0);
     return D3D_OK;
 }
 
diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c b/src/mesa/state_tracker/st_atifs_to_tgsi.c
index 338ced56ed..e0a6ff7131 100644
--- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
+++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
@@ -105,18 +105,18 @@ apply_swizzle(struct st_translate *t,
       imm[0] = src;
       imm[1] = ureg_imm4f(t->ureg, 1.0f, 1.0f, 0.0f, 0.0f);
       imm[2] = ureg_imm4f(t->ureg, 0.0f, 0.0f, 1.0f, 1.0f);
-      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3);
+      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3, 0);
 
       if (swizzle == GL_SWIZZLE_STR_DR_ATI) {
          imm[0] = ureg_scalar(src, TGSI_SWIZZLE_Z);
       } else {
          imm[0] = ureg_scalar(src, TGSI_SWIZZLE_W);
       }
-      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1);
+      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1, 0);
 
       imm[0] = ureg_src(tmp[0]);
       imm[1] = ureg_src(tmp[1]);
-      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2, 0);
 
       return ureg_src(tmp[0]);
    }
@@ -170,35 +170,35 @@ prepare_argument(struct st_translate *t, const unsigned argId,
       src = ureg_scalar(src, TGSI_SWIZZLE_W);
       break;
    }
-   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1);
+   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1, 0);
 
    if (srcReg->argMod & GL_COMP_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_imm1f(t->ureg, 1.0f);
       modsrc[1] = ureg_negate(ureg_src(arg));
 
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
    }
    if (srcReg->argMod & GL_BIAS_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_src(arg);
       modsrc[1] = ureg_imm1f(t->ureg, -0.5f);
 
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
    }
    if (srcReg->argMod & GL_2X_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_src(arg);
       modsrc[1] = ureg_src(arg);
 
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
    }
    if (srcReg->argMod & GL_NEGATE_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_src(arg);
       modsrc[1] = ureg_imm1f(t->ureg, -1.0f);
 
-      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2, 0);
    }
    return  ureg_src(arg);
 }
@@ -217,25 +217,25 @@ emit_special_inst(struct st_translate *t, const struct instruction_desc *desc,
       tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI + 2); /* re-purpose a3 */
       src[0] = ureg_imm1f(t->ureg, 0.5f);
       src[1] = ureg_negate(args[2]);
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2, 0);
       src[0] = ureg_src(tmp[0]);
       src[1] = args[0];
       src[2] = args[1];
-      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
+      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
    } else if (!strcmp(desc->name, "CND0")) {
       src[0] = args[2];
       src[1] = args[1];
       src[2] = args[0];
-      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
+      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
    } else if (!strcmp(desc->name, "DOT2_ADD")) {
       /* note: DP2A is not implemented in most pipe drivers */
       tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI); /* re-purpose a1 */
       src[0] = args[0];
       src[1] = args[1];
-      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2, 0);
       src[0] = ureg_src(tmp[0]);
       src[1] = ureg_scalar(args[2], TGSI_SWIZZLE_Z);
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2, 0);
    }
 }
 
@@ -249,7 +249,7 @@ emit_arith_inst(struct st_translate *t,
       return;
    }
 
-   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount);
+   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount, 0);
 }
 
 static void
@@ -292,7 +292,7 @@ emit_dstmod(struct st_translate *t,
    if (dstMod & GL_SATURATE_BIT_ATI) {
       dst = ureg_saturate(dst);
    }
-   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2);
+   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2, 0);
 }
 
 /**
@@ -334,9 +334,9 @@ compile_setupinst(struct st_translate *t,
       src[1] = t->samplers[r];
       /* the texture target is still unknown, it will be fixed in the draw call */
       ureg_tex_insn(t->ureg, TGSI_OPCODE_TEX, dst, 1, TGSI_TEXTURE_2D,
-                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2);
+                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2, 0);
    } else if (texinst->Opcode == ATI_FRAGMENT_SHADER_PASS_OP) {
-      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
+      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
    }
 
    t->regs_written[t->current_pass][r] = true;
@@ -408,11 +408,11 @@ finalize_shader(struct st_translate *t, unsigned numPasses)
       /* copy the result into the OUT slot */
       dst[0] = t->outputs[t->outputMapping[FRAG_RESULT_COLOR]];
       src[0] = ureg_src(t->temps[0]);
-      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
+      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
    }
 
    /* signal the end of the program */
-   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0);
+   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0, 0);
 }
 
 /**
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 19f90f21fe..ecd9f9f280 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5900,7 +5900,7 @@ compile_tgsi_instruction(struct st_translate *t,
    case TGSI_OPCODE_IF:
    case TGSI_OPCODE_UIF:
       assert(num_dst == 0);
-      ureg_insn(ureg, inst->op, NULL, 0, src, num_src);
+      ureg_insn(ureg, inst->op, NULL, 0, src, num_src, inst->precise);
       return;
 
    case TGSI_OPCODE_TEX:
@@ -5935,7 +5935,7 @@ compile_tgsi_instruction(struct st_translate *t,
                     tex_target,
                     st_translate_texture_type(inst->tex_type),
                     texoffsets, inst->tex_offset_num_offset,
-                    src, num_src);
+                    src, num_src, inst->precise);
       return;
 
    case TGSI_OPCODE_RESQ:
@@ -5966,7 +5966,7 @@ compile_tgsi_instruction(struct st_translate *t,
       assert(src[0].File != TGSI_FILE_NULL);
       ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
                        inst->buffer_access,
-                       tex_target, inst->image_format);
+                       tex_target, inst->image_format, inst->precise);
       break;
 
    case TGSI_OPCODE_STORE:
@@ -5984,19 +5984,19 @@ compile_tgsi_instruction(struct st_translate *t,
       assert(dst[0].File != TGSI_FILE_NULL);
       ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
                        inst->buffer_access,
-                       tex_target, inst->image_format);
+                       tex_target, inst->image_format, inst->precise);
       break;
 
    case TGSI_OPCODE_SCS:
       dst[0] = ureg_writemask(dst[0], TGSI_WRITEMASK_XY);
-      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src);
+      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src, inst->precise);
       break;
 
    default:
       ureg_insn(ureg,
                 inst->op,
                 dst, num_dst,
-                src, num_src);
+                src, num_src, inst->precise);
       break;
    }
 }
diff --git a/src/mesa/state_tracker/st_mesa_to_tgsi.c b/src/mesa/state_tracker/st_mesa_to_tgsi.c
index 984ff92130..f11013c116 100644
--- a/src/mesa/state_tracker/st_mesa_to_tgsi.c
+++ b/src/mesa/state_tracker/st_mesa_to_tgsi.c
@@ -558,7 +558,7 @@ compile_instruction(
                                                inst->TexShadow ),
                      TGSI_RETURN_TYPE_FLOAT,
                      NULL, 0,
-                     src, num_src );
+                     src, num_src, 0 );
       return;
 
    case OPCODE_SCS:
@@ -566,7 +566,7 @@ compile_instruction(
       ureg_insn( ureg, 
                  translate_opcode( inst->Opcode ), 
                  dst, num_dst, 
-                 src, num_src );
+                 src, num_src, 0 );
       break;
 
    case OPCODE_XPD:
@@ -574,7 +574,7 @@ compile_instruction(
       ureg_insn( ureg, 
                  translate_opcode( inst->Opcode ), 
                  dst, num_dst, 
-                 src, num_src );
+                 src, num_src, 0 );
       break;
 
    case OPCODE_RSQ:
@@ -593,7 +593,7 @@ compile_instruction(
       ureg_insn( ureg, 
                  translate_opcode( inst->Opcode ), 
                  dst, num_dst, 
-                 src, num_src );
+                 src, num_src, 0);
       break;
    }
 }
diff --git a/src/mesa/state_tracker/st_pbo.c b/src/mesa/state_tracker/st_pbo.c
index 303c8535b2..3dff1609e8 100644
--- a/src/mesa/state_tracker/st_pbo.c
+++ b/src/mesa/state_tracker/st_pbo.c
@@ -528,7 +528,7 @@ create_fs(struct st_context *st, bool download, enum pipe_texture_target target,
       op[0] = ureg_src(temp0);
       op[1] = ureg_src(temp1);
       ureg_memory_insn(ureg, TGSI_OPCODE_STORE, &out, 1, op, 2, 0,
-                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE);
+                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE, 0);
 
       ureg_release_temporary(ureg, temp1);
    } else {
-- 
2.13.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 5/9] tgsi/text: parse _PRECISE modifier
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-06-11 18:42   ` [RFC 4/9] tgsi: populate precise Karol Herbst
@ 2017-06-11 18:42   ` Karol Herbst
  2017-06-12 10:31     ` Nicolai Hähnle
  2017-06-11 18:42   ` [RFC 6/9] nv50/ir: add precise field to Instruction Karol Herbst
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 93a05568f4..c5fcb3283d 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -999,6 +999,7 @@ parse_texoffset_operand(
 static boolean
 match_inst(const char **pcur,
            unsigned *saturate,
+           unsigned *precise,
            const struct tgsi_opcode_info *info)
 {
    const char *cur = *pcur;
@@ -1007,6 +1008,7 @@ match_inst(const char **pcur,
    if (str_match_nocase_whole(&cur, info->mnemonic)) {
       *pcur = cur;
       *saturate = 0;
+      *precise = 0;
       return TRUE;
    }
 
@@ -1015,8 +1017,15 @@ match_inst(const char **pcur,
       if (str_match_nocase_whole(&cur, "_SAT")) {
          *pcur = cur;
          *saturate = 1;
-         return TRUE;
       }
+
+      if (str_match_nocase_whole(&cur, "_PRECISE")) {
+         *pcur = cur;
+         *precise = 1;
+      }
+
+      if (*precise || *saturate)
+         return TRUE;
    }
 
    return FALSE;
@@ -1029,6 +1038,7 @@ parse_instruction(
 {
    uint i;
    uint saturate = 0;
+   uint precise = 0;
    const struct tgsi_opcode_info *info;
    struct tgsi_full_instruction inst;
    const char *cur;
@@ -1043,7 +1053,7 @@ parse_instruction(
       cur = ctx->cur;
 
       info = tgsi_get_opcode_info( i );
-      if (match_inst(&cur, &saturate, info)) {
+      if (match_inst(&cur, &saturate, &precise, info)) {
          if (info->num_dst + info->num_src + info->is_tex == 0) {
             ctx->cur = cur;
             break;
@@ -1064,6 +1074,7 @@ parse_instruction(
 
    inst.Instruction.Opcode = i;
    inst.Instruction.Saturate = saturate;
+   inst.Instruction.Precise = precise;
    inst.Instruction.NumDstRegs = info->num_dst;
    inst.Instruction.NumSrcRegs = info->num_src;
 
-- 
2.13.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 6/9] nv50/ir: add precise field to Instruction
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-06-11 18:42   ` [RFC 5/9] tgsi/text: parse _PRECISE modifier Karol Herbst
@ 2017-06-11 18:42   ` Karol Herbst
  2017-06-11 18:42   ` [RFC 7/9] nv50/ir/tgsi: handle precise for most ALU instructions Karol Herbst
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 5c09fed05c..6835c4fa8c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -884,6 +884,7 @@ public:
    unsigned perPatch   : 1;
    unsigned exit       : 1; // terminate program after insn
    unsigned mask       : 4; // for vector ops
+   unsigned precise    : 1; // prevent algebraic optimisations like mul+add to mad
 
    int8_t postFactor; // MUL/DIV(if < 0) by 1 << postFactor
 
-- 
2.13.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 7/9] nv50/ir/tgsi: handle precise for most ALU instructions
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-06-11 18:42   ` [RFC 6/9] nv50/ir: add precise field to Instruction Karol Herbst
@ 2017-06-11 18:42   ` Karol Herbst
  2017-06-11 18:42   ` [RFC 8/9] nv50/ir: disable mul+add to mad for precise instructions Karol Herbst
  2017-06-12 10:42   ` [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI Nicolai Hähnle
  7 siblings, 0 replies; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 1264dd4834..c633185893 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -3179,6 +3179,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
          geni->subOp = tgsi::opcodeToSubOp(tgsi.getOpcode());
          if (op == OP_MUL && dstTy == TYPE_F32)
             geni->dnz = info->io.mul_zero_wins;
+         geni->precise = insn->Instruction.Precise;
       }
       break;
    case TGSI_OPCODE_MAD:
@@ -3192,6 +3193,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
          geni = mkOp3(op, dstTy, dst0[c], src0, src1, src2);
          if (dstTy == TYPE_F32)
             geni->dnz = info->io.mul_zero_wins;
+         geni->precise = insn->Instruction.Precise;
       }
       break;
    case TGSI_OPCODE_MOV:
-- 
2.13.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 8/9] nv50/ir: disable mul+add to mad for precise instructions
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-06-11 18:42   ` [RFC 7/9] nv50/ir/tgsi: handle precise for most ALU instructions Karol Herbst
@ 2017-06-11 18:42   ` Karol Herbst
  2017-06-12 10:42   ` [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI Nicolai Hähnle
  7 siblings, 0 replies; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

fixes missrendering in TombRaider

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 4c92a1efb5..85f3f44832 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1669,6 +1669,10 @@ AlgebraicOpt::handleABS(Instruction *abs)
 bool
 AlgebraicOpt::handleADD(Instruction *add)
 {
+   // we can't optimize to SAD/MAD if the instruction is tagged as precise
+   if (add->precise)
+      return false;
+
    Value *src0 = add->getSrc(0);
    Value *src1 = add->getSrc(1);
 
@@ -1712,7 +1716,7 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, operation toOp)
       return false;
 
    if (src->getInsn()->saturate || src->getInsn()->postFactor ||
-       src->getInsn()->dnz)
+       src->getInsn()->dnz || src->getInsn()->precise)
       return false;
 
    if (toOp == OP_SAD) {
-- 
2.13.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 9/9] nv50/ir/tgsi: split mad to mul+add
  2017-06-11 18:42 [RFC 0/9] Add precise/invariant semantics to TGSI Karol Herbst
  2017-06-11 18:42 ` [RFC 3/9] st/glsl_to_tgsi: handle precise modifier Karol Herbst
@ 2017-06-11 18:42 ` Karol Herbst
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 0 replies; 19+ messages in thread
From: Karol Herbst @ 2017-06-11 18:42 UTC (permalink / raw)
  To: mesa-dev; +Cc: nouveau

fixes
KHR-GL44.gpu_shader5.precise_qualifier
KHR-GL45.gpu_shader5.precise_qualifier

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index c633185893..cd45e82426 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -3184,6 +3184,20 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn)
       break;
    case TGSI_OPCODE_MAD:
    case TGSI_OPCODE_UMAD:
+      FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
+         val0 = getSSA();
+         src0 = fetchSrc(0, c);
+         src1 = fetchSrc(1, c);
+         src2 = fetchSrc(2, c);
+         geni = mkOp2(OP_MUL, dstTy, val0, src0, src1);
+         if (dstTy == TYPE_F32)
+            geni->dnz = info->io.mul_zero_wins;
+         geni->precise = insn->Instruction.Precise;
+
+         geni = mkOp2(OP_ADD, dstTy, dst0[c], val0, src2);
+         geni->precise = insn->Instruction.Precise;
+      }
+      break;
    case TGSI_OPCODE_SAD:
    case TGSI_OPCODE_FMA:
       FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
-- 
2.13.1

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC 5/9] tgsi/text: parse _PRECISE modifier
  2017-06-11 18:42   ` [RFC 5/9] tgsi/text: parse _PRECISE modifier Karol Herbst
@ 2017-06-12 10:31     ` Nicolai Hähnle
  0 siblings, 0 replies; 19+ messages in thread
From: Nicolai Hähnle @ 2017-06-12 10:31 UTC (permalink / raw)
  To: Karol Herbst, mesa-dev; +Cc: nouveau

On 11.06.2017 20:42, Karol Herbst wrote:
> Signed-off-by: Karol Herbst <karolherbst@gmail.com>
> ---
>   src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c
> index 93a05568f4..c5fcb3283d 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_text.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
> @@ -999,6 +999,7 @@ parse_texoffset_operand(
>   static boolean
>   match_inst(const char **pcur,
>              unsigned *saturate,
> +           unsigned *precise,
>              const struct tgsi_opcode_info *info)
>   {
>      const char *cur = *pcur;
> @@ -1007,6 +1008,7 @@ match_inst(const char **pcur,
>      if (str_match_nocase_whole(&cur, info->mnemonic)) {
>         *pcur = cur;
>         *saturate = 0;
> +      *precise = 0;
>         return TRUE;
>      }
>   
> @@ -1015,8 +1017,15 @@ match_inst(const char **pcur,
>         if (str_match_nocase_whole(&cur, "_SAT")) {
>            *pcur = cur;
>            *saturate = 1;
> -         return TRUE;
>         }
> +
> +      if (str_match_nocase_whole(&cur, "_PRECISE")) {
> +         *pcur = cur;
> +         *precise = 1;
> +      }

I think this doesn't properly handle the case where both _SAT and 
_PRECISE are present, because of using str_match_nocase_whole.

Cheers,
Nicolai

> +
> +      if (*precise || *saturate)
> +         return TRUE;
>      }
>   
>      return FALSE;
> @@ -1029,6 +1038,7 @@ parse_instruction(
>   {
>      uint i;
>      uint saturate = 0;
> +   uint precise = 0;
>      const struct tgsi_opcode_info *info;
>      struct tgsi_full_instruction inst;
>      const char *cur;
> @@ -1043,7 +1053,7 @@ parse_instruction(
>         cur = ctx->cur;
>   
>         info = tgsi_get_opcode_info( i );
> -      if (match_inst(&cur, &saturate, info)) {
> +      if (match_inst(&cur, &saturate, &precise, info)) {
>            if (info->num_dst + info->num_src + info->is_tex == 0) {
>               ctx->cur = cur;
>               break;
> @@ -1064,6 +1074,7 @@ parse_instruction(
>   
>      inst.Instruction.Opcode = i;
>      inst.Instruction.Saturate = saturate;
> +   inst.Instruction.Precise = precise;
>      inst.Instruction.NumDstRegs = info->num_dst;
>      inst.Instruction.NumSrcRegs = info->num_src;
>   
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Mesa-dev] [RFC 4/9] tgsi: populate precise
       [not found]     ` <20170611184239.7204-5-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-06-12 10:33       ` Nicolai Hähnle
  0 siblings, 0 replies; 19+ messages in thread
From: Nicolai Hähnle @ 2017-06-12 10:33 UTC (permalink / raw)
  To: Karol Herbst, mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 11.06.2017 20:42, Karol Herbst wrote:
> Only implemented for glsl->tgsi. Other converters just set precise to 0.
> 
> Signed-off-by: Karol Herbst <karolherbst@gmail.com>
> ---
>   src/gallium/auxiliary/tgsi/tgsi_build.c       |  3 +++
>   src/gallium/auxiliary/tgsi/tgsi_ureg.c        | 14 +++++++---
>   src/gallium/auxiliary/tgsi/tgsi_ureg.h        | 20 +++++++++++---
>   src/gallium/auxiliary/util/u_simple_shaders.c |  2 +-
>   src/gallium/state_trackers/nine/nine_shader.c |  6 ++---
>   src/mesa/state_tracker/st_atifs_to_tgsi.c     | 38 +++++++++++++--------------
>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp    | 12 ++++-----
>   src/mesa/state_tracker/st_mesa_to_tgsi.c      |  8 +++---
>   src/mesa/state_tracker/st_pbo.c               |  2 +-
>   9 files changed, 65 insertions(+), 40 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c
> index 55e4d064ed..144a017768 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_build.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
> @@ -651,6 +651,7 @@ tgsi_default_instruction( void )
>   static struct tgsi_instruction
>   tgsi_build_instruction(unsigned opcode,
>                          unsigned saturate,
> +                       unsigned precise,
>                          unsigned num_dst_regs,
>                          unsigned num_src_regs,
>                          struct tgsi_header *header)
> @@ -665,6 +666,7 @@ tgsi_build_instruction(unsigned opcode,
>      instruction = tgsi_default_instruction();
>      instruction.Opcode = opcode;
>      instruction.Saturate = saturate;
> +   instruction.Precise = precise;
>      instruction.NumDstRegs = num_dst_regs;
>      instruction.NumSrcRegs = num_src_regs;
>   
> @@ -1061,6 +1063,7 @@ tgsi_build_full_instruction(
>   
>      *instruction = tgsi_build_instruction(full_inst->Instruction.Opcode,
>                                            full_inst->Instruction.Saturate,
> +                                         full_inst->Instruction.Precise,
>                                            full_inst->Instruction.NumDstRegs,
>                                            full_inst->Instruction.NumSrcRegs,
>                                            header);
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> index 5bd779728a..56db2252c5 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> @@ -1213,6 +1213,7 @@ struct ureg_emit_insn_result
>   ureg_emit_insn(struct ureg_program *ureg,
>                  unsigned opcode,
>                  boolean saturate,
> +               unsigned precise,
>                  unsigned num_dst,
>                  unsigned num_src)
>   {
> @@ -1226,6 +1227,7 @@ ureg_emit_insn(struct ureg_program *ureg,
>      out[0].insn = tgsi_default_instruction();
>      out[0].insn.Opcode = opcode;
>      out[0].insn.Saturate = saturate;
> +   out[0].insn.Precise = precise;
>      out[0].insn.NumDstRegs = num_dst;
>      out[0].insn.NumSrcRegs = num_src;
>   
> @@ -1354,7 +1356,8 @@ ureg_insn(struct ureg_program *ureg,
>             const struct ureg_dst *dst,
>             unsigned nr_dst,
>             const struct ureg_src *src,
> -          unsigned nr_src )
> +          unsigned nr_src,
> +          unsigned precise )
>   {
>      struct ureg_emit_insn_result insn;
>      unsigned i;
> @@ -1369,6 +1372,7 @@ ureg_insn(struct ureg_program *ureg,
>      insn = ureg_emit_insn(ureg,
>                            opcode,
>                            saturate,
> +                         precise,
>                            nr_dst,
>                            nr_src);
>   
> @@ -1391,7 +1395,8 @@ ureg_tex_insn(struct ureg_program *ureg,
>                 const struct tgsi_texture_offset *texoffsets,
>                 unsigned nr_offset,
>                 const struct ureg_src *src,
> -              unsigned nr_src )
> +              unsigned nr_src,
> +              unsigned precise )

What does `precise' mean for tex instructions?


>   {
>      struct ureg_emit_insn_result insn;
>      unsigned i;
> @@ -1406,6 +1411,7 @@ ureg_tex_insn(struct ureg_program *ureg,
>      insn = ureg_emit_insn(ureg,
>                            opcode,
>                            saturate,
> +                         precise,
>                            nr_dst,
>                            nr_src);
>   
> @@ -1434,7 +1440,8 @@ ureg_memory_insn(struct ureg_program *ureg,
>                    unsigned nr_src,
>                    unsigned qualifier,
>                    unsigned texture,
> -                 unsigned format)
> +                 unsigned format,
> +                 unsigned precise)

Same question. I can't think of a possible meaning, in which case the 
parameter should be dropped.

Cheers,
Nicolai


>   {
>      struct ureg_emit_insn_result insn;
>      unsigned i;
> @@ -1442,6 +1449,7 @@ ureg_memory_insn(struct ureg_program *ureg,
>      insn = ureg_emit_insn(ureg,
>                            opcode,
>                            FALSE,
> +                         precise,
>                            nr_dst,
>                            nr_src);
>   
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> index 54f95ba565..105c85abd5 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> @@ -546,7 +546,8 @@ ureg_insn(struct ureg_program *ureg,
>             const struct ureg_dst *dst,
>             unsigned nr_dst,
>             const struct ureg_src *src,
> -          unsigned nr_src );
> +          unsigned nr_src,
> +          unsigned precise);
>   
>   
>   void
> @@ -559,7 +560,8 @@ ureg_tex_insn(struct ureg_program *ureg,
>                 const struct tgsi_texture_offset *texoffsets,
>                 unsigned nr_offset,
>                 const struct ureg_src *src,
> -              unsigned nr_src );
> +              unsigned nr_src,
> +              unsigned precise);
>   
>   
>   void
> @@ -571,7 +573,8 @@ ureg_memory_insn(struct ureg_program *ureg,
>                    unsigned nr_src,
>                    unsigned qualifier,
>                    unsigned texture,
> -                 unsigned format);
> +                 unsigned format,
> +                 unsigned precise);
>   
>   /***********************************************************************
>    * Internal instruction helpers, don't call these directly:
> @@ -586,6 +589,7 @@ struct ureg_emit_insn_result
>   ureg_emit_insn(struct ureg_program *ureg,
>                  unsigned opcode,
>                  boolean saturate,
> +               unsigned precise,
>                  unsigned num_dst,
>                  unsigned num_src);
>   
> @@ -632,6 +636,7 @@ static inline void ureg_##op( struct ureg_program *ureg )       \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            0);                                    \
>      ureg_fixup_insn_size( ureg, insn.insn_token );               \
>   }
> @@ -646,6 +651,7 @@ static inline void ureg_##op( struct ureg_program *ureg,        \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            1);                                    \
>      ureg_emit_src( ureg, src );                                  \
>      ureg_fixup_insn_size( ureg, insn.insn_token );               \
> @@ -661,6 +667,7 @@ static inline void ureg_##op( struct ureg_program *ureg,        \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            0);                                    \
>      ureg_emit_label( ureg, insn.extended_token, label_token );   \
>      ureg_fixup_insn_size( ureg, insn.insn_token );               \
> @@ -677,6 +684,7 @@ static inline void ureg_##op( struct ureg_program *ureg,        \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            1);                                    \
>      ureg_emit_label( ureg, insn.extended_token, label_token );   \
>      ureg_emit_src( ureg, src );                                  \
> @@ -694,6 +702,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            0);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -713,6 +722,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            1);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -733,6 +743,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            2);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -756,6 +767,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            2);                                            \
>      ureg_emit_texture( ureg, insn.extended_token, target,                \
> @@ -780,6 +792,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            3);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -806,6 +819,7 @@ static inline void ureg_##op( struct ureg_program *ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            4);                                            \
>      ureg_emit_texture( ureg, insn.extended_token, target,                \
> diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c b/src/gallium/auxiliary/util/u_simple_shaders.c
> index 5874d0e9aa..79331b5638 100644
> --- a/src/gallium/auxiliary/util/u_simple_shaders.c
> +++ b/src/gallium/auxiliary/util/u_simple_shaders.c
> @@ -954,7 +954,7 @@ util_make_geometry_passthrough_shader(struct pipe_context *pipe,
>      }
>   
>      /* EMIT IMM[0] */
> -   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1);
> +   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1, 0);
>   
>      /* END */
>      ureg_END(ureg);
> diff --git a/src/gallium/state_trackers/nine/nine_shader.c b/src/gallium/state_trackers/nine/nine_shader.c
> index 40fb6be88f..f405090811 100644
> --- a/src/gallium/state_trackers/nine/nine_shader.c
> +++ b/src/gallium/state_trackers/nine/nine_shader.c
> @@ -1879,7 +1879,7 @@ DECL_SPECIAL(IFC)
>       struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
>       src[0] = tx_src_param(tx, &tx->insn.src[0]);
>       src[1] = tx_src_param(tx, &tx->insn.src[1]);
> -    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
> +    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
>       ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx));
>       return D3D_OK;
>   }
> @@ -1897,7 +1897,7 @@ DECL_SPECIAL(BREAKC)
>       struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
>       src[0] = tx_src_param(tx, &tx->insn.src[0]);
>       src[1] = tx_src_param(tx, &tx->insn.src[1]);
> -    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
> +    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
>       ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx));
>       ureg_BRK(tx->ureg);
>       tx_endcond(tx);
> @@ -3029,7 +3029,7 @@ NineTranslateInstruction_Generic(struct shader_translator *tx)
>   
>       ureg_insn(tx->ureg, tx->insn.info->opcode,
>                 dst, tx->insn.ndst,
> -              src, tx->insn.nsrc);
> +              src, tx->insn.nsrc, 0);
>       return D3D_OK;
>   }
>   
> diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> index 338ced56ed..e0a6ff7131 100644
> --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
> +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> @@ -105,18 +105,18 @@ apply_swizzle(struct st_translate *t,
>         imm[0] = src;
>         imm[1] = ureg_imm4f(t->ureg, 1.0f, 1.0f, 0.0f, 0.0f);
>         imm[2] = ureg_imm4f(t->ureg, 0.0f, 0.0f, 1.0f, 1.0f);
> -      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3, 0);
>   
>         if (swizzle == GL_SWIZZLE_STR_DR_ATI) {
>            imm[0] = ureg_scalar(src, TGSI_SWIZZLE_Z);
>         } else {
>            imm[0] = ureg_scalar(src, TGSI_SWIZZLE_W);
>         }
> -      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1);
> +      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1, 0);
>   
>         imm[0] = ureg_src(tmp[0]);
>         imm[1] = ureg_src(tmp[1]);
> -      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2, 0);
>   
>         return ureg_src(tmp[0]);
>      }
> @@ -170,35 +170,35 @@ prepare_argument(struct st_translate *t, const unsigned argId,
>         src = ureg_scalar(src, TGSI_SWIZZLE_W);
>         break;
>      }
> -   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1);
> +   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1, 0);
>   
>      if (srcReg->argMod & GL_COMP_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_imm1f(t->ureg, 1.0f);
>         modsrc[1] = ureg_negate(ureg_src(arg));
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
>      }
>      if (srcReg->argMod & GL_BIAS_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_src(arg);
>         modsrc[1] = ureg_imm1f(t->ureg, -0.5f);
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
>      }
>      if (srcReg->argMod & GL_2X_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_src(arg);
>         modsrc[1] = ureg_src(arg);
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
>      }
>      if (srcReg->argMod & GL_NEGATE_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_src(arg);
>         modsrc[1] = ureg_imm1f(t->ureg, -1.0f);
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2, 0);
>      }
>      return  ureg_src(arg);
>   }
> @@ -217,25 +217,25 @@ emit_special_inst(struct st_translate *t, const struct instruction_desc *desc,
>         tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI + 2); /* re-purpose a3 */
>         src[0] = ureg_imm1f(t->ureg, 0.5f);
>         src[1] = ureg_negate(args[2]);
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2, 0);
>         src[0] = ureg_src(tmp[0]);
>         src[1] = args[0];
>         src[2] = args[1];
> -      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
> +      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
>      } else if (!strcmp(desc->name, "CND0")) {
>         src[0] = args[2];
>         src[1] = args[1];
>         src[2] = args[0];
> -      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
> +      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
>      } else if (!strcmp(desc->name, "DOT2_ADD")) {
>         /* note: DP2A is not implemented in most pipe drivers */
>         tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI); /* re-purpose a1 */
>         src[0] = args[0];
>         src[1] = args[1];
> -      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2, 0);
>         src[0] = ureg_src(tmp[0]);
>         src[1] = ureg_scalar(args[2], TGSI_SWIZZLE_Z);
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2, 0);
>      }
>   }
>   
> @@ -249,7 +249,7 @@ emit_arith_inst(struct st_translate *t,
>         return;
>      }
>   
> -   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount);
> +   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount, 0);
>   }
>   
>   static void
> @@ -292,7 +292,7 @@ emit_dstmod(struct st_translate *t,
>      if (dstMod & GL_SATURATE_BIT_ATI) {
>         dst = ureg_saturate(dst);
>      }
> -   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2);
> +   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2, 0);
>   }
>   
>   /**
> @@ -334,9 +334,9 @@ compile_setupinst(struct st_translate *t,
>         src[1] = t->samplers[r];
>         /* the texture target is still unknown, it will be fixed in the draw call */
>         ureg_tex_insn(t->ureg, TGSI_OPCODE_TEX, dst, 1, TGSI_TEXTURE_2D,
> -                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2);
> +                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2, 0);
>      } else if (texinst->Opcode == ATI_FRAGMENT_SHADER_PASS_OP) {
> -      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
>      }
>   
>      t->regs_written[t->current_pass][r] = true;
> @@ -408,11 +408,11 @@ finalize_shader(struct st_translate *t, unsigned numPasses)
>         /* copy the result into the OUT slot */
>         dst[0] = t->outputs[t->outputMapping[FRAG_RESULT_COLOR]];
>         src[0] = ureg_src(t->temps[0]);
> -      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
>      }
>   
>      /* signal the end of the program */
> -   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0);
> +   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0, 0);
>   }
>   
>   /**
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 19f90f21fe..ecd9f9f280 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -5900,7 +5900,7 @@ compile_tgsi_instruction(struct st_translate *t,
>      case TGSI_OPCODE_IF:
>      case TGSI_OPCODE_UIF:
>         assert(num_dst == 0);
> -      ureg_insn(ureg, inst->op, NULL, 0, src, num_src);
> +      ureg_insn(ureg, inst->op, NULL, 0, src, num_src, inst->precise);
>         return;
>   
>      case TGSI_OPCODE_TEX:
> @@ -5935,7 +5935,7 @@ compile_tgsi_instruction(struct st_translate *t,
>                       tex_target,
>                       st_translate_texture_type(inst->tex_type),
>                       texoffsets, inst->tex_offset_num_offset,
> -                    src, num_src);
> +                    src, num_src, inst->precise);
>         return;
>   
>      case TGSI_OPCODE_RESQ:
> @@ -5966,7 +5966,7 @@ compile_tgsi_instruction(struct st_translate *t,
>         assert(src[0].File != TGSI_FILE_NULL);
>         ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
>                          inst->buffer_access,
> -                       tex_target, inst->image_format);
> +                       tex_target, inst->image_format, inst->precise);
>         break;
>   
>      case TGSI_OPCODE_STORE:
> @@ -5984,19 +5984,19 @@ compile_tgsi_instruction(struct st_translate *t,
>         assert(dst[0].File != TGSI_FILE_NULL);
>         ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
>                          inst->buffer_access,
> -                       tex_target, inst->image_format);
> +                       tex_target, inst->image_format, inst->precise);
>         break;
>   
>      case TGSI_OPCODE_SCS:
>         dst[0] = ureg_writemask(dst[0], TGSI_WRITEMASK_XY);
> -      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src);
> +      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src, inst->precise);
>         break;
>   
>      default:
>         ureg_insn(ureg,
>                   inst->op,
>                   dst, num_dst,
> -                src, num_src);
> +                src, num_src, inst->precise);
>         break;
>      }
>   }
> diff --git a/src/mesa/state_tracker/st_mesa_to_tgsi.c b/src/mesa/state_tracker/st_mesa_to_tgsi.c
> index 984ff92130..f11013c116 100644
> --- a/src/mesa/state_tracker/st_mesa_to_tgsi.c
> +++ b/src/mesa/state_tracker/st_mesa_to_tgsi.c
> @@ -558,7 +558,7 @@ compile_instruction(
>                                                  inst->TexShadow ),
>                        TGSI_RETURN_TYPE_FLOAT,
>                        NULL, 0,
> -                     src, num_src );
> +                     src, num_src, 0 );
>         return;
>   
>      case OPCODE_SCS:
> @@ -566,7 +566,7 @@ compile_instruction(
>         ureg_insn( ureg,
>                    translate_opcode( inst->Opcode ),
>                    dst, num_dst,
> -                 src, num_src );
> +                 src, num_src, 0 );
>         break;
>   
>      case OPCODE_XPD:
> @@ -574,7 +574,7 @@ compile_instruction(
>         ureg_insn( ureg,
>                    translate_opcode( inst->Opcode ),
>                    dst, num_dst,
> -                 src, num_src );
> +                 src, num_src, 0 );
>         break;
>   
>      case OPCODE_RSQ:
> @@ -593,7 +593,7 @@ compile_instruction(
>         ureg_insn( ureg,
>                    translate_opcode( inst->Opcode ),
>                    dst, num_dst,
> -                 src, num_src );
> +                 src, num_src, 0);
>         break;
>      }
>   }
> diff --git a/src/mesa/state_tracker/st_pbo.c b/src/mesa/state_tracker/st_pbo.c
> index 303c8535b2..3dff1609e8 100644
> --- a/src/mesa/state_tracker/st_pbo.c
> +++ b/src/mesa/state_tracker/st_pbo.c
> @@ -528,7 +528,7 @@ create_fs(struct st_context *st, bool download, enum pipe_texture_target target,
>         op[0] = ureg_src(temp0);
>         op[1] = ureg_src(temp1);
>         ureg_memory_insn(ureg, TGSI_OPCODE_STORE, &out, 1, op, 2, 0,
> -                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE);
> +                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE, 0);
>   
>         ureg_release_temporary(ureg, temp1);
>      } else {
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Mesa-dev] [RFC 3/9] st/glsl_to_tgsi: handle precise modifier
       [not found]   ` <20170611184239.7204-4-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-06-12 10:41     ` Nicolai Hähnle
  0 siblings, 0 replies; 19+ messages in thread
From: Nicolai Hähnle @ 2017-06-12 10:41 UTC (permalink / raw)
  To: Karol Herbst, mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 11.06.2017 20:42, Karol Herbst wrote:
> all subexpression inside an ir_assignment needs to be tagged as precise.
> 
> Signed-off-by: Karol Herbst <karolherbst@gmail.com>
> ---
>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 80 ++++++++++++++++++++++++------
>   1 file changed, 65 insertions(+), 15 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index c5d2e0fcd2..19f90f21fe 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -87,6 +87,13 @@ static int swizzle_for_type(const glsl_type *type, int component = 0)
>      return swizzle;
>   }
>   
> +static unsigned is_precise(const ir_variable *ir)
> +{
> +   if (!ir)
> +      return 0;
> +   return ir->data.precise || ir->data.invariant;
> +}
> +
>   /**
>    * This struct is a corresponding struct to TGSI ureg_src.
>    */
> @@ -296,6 +303,7 @@ public:
>      ir_instruction *ir;
>   
>      unsigned op:8; /**< TGSI opcode */
> +   unsigned precise:1;
>      unsigned saturate:1;
>      unsigned is_64bit_expanded:1;
>      unsigned sampler_base:5;
> @@ -435,6 +443,7 @@ public:
>      bool have_fma;
>      bool use_shared_memory;
>      bool has_tex_txf_lz;
> +   unsigned precise;
>   
>      variable_storage *find_variable_storage(ir_variable *var);
>   
> @@ -505,13 +514,29 @@ public:
>                                         st_src_reg src0 = undef_src,
>                                         st_src_reg src1 = undef_src,
>                                         st_src_reg src2 = undef_src,
> -                                      st_src_reg src3 = undef_src);
> +                                      st_src_reg src3 = undef_src,
> +                                      unsigned precise = 0);
>   
>      glsl_to_tgsi_instruction *emit_asm(ir_instruction *ir, unsigned op,
>                                         st_dst_reg dst, st_dst_reg dst1,
>                                         st_src_reg src0 = undef_src,
>                                         st_src_reg src1 = undef_src,
>                                         st_src_reg src2 = undef_src,
> +                                      st_src_reg src3 = undef_src,
> +                                      unsigned precise = 0);
> +
> +   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
> +                                      st_dst_reg dst = undef_dst,
> +                                      st_src_reg src0 = undef_src,
> +                                      st_src_reg src1 = undef_src,
> +                                      st_src_reg src2 = undef_src,
> +                                      st_src_reg src3 = undef_src);
> +
> +   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
> +                                      st_dst_reg dst, st_dst_reg dst1,
> +                                      st_src_reg src0 = undef_src,
> +                                      st_src_reg src1 = undef_src,
> +                                      st_src_reg src2 = undef_src,
>                                         st_src_reg src3 = undef_src);

Yeah, I don't like those overloads and the way they force you to add 
artificial casts for disambiguation.

I'd suggest to embrace the global precise flag: drop the precise 
parameter from emit_asm, and just source the bit from this->precise.

Please make precise a bool, and add a comment explaining that it's a 
flag for whether the currently evaluated expression should be precise.

Cheers,
Nicolai


>      unsigned get_opcode(unsigned op,
> @@ -650,7 +675,8 @@ glsl_to_tgsi_instruction *
>   glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
>                                  st_dst_reg dst, st_dst_reg dst1,
>                                  st_src_reg src0, st_src_reg src1,
> -                               st_src_reg src2, st_src_reg src3)
> +                               st_src_reg src2, st_src_reg src3,
> +                               unsigned precise)
>   {
>      glsl_to_tgsi_instruction *inst = new(mem_ctx) glsl_to_tgsi_instruction();
>      int num_reladdr = 0, i, j;
> @@ -691,6 +717,7 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
>      STATIC_ASSERT(TGSI_OPCODE_LAST <= 255);
>   
>      inst->op = op;
> +   inst->precise = precise;
>      inst->info = tgsi_get_opcode_info(op);
>      inst->dst[0] = dst;
>      inst->dst[1] = dst1;
> @@ -881,9 +908,28 @@ glsl_to_tgsi_instruction *
>   glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
>                                  st_dst_reg dst,
>                                  st_src_reg src0, st_src_reg src1,
> +                               st_src_reg src2, st_src_reg src3,
> +                               unsigned precise)
> +{
> +   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, precise);
> +}
> +
> +glsl_to_tgsi_instruction *
> +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
> +                               st_dst_reg dst,
> +                               st_src_reg src0, st_src_reg src1,
> +                               st_src_reg src2, st_src_reg src3)
> +{
> +   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, this->precise);
> +}
> +
> +glsl_to_tgsi_instruction *
> +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
> +                               st_dst_reg dst, st_dst_reg dst1,
> +                               st_src_reg src0, st_src_reg src1,
>                                  st_src_reg src2, st_src_reg src3)
>   {
> -   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3);
> +   return emit_asm(ir, op, dst, dst1, src0, src1, src2, src3, this->precise);
>   }
>   
>   /**
> @@ -1116,7 +1162,7 @@ glsl_to_tgsi_visitor::emit_arl(ir_instruction *ir,
>      if (dst.index >= this->num_address_regs)
>         this->num_address_regs = dst.index + 1;
>   
> -   emit_asm(NULL, op, dst, src0);
> +   emit_asm((ir_instruction *)NULL, op, dst, src0);
>   }
>   
>   int
> @@ -1406,11 +1452,11 @@ glsl_to_tgsi_visitor::visit(ir_variable *ir)
>   void
>   glsl_to_tgsi_visitor::visit(ir_loop *ir)
>   {
> -   emit_asm(NULL, TGSI_OPCODE_BGNLOOP);
> +   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BGNLOOP);
>   
>      visit_exec_list(&ir->body_instructions, this);
>   
> -   emit_asm(NULL, TGSI_OPCODE_ENDLOOP);
> +   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ENDLOOP);
>   }
>   
>   void
> @@ -1418,10 +1464,10 @@ glsl_to_tgsi_visitor::visit(ir_loop_jump *ir)
>   {
>      switch (ir->mode) {
>      case ir_loop_jump::jump_break:
> -      emit_asm(NULL, TGSI_OPCODE_BRK);
> +      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BRK);
>         break;
>      case ir_loop_jump::jump_continue:
> -      emit_asm(NULL, TGSI_OPCODE_CONT);
> +      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_CONT);
>         break;
>      }
>   }
> @@ -2703,7 +2749,7 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
>               st_dst_reg dst = st_dst_reg(get_temp(var->type));
>               st_src_reg src = st_src_reg(PROGRAM_OUTPUT, decl->mesa_index,
>                                           var->type, component, decl->array_id);
> -            emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src);
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_FBFETCH, dst, src);
>               entry = new(mem_ctx) variable_storage(var, dst.file, dst.index,
>                                                     dst.array_id);
>            } else {
> @@ -3148,7 +3194,10 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
>      st_dst_reg l;
>      st_src_reg r;
>   
> +   /* all generated instructions need to be flaged as precise */
> +   this->precise = is_precise(ir->lhs->variable_referenced());
>      ir->rhs->accept(this);
> +   this->precise = 0;
>      r = this->result;
>   
>      l = get_assignment_lhs(ir->lhs, this, &dst_component);
> @@ -3233,7 +3282,8 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
>          */
>         glsl_to_tgsi_instruction *inst, *new_inst;
>         inst = (glsl_to_tgsi_instruction *)this->instructions.get_tail();
> -      new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3]);
> +      new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3],
> +                          is_precise(ir->lhs->variable_referenced()));
>         new_inst->saturate = inst->saturate;
>         inst->dead_mask = inst->dst[0].writemask;
>      } else {
> @@ -4072,16 +4122,16 @@ glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference *tail,
>   
>            deref_arr->array_index->accept(this);
>            if (*array_elements != 1)
> -            emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements));
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements));
>            else
> -            emit_asm(NULL, TGSI_OPCODE_MOV, temp_dst, this->result);
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, temp_dst, this->result);
>   
>            if (indirect->file == PROGRAM_UNDEFINED)
>               *indirect = temp_reg;
>            else {
>               temp_dst = st_dst_reg(*indirect);
>               temp_dst.writemask = 1;
> -            emit_asm(NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg);
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg);
>            }
>         } else
>            *index += array_index->value.u[0] * *array_elements;
> @@ -4141,7 +4191,7 @@ glsl_to_tgsi_visitor::canonicalize_gather_offset(st_src_reg offset)
>         st_src_reg tmp = get_temp(glsl_type::ivec2_type);
>         st_dst_reg tmp_dst = st_dst_reg(tmp);
>         tmp_dst.writemask = WRITEMASK_XY;
> -      emit_asm(NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
> +      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
>         return tmp;
>      }
>   
> @@ -6777,7 +6827,7 @@ get_mesa_program_tgsi(struct gl_context *ctx,
>      v->renumber_registers();
>   
>      /* Write the END instruction. */
> -   v->emit_asm(NULL, TGSI_OPCODE_END);
> +   v->emit_asm((ir_instruction *)NULL, TGSI_OPCODE_END);
>   
>      if (ctx->_Shader->Flags & GLSL_DUMP) {
>         _mesa_log("\n");
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
       [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-06-11 18:42   ` [RFC 8/9] nv50/ir: disable mul+add to mad for precise instructions Karol Herbst
@ 2017-06-12 10:42   ` Nicolai Hähnle
  2017-06-12 23:57     ` Roland Scheidegger
  7 siblings, 1 reply; 19+ messages in thread
From: Nicolai Hähnle @ 2017-06-12 10:42 UTC (permalink / raw)
  To: Karol Herbst, mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 11.06.2017 20:42, Karol Herbst wrote:
> Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise
> modifiers on variables inside Nouveau.
 >
> This series add precise/invariant handling to TGSI, which can be then used by
> drivers to disable certain unsafe optimisations which may otherwise alter
> calculations, which depend on having the same result across shaders.

It's kind of amazing that we got this far without doing this. On the 
radeonsi side, it's probably related to how conservative LLVM is.

But this series is a good idea, since it might allow us to become more 
aggressive with optimizations in radeonsi as well.


> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and 4.5
> 
> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to apply the
> precise flag on instruction emited in ir_assignment->rhs->accept(); but I found
> no other easy way to handle this. Maybe somebody of you has a better idea?

Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2:

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>


> 
> Karol Herbst (9):
>    tgsi: add precise flag to tgsi_instruction
>    tgsi/dump: print _PRECISE modifier on Instrutions
>    st/glsl_to_tgsi: handle precise modifier
>    tgsi: populate precise
>    tgsi/text: parse _PRECISE modifier
>    nv50/ir: add precise field to Instruction
>    nv50/ir/tgsi: handle precise for most ALU instructions
>    nv50/ir: disable mul+add to mad for precise instructions
>    nv50/ir/tgsi: split mad to mul+add
> 
>   src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
>   src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
>   src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
>   src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
>   src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
>   src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
>   src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
>   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
>   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
>   src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
>   src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
>   src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92 +++++++++++++++++-----
>   src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
>   src/mesa/state_tracker/st_pbo.c                    |  2 +-
>   15 files changed, 172 insertions(+), 59 deletions(-)
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/9] Add precise/invariant semantics to TGSI
  2017-06-12 10:42   ` [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI Nicolai Hähnle
@ 2017-06-12 23:57     ` Roland Scheidegger
  2017-06-13  0:01       ` Roland Scheidegger
       [not found]       ` <8a99cdc6-c415-1423-dd1b-13a09f902288-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  0 siblings, 2 replies; 19+ messages in thread
From: Roland Scheidegger @ 2017-06-12 23:57 UTC (permalink / raw)
  To: Nicolai Hähnle, Karol Herbst, mesa-dev; +Cc: nouveau

This looks like the right idea to me too. It may sound a bit weird to do
that per instruction, but d3d11 does that as well. (Some d3d versions
just have a global flag basically forbidding or allowing any such fast
math optimizations in the assembly, but I'm not actually sure everybody
honors that without tesselation...)

For 1/9:
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

2/9 has a typo in the commit short log ("Instrutions").

FWIW surely on nv50 you could keep a single mad instruction for umad
(sad maybe too?). (I'm actually wondering if the hw really can't do
unfused float multiply+add as a single instruction but I know next to
nothing about nvidia hw...)

Roland

Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle:
> On 11.06.2017 20:42, Karol Herbst wrote:
>> Running Tomb Raider on Nouveau I found some flicker caused by ignoring
>> precise
>> modifiers on variables inside Nouveau.
>>
>> This series add precise/invariant handling to TGSI, which can be then
>> used by
>> drivers to disable certain unsafe optimisations which may otherwise alter
>> calculations, which depend on having the same result across shaders.
> 
> It's kind of amazing that we got this far without doing this. On the
> radeonsi side, it's probably related to how conservative LLVM is.
> 
> But this series is a good idea, since it might allow us to become more
> aggressive with optimizations in radeonsi as well.
> 
> 
>> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and
>> 4.5
>>
>> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to
>> apply the
>> precise flag on instruction emited in ir_assignment->rhs->accept();
>> but I found
>> no other easy way to handle this. Maybe somebody of you has a better
>> idea?
> 
> Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2:
> 
> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
> 
> 
>>
>> Karol Herbst (9):
>>    tgsi: add precise flag to tgsi_instruction
>>    tgsi/dump: print _PRECISE modifier on Instrutions
>>    st/glsl_to_tgsi: handle precise modifier
>>    tgsi: populate precise
>>    tgsi/text: parse _PRECISE modifier
>>    nv50/ir: add precise field to Instruction
>>    nv50/ir/tgsi: handle precise for most ALU instructions
>>    nv50/ir: disable mul+add to mad for precise instructions
>>    nv50/ir/tgsi: split mad to mul+add
>>
>>   src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
>>   src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
>>   src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
>>   src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
>>   src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
>>   src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
>>   src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
>>   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
>>   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
>>   src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
>>   src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
>>   src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92
>> +++++++++++++++++-----
>>   src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
>>   src/mesa/state_tracker/st_pbo.c                    |  2 +-
>>   15 files changed, 172 insertions(+), 59 deletions(-)
>>
> 
> 

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/9] Add precise/invariant semantics to TGSI
  2017-06-12 23:57     ` Roland Scheidegger
@ 2017-06-13  0:01       ` Roland Scheidegger
       [not found]       ` <8a99cdc6-c415-1423-dd1b-13a09f902288-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  1 sibling, 0 replies; 19+ messages in thread
From: Roland Scheidegger @ 2017-06-13  0:01 UTC (permalink / raw)
  To: Nicolai Hähnle, Karol Herbst, mesa-dev; +Cc: nouveau

Am 13.06.2017 um 01:57 schrieb Roland Scheidegger:
> This looks like the right idea to me too. It may sound a bit weird to do
> that per instruction, but d3d11 does that as well. (Some d3d versions
> just have a global flag basically forbidding or allowing any such fast
> math optimizations in the assembly, but I'm not actually sure everybody
> honors that without tesselation...)
> 
> For 1/9:
> Reviewed-by: Roland Scheidegger <sroland@vmware.com>

I forgot to mention, could you add some bits in gallium docs
(source/tgsi.rst) for this? Not sure where maybe under Modifiers or some
such.

Roland

> 
> 2/9 has a typo in the commit short log ("Instrutions").
> 
> FWIW surely on nv50 you could keep a single mad instruction for umad
> (sad maybe too?). (I'm actually wondering if the hw really can't do
> unfused float multiply+add as a single instruction but I know next to
> nothing about nvidia hw...)
> 
> Roland
> 
> Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle:
>> On 11.06.2017 20:42, Karol Herbst wrote:
>>> Running Tomb Raider on Nouveau I found some flicker caused by ignoring
>>> precise
>>> modifiers on variables inside Nouveau.
>>>
>>> This series add precise/invariant handling to TGSI, which can be then
>>> used by
>>> drivers to disable certain unsafe optimisations which may otherwise alter
>>> calculations, which depend on having the same result across shaders.
>>
>> It's kind of amazing that we got this far without doing this. On the
>> radeonsi side, it's probably related to how conservative LLVM is.
>>
>> But this series is a good idea, since it might allow us to become more
>> aggressive with optimizations in radeonsi as well.
>>
>>
>>> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and
>>> 4.5
>>>
>>> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to
>>> apply the
>>> precise flag on instruction emited in ir_assignment->rhs->accept();
>>> but I found
>>> no other easy way to handle this. Maybe somebody of you has a better
>>> idea?
>>
>> Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2:
>>
>> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>>
>>
>>>
>>> Karol Herbst (9):
>>>    tgsi: add precise flag to tgsi_instruction
>>>    tgsi/dump: print _PRECISE modifier on Instrutions
>>>    st/glsl_to_tgsi: handle precise modifier
>>>    tgsi: populate precise
>>>    tgsi/text: parse _PRECISE modifier
>>>    nv50/ir: add precise field to Instruction
>>>    nv50/ir/tgsi: handle precise for most ALU instructions
>>>    nv50/ir: disable mul+add to mad for precise instructions
>>>    nv50/ir/tgsi: split mad to mul+add
>>>
>>>   src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
>>>   src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
>>>   src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
>>>   src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
>>>   src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
>>>   src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
>>>   src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
>>>   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
>>>   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
>>>   src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
>>>   src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
>>>   src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
>>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92
>>> +++++++++++++++++-----
>>>   src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
>>>   src/mesa/state_tracker/st_pbo.c                    |  2 +-
>>>   15 files changed, 172 insertions(+), 59 deletions(-)
>>>
>>
>>
> 

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
       [not found]       ` <8a99cdc6-c415-1423-dd1b-13a09f902288-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2017-06-13  0:05         ` Ilia Mirkin
       [not found]           ` <CAKb7UviKiLbRG+z4paq8=-6epuWtKSxty2DHw6SQ1LZ+ULQgmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ilia Mirkin @ 2017-06-13  0:05 UTC (permalink / raw)
  To: Roland Scheidegger
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Nicolai Hähnle,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland@vmware.com> wrote:
> FWIW surely on nv50 you could keep a single mad instruction for umad
> (sad maybe too?). (I'm actually wondering if the hw really can't do
> unfused float multiply+add as a single instruction but I know next to
> nothing about nvidia hw...)

The compiler should reassociate a mul + add into a mad where possible.
In actuality, IMAD is actually super-slow... allegedly slower than
IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
faster but we haven't figured out how to operate it yet. I'm not aware
of a muladd version of fma on fermi and newer (GL 4.0). The tesla
series does have a floating point mul+add (but no fma).
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
       [not found]           ` <CAKb7UviKiLbRG+z4paq8=-6epuWtKSxty2DHw6SQ1LZ+ULQgmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-13  0:33             ` Roland Scheidegger
       [not found]               ` <6ffa13fd-c90b-bda6-b243-13c4857346f7-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Roland Scheidegger @ 2017-06-13  0:33 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Nicolai Hähnle,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 13.06.2017 um 02:05 schrieb Ilia Mirkin:
> On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland@vmware.com> wrote:
>> FWIW surely on nv50 you could keep a single mad instruction for umad
>> (sad maybe too?). (I'm actually wondering if the hw really can't do
>> unfused float multiply+add as a single instruction but I know next to
>> nothing about nvidia hw...)
> 
> The compiler should reassociate a mul + add into a mad where possible.
> In actuality, IMAD is actually super-slow... allegedly slower than
> IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
> faster but we haven't figured out how to operate it yet. I'm not aware
> of a muladd version of fma on fermi and newer (GL 4.0). The tesla
> series does have a floating point mul+add (but no fma).
> 

Interesting. radeons seem to always have a unfused mad. pre-gcn parts
apparently only have a 32bit fma with parts supporting double precision.
The same restriction is stated for gcn parts in the isa docs, which
obviously doesn't make sense, but I have no idea if the fma is full speed...

Roland
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
       [not found]               ` <6ffa13fd-c90b-bda6-b243-13c4857346f7-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
@ 2017-06-13 15:06                 ` Marek Olšák
  0 siblings, 0 replies; 19+ messages in thread
From: Marek Olšák @ 2017-06-13 15:06 UTC (permalink / raw)
  To: Roland Scheidegger
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Nicolai Hähnle,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Tue, Jun 13, 2017 at 2:33 AM, Roland Scheidegger <sroland@vmware.com> wrote:
> Am 13.06.2017 um 02:05 schrieb Ilia Mirkin:
>> On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland@vmware.com> wrote:
>>> FWIW surely on nv50 you could keep a single mad instruction for umad
>>> (sad maybe too?). (I'm actually wondering if the hw really can't do
>>> unfused float multiply+add as a single instruction but I know next to
>>> nothing about nvidia hw...)
>>
>> The compiler should reassociate a mul + add into a mad where possible.
>> In actuality, IMAD is actually super-slow... allegedly slower than
>> IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
>> faster but we haven't figured out how to operate it yet. I'm not aware
>> of a muladd version of fma on fermi and newer (GL 4.0). The tesla
>> series does have a floating point mul+add (but no fma).
>>
>
> Interesting. radeons seem to always have a unfused mad. pre-gcn parts
> apparently only have a 32bit fma with parts supporting double precision.
> The same restriction is stated for gcn parts in the isa docs, which
> obviously doesn't make sense, but I have no idea if the fma is full speed...

fma is full-rate on Tahiti and Hawaii and quarter-rate on other GCN chips. FP64
opcodes are always 2x or 4x slower than fma_f32.

Marek
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-06-13 15:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-11 18:42 [RFC 0/9] Add precise/invariant semantics to TGSI Karol Herbst
2017-06-11 18:42 ` [RFC 3/9] st/glsl_to_tgsi: handle precise modifier Karol Herbst
     [not found]   ` <20170611184239.7204-4-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-06-12 10:41     ` [Mesa-dev] " Nicolai Hähnle
2017-06-11 18:42 ` [RFC 9/9] nv50/ir/tgsi: split mad to mul+add Karol Herbst
     [not found] ` <20170611184239.7204-1-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-06-11 18:42   ` [RFC 1/9] tgsi: add precise flag to tgsi_instruction Karol Herbst
2017-06-11 18:42   ` [RFC 2/9] tgsi/dump: print _PRECISE modifier on Instrutions Karol Herbst
2017-06-11 18:42   ` [RFC 4/9] tgsi: populate precise Karol Herbst
     [not found]     ` <20170611184239.7204-5-karolherbst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-06-12 10:33       ` [Mesa-dev] " Nicolai Hähnle
2017-06-11 18:42   ` [RFC 5/9] tgsi/text: parse _PRECISE modifier Karol Herbst
2017-06-12 10:31     ` Nicolai Hähnle
2017-06-11 18:42   ` [RFC 6/9] nv50/ir: add precise field to Instruction Karol Herbst
2017-06-11 18:42   ` [RFC 7/9] nv50/ir/tgsi: handle precise for most ALU instructions Karol Herbst
2017-06-11 18:42   ` [RFC 8/9] nv50/ir: disable mul+add to mad for precise instructions Karol Herbst
2017-06-12 10:42   ` [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI Nicolai Hähnle
2017-06-12 23:57     ` Roland Scheidegger
2017-06-13  0:01       ` Roland Scheidegger
     [not found]       ` <8a99cdc6-c415-1423-dd1b-13a09f902288-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2017-06-13  0:05         ` [Mesa-dev] " Ilia Mirkin
     [not found]           ` <CAKb7UviKiLbRG+z4paq8=-6epuWtKSxty2DHw6SQ1LZ+ULQgmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-13  0:33             ` Roland Scheidegger
     [not found]               ` <6ffa13fd-c90b-bda6-b243-13c4857346f7-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
2017-06-13 15:06                 ` Marek Olšák

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.