* [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
@ 2015-08-19 1:49 Ilia Mirkin
[not found] ` <1439948992-17738-1-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Ilia Mirkin @ 2015-08-19 1:49 UTC (permalink / raw)
To: mesa-dev; +Cc: nouveau
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++++++++++++++-------
1 file changed, 46 insertions(+), 20 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3841c33..b0e74f0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1023,27 +1023,53 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
case OP_AND:
{
- CmpInstruction *cmp = i->getSrc(t)->getInsn()->asCmp();
- if (!cmp || cmp->op == OP_SLCT || cmp->getDef(0)->refCount() > 1)
- return;
- if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32))
- return;
- if (imm0.reg.data.f32 != 1.0)
- return;
- if (i->getSrc(t)->getInsn()->dType != TYPE_U32)
- return;
+ Instruction *src = i->getSrc(t)->getInsn();
+ ImmediateValue imm1;
+ if (imm0.reg.data.u32 == 0) {
+ i->op = OP_MOV;
+ i->setSrc(0, new_ImmediateValue(prog, 0u));
+ i->src(0).mod = Modifier(0);
+ i->setSrc(1, NULL);
+ } else if (imm0.reg.data.u32 == ~0U) {
+ i->op = i->src(t).mod.getOp();
+ if (t) {
+ i->setSrc(0, i->getSrc(t));
+ i->src(0).mod = i->src(t).mod;
+ }
+ i->setSrc(1, NULL);
+ } else if (src->asCmp()) {
+ CmpInstruction *cmp = src->asCmp();
+ if (!cmp || cmp->op == OP_SLCT || cmp->getDef(0)->refCount() > 1)
+ return;
+ if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32))
+ return;
+ if (imm0.reg.data.f32 != 1.0)
+ return;
+ if (cmp->dType != TYPE_U32)
+ return;
- i->getSrc(t)->getInsn()->dType = TYPE_F32;
- if (i->src(t).mod != Modifier(0)) {
- assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT));
- i->src(t).mod = Modifier(0);
- cmp->setCond = inverseCondCode(cmp->setCond);
- }
- i->op = OP_MOV;
- i->setSrc(s, NULL);
- if (t) {
- i->setSrc(0, i->getSrc(t));
- i->setSrc(t, NULL);
+ cmp->dType = TYPE_F32;
+ if (i->src(t).mod != Modifier(0)) {
+ assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT));
+ i->src(t).mod = Modifier(0);
+ cmp->setCond = inverseCondCode(cmp->setCond);
+ }
+ i->op = OP_MOV;
+ i->setSrc(s, NULL);
+ if (t) {
+ i->setSrc(0, i->getSrc(t));
+ i->setSrc(t, NULL);
+ }
+ } else if (prog->getTarget()->isOpSupported(OP_EXTBF, TYPE_U32) &&
+ src->op == OP_SHR &&
+ src->src(1).getImmediate(imm1) &&
+ i->src(t).mod == Modifier(0) &&
+ util_is_power_of_two(imm0.reg.data.u32 + 1)) {
+ // low byte = offset, high byte = width
+ uint32_t ext = (util_last_bit(imm0.reg.data.u32) << 8) | imm1.reg.data.u32;
+ i->op = OP_EXTBF;
+ i->setSrc(0, src->getSrc(0));
+ i->setSrc(1, new_ImmediateValue(prog, ext));
}
}
break;
--
2.4.6
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words
[not found] ` <1439948992-17738-1-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
@ 2015-08-19 1:49 ` Ilia Mirkin
2015-08-19 1:57 ` [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF Matt Turner
1 sibling, 0 replies; 6+ messages in thread
From: Ilia Mirkin @ 2015-08-19 1:49 UTC (permalink / raw)
To: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.
This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 1 +
.../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 2 +
.../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 4 ++
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 79 ++++++++++++++++++++--
4 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index f06056f..8f15429 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -933,6 +933,7 @@ CodeEmitterGK110::emitCVT(const Instruction *i)
code[0] |= typeSizeofLog2(dType) << 10;
code[0] |= typeSizeofLog2(i->sType) << 12;
+ code[1] |= i->subOp << 12;
if (isSignedIntType(dType))
code[0] |= 0x4000;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index ef5c87d..6e22788 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -818,6 +818,7 @@ CodeEmitterGM107::emitI2F()
emitField(0x31, 1, (insn->op == OP_ABS) || insn->src(0).mod.abs());
emitCC (0x2f);
emitField(0x2d, 1, (insn->op == OP_NEG) || insn->src(0).mod.neg());
+ emitField(0x29, 2, insn->subOp);
emitRND (0x27, rnd, -1);
emitField(0x0d, 1, isSignedType(insn->sType));
emitField(0x0a, 2, util_logbase2(typeSizeof(insn->sType)));
@@ -850,6 +851,7 @@ CodeEmitterGM107::emitI2I()
emitField(0x31, 1, (insn->op == OP_ABS) || insn->src(0).mod.abs());
emitCC (0x2f);
emitField(0x2d, 1, (insn->op == OP_NEG) || insn->src(0).mod.neg());
+ emitField(0x29, 2, insn->subOp);
emitField(0x0d, 1, isSignedType(insn->sType));
emitField(0x0c, 1, isSignedType(insn->dType));
emitField(0x0a, 2, util_logbase2(typeSizeof(insn->sType)));
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5703712..6bf5219 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1020,6 +1020,10 @@ CodeEmitterNVC0::emitCVT(Instruction *i)
code[0] |= util_logbase2(typeSizeof(dType)) << 20;
code[0] |= util_logbase2(typeSizeof(i->sType)) << 23;
+ // for 8/16 source types, the byte/word is in subOp. word 1 is
+ // represented as 2.
+ code[1] |= i->subOp << 0x17;
+
if (sat)
code[0] |= 0x20;
if (abs)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index b0e74f0..e37420c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1312,7 +1312,8 @@ private:
void handleRCP(Instruction *);
void handleSLCT(Instruction *);
void handleLOGOP(Instruction *);
- void handleCVT(Instruction *);
+ void handleCVT_NEG(Instruction *);
+ void handleCVT_EXTBF(Instruction *);
void handleSUCLAMP(Instruction *);
BuildUtil bld;
@@ -1563,12 +1564,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop)
// nv50:
// F2I(NEG(I2F(ABS(SET))))
void
-AlgebraicOpt::handleCVT(Instruction *cvt)
+AlgebraicOpt::handleCVT_NEG(Instruction *cvt)
{
+ Instruction *insn = cvt->getSrc(0)->getInsn();
if (cvt->sType != TYPE_F32 ||
cvt->dType != TYPE_S32 || cvt->src(0).mod != Modifier(0))
return;
- Instruction *insn = cvt->getSrc(0)->getInsn();
if (!insn || insn->op != OP_NEG || insn->dType != TYPE_F32)
return;
if (insn->src(0).mod != Modifier(0))
@@ -1598,6 +1599,74 @@ AlgebraicOpt::handleCVT(Instruction *cvt)
delete_Instruction(prog, cvt);
}
+// Some shaders extract packed bytes out of words and convert them to
+// e.g. float. The Fermi+ CVT instruction can extract those directly, as can
+// nv50 for word sizes.
+//
+// CVT(EXTBF(x, byte/word))
+// CVT(AND(bytemask, x))
+// CVT(AND(bytemask, SHR(x, 8/16/24)))
+void
+AlgebraicOpt::handleCVT_EXTBF(Instruction *cvt)
+{
+ Instruction *insn = cvt->getSrc(0)->getInsn();
+ ImmediateValue imm0, imm1;
+ Value *arg = NULL;
+ unsigned width, offset;
+ if ((cvt->sType != TYPE_U32 && cvt->sType != TYPE_S32) || !insn)
+ return;
+ if (insn->op == OP_EXTBF && insn->src(1).getImmediate(imm0)) {
+ width = (imm0.reg.data.u32 >> 8) & 0xff;
+ offset = imm0.reg.data.u32 & 0xff;
+ arg = insn->getSrc(0);
+
+ if (width != 8 && width != 16)
+ return;
+ if (width == 8 && offset & 0x7)
+ return;
+ if (width == 16 && offset & 0xf)
+ return;
+ } else if (insn->op == OP_AND) {
+ int s;
+ if (insn->src(0).getImmediate(imm0))
+ s = 0;
+ else if (insn->src(1).getImmediate(imm0))
+ s = 1;
+ else
+ return;
+
+ if (imm0.reg.data.u32 == 0xff)
+ width = 8;
+ else if (imm0.reg.data.u32 == 0xffff)
+ width = 16;
+ else
+ return;
+
+ arg = insn->getSrc(!s);
+ Instruction *shift = arg->getInsn();
+ offset = 0;
+ if (shift && shift->op == OP_SHR &&
+ shift->src(1).getImmediate(imm1) &&
+ ((width == 8 && (imm1.reg.data.u32 & 0x7) == 0) ||
+ (width == 16 && (imm1.reg.data.u32 & 0xf) == 0))) {
+ arg = shift->getSrc(0);
+ offset = imm1.reg.data.u32;
+ }
+ }
+
+ if (!arg)
+ return;
+
+ if (width == 8) {
+ cvt->sType = cvt->sType == TYPE_U32 ? TYPE_U8 : TYPE_S8;
+ } else {
+ assert(width == 16);
+ cvt->sType = cvt->sType == TYPE_U32 ? TYPE_U16 : TYPE_S16;
+ }
+ cvt->setSrc(0, arg);
+ cvt->subOp = offset >> 3;
+}
+
// SUCLAMP dst, (ADD b imm), k, 0 -> SUCLAMP dst, b, k, imm (if imm fits s6)
void
AlgebraicOpt::handleSUCLAMP(Instruction *insn)
@@ -1668,7 +1737,9 @@ AlgebraicOpt::visit(BasicBlock *bb)
handleLOGOP(i);
break;
case OP_CVT:
- handleCVT(i);
+ handleCVT_NEG(i);
+ if (prog->getTarget()->isOpSupported(OP_EXTBF, TYPE_U32))
+ handleCVT_EXTBF(i);
break;
case OP_SUCLAMP:
handleSUCLAMP(i);
--
2.4.6
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
[not found] ` <1439948992-17738-1-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
2015-08-19 1:49 ` [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words Ilia Mirkin
@ 2015-08-19 1:57 ` Matt Turner
[not found] ` <CAEdQ38EXBDX8On9822t7brPvr-wcYr0MADnyGpQGbfU7Tce03w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-20 16:13 ` Eric Anholt
1 sibling, 2 replies; 6+ messages in thread
From: Matt Turner @ 2015-08-19 1:57 UTC (permalink / raw)
To: Ilia Mirkin
Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> Some shaders appear to extract bits using shift/and combos. Detect
> (some) of those and convert to EXTBF instead.
What is EXTBF? Extract byte to float?
I ask because Unigine Heaven has shaders that pack 3x byte-integers
into one component of a vec4 and extracts them with shifts/ands and
converts them to floats, and i965 could do the extraction and
conversion in a single instruction. I'm curious if this is the same
thing you're optimizing.
I thought about adding an extract_byte(src, byte_num) operation, but
i965's copy propagation caused me some headache and I shelved it.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
[not found] ` <CAEdQ38EXBDX8On9822t7brPvr-wcYr0MADnyGpQGbfU7Tce03w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-08-19 1:58 ` Matt Turner
2015-08-19 2:00 ` Ilia Mirkin
1 sibling, 0 replies; 6+ messages in thread
From: Matt Turner @ 2015-08-19 1:58 UTC (permalink / raw)
To: Ilia Mirkin
Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Tue, Aug 18, 2015 at 6:57 PM, Matt Turner <mattst88@gmail.com> wrote:
> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> Some shaders appear to extract bits using shift/and combos. Detect
>> (some) of those and convert to EXTBF instead.
>
> What is EXTBF? Extract byte to float?
>
> I ask because Unigine Heaven has shaders that pack 3x byte-integers
> into one component of a vec4 and extracts them with shifts/ands and
> converts them to floats, and i965 could do the extraction and
> conversion in a single instruction. I'm curious if this is the same
> thing you're optimizing.
Well, I apparently just needed to read your second patch's commit
message to confirm my suspicions.
> I thought about adding an extract_byte(src, byte_num) operation, but
> i965's copy propagation caused me some headache and I shelved it.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
[not found] ` <CAEdQ38EXBDX8On9822t7brPvr-wcYr0MADnyGpQGbfU7Tce03w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-19 1:58 ` Matt Turner
@ 2015-08-19 2:00 ` Ilia Mirkin
1 sibling, 0 replies; 6+ messages in thread
From: Ilia Mirkin @ 2015-08-19 2:00 UTC (permalink / raw)
To: Matt Turner
Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Tue, Aug 18, 2015 at 9:57 PM, Matt Turner <mattst88@gmail.com> wrote:
> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> Some shaders appear to extract bits using shift/and combos. Detect
>> (some) of those and convert to EXTBF instead.
>
> What is EXTBF? Extract byte to float?
Extract Bitfield.
>
> I ask because Unigine Heaven has shaders that pack 3x byte-integers
> into one component of a vec4 and extracts them with shifts/ands and
> converts them to floats, and i965 could do the extraction and
> conversion in a single instruction. I'm curious if this is the same
> thing you're optimizing.
>
> I thought about adding an extract_byte(src, byte_num) operation, but
> i965's copy propagation caused me some headache and I shelved it.
Yes, I think it's the same shader... it's doing a texelFetch() and
then grabbing bytes 0, 1, 2 off that.
The generated shader code after the second patch does:
/*05d0*/ TLD.LL.P R0, R24, 0x0, 2D, 0x3;
/*05d8*/ TEXDEPBAR 0x0;
/*05e0*/ I2F.F32.U8 R2, R1;
/*05e8*/ FFMA.FTZ R2, R2, R15, R19;
/*05f0*/ I2F.F32.U8 R8, R1.B1;
/*05f8*/ FFMA.FTZ R8, R8, R15, R19;
/*0608*/ I2F.F32.U8 R1, R1.B2;
I'll let you guess what these things mean. TLD = texelfetch :)
-ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
2015-08-19 1:57 ` [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF Matt Turner
[not found] ` <CAEdQ38EXBDX8On9822t7brPvr-wcYr0MADnyGpQGbfU7Tce03w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-08-20 16:13 ` Eric Anholt
1 sibling, 0 replies; 6+ messages in thread
From: Eric Anholt @ 2015-08-20 16:13 UTC (permalink / raw)
To: Matt Turner, Ilia Mirkin; +Cc: mesa-dev, nouveau
[-- Attachment #1.1: Type: text/plain, Size: 947 bytes --]
Matt Turner <mattst88@gmail.com> writes:
> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> Some shaders appear to extract bits using shift/and combos. Detect
>> (some) of those and convert to EXTBF instead.
>
> What is EXTBF? Extract byte to float?
>
> I ask because Unigine Heaven has shaders that pack 3x byte-integers
> into one component of a vec4 and extracts them with shifts/ands and
> converts them to floats, and i965 could do the extraction and
> conversion in a single instruction. I'm curious if this is the same
> thing you're optimizing.
>
> I thought about adding an extract_byte(src, byte_num) operation, but
> i965's copy propagation caused me some headache and I shelved it.
I could use this one, as int, uint, and unorm unpacks. Right now for
int/uint I'm recognizing the pattern in vc4_program.c (in a branch).
I'd be interested in writing the NIR bits if others are interested in
having this.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
[-- Attachment #2: Type: text/plain, Size: 156 bytes --]
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-08-20 16:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-19 1:49 [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF Ilia Mirkin
[not found] ` <1439948992-17738-1-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
2015-08-19 1:49 ` [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words Ilia Mirkin
2015-08-19 1:57 ` [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF Matt Turner
[not found] ` <CAEdQ38EXBDX8On9822t7brPvr-wcYr0MADnyGpQGbfU7Tce03w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-19 1:58 ` Matt Turner
2015-08-19 2:00 ` Ilia Mirkin
2015-08-20 16:13 ` Eric Anholt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.