From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ilia Mirkin Subject: Re: [PATCH v2] nv50/ir: optimize shl(a, 0) to a Date: Sat, 29 Apr 2017 20:28:26 -0400 Message-ID: References: <20170429220934.3355-1-karolherbst@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1396970365==" Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: mesa-dev-bounces@lists.freedesktop.org Sender: "mesa-dev" To: Karol Herbst Cc: nouveau@lists.freedesktop.org, mesa-dev@lists.freedesktop.org List-Id: nouveau.vger.kernel.org --===============1396970365== Content-Type: multipart/alternative; boundary=001a1140a35a95db9b054e5762d2 --001a1140a35a95db9b054e5762d2 Content-Type: text/plain; charset=UTF-8 Maybe in a separate change. I'd want to double check on all gens. I think the thing I suggested is sufficient. On Apr 29, 2017 8:09 PM, "Karol Herbst" wrote: 2017-04-30 0:28 GMT+02:00 Ilia Mirkin : > On Sat, Apr 29, 2017 at 6:09 PM, Karol Herbst wrote: >> helps two alien isolation shaders >> >> shader-db: >> total instructions in shared programs : 4251497 -> 4251494 (-0.00%) >> total gprs used in shared programs : 513962 -> 513962 (0.00%) >> total local used in shared programs : 29797 -> 29797 (0.00%) >> total bytes used in shared programs : 38960264 -> 38960232 (-0.00%) >> >> local gpr inst bytes >> helped 0 0 2 2 >> hurt 0 0 0 0 >> >> v2: handle potential mods on src0 >> >> Signed-off-by: Karol Herbst >> Reviewed-by: Samuel Pitoiset >> Reviewed-by: Ilia Mirkin >> --- >> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> index 015def0391..82da0d3e48 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> @@ -1284,6 +1284,13 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) >> >> case OP_SHL: >> { >> + if (s == 1 && imm0.isInteger(0)) { >> + i->op = i->src(0).mod.getOp(); >> + if (i->op != OP_CVT) >> + i->src(0).mod = 0; > > Is this necessary? Presumably if the op != 0, then op == OP_CVT... > yeah, no idea. I just thought I do it right when I actually depend on the getOp magic. But we can't emit any mods to begin with, so maybe we should just drop the mod handling and be done with it? >> + i->setSrc(1, NULL); >> + break; >> + } >> if (s != 1 || i->src(0).mod != Modifier(0)) >> break; >> // try to concatenate shifts >> -- >> 2.12.2 >> --001a1140a35a95db9b054e5762d2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Maybe in a separate change. I'd want to double check = on all gens. I think the thing I suggested is sufficient.

On Apr 29, 2017 8:09 PM, &q= uot;Karol Herbst" <karolhe= rbst@gmail.com> wrote:
2017-04-30 0:28 GMT+02:00 Ilia Mirkin <imirkin@alum.mit.edu>:
> On Sat, Apr 29, 2017 at 6:09 PM, Karol Herbst <karolherbst@gmail.com> wrote:
>> helps two alien isolation shaders
>>
>> shader-db:
>> total instructions in shared programs : 4251497 -> 4251494 (-0.= 00%)
>> total gprs used in shared programs=C2=A0 =C2=A0 : 513962 -> 513= 962 (0.00%)
>> total local used in shared programs=C2=A0 =C2=A0: 29797 -> 2979= 7 (0.00%)
>> total bytes used in shared programs=C2=A0 =C2=A0: 38960264 -> 3= 8960232 (-0.00%)
>>
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0local= =C2=A0 =C2=A0 =C2=A0 =C2=A0 gpr=C2=A0 =C2=A0 =C2=A0 =C2=A0inst=C2=A0 =C2=A0= =C2=A0 bytes
>>=C2=A0 =C2=A0 =C2=A0helped=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A02=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0hurt=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
>>
>> v2: handle potential mods on src0
>>
>> Signed-off-by: Karol Herbst <karolherbst@gmail.com>
>> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
>> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
>> ---
>>=C2=A0 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cp= p | 7 +++++++
>>=C2=A0 1 file changed, 7 insertions(+)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peep= hole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> index 015def0391..82da0d3e48 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cp= p
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cp= p
>> @@ -1284,6 +1284,13 @@ ConstantFolding::opnd(Instruction *i, = ImmediateValue &imm0, int s)
>>
>>=C2=A0 =C2=A0 =C2=A0case OP_SHL:
>>=C2=A0 =C2=A0 =C2=A0{
>> +=C2=A0 =C2=A0 =C2=A0 if (s =3D=3D 1 && imm0.isInteger(0))= {
>> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0i->op =3D i->src(0).mod.g= etOp();
>> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (i->op !=3D OP_CVT)
>> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i->src(0).mod =3D 0;=
>
> Is this necessary? Presumably if the op !=3D 0, then op =3D=3D OP_CVT.= ..
>

yeah, no idea. I just thought I do it right when I actually depend on=
the getOp magic. But we can't emit any mods to begin with, so maybe we<= br> should just drop the mod handling and be done with it?

>> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0i->setSrc(1, NULL);
>> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0break;
>> +=C2=A0 =C2=A0 =C2=A0 }
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (s !=3D 1 || i->src(0).mod !=3D M= odifier(0))
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0break;
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 // try to concatenate shifts
>> --
>> 2.12.2
>>

--001a1140a35a95db9b054e5762d2-- --===============1396970365== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbWVzYS1kZXYg bWFpbGluZyBsaXN0Cm1lc2EtZGV2QGxpc3RzLmZyZWVkZXNrdG9wLm9yZwpodHRwczovL2xpc3Rz LmZyZWVkZXNrdG9wLm9yZy9tYWlsbWFuL2xpc3RpbmZvL21lc2EtZGV2Cg== --===============1396970365==--