All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nouveau: codegen: Take src swizzle into account on loads
@ 2016-04-07 13:27 Hans de Goede
       [not found] ` <1460035648-3804-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2016-04-07 13:27 UTC (permalink / raw)
  To: Ilia Mirkin, Samuel Pitoiset; +Cc: mesa-dev, nouveau

The llvm TGSI backend does things like:

LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0].x

Expecting the data at address TEMP[0].x to get loaded to
TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be
loaded instead. This commit fixes this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 557608e..cc51f5a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4])
 
          Value *off = fetchSrc(1, c);
          Symbol *sym;
+         uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c) * 4;
+
          if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) {
             off = NULL;
             sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
-                          tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
+                          tgsi.getSrc(1).getValueU32(0, info) +
+                          src0_component_offset);
          } else {
-            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c);
+            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
+                          src0_component_offset);
          }
 
          Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off);
-- 
2.7.3

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] nouveau: codegen: Take src swizzle into account on loads
       [not found] ` <1460035648-3804-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-04-07 13:58   ` Ilia Mirkin
       [not found]     ` <CAKb7Uvhf=6TngukZe976+fX2zibE6U4D05zYOUEj_F5mCVKOBA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ilia Mirkin @ 2016-04-07 13:58 UTC (permalink / raw)
  To: Hans de Goede
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

That's wrong. The spec for the instruction needs to be clarified...
The current nouveau impl is correct - only the .x of the address
should be loaded, with up to 16 bytes read into the destination.

On Thu, Apr 7, 2016 at 9:27 AM, Hans de Goede <hdegoede@redhat.com> wrote:
> The llvm TGSI backend does things like:
>
> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0].x
>
> Expecting the data at address TEMP[0].x to get loaded to
> TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be
> loaded instead. This commit fixes this.
>
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> index 557608e..cc51f5a 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
> @@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4])
>
>           Value *off = fetchSrc(1, c);
>           Symbol *sym;
> +         uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c) * 4;
> +
>           if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) {
>              off = NULL;
>              sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
> -                          tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
> +                          tgsi.getSrc(1).getValueU32(0, info) +
> +                          src0_component_offset);
>           } else {
> -            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c);
> +            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
> +                          src0_component_offset);
>           }
>
>           Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off);
> --
> 2.7.3
>
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nouveau: codegen: Take src swizzle into account on loads
       [not found]     ` <CAKb7Uvhf=6TngukZe976+fX2zibE6U4D05zYOUEj_F5mCVKOBA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-04-08  9:27       ` Hans de Goede
       [not found]         ` <5707797B.4040708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2016-04-08  9:27 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

On 07-04-16 15:58, Ilia Mirkin wrote:
> That's wrong.

It used to work with the old RES[] code and if one cannot specify
a source swizzle, then how can I do something like

LOAD TEMP[0].y, MEMORY[0], address

And get the data at absolute global memory address "address" into TEMP[0].y ?

This is a must-have for llvm to be able to generate working TGSI code,
I do not see any way around this.

AFAIK this is exactly what src-swizzling is for. Also note that
this commit does not change anything if no src-swizzling is specified,
in that case things work exactly as before.

 > The spec for the instruction needs to be clarified...
> The current nouveau impl is correct - only the .x of the address
> should be loaded, with up to 16 bytes read into the destination.

Ah note this is not about swizzling on the address, that indeed
makes no sense given how the addressing works for BUFFERS / MEMORY,
no this is about adding a swizlling postfix to the buffer / memory
resource specification, for example:

LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]

See the swizzling is done on the resource, not on the address, so
the swizzling specifies swizzling of the up to 16 bytes read from
address, it does not influence the address handling at all.

I now see I made an error in my commit msg, it gives the following
example:

LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0].x

This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
the correct example would be:

LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]

Regards,

Hans


> On Thu, Apr 7, 2016 at 9:27 AM, Hans de Goede <hdegoede@redhat.com> wrote:
>> The llvm TGSI backend does things like:
>>
>>
>>
>> Expecting the data at address TEMP[0].x to get loaded to
>> TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be
>> loaded instead. This commit fixes this.
>>
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>>   src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> index 557608e..cc51f5a 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> @@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4])
>>
>>            Value *off = fetchSrc(1, c);
>>            Symbol *sym;
>> +         uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c) * 4;
>> +
>>            if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) {
>>               off = NULL;
>>               sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
>> -                          tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
>> +                          tgsi.getSrc(1).getValueU32(0, info) +
>> +                          src0_component_offset);
>>            } else {
>> -            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c);
>> +            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
>> +                          src0_component_offset);
>>            }
>>
>>            Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off);
>> --
>> 2.7.3
>>
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nouveau: codegen: Take src swizzle into account on loads
       [not found]         ` <5707797B.4040708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-04-08 13:56           ` Ilia Mirkin
  2016-04-08 15:02           ` Ilia Mirkin
  1 sibling, 0 replies; 10+ messages in thread
From: Ilia Mirkin @ 2016-04-08 13:56 UTC (permalink / raw)
  To: Hans de Goede
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 3596 bytes --]

Ah, I may have read over your commit too hastily. Will have another look.
On Apr 8, 2016 5:27 AM, "Hans de Goede" <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Hi,
>
> On 07-04-16 15:58, Ilia Mirkin wrote:
>
>> That's wrong.
>>
>
> It used to work with the old RES[] code and if one cannot specify
> a source swizzle, then how can I do something like
>
> LOAD TEMP[0].y, MEMORY[0], address
>
> And get the data at absolute global memory address "address" into
> TEMP[0].y ?
>
> This is a must-have for llvm to be able to generate working TGSI code,
> I do not see any way around this.
>
> AFAIK this is exactly what src-swizzling is for. Also note that
> this commit does not change anything if no src-swizzling is specified,
> in that case things work exactly as before.
>
> > The spec for the instruction needs to be clarified...
>
>> The current nouveau impl is correct - only the .x of the address
>> should be loaded, with up to 16 bytes read into the destination.
>>
>
> Ah note this is not about swizzling on the address, that indeed
> makes no sense given how the addressing works for BUFFERS / MEMORY,
> no this is about adding a swizlling postfix to the buffer / memory
> resource specification, for example:
>
> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]
>
> See the swizzling is done on the resource, not on the address, so
> the swizzling specifies swizzling of the up to 16 bytes read from
> address, it does not influence the address handling at all.
>
> I now see I made an error in my commit msg, it gives the following
> example:
>
> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0].x
>
> This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
> the correct example would be:
>
> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]
>
> Regards,
>
> Hans
>
>
> On Thu, Apr 7, 2016 at 9:27 AM, Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>
>>> The llvm TGSI backend does things like:
>>>
>>>
>>>
>>> Expecting the data at address TEMP[0].x to get loaded to
>>> TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be
>>> loaded instead. This commit fixes this.
>>>
>>> Signed-off-by: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> ---
>>>   src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> index 557608e..cc51f5a 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> @@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4])
>>>
>>>            Value *off = fetchSrc(1, c);
>>>            Symbol *sym;
>>> +         uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c)
>>> * 4;
>>> +
>>>            if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) {
>>>               off = NULL;
>>>               sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
>>> -                          tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
>>> +                          tgsi.getSrc(1).getValueU32(0, info) +
>>> +                          src0_component_offset);
>>>            } else {
>>> -            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c);
>>> +            sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
>>> +                          src0_component_offset);
>>>            }
>>>
>>>            Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off);
>>> --
>>> 2.7.3
>>>
>>>

[-- Attachment #1.2: Type: text/html, Size: 4821 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nouveau: codegen: Take src swizzle into account on loads
       [not found]         ` <5707797B.4040708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-04-08 13:56           ` Ilia Mirkin
@ 2016-04-08 15:02           ` Ilia Mirkin
       [not found]             ` <CAKb7UvjhSJGCH2bLJWc+A8hOUepR8=AnjKjLWTW9ipr44N2NnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Ilia Mirkin @ 2016-04-08 15:02 UTC (permalink / raw)
  To: Hans de Goede
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goede <hdegoede@redhat.com> wrote:
> Hi,
>
> On 07-04-16 15:58, Ilia Mirkin wrote:
>>
>> That's wrong.
>
>
> It used to work with the old RES[] code and if one cannot specify
> a source swizzle, then how can I do something like
>
> LOAD TEMP[0].y, MEMORY[0], address
>
> And get the data at absolute global memory address "address" into TEMP[0].y
> ?
>
> This is a must-have for llvm to be able to generate working TGSI code,
> I do not see any way around this.
>
> AFAIK this is exactly what src-swizzling is for. Also note that
> this commit does not change anything if no src-swizzling is specified,
> in that case things work exactly as before.
>
>> The spec for the instruction needs to be clarified...
>>
>> The current nouveau impl is correct - only the .x of the address
>> should be loaded, with up to 16 bytes read into the destination.
>
>
> Ah note this is not about swizzling on the address, that indeed
> makes no sense given how the addressing works for BUFFERS / MEMORY,
> no this is about adding a swizlling postfix to the buffer / memory
> resource specification, for example:
>
> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]
>
> See the swizzling is done on the resource, not on the address, so
> the swizzling specifies swizzling of the up to 16 bytes read from
> address, it does not influence the address handling at all.
>
> I now see I made an error in my commit msg, it gives the following
> example:
>
> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0].x
>
> This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
> the correct example would be:
>
> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]

I stand by my comment of "working as intended". But that doesn't mean
the intent can't be changed :)

For memory/buffers, LOAD takes the address at TEMP[0].x and loads 16
bytes (4 words), and sticks them into the destination's .xyzw. If you
happen to have a writemask, then only some of those are written out.

It seems that you're trying to add additional meaning to the swizzle
on the "memory" argument. However I don't believe that such a thing is
defined. (And definitely not used anywhere, at least not on purpose.)

Why does this cause you issues with LLVM-generated TGSI?

  -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nouveau: codegen: Take src swizzle into account on loads
       [not found]             ` <CAKb7UvjhSJGCH2bLJWc+A8hOUepR8=AnjKjLWTW9ipr44N2NnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-04-08 15:28               ` Hans de Goede
       [not found]                 ` <5707CE34.6050904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2016-04-08 15:28 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

On 08-04-16 17:02, Ilia Mirkin wrote:
> On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goede <hdegoede@redhat.com> wrote:
>> Hi,
>>
>> On 07-04-16 15:58, Ilia Mirkin wrote:
>>>
>>> That's wrong.
>>
>>
>> It used to work with the old RES[] code and if one cannot specify
>> a source swizzle, then how can I do something like
>>
>> LOAD TEMP[0].y, MEMORY[0], address
>>
>> And get the data at absolute global memory address "address" into TEMP[0].y
>> ?
>>
>> This is a must-have for llvm to be able to generate working TGSI code,
>> I do not see any way around this.
>>
>> AFAIK this is exactly what src-swizzling is for. Also note that
>> this commit does not change anything if no src-swizzling is specified,
>> in that case things work exactly as before.
>>
>>> The spec for the instruction needs to be clarified...
>>>
>>> The current nouveau impl is correct - only the .x of the address
>>> should be loaded, with up to 16 bytes read into the destination.
>>
>>
>> Ah note this is not about swizzling on the address, that indeed
>> makes no sense given how the addressing works for BUFFERS / MEMORY,
>> no this is about adding a swizlling postfix to the buffer / memory
>> resource specification, for example:
>>
>> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]
>>
>> See the swizzling is done on the resource, not on the address, so
>> the swizzling specifies swizzling of the up to 16 bytes read from
>> address, it does not influence the address handling at all.
>>
>> I now see I made an error in my commit msg, it gives the following
>> example:
>>
>> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0].x
>>
>> This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
>> the correct example would be:
>>
>> LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]
>
> I stand by my comment of "working as intended". But that doesn't mean
> the intent can't be changed :)
>
> For memory/buffers, LOAD takes the address at TEMP[0].x and loads 16
> bytes (4 words), and sticks them into the destination's .xyzw. If you
> happen to have a writemask, then only some of those are written out.
>
> It seems that you're trying to add additional meaning to the swizzle
> on the "memory" argument. However I don't believe that such a thing is
> defined. (And definitely not used anywhere, at least not on purpose.)
>
> Why does this cause you issues with LLVM-generated TGSI?

When dealing with non vector variables the llvm register allocator
will use TEMP[0].x then TEMP[0].y, etc.

When loading something from a global buffer it will calculate the
address to use, and store that in say TEMP[0].x, so it ends up
generating:

LOAD TEMP[0].y, MEMORY[0], TEMP[0]

Expecting the contents of TEMP[0].y to become the 32 bits of data
to which TEMP[0].x is pointing. But instead it will get the 32 bits of
data at address (TEMP[0].x + 4).

With the old RES[32767] code one could generate the following TGSI:

LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]

And things would work fine since the .xxxx swizzling postfix would
be honored and when storing to y (the only component set in the dest-mask)
the x component at address (TEMP[0].x) would be loaded, rather then the
y component at (TEMP[0].y)

Note that another approach would be to not increment the address by
a 32 bit word for skipped (not set in destmask) components.

The way I see it either:

1) We see that LOAD does not deal with vectors, but with flat memory,
in which case skipping 4 bytes because x is not set in the destmask
does not make sense, as that is a vector thing todo.

2) LOAD is vector layout aware in which case supporting swizzling
makes sense.

Currently we have a weird hybrid which is rather cumbersome to
work with from a compiler pov.

Regards,

Hans













>
>    -ilia
>
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nouveau: codegen: Take src swizzle into account on loads
       [not found]                 ` <5707CE34.6050904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-04-08 15:45                   ` Ilia Mirkin
       [not found]                     ` <CAKb7Uvg5c1NorKA3uvnXfToZb+QUwRzZ0T9cUaA5jEuiBma+gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ilia Mirkin @ 2016-04-08 15:45 UTC (permalink / raw)
  To: Hans de Goede
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede@redhat.com> wrote:
> When dealing with non vector variables the llvm register allocator
> will use TEMP[0].x then TEMP[0].y, etc.
>
> When loading something from a global buffer it will calculate the
> address to use, and store that in say TEMP[0].x, so it ends up
> generating:
>
> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>
> Expecting the contents of TEMP[0].y to become the 32 bits of data
> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
> data at address (TEMP[0].x + 4).
>
> With the old RES[32767] code one could generate the following TGSI:
>
> LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]
>
> And things would work fine since the .xxxx swizzling postfix would
> be honored and when storing to y (the only component set in the dest-mask)
> the x component at address (TEMP[0].x) would be loaded, rather then the
> y component at (TEMP[0].y)
>
> Note that another approach would be to not increment the address by
> a 32 bit word for skipped (not set in destmask) components.
>
> The way I see it either:
>
> 1) We see that LOAD does not deal with vectors, but with flat memory,
> in which case skipping 4 bytes because x is not set in the destmask
> does not make sense, as that is a vector thing todo.
>
> 2) LOAD is vector layout aware in which case supporting swizzling
> makes sense.
>
> Currently we have a weird hybrid which is rather cumbersome to
> work with from a compiler pov.

And I guess LLVM never ends up generating any of the other "funny"
instructions like LIT and the such. Well, I have no problem adding the
swizzling logic, i.e. the way that LOAD will now work (logically) is
that it will

(a) fetch 4 values from the coordinates provided (4 sequential dwords
from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
the case of images)
(b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
(c) store that swizzled result into the destination based on the writemask

That would sound reasonable to me, and if I understand correctly, is
option 2 of your proposal. We'd need some docs updates and buy-in from
the other gallium driver developers.

STORE remains unchanged, as the MEMORY/etc is in the destination,
where there is a writemask, which is presently used and will remain
effective.

  -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nouveau: codegen: Take src swizzle into account on loads
       [not found]                     ` <CAKb7Uvg5c1NorKA3uvnXfToZb+QUwRzZ0T9cUaA5jEuiBma+gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-04-08 16:06                       ` Hans de Goede
  2016-04-08 16:26                         ` [Nouveau] " Hans de Goede
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2016-04-08 16:06 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: mesa-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

On 08-04-16 17:45, Ilia Mirkin wrote:
> On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede@redhat.com> wrote:
>> When dealing with non vector variables the llvm register allocator
>> will use TEMP[0].x then TEMP[0].y, etc.
>>
>> When loading something from a global buffer it will calculate the
>> address to use, and store that in say TEMP[0].x, so it ends up
>> generating:
>>
>> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>>
>> Expecting the contents of TEMP[0].y to become the 32 bits of data
>> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
>> data at address (TEMP[0].x + 4).
>>
>> With the old RES[32767] code one could generate the following TGSI:
>>
>> LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]
>>
>> And things would work fine since the .xxxx swizzling postfix would
>> be honored and when storing to y (the only component set in the dest-mask)
>> the x component at address (TEMP[0].x) would be loaded, rather then the
>> y component at (TEMP[0].y)
>>
>> Note that another approach would be to not increment the address by
>> a 32 bit word for skipped (not set in destmask) components.
>>
>> The way I see it either:
>>
>> 1) We see that LOAD does not deal with vectors, but with flat memory,
>> in which case skipping 4 bytes because x is not set in the destmask
>> does not make sense, as that is a vector thing todo.
>>
>> 2) LOAD is vector layout aware in which case supporting swizzling
>> makes sense.
>>
>> Currently we have a weird hybrid which is rather cumbersome to
>> work with from a compiler pov.
>
> And I guess LLVM never ends up generating any of the other "funny"
> instructions like LIT and the such. Well, I have no problem adding the
> swizzling logic, i.e. the way that LOAD will now work (logically) is
> that it will
>
> (a) fetch 4 values from the coordinates provided (4 sequential dwords
> from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
> the case of images)
> (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
> (c) store that swizzled result into the destination based on the writemask
>
> That would sound reasonable to me, and if I understand correctly, is
> option 2 of your proposal.

Yes that is option 2, and is basically what the patch which started this
thread does. So that would work for me :)

> We'd need some docs updates and buy-in from the other gallium driver developers.

What docs would need updating ? The TGSI docs I'm aware of are at:

http://gallium.readthedocs.org/en/latest/tgsi.html

I assume those have a source in the mesa src somewhere (I've not looked),
but those mostly just look quite incomplete in general when it comes to TGSI
(I've had to revert to figuring what things do from the mesa srcs quite often)

Have I been looking at the wrong docs perhaps ?

Note that them being incomplete is not intended as an excuse to not document
this, I'm all for better documentation.

> STORE remains unchanged, as the MEMORY/etc is in the destination,
> where there is a writemask, which is presently used and will remain
> effective.

Right and note that the first src operand of STORE already takes swizzling
into account, so the proposed option 2 will actually make the 2 more inline.

Regards,

Hans
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads
  2016-04-08 16:06                       ` Hans de Goede
@ 2016-04-08 16:26                         ` Hans de Goede
  2016-04-08 16:34                           ` Ilia Mirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2016-04-08 16:26 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: mesa-dev, nouveau

Hi,

On 08-04-16 18:06, Hans de Goede wrote:
> Hi,
>
> On 08-04-16 17:45, Ilia Mirkin wrote:
>> On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede@redhat.com> wrote:
>>> When dealing with non vector variables the llvm register allocator
>>> will use TEMP[0].x then TEMP[0].y, etc.
>>>
>>> When loading something from a global buffer it will calculate the
>>> address to use, and store that in say TEMP[0].x, so it ends up
>>> generating:
>>>
>>> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>>>
>>> Expecting the contents of TEMP[0].y to become the 32 bits of data
>>> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
>>> data at address (TEMP[0].x + 4).
>>>
>>> With the old RES[32767] code one could generate the following TGSI:
>>>
>>> LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]
>>>
>>> And things would work fine since the .xxxx swizzling postfix would
>>> be honored and when storing to y (the only component set in the dest-mask)
>>> the x component at address (TEMP[0].x) would be loaded, rather then the
>>> y component at (TEMP[0].y)
>>>
>>> Note that another approach would be to not increment the address by
>>> a 32 bit word for skipped (not set in destmask) components.
>>>
>>> The way I see it either:
>>>
>>> 1) We see that LOAD does not deal with vectors, but with flat memory,
>>> in which case skipping 4 bytes because x is not set in the destmask
>>> does not make sense, as that is a vector thing todo.
>>>
>>> 2) LOAD is vector layout aware in which case supporting swizzling
>>> makes sense.
>>>
>>> Currently we have a weird hybrid which is rather cumbersome to
>>> work with from a compiler pov.
>>
>> And I guess LLVM never ends up generating any of the other "funny"
>> instructions like LIT and the such. Well, I have no problem adding the
>> swizzling logic, i.e. the way that LOAD will now work (logically) is
>> that it will
>>
>> (a) fetch 4 values from the coordinates provided (4 sequential dwords
>> from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
>> the case of images)
>> (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
>> (c) store that swizzled result into the destination based on the writemask
>>
>> That would sound reasonable to me, and if I understand correctly, is
>> option 2 of your proposal.
>
> Yes that is option 2, and is basically what the patch which started this
> thread does. So that would work for me :)
>
>> We'd need some docs updates and buy-in from the other gallium driver developers.
>
> What docs would need updating ? The TGSI docs I'm aware of are at:
>
> http://gallium.readthedocs.org/en/latest/tgsi.html
>
> I assume those have a source in the mesa src somewhere (I've not looked),
> but those mostly just look quite incomplete in general when it comes to TGSI
> (I've had to revert to figuring what things do from the mesa srcs quite often)
>
> Have I been looking at the wrong docs perhaps ?
>
> Note that them being incomplete is not intended as an excuse to not document
> this, I'm all for better documentation.
>
>> STORE remains unchanged, as the MEMORY/etc is in the destination,
>> where there is a writemask, which is presently used and will remain
>> effective.
>
> Right and note that the first src operand of STORE already takes swizzling
> into account, so the proposed option 2 will actually make the 2 more inline.

Erm, I mean the 2nd src operand of the store of-course, the actual src.

On a related note, comparing handleLOAD and handleSTORE, this bit in
handleLOAD seems wrong:

          Value *off = fetchSrc(1, c);

I believe that should be:

	 Value *off = fetchSrc(1, 0);

Just like handleSTORE does:

	 off = fetchSrc(0, 0);
	
And always using a 'c' of 0 seems correct here since we are dealing
with an address.

Once I know which docs to update for this, I'll do a v2 of this patch
and add a preparation patch fixing the above to the v2 set.

Regards,

Hans
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads
  2016-04-08 16:26                         ` [Nouveau] " Hans de Goede
@ 2016-04-08 16:34                           ` Ilia Mirkin
  0 siblings, 0 replies; 10+ messages in thread
From: Ilia Mirkin @ 2016-04-08 16:34 UTC (permalink / raw)
  To: Hans de Goede; +Cc: mesa-dev, nouveau

On Fri, Apr 8, 2016 at 12:26 PM, Hans de Goede <hdegoede@redhat.com> wrote:
> Hi,
>
>
> On 08-04-16 18:06, Hans de Goede wrote:
>>
>> Hi,
>>
>> On 08-04-16 17:45, Ilia Mirkin wrote:
>>>
>>> On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede <hdegoede@redhat.com>
>>> wrote:
>>>>
>>>> When dealing with non vector variables the llvm register allocator
>>>> will use TEMP[0].x then TEMP[0].y, etc.
>>>>
>>>> When loading something from a global buffer it will calculate the
>>>> address to use, and store that in say TEMP[0].x, so it ends up
>>>> generating:
>>>>
>>>> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>>>>
>>>> Expecting the contents of TEMP[0].y to become the 32 bits of data
>>>> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
>>>> data at address (TEMP[0].x + 4).
>>>>
>>>> With the old RES[32767] code one could generate the following TGSI:
>>>>
>>>> LOAD TEMP[0].y, RES[32767].xxxx, TEMP[0]
>>>>
>>>> And things would work fine since the .xxxx swizzling postfix would
>>>> be honored and when storing to y (the only component set in the
>>>> dest-mask)
>>>> the x component at address (TEMP[0].x) would be loaded, rather then the
>>>> y component at (TEMP[0].y)
>>>>
>>>> Note that another approach would be to not increment the address by
>>>> a 32 bit word for skipped (not set in destmask) components.
>>>>
>>>> The way I see it either:
>>>>
>>>> 1) We see that LOAD does not deal with vectors, but with flat memory,
>>>> in which case skipping 4 bytes because x is not set in the destmask
>>>> does not make sense, as that is a vector thing todo.
>>>>
>>>> 2) LOAD is vector layout aware in which case supporting swizzling
>>>> makes sense.
>>>>
>>>> Currently we have a weird hybrid which is rather cumbersome to
>>>> work with from a compiler pov.
>>>
>>>
>>> And I guess LLVM never ends up generating any of the other "funny"
>>> instructions like LIT and the such. Well, I have no problem adding the
>>> swizzling logic, i.e. the way that LOAD will now work (logically) is
>>> that it will
>>>
>>> (a) fetch 4 values from the coordinates provided (4 sequential dwords
>>> from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
>>> the case of images)
>>> (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE
>>> argument
>>> (c) store that swizzled result into the destination based on the
>>> writemask
>>>
>>> That would sound reasonable to me, and if I understand correctly, is
>>> option 2 of your proposal.
>>
>>
>> Yes that is option 2, and is basically what the patch which started this
>> thread does. So that would work for me :)
>>
>>> We'd need some docs updates and buy-in from the other gallium driver
>>> developers.
>>
>>
>> What docs would need updating ? The TGSI docs I'm aware of are at:
>>
>> http://gallium.readthedocs.org/en/latest/tgsi.html
>>
>> I assume those have a source in the mesa src somewhere (I've not looked),
>> but those mostly just look quite incomplete in general when it comes to
>> TGSI
>> (I've had to revert to figuring what things do from the mesa srcs quite
>> often)
>>
>> Have I been looking at the wrong docs perhaps ?
>>
>> Note that them being incomplete is not intended as an excuse to not
>> document
>> this, I'm all for better documentation.
>>
>>> STORE remains unchanged, as the MEMORY/etc is in the destination,
>>> where there is a writemask, which is presently used and will remain
>>> effective.
>>
>>
>> Right and note that the first src operand of STORE already takes swizzling
>> into account, so the proposed option 2 will actually make the 2 more
>> inline.
>
>
> Erm, I mean the 2nd src operand of the store of-course, the actual src.
>
> On a related note, comparing handleLOAD and handleSTORE, this bit in
> handleLOAD seems wrong:
>
>          Value *off = fetchSrc(1, c);
>
> I believe that should be:
>
>          Value *off = fetchSrc(1, 0);

Yep, that's wrong. I think I was waffling back and forth early on in
the lifetime of the patchset about how it would work, and one of the
options was to read each dword from a separate offset. (I think I
started implementing atomic buffers well over a year ago, only to be
stymied by the memory window issue and give up for a long time.) I
eventually came to realize that was insanity.

>
> Just like handleSTORE does:
>
>          off = fetchSrc(0, 0);
>
> And always using a 'c' of 0 seems correct here since we are dealing
> with an address.
>
> Once I know which docs to update for this, I'll do a v2 of this patch
> and add a preparation patch fixing the above to the v2 set.

src/gallium/docs/source/tgsi.rst

There are push hooks which make readthedocs.org re-pull from the mesa
repo on every push so that things are up to date (well, it takes a few
minutes to regenerate the html).

  -ilia
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-08 16:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-07 13:27 [PATCH] nouveau: codegen: Take src swizzle into account on loads Hans de Goede
     [not found] ` <1460035648-3804-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-07 13:58   ` Ilia Mirkin
     [not found]     ` <CAKb7Uvhf=6TngukZe976+fX2zibE6U4D05zYOUEj_F5mCVKOBA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-08  9:27       ` Hans de Goede
     [not found]         ` <5707797B.4040708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-08 13:56           ` Ilia Mirkin
2016-04-08 15:02           ` Ilia Mirkin
     [not found]             ` <CAKb7UvjhSJGCH2bLJWc+A8hOUepR8=AnjKjLWTW9ipr44N2NnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-08 15:28               ` Hans de Goede
     [not found]                 ` <5707CE34.6050904-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-08 15:45                   ` Ilia Mirkin
     [not found]                     ` <CAKb7Uvg5c1NorKA3uvnXfToZb+QUwRzZ0T9cUaA5jEuiBma+gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-08 16:06                       ` Hans de Goede
2016-04-08 16:26                         ` [Nouveau] " Hans de Goede
2016-04-08 16:34                           ` Ilia Mirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.