All of lore.kernel.org
 help / color / mirror / Atom feed
* Some llvm questions (for tgsi backend)
@ 2016-01-11 11:07 Hans de Goede
       [not found] ` <56938CE2.1010705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Hans de Goede @ 2016-01-11 11:07 UTC (permalink / raw)
  To: Tom Stellard, Francisco Jerez, Samuel Pitoiset,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

After a few distractions I'm back to work on the llvm tgsi backend. I've
added clang integration and I can now compile a simple opencl program
to something which sort of looks like tgsi.

You can find my latest work on this here:
http://cgit.freedesktop.org/~jwrdegoede/llvm
http://cgit.freedesktop.org/~jwrdegoede/clang
(the latter may still need to sync)

I've a little test program of which I have 3 versions now,
1 raw gallium calls + a tgsi kernel
2 opencl calls to clover + a tgsi kernel
3 opencl calls to clover + an opencl kernel

1 and 2 have been tested on a kepler card, 3 has been
tested with pocl. My goal for this week is to get
the tgsi backend to produce code which I can copy
and paste into 2 and then have it working on a kepler card.

The test program looks like this:

__kernel void test_kern(__global uint *vals, __global uint *buf)
{
   uint id = get_global_id(0);

   buf[32 * id] -= vals[id];
}

The llvm ir looks like this:

bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl

; ModuleID = '/home/hans/foo.cl'
target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
target triple = "tgsi--"

; Function Attrs: nounwind
define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 {
entry:
   %call = tail call i32 @_Z13get_global_idj(i32 0) #2
   %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
   %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
   %mul = shl i32 %call, 5
   %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
   %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
   %sub = sub i32 %1, %0
   store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
   ret void
}

declare i32 @_Z13get_global_idj(i32) #1

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }

!opencl.kernels = !{!0}
!llvm.ident = !{!6}

!0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5}
!1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
!2 = !{!"kernel_arg_access_qual", !"none", !"none"}
!3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
!4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
!5 = !{!"kernel_arg_type_qual", !"", !""}
!6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"}
!7 = !{!8, !8, i64 0}
!8 = !{!"int", !9, i64 0}
!9 = !{!"omnipotent char", !10, i64 0}
!10 = !{!"Simple C/C++ TBAA"}

And the "tgsi" looks like this:

         .text
         .file   "/home/hans/foo.cl"
         .globl  test_kern
test_kern:
         BGNSUB
         MOVis TEMP1x, 0
         CAL _Z13get_global_idj
         SHLs TEMP1y, TEMP1x, 7
         LOADiis TEMP1z, [4]
         UADDs TEMP1y, TEMP1z, TEMP1y
         SHLs TEMP1x, TEMP1x, 2
         LOADiis TEMP1z, [0]
         UADDs TEMP1x, TEMP1z, TEMP1x
         LOADgis TEMP1x, [TEMP1x]
         INEGs TEMP1x, TEMP1x
         LOADgis TEMP1z, [TEMP1y]
         UADDs TEMP1x, TEMP1x, TEMP1z
         STOREgis [TEMP1y], TEMP1x
         RET
         ENDSUB

Working tgsi for this would look like this:

COMP
DCL SV[0], THREAD_ID[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 0, 0, 0, 0 }
IMM UINT32 { 4, 0, 0, 0 }
IMM UINT32 { 128, 0, 0, 0 }

BGNSUB
        LOAD TEMP[0].xy, RINPUT, IMM[0]
        UMUL TEMP[1].x, SV[0], IMM[1]
        UADD TEMP[0].x, TEMP[0], TEMP[1]
        UMUL TEMP[1].x, SV[0], IMM[2]
        UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
        LOAD TEMP[1].x, RGLOBAL, TEMP[0]
        LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
        UADD TEMP[1].x, TEMP[0], -TEMP[1]
        STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
        RET
ENDSUB;

So my questions (I'm still quite green when it comes to llvm):

1) As you can see a proper tgsi program needs a header
to declare which registers (etc) it is using, in which
class-method should I implement this ?

2) Immediates need to be declared with a specific
value and then addressed as IMM[x], how would I go about
this ?

3) The get_global_id call needs to be translated into
simply using the SV[0] "register", how would I go about
this ?

4) The global and input load / stores are not handled
correctly, I see that the LOAD instructions get postfixed
with a i reps. g for input / global how would I go about
modifying the code emitter (AsmPrinter?) to change "LOADi"
into "LOAD <dest> RINPUT <offset>"?

5) Talking about the lowecase suffixes to the instructions,
these should not be part of the output, how do I filter these?

6) And finally, the current llvm-tgsi output uses e.g.
TEMP1y where as for the destination it should use TEMP[1].y
and for the sources it should use TEMP[1].xxxx (so include
proper swizzling info).

Lots of questions, sorry about that. Feel free to point me
to some relvant parts of the docs, I've tried to find answers
myself but I've gotten a bit lost in the docs.

Regards,

Hans
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some llvm questions (for tgsi backend)
       [not found] ` <56938CE2.1010705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-01-11 17:04   ` Ilia Mirkin
       [not found]     ` <CAKb7UvhYznsKLeVwe7jUpeQmSVTBsvFk4A5nExWV9P=OgD-nWw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-01-11 17:10   ` Tom Stellard
  1 sibling, 1 reply; 6+ messages in thread
From: Ilia Mirkin @ 2016-01-11 17:04 UTC (permalink / raw)
  To: Hans de Goede; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Tom Stellard

On Mon, Jan 11, 2016 at 6:07 AM, Hans de Goede <hdegoede@redhat.com> wrote:
> Hi,
>
> After a few distractions I'm back to work on the llvm tgsi backend. I've
> added clang integration and I can now compile a simple opencl program
> to something which sort of looks like tgsi.
>
> You can find my latest work on this here:
> http://cgit.freedesktop.org/~jwrdegoede/llvm
> http://cgit.freedesktop.org/~jwrdegoede/clang
> (the latter may still need to sync)
>
> I've a little test program of which I have 3 versions now,
> 1 raw gallium calls + a tgsi kernel
> 2 opencl calls to clover + a tgsi kernel
> 3 opencl calls to clover + an opencl kernel
>
> 1 and 2 have been tested on a kepler card, 3 has been
> tested with pocl. My goal for this week is to get
> the tgsi backend to produce code which I can copy
> and paste into 2 and then have it working on a kepler card.
>
> The test program looks like this:
>
> __kernel void test_kern(__global uint *vals, __global uint *buf)
> {
>   uint id = get_global_id(0);
>
>   buf[32 * id] -= vals[id];
> }
>
> The llvm ir looks like this:
>
> bin/clang -x cl -c -emit-llvm -target tgsi-- -include
> /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
>
> ; ModuleID = '/home/hans/foo.cl'
> target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
> target triple = "tgsi--"
>
> ; Function Attrs: nounwind
> define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32
> addrspace(1)* nocapture %buf) #0 {
> entry:
>   %call = tail call i32 @_Z13get_global_idj(i32 0) #2
>   %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
>   %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
>   %mul = shl i32 %call, 5
>   %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
>   %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>   %sub = sub i32 %1, %0
>   store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>   ret void
> }
>
> declare i32 @_Z13get_global_idj(i32) #1
>
> attributes #0 = { nounwind "disable-tail-calls"="false"
> "less-precise-fpmad"="false" "no-frame-pointer-elim"="true"
> "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false"
> "no-nans-fp-math"="false" "stack-protector-buffer-size"="8"
> "unsafe-fp-math"="false" "use-soft-float"="false" }
> attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false"
> "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
> "no-infs-fp-math"="false" "no-nans-fp-math"="false"
> "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
> "use-soft-float"="false" }
> attributes #2 = { nounwind }
>
> !opencl.kernels = !{!0}
> !llvm.ident = !{!6}
>
> !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3,
> !4, !5}
> !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
> !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
> !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
> !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
> !5 = !{!"kernel_arg_type_qual", !"", !""}
> !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git
> 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git
> 7a311143550c6fc01aa5000049825ecc09787440)"}
> !7 = !{!8, !8, i64 0}
> !8 = !{!"int", !9, i64 0}
> !9 = !{!"omnipotent char", !10, i64 0}
> !10 = !{!"Simple C/C++ TBAA"}
>
> And the "tgsi" looks like this:
>
>         .text
>         .file   "/home/hans/foo.cl"
>         .globl  test_kern
> test_kern:
>         BGNSUB
>         MOVis TEMP1x, 0
>         CAL _Z13get_global_idj
>         SHLs TEMP1y, TEMP1x, 7
>         LOADiis TEMP1z, [4]
>         UADDs TEMP1y, TEMP1z, TEMP1y
>         SHLs TEMP1x, TEMP1x, 2
>         LOADiis TEMP1z, [0]
>         UADDs TEMP1x, TEMP1z, TEMP1x
>         LOADgis TEMP1x, [TEMP1x]
>         INEGs TEMP1x, TEMP1x
>         LOADgis TEMP1z, [TEMP1y]
>         UADDs TEMP1x, TEMP1x, TEMP1z
>         STOREgis [TEMP1y], TEMP1x
>         RET
>         ENDSUB
>
> Working tgsi for this would look like this:
>
> COMP
> DCL SV[0], THREAD_ID[0]
> DCL TEMP[0], LOCAL
> DCL TEMP[1], LOCAL
> IMM UINT32 { 0, 0, 0, 0 }
> IMM UINT32 { 4, 0, 0, 0 }
> IMM UINT32 { 128, 0, 0, 0 }
>
> BGNSUB
>        LOAD TEMP[0].xy, RINPUT, IMM[0]
>        UMUL TEMP[1].x, SV[0], IMM[1]
>        UADD TEMP[0].x, TEMP[0], TEMP[1]
>        UMUL TEMP[1].x, SV[0], IMM[2]
>        UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
>        LOAD TEMP[1].x, RGLOBAL, TEMP[0]
>        LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
>        UADD TEMP[1].x, TEMP[0], -TEMP[1]
>        STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
>        RET
> ENDSUB;
>
> So my questions (I'm still quite green when it comes to llvm):
>
> 1) As you can see a proper tgsi program needs a header
> to declare which registers (etc) it is using, in which
> class-method should I implement this ?
>
> 2) Immediates need to be declared with a specific
> value and then addressed as IMM[x], how would I go about
> this ?
>
> 3) The get_global_id call needs to be translated into
> simply using the SV[0] "register", how would I go about
> this ?
>
> 4) The global and input load / stores are not handled
> correctly, I see that the LOAD instructions get postfixed
> with a i reps. g for input / global how would I go about
> modifying the code emitter (AsmPrinter?) to change "LOADi"
> into "LOAD <dest> RINPUT <offset>"?
>
> 5) Talking about the lowecase suffixes to the instructions,
> these should not be part of the output, how do I filter these?
>
> 6) And finally, the current llvm-tgsi output uses e.g.
> TEMP1y where as for the destination it should use TEMP[1].y
> and for the sources it should use TEMP[1].xxxx (so include
> proper swizzling info).
>
> Lots of questions, sorry about that. Feel free to point me
> to some relvant parts of the docs, I've tried to find answers
> myself but I've gotten a bit lost in the docs.

You may consider emitting binary TGSI. It's a semi-fluid format, but
doesn't change too often (and usually does so in backwards-compatible
ways).

BTW, note that I recently got rid of TGSI_FILE_RESOURCE in favor of
BUFFER and IMAGE register files (which in turn correlate to
->set_shader_buffers and ->set_shader_images). We need this for the
various GL extensions (ssbo, atomic, images). Not sure how that
integrates with what OpenCL needs.

  -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some llvm questions (for tgsi backend)
       [not found] ` <56938CE2.1010705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2016-01-11 17:04   ` Ilia Mirkin
@ 2016-01-11 17:10   ` Tom Stellard
       [not found]     ` <20160111171008.GA27306-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Tom Stellard @ 2016-01-11 17:10 UTC (permalink / raw)
  To: Hans de Goede; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote:
> Hi,
> 
> After a few distractions I'm back to work on the llvm tgsi backend. I've
> added clang integration and I can now compile a simple opencl program
> to something which sort of looks like tgsi.
> 
> You can find my latest work on this here:
> http://cgit.freedesktop.org/~jwrdegoede/llvm
> http://cgit.freedesktop.org/~jwrdegoede/clang
> (the latter may still need to sync)
> 
> I've a little test program of which I have 3 versions now,
> 1 raw gallium calls + a tgsi kernel
> 2 opencl calls to clover + a tgsi kernel
> 3 opencl calls to clover + an opencl kernel
> 
> 1 and 2 have been tested on a kepler card, 3 has been
> tested with pocl. My goal for this week is to get
> the tgsi backend to produce code which I can copy
> and paste into 2 and then have it working on a kepler card.
> 
> The test program looks like this:
> 
> __kernel void test_kern(__global uint *vals, __global uint *buf)
> {
>   uint id = get_global_id(0);
> 
>   buf[32 * id] -= vals[id];
> }
> 
> The llvm ir looks like this:
> 
> bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
> 
> ; ModuleID = '/home/hans/foo.cl'
> target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
> target triple = "tgsi--"
> 
> ; Function Attrs: nounwind
> define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 {
> entry:
>   %call = tail call i32 @_Z13get_global_idj(i32 0) #2
>   %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
>   %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
>   %mul = shl i32 %call, 5
>   %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
>   %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>   %sub = sub i32 %1, %0
>   store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>   ret void
> }
> 
> declare i32 @_Z13get_global_idj(i32) #1
> 
> attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
> attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
> attributes #2 = { nounwind }
> 
> !opencl.kernels = !{!0}
> !llvm.ident = !{!6}
> 
> !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5}
> !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
> !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
> !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
> !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
> !5 = !{!"kernel_arg_type_qual", !"", !""}
> !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"}
> !7 = !{!8, !8, i64 0}
> !8 = !{!"int", !9, i64 0}
> !9 = !{!"omnipotent char", !10, i64 0}
> !10 = !{!"Simple C/C++ TBAA"}
> 
> And the "tgsi" looks like this:
> 
>         .text
>         .file   "/home/hans/foo.cl"
>         .globl  test_kern
> test_kern:
>         BGNSUB
>         MOVis TEMP1x, 0
>         CAL _Z13get_global_idj
>         SHLs TEMP1y, TEMP1x, 7
>         LOADiis TEMP1z, [4]
>         UADDs TEMP1y, TEMP1z, TEMP1y
>         SHLs TEMP1x, TEMP1x, 2
>         LOADiis TEMP1z, [0]
>         UADDs TEMP1x, TEMP1z, TEMP1x
>         LOADgis TEMP1x, [TEMP1x]
>         INEGs TEMP1x, TEMP1x
>         LOADgis TEMP1z, [TEMP1y]
>         UADDs TEMP1x, TEMP1x, TEMP1z
>         STOREgis [TEMP1y], TEMP1x
>         RET
>         ENDSUB
> 
> Working tgsi for this would look like this:
> 
> COMP
> DCL SV[0], THREAD_ID[0]
> DCL TEMP[0], LOCAL
> DCL TEMP[1], LOCAL
> IMM UINT32 { 0, 0, 0, 0 }
> IMM UINT32 { 4, 0, 0, 0 }
> IMM UINT32 { 128, 0, 0, 0 }
> 
> BGNSUB
>        LOAD TEMP[0].xy, RINPUT, IMM[0]
>        UMUL TEMP[1].x, SV[0], IMM[1]
>        UADD TEMP[0].x, TEMP[0], TEMP[1]
>        UMUL TEMP[1].x, SV[0], IMM[2]
>        UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
>        LOAD TEMP[1].x, RGLOBAL, TEMP[0]
>        LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
>        UADD TEMP[1].x, TEMP[0], -TEMP[1]
>        STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
>        RET
> ENDSUB;
> 
> So my questions (I'm still quite green when it comes to llvm):
> 
> 1) As you can see a proper tgsi program needs a header
> to declare which registers (etc) it is using, in which
> class-method should I implement this ?
> 

These kinds of things should be emitted by the TGSI implementation of the
AsmPrinter class: http://llvm.org/docs/doxygen/html/classllvm_1_1AsmPrinter.html
You probably want to use EmitStartOfAsmFile() for the headers.

> 2) Immediates need to be declared with a specific
> value and then addressed as IMM[x], how would I go about
> this ?
> 

I would recommend adding a pass that replaced immediates in the code with
IMM file regitser accesses.

> 3) The get_global_id call needs to be translated into
> simply using the SV[0] "register", how would I go about
> this ?
> 

get_global_id should be implemented in libclc with tgsi specific intrinsics
that read from the system value registers.

> 4) The global and input load / stores are not handled
> correctly, I see that the LOAD instructions get postfixed
> with a i reps. g for input / global how would I go about
> modifying the code emitter (AsmPrinter?) to change "LOADi"
> into "LOAD <dest> RINPUT <offset>"?
> 

I don't understand exactly why you want to change this.  Are you trying to
make the assembly string for input loads and global loads look the same?

> 5) Talking about the lowecase suffixes to the instructions,
> these should not be part of the output, how do I filter these?
> 
> 6) And finally, the current llvm-tgsi output uses e.g.
> TEMP1y where as for the destination it should use TEMP[1].y
> and for the sources it should use TEMP[1].xxxx (so include
> proper swizzling info).
> 

You just need to change how the registers are printed in the
TGSIRegisterInfo.td file which should fix this.

> Lots of questions, sorry about that. Feel free to point me
> to some relvant parts of the docs, I've tried to find answers
> myself but I've gotten a bit lost in the docs.
> 

Ask more if you have them.

-Tom

> Regards,
> 
> Hans
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some llvm questions (for tgsi backend)
       [not found]     ` <CAKb7UvhYznsKLeVwe7jUpeQmSVTBsvFk4A5nExWV9P=OgD-nWw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-01-11 18:17       ` Tom Stellard
  0 siblings, 0 replies; 6+ messages in thread
From: Tom Stellard @ 2016-01-11 18:17 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Mon, Jan 11, 2016 at 12:04:40PM -0500, Ilia Mirkin wrote:
> On Mon, Jan 11, 2016 at 6:07 AM, Hans de Goede <hdegoede@redhat.com> wrote:
> > Hi,
> >
> > After a few distractions I'm back to work on the llvm tgsi backend. I've
> > added clang integration and I can now compile a simple opencl program
> > to something which sort of looks like tgsi.
> >
> > You can find my latest work on this here:
> > http://cgit.freedesktop.org/~jwrdegoede/llvm
> > http://cgit.freedesktop.org/~jwrdegoede/clang
> > (the latter may still need to sync)
> >
> > I've a little test program of which I have 3 versions now,
> > 1 raw gallium calls + a tgsi kernel
> > 2 opencl calls to clover + a tgsi kernel
> > 3 opencl calls to clover + an opencl kernel
> >
> > 1 and 2 have been tested on a kepler card, 3 has been
> > tested with pocl. My goal for this week is to get
> > the tgsi backend to produce code which I can copy
> > and paste into 2 and then have it working on a kepler card.
> >
> > The test program looks like this:
> >
> > __kernel void test_kern(__global uint *vals, __global uint *buf)
> > {
> >   uint id = get_global_id(0);
> >
> >   buf[32 * id] -= vals[id];
> > }
> >
> > The llvm ir looks like this:
> >
> > bin/clang -x cl -c -emit-llvm -target tgsi-- -include
> > /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
> >
> > ; ModuleID = '/home/hans/foo.cl'
> > target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
> > target triple = "tgsi--"
> >
> > ; Function Attrs: nounwind
> > define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32
> > addrspace(1)* nocapture %buf) #0 {
> > entry:
> >   %call = tail call i32 @_Z13get_global_idj(i32 0) #2
> >   %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
> >   %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
> >   %mul = shl i32 %call, 5
> >   %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
> >   %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
> >   %sub = sub i32 %1, %0
> >   store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
> >   ret void
> > }
> >
> > declare i32 @_Z13get_global_idj(i32) #1
> >
> > attributes #0 = { nounwind "disable-tail-calls"="false"
> > "less-precise-fpmad"="false" "no-frame-pointer-elim"="true"
> > "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false"
> > "no-nans-fp-math"="false" "stack-protector-buffer-size"="8"
> > "unsafe-fp-math"="false" "use-soft-float"="false" }
> > attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false"
> > "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
> > "no-infs-fp-math"="false" "no-nans-fp-math"="false"
> > "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
> > "use-soft-float"="false" }
> > attributes #2 = { nounwind }
> >
> > !opencl.kernels = !{!0}
> > !llvm.ident = !{!6}
> >
> > !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3,
> > !4, !5}
> > !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
> > !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
> > !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
> > !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
> > !5 = !{!"kernel_arg_type_qual", !"", !""}
> > !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git
> > 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git
> > 7a311143550c6fc01aa5000049825ecc09787440)"}
> > !7 = !{!8, !8, i64 0}
> > !8 = !{!"int", !9, i64 0}
> > !9 = !{!"omnipotent char", !10, i64 0}
> > !10 = !{!"Simple C/C++ TBAA"}
> >
> > And the "tgsi" looks like this:
> >
> >         .text
> >         .file   "/home/hans/foo.cl"
> >         .globl  test_kern
> > test_kern:
> >         BGNSUB
> >         MOVis TEMP1x, 0
> >         CAL _Z13get_global_idj
> >         SHLs TEMP1y, TEMP1x, 7
> >         LOADiis TEMP1z, [4]
> >         UADDs TEMP1y, TEMP1z, TEMP1y
> >         SHLs TEMP1x, TEMP1x, 2
> >         LOADiis TEMP1z, [0]
> >         UADDs TEMP1x, TEMP1z, TEMP1x
> >         LOADgis TEMP1x, [TEMP1x]
> >         INEGs TEMP1x, TEMP1x
> >         LOADgis TEMP1z, [TEMP1y]
> >         UADDs TEMP1x, TEMP1x, TEMP1z
> >         STOREgis [TEMP1y], TEMP1x
> >         RET
> >         ENDSUB
> >
> > Working tgsi for this would look like this:
> >
> > COMP
> > DCL SV[0], THREAD_ID[0]
> > DCL TEMP[0], LOCAL
> > DCL TEMP[1], LOCAL
> > IMM UINT32 { 0, 0, 0, 0 }
> > IMM UINT32 { 4, 0, 0, 0 }
> > IMM UINT32 { 128, 0, 0, 0 }
> >
> > BGNSUB
> >        LOAD TEMP[0].xy, RINPUT, IMM[0]
> >        UMUL TEMP[1].x, SV[0], IMM[1]
> >        UADD TEMP[0].x, TEMP[0], TEMP[1]
> >        UMUL TEMP[1].x, SV[0], IMM[2]
> >        UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
> >        LOAD TEMP[1].x, RGLOBAL, TEMP[0]
> >        LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
> >        UADD TEMP[1].x, TEMP[0], -TEMP[1]
> >        STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
> >        RET
> > ENDSUB;
> >
> > So my questions (I'm still quite green when it comes to llvm):
> >
> > 1) As you can see a proper tgsi program needs a header
> > to declare which registers (etc) it is using, in which
> > class-method should I implement this ?
> >
> > 2) Immediates need to be declared with a specific
> > value and then addressed as IMM[x], how would I go about
> > this ?
> >
> > 3) The get_global_id call needs to be translated into
> > simply using the SV[0] "register", how would I go about
> > this ?
> >
> > 4) The global and input load / stores are not handled
> > correctly, I see that the LOAD instructions get postfixed
> > with a i reps. g for input / global how would I go about
> > modifying the code emitter (AsmPrinter?) to change "LOADi"
> > into "LOAD <dest> RINPUT <offset>"?
> >
> > 5) Talking about the lowecase suffixes to the instructions,
> > these should not be part of the output, how do I filter these?
> >
> > 6) And finally, the current llvm-tgsi output uses e.g.
> > TEMP1y where as for the destination it should use TEMP[1].y
> > and for the sources it should use TEMP[1].xxxx (so include
> > proper swizzling info).
> >
> > Lots of questions, sorry about that. Feel free to point me
> > to some relvant parts of the docs, I've tried to find answers
> > myself but I've gotten a bit lost in the docs.
> 
> You may consider emitting binary TGSI. It's a semi-fluid format, but
> doesn't change too often (and usually does so in backwards-compatible
> ways).
> 

Yes, if you aren't already emitting binary, you should start doing this.  It will
make things much easier.

-Tom

> BTW, note that I recently got rid of TGSI_FILE_RESOURCE in favor of
> BUFFER and IMAGE register files (which in turn correlate to
> ->set_shader_buffers and ->set_shader_images). We need this for the
> various GL extensions (ssbo, atomic, images). Not sure how that
> integrates with what OpenCL needs.
> 
>   -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some llvm questions (for tgsi backend)
       [not found]     ` <20160111171008.GA27306-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org>
@ 2016-01-12 14:53       ` Hans de Goede
       [not found]         ` <56951356.2040005-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Hans de Goede @ 2016-01-12 14:53 UTC (permalink / raw)
  To: Tom Stellard; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Tom,

Thanks for taking the time to answer this.

On 11-01-16 18:10, Tom Stellard wrote:
> On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> After a few distractions I'm back to work on the llvm tgsi backend. I've
>> added clang integration and I can now compile a simple opencl program
>> to something which sort of looks like tgsi.
>>
>> You can find my latest work on this here:
>> http://cgit.freedesktop.org/~jwrdegoede/llvm
>> http://cgit.freedesktop.org/~jwrdegoede/clang
>> (the latter may still need to sync)
>>
>> I've a little test program of which I have 3 versions now,
>> 1 raw gallium calls + a tgsi kernel
>> 2 opencl calls to clover + a tgsi kernel
>> 3 opencl calls to clover + an opencl kernel
>>
>> 1 and 2 have been tested on a kepler card, 3 has been
>> tested with pocl. My goal for this week is to get
>> the tgsi backend to produce code which I can copy
>> and paste into 2 and then have it working on a kepler card.
>>
>> The test program looks like this:
>>
>> __kernel void test_kern(__global uint *vals, __global uint *buf)
>> {
>>    uint id = get_global_id(0);
>>
>>    buf[32 * id] -= vals[id];
>> }
>>
>> The llvm ir looks like this:
>>
>> bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
>>
>> ; ModuleID = '/home/hans/foo.cl'
>> target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
>> target triple = "tgsi--"
>>
>> ; Function Attrs: nounwind
>> define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 {
>> entry:
>>    %call = tail call i32 @_Z13get_global_idj(i32 0) #2
>>    %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
>>    %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
>>    %mul = shl i32 %call, 5
>>    %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
>>    %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>>    %sub = sub i32 %1, %0
>>    store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
>>    ret void
>> }
>>
>> declare i32 @_Z13get_global_idj(i32) #1
>>
>> attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
>> attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
>> attributes #2 = { nounwind }
>>
>> !opencl.kernels = !{!0}
>> !llvm.ident = !{!6}
>>
>> !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5}
>> !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
>> !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
>> !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
>> !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
>> !5 = !{!"kernel_arg_type_qual", !"", !""}
>> !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"}
>> !7 = !{!8, !8, i64 0}
>> !8 = !{!"int", !9, i64 0}
>> !9 = !{!"omnipotent char", !10, i64 0}
>> !10 = !{!"Simple C/C++ TBAA"}
>>
>> And the "tgsi" looks like this:
>>
>>          .text
>>          .file   "/home/hans/foo.cl"
>>          .globl  test_kern
>> test_kern:
>>          BGNSUB
>>          MOVis TEMP1x, 0
>>          CAL _Z13get_global_idj
>>          SHLs TEMP1y, TEMP1x, 7
>>          LOADiis TEMP1z, [4]
>>          UADDs TEMP1y, TEMP1z, TEMP1y
>>          SHLs TEMP1x, TEMP1x, 2
>>          LOADiis TEMP1z, [0]
>>          UADDs TEMP1x, TEMP1z, TEMP1x
>>          LOADgis TEMP1x, [TEMP1x]
>>          INEGs TEMP1x, TEMP1x
>>          LOADgis TEMP1z, [TEMP1y]
>>          UADDs TEMP1x, TEMP1x, TEMP1z
>>          STOREgis [TEMP1y], TEMP1x
>>          RET
>>          ENDSUB
>>
>> Working tgsi for this would look like this:
>>
>> COMP
>> DCL SV[0], THREAD_ID[0]
>> DCL TEMP[0], LOCAL
>> DCL TEMP[1], LOCAL
>> IMM UINT32 { 0, 0, 0, 0 }
>> IMM UINT32 { 4, 0, 0, 0 }
>> IMM UINT32 { 128, 0, 0, 0 }
>>
>> BGNSUB
>>         LOAD TEMP[0].xy, RINPUT, IMM[0]
>>         UMUL TEMP[1].x, SV[0], IMM[1]
>>         UADD TEMP[0].x, TEMP[0], TEMP[1]
>>         UMUL TEMP[1].x, SV[0], IMM[2]
>>         UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
>>         LOAD TEMP[1].x, RGLOBAL, TEMP[0]
>>         LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
>>         UADD TEMP[1].x, TEMP[0], -TEMP[1]
>>         STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
>>         RET
>> ENDSUB;
>>
>> So my questions (I'm still quite green when it comes to llvm):
>>
>> 1) As you can see a proper tgsi program needs a header
>> to declare which registers (etc) it is using, in which
>> class-method should I implement this ?
>>
>
> These kinds of things should be emitted by the TGSI implementation of the
> AsmPrinter class: http://llvm.org/docs/doxygen/html/classllvm_1_1AsmPrinter.html
> You probably want to use EmitStartOfAsmFile() for the headers.

Ok, is there an easy way to suppress the generation of:

         .text
         .file   "/home/hans/foo.cl"
         .globl  test_kern

?

It seems I need to tackle all 3 of those separately ?

>> 2) Immediates need to be declared with a specific
>> value and then addressed as IMM[x], how would I go about
>> this ?
>>
>
> I would recommend adding a pass that replaced immediates in the code with
> IMM file regitser accesses.

Thinking more about this, I believe I can use the existing const
mechanism for this (and then add a const generation pass to AsmPrinter).

How do I tell llvm to never generate instructions using immediates
and to instead always use the const address space for this ?

>> 3) The get_global_id call needs to be translated into
>> simply using the SV[0] "register", how would I go about
>> this ?
>>
>
> get_global_id should be implemented in libclc with tgsi specific intrinsics
> that read from the system value registers.

I see I will investigate this further.

>> 4) The global and input load / stores are not handled
>> correctly, I see that the LOAD instructions get postfixed
>> with a i reps. g for input / global how would I go about
>> modifying the code emitter (AsmPrinter?) to change "LOADi"
>> into "LOAD <dest> RINPUT <offset>"?
>>
>
> I don't understand exactly why you want to change this.  Are you trying to
> make the assembly string for input loads and global loads look the same?

Currently the following code is generated:

LOADiis TEMP[1].z, [0]

This should be:

LOAD TEMP[1].z, RINPUT, 0

So I need to add handling to generate the RINPUT here. I'm wondering
what the best place is for this. Do I simply intercept the LOAD in
LowerMachineInstrToMCInst and add an extra operand there ?


>> 5) Talking about the lowecase suffixes to the instructions,
>> these should not be part of the output, how do I filter these?
>>
>> 6) And finally, the current llvm-tgsi output uses e.g.
>> TEMP1y where as for the destination it should use TEMP[1].y
>> and for the sources it should use TEMP[1].xxxx (so include
>> proper swizzling info).
>>
>
> You just need to change how the registers are printed in the
> TGSIRegisterInfo.td file which should fix this.

OK, so I've managed (easy) to change things so that now I get
(i.e.) :

UADDs TEMP[1].x, TEMP[1].x, TEMP[1].z

This is however not correct TGSI, correct would be:

UADD TEMP[1].x, TEMP[1].xxxx, TEMP[1].zzzz

Which may be shortend to:

UADD TEMP[1].x, TEMP[1], TEMP[1].zzzz

TGSI will automatically use the dest vector component from
the sources, and if you want to use another component you
need to specify a swizzle order, which is the 4 components
in an alternate order.

I'm not quite sure yet how to deal with this.

Regards,

Hans
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some llvm questions (for tgsi backend)
       [not found]         ` <56951356.2040005-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-01-12 15:08           ` Tom Stellard
  0 siblings, 0 replies; 6+ messages in thread
From: Tom Stellard @ 2016-01-12 15:08 UTC (permalink / raw)
  To: Hans de Goede; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Tue, Jan 12, 2016 at 03:53:10PM +0100, Hans de Goede wrote:
> Hi Tom,
> 
> Thanks for taking the time to answer this.
> 
> On 11-01-16 18:10, Tom Stellard wrote:
> > On Mon, Jan 11, 2016 at 12:07:14PM +0100, Hans de Goede wrote:
> >> Hi,
> >>
> >> After a few distractions I'm back to work on the llvm tgsi backend. I've
> >> added clang integration and I can now compile a simple opencl program
> >> to something which sort of looks like tgsi.
> >>
> >> You can find my latest work on this here:
> >> http://cgit.freedesktop.org/~jwrdegoede/llvm
> >> http://cgit.freedesktop.org/~jwrdegoede/clang
> >> (the latter may still need to sync)
> >>
> >> I've a little test program of which I have 3 versions now,
> >> 1 raw gallium calls + a tgsi kernel
> >> 2 opencl calls to clover + a tgsi kernel
> >> 3 opencl calls to clover + an opencl kernel
> >>
> >> 1 and 2 have been tested on a kepler card, 3 has been
> >> tested with pocl. My goal for this week is to get
> >> the tgsi backend to produce code which I can copy
> >> and paste into 2 and then have it working on a kepler card.
> >>
> >> The test program looks like this:
> >>
> >> __kernel void test_kern(__global uint *vals, __global uint *buf)
> >> {
> >>    uint id = get_global_id(0);
> >>
> >>    buf[32 * id] -= vals[id];
> >> }
> >>
> >> The llvm ir looks like this:
> >>
> >> bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl
> >>
> >> ; ModuleID = '/home/hans/foo.cl'
> >> target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
> >> target triple = "tgsi--"
> >>
> >> ; Function Attrs: nounwind
> >> define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 {
> >> entry:
> >>    %call = tail call i32 @_Z13get_global_idj(i32 0) #2
> >>    %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
> >>    %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
> >>    %mul = shl i32 %call, 5
> >>    %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
> >>    %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
> >>    %sub = sub i32 %1, %0
> >>    store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
> >>    ret void
> >> }
> >>
> >> declare i32 @_Z13get_global_idj(i32) #1
> >>
> >> attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
> >> attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
> >> attributes #2 = { nounwind }
> >>
> >> !opencl.kernels = !{!0}
> >> !llvm.ident = !{!6}
> >>
> >> !0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5}
> >> !1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
> >> !2 = !{!"kernel_arg_access_qual", !"none", !"none"}
> >> !3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
> >> !4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
> >> !5 = !{!"kernel_arg_type_qual", !"", !""}
> >> !6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"}
> >> !7 = !{!8, !8, i64 0}
> >> !8 = !{!"int", !9, i64 0}
> >> !9 = !{!"omnipotent char", !10, i64 0}
> >> !10 = !{!"Simple C/C++ TBAA"}
> >>
> >> And the "tgsi" looks like this:
> >>
> >>          .text
> >>          .file   "/home/hans/foo.cl"
> >>          .globl  test_kern
> >> test_kern:
> >>          BGNSUB
> >>          MOVis TEMP1x, 0
> >>          CAL _Z13get_global_idj
> >>          SHLs TEMP1y, TEMP1x, 7
> >>          LOADiis TEMP1z, [4]
> >>          UADDs TEMP1y, TEMP1z, TEMP1y
> >>          SHLs TEMP1x, TEMP1x, 2
> >>          LOADiis TEMP1z, [0]
> >>          UADDs TEMP1x, TEMP1z, TEMP1x
> >>          LOADgis TEMP1x, [TEMP1x]
> >>          INEGs TEMP1x, TEMP1x
> >>          LOADgis TEMP1z, [TEMP1y]
> >>          UADDs TEMP1x, TEMP1x, TEMP1z
> >>          STOREgis [TEMP1y], TEMP1x
> >>          RET
> >>          ENDSUB
> >>
> >> Working tgsi for this would look like this:
> >>
> >> COMP
> >> DCL SV[0], THREAD_ID[0]
> >> DCL TEMP[0], LOCAL
> >> DCL TEMP[1], LOCAL
> >> IMM UINT32 { 0, 0, 0, 0 }
> >> IMM UINT32 { 4, 0, 0, 0 }
> >> IMM UINT32 { 128, 0, 0, 0 }
> >>
> >> BGNSUB
> >>         LOAD TEMP[0].xy, RINPUT, IMM[0]
> >>         UMUL TEMP[1].x, SV[0], IMM[1]
> >>         UADD TEMP[0].x, TEMP[0], TEMP[1]
> >>         UMUL TEMP[1].x, SV[0], IMM[2]
> >>         UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
> >>         LOAD TEMP[1].x, RGLOBAL, TEMP[0]
> >>         LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
> >>         UADD TEMP[1].x, TEMP[0], -TEMP[1]
> >>         STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]
> >>         RET
> >> ENDSUB;
> >>
> >> So my questions (I'm still quite green when it comes to llvm):
> >>
> >> 1) As you can see a proper tgsi program needs a header
> >> to declare which registers (etc) it is using, in which
> >> class-method should I implement this ?
> >>
> >
> > These kinds of things should be emitted by the TGSI implementation of the
> > AsmPrinter class: http://llvm.org/docs/doxygen/html/classllvm_1_1AsmPrinter.html
> > You probably want to use EmitStartOfAsmFile() for the headers.
> 
> Ok, is there an easy way to suppress the generation of:
> 
>          .text
>          .file   "/home/hans/foo.cl"
>          .globl  test_kern
> 
> ?
> 

.file can be avoided pretty easily.  You'll have to look through the MC
code to see what bit you need to set to disable it.  However, why
do you not want to emit these?  I think the easiest think to do is
keep the ELF format and just have clover extract the binary code
from the .text section.

See gallium/drivers/radeon/radeon_elf_util.c

> It seems I need to tackle all 3 of those separately ?
> 
> >> 2) Immediates need to be declared with a specific
> >> value and then addressed as IMM[x], how would I go about
> >> this ?
> >>
> >
> > I would recommend adding a pass that replaced immediates in the code with
> > IMM file regitser accesses.
> 
> Thinking more about this, I believe I can use the existing const
> mechanism for this (and then add a const generation pass to AsmPrinter).
> 

I would recommend not doing this kind of thing in the AsmPrinter if you
can avoid it.  I really think you should be able to do this in a
MachineInstr pass.

> How do I tell llvm to never generate instructions using immediates
> and to instead always use the const address space for this ?
> 

LLVM usually won't put immediates into instructions unless you have a
TableGen pattern that tells it to do this.  I suspect you probably have
some patterns like this.

> >> 3) The get_global_id call needs to be translated into
> >> simply using the SV[0] "register", how would I go about
> >> this ?
> >>
> >
> > get_global_id should be implemented in libclc with tgsi specific intrinsics
> > that read from the system value registers.
> 
> I see I will investigate this further.
> 
> >> 4) The global and input load / stores are not handled
> >> correctly, I see that the LOAD instructions get postfixed
> >> with a i reps. g for input / global how would I go about
> >> modifying the code emitter (AsmPrinter?) to change "LOADi"
> >> into "LOAD <dest> RINPUT <offset>"?
> >>
> >
> > I don't understand exactly why you want to change this.  Are you trying to
> > make the assembly string for input loads and global loads look the same?
> 
> Currently the following code is generated:
> 
> LOADiis TEMP[1].z, [0]
> 
> This should be:
> 
> LOAD TEMP[1].z, RINPUT, 0
> 
> So I need to add handling to generate the RINPUT here. I'm wondering
> what the best place is for this. Do I simply intercept the LOAD in
> LowerMachineInstrToMCInst and add an extra operand there ?
> 

It seems like you just need to change your instruction definitions an/or
assembly printing here.

> 
> >> 5) Talking about the lowecase suffixes to the instructions,
> >> these should not be part of the output, how do I filter these?
> >>
> >> 6) And finally, the current llvm-tgsi output uses e.g.
> >> TEMP1y where as for the destination it should use TEMP[1].y
> >> and for the sources it should use TEMP[1].xxxx (so include
> >> proper swizzling info).
> >>
> >
> > You just need to change how the registers are printed in the
> > TGSIRegisterInfo.td file which should fix this.
> 
> OK, so I've managed (easy) to change things so that now I get
> (i.e.) :
> 
> UADDs TEMP[1].x, TEMP[1].x, TEMP[1].z
> 
> This is however not correct TGSI, correct would be:
> 

Are you sure the above isn't correct?

> UADD TEMP[1].x, TEMP[1].xxxx, TEMP[1].zzzz
> 
> Which may be shortend to:
> 
> UADD TEMP[1].x, TEMP[1], TEMP[1].zzzz
> 
> TGSI will automatically use the dest vector component from
> the sources, and if you want to use another component you
> need to specify a swizzle order, which is the 4 components
> in an alternate order.
> 
> I'm not quite sure yet how to deal with this.
> 

These kinds of things can be complicated, I think this is a
good reason to start emitting binary code from LLVM rather than
text.  You can always clean up the assembly format later.

-Tom

> Regards,
> 
> Hans
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-12 15:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-11 11:07 Some llvm questions (for tgsi backend) Hans de Goede
     [not found] ` <56938CE2.1010705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-01-11 17:04   ` Ilia Mirkin
     [not found]     ` <CAKb7UvhYznsKLeVwe7jUpeQmSVTBsvFk4A5nExWV9P=OgD-nWw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-01-11 18:17       ` Tom Stellard
2016-01-11 17:10   ` Tom Stellard
     [not found]     ` <20160111171008.GA27306-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org>
2016-01-12 14:53       ` Hans de Goede
     [not found]         ` <56951356.2040005-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-01-12 15:08           ` Tom Stellard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.