All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 96881] ViennaCL fails dense_blas-bench-opencl benchmark with doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
@ 2016-07-10 15:20 bugzilla-daemon
  2016-07-11  9:54 ` bugzilla-daemon
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bugzilla-daemon @ 2016-07-10 15:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4907 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96881

            Bug ID: 96881
           Summary: ViennaCL fails dense_blas-bench-opencl benchmark with
                    doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
           Product: Mesa
           Version: 11.2
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/Gallium/r600
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: ubizjak@gmail.com
        QA Contact: dri-devel@lists.freedesktop.org

The dense_blas-bench-opencl benchmark from ViennaCL suite fails with doubles on
AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0):

$ ./dense_blas-bench-opencl 

----------------------------------------------
               Device Info
----------------------------------------------

Name:                AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Vendor:              AMD
Type:                GPU 
Available:           1
Max Compute Units:   10
Max Work Group Size: 256
Global Mem Size:     1073741824
Local Mem Size:      32768
Local Mem Type:      1
Host Unified Memory: 1


Benchmark : BLAS
----------------
sCOPY : 64.3 GB/s
sAXPY : 95.4 GB/s
sDOT : 85.3 GB/s
sGEMV-N : 20.8 GB/s
sGEMV-T : 44.3 GB/s
sGEMM-NN : 126 GFLOPs/s
sGEMM-NT : 87.6 GFLOPs/s
sGEMM-TN : 90.5 GFLOPs/s
sGEMM-TT : 72.3 GFLOPs/s
----
Build Status = -2 ( Err = -11 )
Log: unsupported call to function __subdf3 in av_cpu
Sources: #pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void av_cpu( 
  __global double * vec1, 
  uint4 size1, 
...

It looks like DFmode (double) instructions are not enabled correctly in LLVM
for targets that report cl_khr_fp64 extension.

clinfo reports:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 11.2.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD CYPRESS (DRM 2.43.0, LLVM
3.8.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 MESA 11.2.2
  Driver Version                                  11.2.2
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               10
  Max clock frequency                             850MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2       
(cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    ...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 6312 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-18 19:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-10 15:20 [Bug 96881] ViennaCL fails dense_blas-bench-opencl benchmark with doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0) bugzilla-daemon
2016-07-11  9:54 ` bugzilla-daemon
2016-07-11 16:40 ` bugzilla-daemon
2017-01-27 19:07 ` bugzilla-daemon
2018-09-12 20:16 ` bugzilla-daemon
2019-09-18 19:21 ` bugzilla-daemon
2019-09-18 19:22 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.