All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
@ 2016-12-19  9:19 Tvrtko Ursulin
  2016-12-19  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
  2016-12-19  9:47 ` [PATCH] " Joonas Lahtinen
  0 siblings, 2 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2016-12-19  9:19 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

For some reason GCC 6.2.1 here unrolls the from and to stack memcpy
here in per-byte fashion and also by repeatedly loading offset
constants. It look horrible like this for example:

      ...
     fdc:       48 b8 41 00 00 00 00    movabs rax,0xffff880000000041
     fe3:       88 ff ff
     fe6:       44 88 74 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r14b
     feb:       48 b8 42 00 00 00 00    movabs rax,0xffff880000000042
     ff2:       88 ff ff
     ff5:       44 88 6c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r13b
     ffa:       48 b8 43 00 00 00 00    movabs rax,0xffff880000000043
    1001:       88 ff ff
    1004:       44 88 64 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r12b
    1009:       48 b8 44 00 00 00 00    movabs rax,0xffff880000000044
    1010:       88 ff ff
    1013:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1017:       48 b8 45 00 00 00 00    movabs rax,0xffff880000000045
    101e:       88 ff ff
    1021:       44 88 5c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r11b
    1026:       48 b8 46 00 00 00 00    movabs rax,0xffff880000000046
    102d:       88 ff ff
    1030:       44 88 54 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r10b
    1035:       48 b8 47 00 00 00 00    movabs rax,0xffff880000000047
    103c:       88 ff ff
    103f:       44 88 4c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],r9b
    1044:       0f b6 5d d0             movzx  ebx,BYTE PTR [rbp-0x30]
    1048:       48 b8 48 00 00 00 00    movabs rax,0xffff880000000048
    104f:       88 ff ff
    1052:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1056:       48 b8 49 00 00 00 00    movabs rax,0xffff880000000049
    105d:       88 ff ff
    1060:       40 88 7c 06 80          mov    BYTE PTR [rsi+rax*1-0x80],dil
    1065:       0f b6 5d cf             movzx  ebx,BYTE PTR [rbp-0x31]
    1069:       48 b8 4a 00 00 00 00    movabs rax,0xffff88000000004a
    1070:       88 ff ff
    1073:       88 5c 06 80             mov    BYTE PTR [rsi+rax*1-0x80],bl
    1077:       0f b6 7d ce             movzx  edi,BYTE PTR [rbp-0x32]
    107b:       48 b8 4b 00 00 00 00    movabs rax,0xffff88000000004b
      ...

So change the code a bit which makes it generate a more reasonable
code like:
  ...
 bf1:   48 89 78 b8             mov    QWORD PTR [rax-0x48],rdi
 bf5:   4c 89 60 c0             mov    QWORD PTR [rax-0x40],r12
 bf9:   48 89 58 c8             mov    QWORD PTR [rax-0x38],rbx
 bfd:   4c 89 58 d0             mov    QWORD PTR [rax-0x30],r11
 c01:   4c 89 50 d8             mov    QWORD PTR [rax-0x28],r10
  ...

Which saves 2087 bytes of code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_fence_reg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index e03983973252..d665d2e74641 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
 	vaddr = kmap(page);
 
 	for (i = 0; i < PAGE_SIZE; i += 128) {
-		memcpy(temp, &vaddr[i], 64);
+		memcpy(&temp[0], &vaddr[i], 64);
 		memcpy(&vaddr[i], &vaddr[i + 64], 64);
-		memcpy(&vaddr[i + 64], temp, 64);
+		memcpy(&vaddr[i + 64], &temp[0], 64);
 	}
 
 	kunmap(page);
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* ✗ Fi.CI.BAT: warning for drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
  2016-12-19  9:19 [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page Tvrtko Ursulin
@ 2016-12-19  9:46 ` Patchwork
  2016-12-19  9:47 ` [PATCH] " Joonas Lahtinen
  1 sibling, 0 replies; 5+ messages in thread
From: Patchwork @ 2016-12-19  9:46 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
URL   : https://patchwork.freedesktop.org/series/16981/
State : warning

== Summary ==

Series 16981v1 drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
https://patchwork.freedesktop.org/api/1.0/series/16981/revisions/1/mbox/

Test drv_module_reload:
        Subgroup basic-reload-inject:
                dmesg-warn -> PASS       (fi-ilk-650)
Test kms_flip:
        Subgroup basic-flip-vs-dpms:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup basic-flip-vs-modeset:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup basic-plain-flip:
                pass       -> DMESG-WARN (fi-bxt-j4205)
Test kms_frontbuffer_tracking:
        Subgroup basic:
                pass       -> DMESG-WARN (fi-bxt-j4205)
Test kms_pipe_crc_basic:
        Subgroup bad-nb-words-1:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup bad-nb-words-3:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup bad-pipe:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup bad-source:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup hang-read-crc-pipe-a:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup hang-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup hang-read-crc-pipe-c:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup nonblocking-crc-pipe-a:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup nonblocking-crc-pipe-a-frame-sequence:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup nonblocking-crc-pipe-b:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup nonblocking-crc-pipe-b-frame-sequence:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup nonblocking-crc-pipe-c:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup nonblocking-crc-pipe-c-frame-sequence:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup read-crc-pipe-a:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup read-crc-pipe-a-frame-sequence:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup read-crc-pipe-b-frame-sequence:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup read-crc-pipe-c:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup read-crc-pipe-c-frame-sequence:
                pass       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup suspend-read-crc-pipe-a:
                skip       -> DMESG-WARN (fi-bxt-j4205)
        Subgroup suspend-read-crc-pipe-b:
                skip       -> PASS       (fi-bxt-j4205)
        Subgroup suspend-read-crc-pipe-c:
                skip       -> PASS       (fi-bxt-j4205)
Test pm_backlight:
        Subgroup basic-brightness:
                skip       -> PASS       (fi-bxt-j4205)

fi-bdw-5557u     total:247  pass:233  dwarn:0   dfail:0   fail:0   skip:14 
fi-bsw-n3050     total:247  pass:208  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:247  pass:201  dwarn:25  dfail:0   fail:0   skip:21 
fi-bxt-t5700     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-j1900     total:247  pass:220  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:247  pass:216  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19 
fi-hsw-4770r     total:247  pass:228  dwarn:0   dfail:0   fail:0   skip:19 
fi-ilk-650       total:247  pass:195  dwarn:0   dfail:0   fail:0   skip:52 
fi-ivb-3520m     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-ivb-3770      total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-kbl-7500u     total:247  pass:226  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6260u     total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13 
fi-skl-6700hq    total:247  pass:227  dwarn:0   dfail:0   fail:0   skip:20 
fi-skl-6700k     total:247  pass:224  dwarn:3   dfail:0   fail:0   skip:20 
fi-skl-6770hq    total:247  pass:234  dwarn:0   dfail:0   fail:0   skip:13 
fi-snb-2520m     total:247  pass:216  dwarn:0   dfail:0   fail:0   skip:31 
fi-snb-2600      total:247  pass:215  dwarn:0   dfail:0   fail:0   skip:32 

2a932d085375c80a1bbb332799db3df9738e8eba drm-tip: 2016y-12m-18d-16h-31m-05s UTC integration manifest
aa4cfbd drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3322/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
  2016-12-19  9:19 [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page Tvrtko Ursulin
  2016-12-19  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
@ 2016-12-19  9:47 ` Joonas Lahtinen
  2016-12-19 10:32   ` Jani Nikula
  1 sibling, 1 reply; 5+ messages in thread
From: Joonas Lahtinen @ 2016-12-19  9:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>  	vaddr = kmap(page);
>  
>  	for (i = 0; i < PAGE_SIZE; i += 128) {
> -		memcpy(temp, &vaddr[i], 64);
> +		memcpy(&temp[0], &vaddr[i], 64);
>  		memcpy(&vaddr[i], &vaddr[i + 64], 64);
> -		memcpy(&vaddr[i + 64], temp, 64);
> +		memcpy(&vaddr[i + 64], &temp[0], 64);

This reeks of GCC bug badly. So I would not apply as next time the bug
could be into the another direction.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
  2016-12-19  9:47 ` [PATCH] " Joonas Lahtinen
@ 2016-12-19 10:32   ` Jani Nikula
  2016-12-20  9:48     ` Tvrtko Ursulin
  0 siblings, 1 reply; 5+ messages in thread
From: Jani Nikula @ 2016-12-19 10:32 UTC (permalink / raw)
  To: Joonas Lahtinen, Tvrtko Ursulin, Intel-gfx

On Mon, 19 Dec 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
>> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>>  	vaddr = kmap(page);
>>  
>>  	for (i = 0; i < PAGE_SIZE; i += 128) {
>> -		memcpy(temp, &vaddr[i], 64);
>> +		memcpy(&temp[0], &vaddr[i], 64);
>>  		memcpy(&vaddr[i], &vaddr[i + 64], 64);
>> -		memcpy(&vaddr[i + 64], temp, 64);
>> +		memcpy(&vaddr[i + 64], &temp[0], 64);
>
> This reeks of GCC bug badly. So I would not apply as next time the bug
> could be into the another direction.

Agreed. Please file a bug over at https://gcc.gnu.org/bugs/

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
  2016-12-19 10:32   ` Jani Nikula
@ 2016-12-20  9:48     ` Tvrtko Ursulin
  0 siblings, 0 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2016-12-20  9:48 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin, Intel-gfx


On 19/12/2016 10:32, Jani Nikula wrote:
> On Mon, 19 Dec 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
>> On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
>>> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>>> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>>>  	vaddr = kmap(page);
>>>
>>>  	for (i = 0; i < PAGE_SIZE; i += 128) {
>>> -		memcpy(temp, &vaddr[i], 64);
>>> +		memcpy(&temp[0], &vaddr[i], 64);
>>>  		memcpy(&vaddr[i], &vaddr[i + 64], 64);
>>> -		memcpy(&vaddr[i + 64], temp, 64);
>>> +		memcpy(&vaddr[i + 64], &temp[0], 64);
>>
>> This reeks of GCC bug badly. So I would not apply as next time the bug
>> could be into the another direction.
>
> Agreed. Please file a bug over at https://gcc.gnu.org/bugs/

Bug filed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78869

Potentially useful code generation explorer picked up from #gcc: 
https://godbolt.org/g/XNioHs

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-20  9:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19  9:19 [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page Tvrtko Ursulin
2016-12-19  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
2016-12-19  9:47 ` [PATCH] " Joonas Lahtinen
2016-12-19 10:32   ` Jani Nikula
2016-12-20  9:48     ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.