* [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
@ 2016-12-19 9:19 Tvrtko Ursulin
2016-12-19 9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
2016-12-19 9:47 ` [PATCH] " Joonas Lahtinen
0 siblings, 2 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2016-12-19 9:19 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
For some reason GCC 6.2.1 here unrolls the from and to stack memcpy
here in per-byte fashion and also by repeatedly loading offset
constants. It look horrible like this for example:
...
fdc: 48 b8 41 00 00 00 00 movabs rax,0xffff880000000041
fe3: 88 ff ff
fe6: 44 88 74 06 80 mov BYTE PTR [rsi+rax*1-0x80],r14b
feb: 48 b8 42 00 00 00 00 movabs rax,0xffff880000000042
ff2: 88 ff ff
ff5: 44 88 6c 06 80 mov BYTE PTR [rsi+rax*1-0x80],r13b
ffa: 48 b8 43 00 00 00 00 movabs rax,0xffff880000000043
1001: 88 ff ff
1004: 44 88 64 06 80 mov BYTE PTR [rsi+rax*1-0x80],r12b
1009: 48 b8 44 00 00 00 00 movabs rax,0xffff880000000044
1010: 88 ff ff
1013: 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],bl
1017: 48 b8 45 00 00 00 00 movabs rax,0xffff880000000045
101e: 88 ff ff
1021: 44 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],r11b
1026: 48 b8 46 00 00 00 00 movabs rax,0xffff880000000046
102d: 88 ff ff
1030: 44 88 54 06 80 mov BYTE PTR [rsi+rax*1-0x80],r10b
1035: 48 b8 47 00 00 00 00 movabs rax,0xffff880000000047
103c: 88 ff ff
103f: 44 88 4c 06 80 mov BYTE PTR [rsi+rax*1-0x80],r9b
1044: 0f b6 5d d0 movzx ebx,BYTE PTR [rbp-0x30]
1048: 48 b8 48 00 00 00 00 movabs rax,0xffff880000000048
104f: 88 ff ff
1052: 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],bl
1056: 48 b8 49 00 00 00 00 movabs rax,0xffff880000000049
105d: 88 ff ff
1060: 40 88 7c 06 80 mov BYTE PTR [rsi+rax*1-0x80],dil
1065: 0f b6 5d cf movzx ebx,BYTE PTR [rbp-0x31]
1069: 48 b8 4a 00 00 00 00 movabs rax,0xffff88000000004a
1070: 88 ff ff
1073: 88 5c 06 80 mov BYTE PTR [rsi+rax*1-0x80],bl
1077: 0f b6 7d ce movzx edi,BYTE PTR [rbp-0x32]
107b: 48 b8 4b 00 00 00 00 movabs rax,0xffff88000000004b
...
So change the code a bit which makes it generate a more reasonable
code like:
...
bf1: 48 89 78 b8 mov QWORD PTR [rax-0x48],rdi
bf5: 4c 89 60 c0 mov QWORD PTR [rax-0x40],r12
bf9: 48 89 58 c8 mov QWORD PTR [rax-0x38],rbx
bfd: 4c 89 58 d0 mov QWORD PTR [rax-0x30],r11
c01: 4c 89 50 d8 mov QWORD PTR [rax-0x28],r10
...
Which saves 2087 bytes of code.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem_fence_reg.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index e03983973252..d665d2e74641 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
vaddr = kmap(page);
for (i = 0; i < PAGE_SIZE; i += 128) {
- memcpy(temp, &vaddr[i], 64);
+ memcpy(&temp[0], &vaddr[i], 64);
memcpy(&vaddr[i], &vaddr[i + 64], 64);
- memcpy(&vaddr[i + 64], temp, 64);
+ memcpy(&vaddr[i + 64], &temp[0], 64);
}
kunmap(page);
--
2.7.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 5+ messages in thread
* ✗ Fi.CI.BAT: warning for drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
2016-12-19 9:19 [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page Tvrtko Ursulin
@ 2016-12-19 9:46 ` Patchwork
2016-12-19 9:47 ` [PATCH] " Joonas Lahtinen
1 sibling, 0 replies; 5+ messages in thread
From: Patchwork @ 2016-12-19 9:46 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
URL : https://patchwork.freedesktop.org/series/16981/
State : warning
== Summary ==
Series 16981v1 drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
https://patchwork.freedesktop.org/api/1.0/series/16981/revisions/1/mbox/
Test drv_module_reload:
Subgroup basic-reload-inject:
dmesg-warn -> PASS (fi-ilk-650)
Test kms_flip:
Subgroup basic-flip-vs-dpms:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup basic-flip-vs-modeset:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup basic-flip-vs-wf_vblank:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup basic-plain-flip:
pass -> DMESG-WARN (fi-bxt-j4205)
Test kms_frontbuffer_tracking:
Subgroup basic:
pass -> DMESG-WARN (fi-bxt-j4205)
Test kms_pipe_crc_basic:
Subgroup bad-nb-words-1:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup bad-nb-words-3:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup bad-pipe:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup bad-source:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup hang-read-crc-pipe-a:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup hang-read-crc-pipe-b:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup hang-read-crc-pipe-c:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup nonblocking-crc-pipe-a:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup nonblocking-crc-pipe-a-frame-sequence:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup nonblocking-crc-pipe-b:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup nonblocking-crc-pipe-b-frame-sequence:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup nonblocking-crc-pipe-c:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup nonblocking-crc-pipe-c-frame-sequence:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup read-crc-pipe-a:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup read-crc-pipe-a-frame-sequence:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup read-crc-pipe-b:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup read-crc-pipe-b-frame-sequence:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup read-crc-pipe-c:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup read-crc-pipe-c-frame-sequence:
pass -> DMESG-WARN (fi-bxt-j4205)
Subgroup suspend-read-crc-pipe-a:
skip -> DMESG-WARN (fi-bxt-j4205)
Subgroup suspend-read-crc-pipe-b:
skip -> PASS (fi-bxt-j4205)
Subgroup suspend-read-crc-pipe-c:
skip -> PASS (fi-bxt-j4205)
Test pm_backlight:
Subgroup basic-brightness:
skip -> PASS (fi-bxt-j4205)
fi-bdw-5557u total:247 pass:233 dwarn:0 dfail:0 fail:0 skip:14
fi-bsw-n3050 total:247 pass:208 dwarn:0 dfail:0 fail:0 skip:39
fi-bxt-j4205 total:247 pass:201 dwarn:25 dfail:0 fail:0 skip:21
fi-bxt-t5700 total:247 pass:220 dwarn:0 dfail:0 fail:0 skip:27
fi-byt-j1900 total:247 pass:220 dwarn:0 dfail:0 fail:0 skip:27
fi-byt-n2820 total:247 pass:216 dwarn:0 dfail:0 fail:0 skip:31
fi-hsw-4770 total:247 pass:228 dwarn:0 dfail:0 fail:0 skip:19
fi-hsw-4770r total:247 pass:228 dwarn:0 dfail:0 fail:0 skip:19
fi-ilk-650 total:247 pass:195 dwarn:0 dfail:0 fail:0 skip:52
fi-ivb-3520m total:247 pass:226 dwarn:0 dfail:0 fail:0 skip:21
fi-ivb-3770 total:247 pass:226 dwarn:0 dfail:0 fail:0 skip:21
fi-kbl-7500u total:247 pass:226 dwarn:0 dfail:0 fail:0 skip:21
fi-skl-6260u total:247 pass:234 dwarn:0 dfail:0 fail:0 skip:13
fi-skl-6700hq total:247 pass:227 dwarn:0 dfail:0 fail:0 skip:20
fi-skl-6700k total:247 pass:224 dwarn:3 dfail:0 fail:0 skip:20
fi-skl-6770hq total:247 pass:234 dwarn:0 dfail:0 fail:0 skip:13
fi-snb-2520m total:247 pass:216 dwarn:0 dfail:0 fail:0 skip:31
fi-snb-2600 total:247 pass:215 dwarn:0 dfail:0 fail:0 skip:32
2a932d085375c80a1bbb332799db3df9738e8eba drm-tip: 2016y-12m-18d-16h-31m-05s UTC integration manifest
aa4cfbd drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3322/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
2016-12-19 9:19 [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page Tvrtko Ursulin
2016-12-19 9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
@ 2016-12-19 9:47 ` Joonas Lahtinen
2016-12-19 10:32 ` Jani Nikula
1 sibling, 1 reply; 5+ messages in thread
From: Joonas Lahtinen @ 2016-12-19 9:47 UTC (permalink / raw)
To: Tvrtko Ursulin, Intel-gfx
On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
> vaddr = kmap(page);
>
> for (i = 0; i < PAGE_SIZE; i += 128) {
> - memcpy(temp, &vaddr[i], 64);
> + memcpy(&temp[0], &vaddr[i], 64);
> memcpy(&vaddr[i], &vaddr[i + 64], 64);
> - memcpy(&vaddr[i + 64], temp, 64);
> + memcpy(&vaddr[i + 64], &temp[0], 64);
This reeks of GCC bug badly. So I would not apply as next time the bug
could be into the another direction.
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
2016-12-19 9:47 ` [PATCH] " Joonas Lahtinen
@ 2016-12-19 10:32 ` Jani Nikula
2016-12-20 9:48 ` Tvrtko Ursulin
0 siblings, 1 reply; 5+ messages in thread
From: Jani Nikula @ 2016-12-19 10:32 UTC (permalink / raw)
To: Joonas Lahtinen, Tvrtko Ursulin, Intel-gfx
On Mon, 19 Dec 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
> On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
>> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>> vaddr = kmap(page);
>>
>> for (i = 0; i < PAGE_SIZE; i += 128) {
>> - memcpy(temp, &vaddr[i], 64);
>> + memcpy(&temp[0], &vaddr[i], 64);
>> memcpy(&vaddr[i], &vaddr[i + 64], 64);
>> - memcpy(&vaddr[i + 64], temp, 64);
>> + memcpy(&vaddr[i + 64], &temp[0], 64);
>
> This reeks of GCC bug badly. So I would not apply as next time the bug
> could be into the another direction.
Agreed. Please file a bug over at https://gcc.gnu.org/bugs/
BR,
Jani.
--
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page
2016-12-19 10:32 ` Jani Nikula
@ 2016-12-20 9:48 ` Tvrtko Ursulin
0 siblings, 0 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2016-12-20 9:48 UTC (permalink / raw)
To: Jani Nikula, Joonas Lahtinen, Tvrtko Ursulin, Intel-gfx
On 19/12/2016 10:32, Jani Nikula wrote:
> On Mon, 19 Dec 2016, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:
>> On ma, 2016-12-19 at 09:19 +0000, Tvrtko Ursulin wrote:
>>> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>>> @@ -631,9 +631,9 @@ i915_gem_swizzle_page(struct page *page)
>>> vaddr = kmap(page);
>>>
>>> for (i = 0; i < PAGE_SIZE; i += 128) {
>>> - memcpy(temp, &vaddr[i], 64);
>>> + memcpy(&temp[0], &vaddr[i], 64);
>>> memcpy(&vaddr[i], &vaddr[i + 64], 64);
>>> - memcpy(&vaddr[i + 64], temp, 64);
>>> + memcpy(&vaddr[i + 64], &temp[0], 64);
>>
>> This reeks of GCC bug badly. So I would not apply as next time the bug
>> could be into the another direction.
>
> Agreed. Please file a bug over at https://gcc.gnu.org/bugs/
Bug filed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78869
Potentially useful code generation explorer picked up from #gcc:
https://godbolt.org/g/XNioHs
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-20 9:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19 9:19 [PATCH] drm/i915: More reasonable memcpy unroll in i915_gem_swizzle_page Tvrtko Ursulin
2016-12-19 9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
2016-12-19 9:47 ` [PATCH] " Joonas Lahtinen
2016-12-19 10:32 ` Jani Nikula
2016-12-20 9:48 ` Tvrtko Ursulin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.