* Performance drop using deinterlace_vaapi on 5.19-rcX
@ 2022-06-18 16:13 Thomas Voegtle
2022-06-20 11:32 ` Christian König
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Voegtle @ 2022-06-18 16:13 UTC (permalink / raw)
To: Christian König, Daniel Vetter, amd-gfx; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]
Hello,
I noticed a performance drop encoding a mpeg file to a h264 video using
the vaapi option deinterlace_vaapi on a Haswell i5-4570 with Linux
5.19-rc1.
A 10 minute long video takes normally 41s to convert, now with 5.19-rc1
it takes about 2m 36s.
My ffmpeg line is:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128
-hwaccel_output_format vaapi -i test.vdr -vf 'deinterlace_vaapi' -c:v
h264_vaapi
Removing the option deinterlace_vaapi shows no difference in performance
between 5.18 and 5.19-rcX.
I bisected this down to:
commit 047a1b877ed48098bed71fcfb1d4891e1b54441d
Author: Christian König <christian.koenig@amd.com>
Date: Tue Nov 23 09:33:07 2021 +0100
dma-buf & drm/amdgpu: remove dma_resv workaround
and wasn't able to revert this one on top of 5.19-rcX.
I tried the predecessor commit:
commit 73511edf8b196e6f1ccda0fdf294ff57aa2dc9db (HEAD)
Author: Christian König <christian.koenig@amd.com>
Date: Tue Nov 9 11:08:18 2021 +0100
dma-buf: specify usage while adding fences to dma_resv obj v7
which is fine.
Using ffmpeg 5.0.1 with libva 2.10.0 and intel vaapi driver 2.4.1
Best regards,
Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
2022-06-18 16:13 Performance drop using deinterlace_vaapi on 5.19-rcX Thomas Voegtle
@ 2022-06-20 11:32 ` Christian König
2022-06-20 11:40 ` Thomas Voegtle
0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2022-06-20 11:32 UTC (permalink / raw)
To: Thomas Voegtle, Christian König, Daniel Vetter, amd-gfx
Hi Thomas,
[moving vger to bcc]
mhm, sounds like something isn't running in parallel any more.
We usually don't test the multimedia engines for this but we do test
gfx+compute, so I'm really wondering what goes wrong here.
Could you run some tests for me? Additional to that I'm going to raise
that issue with our multimedia guys later today.
Thanks for the info,
Christian.
Am 18.06.22 um 18:13 schrieb Thomas Voegtle:
>
> Hello,
>
> I noticed a performance drop encoding a mpeg file to a h264 video using
> the vaapi option deinterlace_vaapi on a Haswell i5-4570 with Linux
> 5.19-rc1.
>
> A 10 minute long video takes normally 41s to convert, now with 5.19-rc1
> it takes about 2m 36s.
>
> My ffmpeg line is:
> ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128
> -hwaccel_output_format vaapi -i test.vdr -vf 'deinterlace_vaapi' -c:v
> h264_vaapi
>
> Removing the option deinterlace_vaapi shows no difference in
> performance between 5.18 and 5.19-rcX.
>
>
> I bisected this down to:
>
> commit 047a1b877ed48098bed71fcfb1d4891e1b54441d
> Author: Christian König <christian.koenig@amd.com>
> Date: Tue Nov 23 09:33:07 2021 +0100
>
> dma-buf & drm/amdgpu: remove dma_resv workaround
>
>
> and wasn't able to revert this one on top of 5.19-rcX.
>
> I tried the predecessor commit:
>
> commit 73511edf8b196e6f1ccda0fdf294ff57aa2dc9db (HEAD)
> Author: Christian König <christian.koenig@amd.com>
> Date: Tue Nov 9 11:08:18 2021 +0100
>
> dma-buf: specify usage while adding fences to dma_resv obj v7
>
> which is fine.
>
> Using ffmpeg 5.0.1 with libva 2.10.0 and intel vaapi driver 2.4.1
>
>
> Best regards,
>
> Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
2022-06-20 11:32 ` Christian König
@ 2022-06-20 11:40 ` Thomas Voegtle
2022-06-20 13:26 ` Christian König
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Voegtle @ 2022-06-20 11:40 UTC (permalink / raw)
To: Christian König; +Cc: Daniel Vetter, amd-gfx
[-- Attachment #1: Type: text/plain, Size: 478 bytes --]
On Mon, 20 Jun 2022, Christian König wrote:
> Hi Thomas,
>
> [moving vger to bcc]
>
> mhm, sounds like something isn't running in parallel any more.
>
> We usually don't test the multimedia engines for this but we do test
> gfx+compute, so I'm really wondering what goes wrong here.
>
> Could you run some tests for me? Additional to that I'm going to raise that
> issue with our multimedia guys later today.
Yes, I can run some tests for you. Which tests?
Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
2022-06-20 11:40 ` Thomas Voegtle
@ 2022-06-20 13:26 ` Christian König
2022-06-20 14:31 ` Thomas Voegtle
0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2022-06-20 13:26 UTC (permalink / raw)
To: Thomas Voegtle; +Cc: Daniel Vetter, amd-gfx
Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
> On Mon, 20 Jun 2022, Christian König wrote:
>
>> Hi Thomas,
>>
>> [moving vger to bcc]
>>
>> mhm, sounds like something isn't running in parallel any more.
>>
>> We usually don't test the multimedia engines for this but we do test
>> gfx+compute, so I'm really wondering what goes wrong here.
>>
>> Could you run some tests for me? Additional to that I'm going to
>> raise that issue with our multimedia guys later today.
>
> Yes, I can run some tests for you. Which tests?
Try this as root:
echo 1 > /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
echo 1 >
/sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
cat /sys/kernel/debug/tracing/trace_pipe > trace.log
Then start the encoding in another shell, after it completed cancel the
cat with cntr+c and save the log file.
Do this one with the old kernel and once with the new one.
Regards,
Christian.
>
>
> Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
2022-06-20 13:26 ` Christian König
@ 2022-06-20 14:31 ` Thomas Voegtle
2022-06-20 15:28 ` [Intel-gfx] " Christian König
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Voegtle @ 2022-06-20 14:31 UTC (permalink / raw)
To: Christian König; +Cc: Daniel Vetter, amd-gfx
[-- Attachment #1: Type: text/plain, Size: 1258 bytes --]
On Mon, 20 Jun 2022, Christian König wrote:
> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
>> On Mon, 20 Jun 2022, Christian König wrote:
>>
>>> Hi Thomas,
>>>
>>> [moving vger to bcc]
>>>
>>> mhm, sounds like something isn't running in parallel any more.
>>>
>>> We usually don't test the multimedia engines for this but we do test
>>> gfx+compute, so I'm really wondering what goes wrong here.
>>>
>>> Could you run some tests for me? Additional to that I'm going to raise
>>> that issue with our multimedia guys later today.
>>
>> Yes, I can run some tests for you. Which tests?
>
> Try this as root:
>
> echo 1 > /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
> echo 1 > /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
>
> Then start the encoding in another shell, after it completed cancel the cat
> with cntr+c and save the log file.
>
> Do this one with the old kernel and once with the new one.
https://32h.de/tv/5.18.0-i5-trace.log.bz2
https://32h.de/tv/5.19.0-rc3-i5-trace.log.bz2
I hope I have done this correctly.
All necessary tracing things switched on?
I want to add that this is a headless machine. No monitor connected.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Intel-gfx] Performance drop using deinterlace_vaapi on 5.19-rcX
2022-06-20 14:31 ` Thomas Voegtle
@ 2022-06-20 15:28 ` Christian König
2022-06-20 17:28 ` Daniel Vetter
0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2022-06-20 15:28 UTC (permalink / raw)
To: Thomas Voegtle; +Cc: Daniel Vetter, Intel Graphics Development
Hi Thomas,
Am 20.06.22 um 16:31 schrieb Thomas Voegtle:
> On Mon, 20 Jun 2022, Christian König wrote:
>
>> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
>>> On Mon, 20 Jun 2022, Christian König wrote:
>>>
>>>> Hi Thomas,
>>>>
>>>> [moving vger to bcc]
>>>>
>>>> mhm, sounds like something isn't running in parallel any more.
>>>>
>>>> We usually don't test the multimedia engines for this but we do test
>>>> gfx+compute, so I'm really wondering what goes wrong here.
>>>>
>>>> Could you run some tests for me? Additional to that I'm going to
>>>> raise
>>>> that issue with our multimedia guys later today.
>>>
>>> Yes, I can run some tests for you. Which tests?
>>
>> Try this as root:
>>
>> echo 1 >
>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
>> echo 1 >
>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
>> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
>>
>> Then start the encoding in another shell, after it completed cancel
>> the cat with cntr+c and save the log file.
>>
>> Do this one with the old kernel and once with the new one.
>
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.18.0-i5-trace.log.bz2&data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xv8vLUuBq37sBFcGxdua%2FnNQ51BiN1USn30ehP8bys0%3D&reserved=0
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.19.0-rc3-i5-trace.log.bz2&data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xuBVrQMQ%2FDK3Gv1qN%2FntJ9NjXOZxD6XVkmDCWfG4K44%3D&reserved=0
>
>
> I hope I have done this correctly.
> All necessary tracing things switched on?
Yeah, that looks like what I wanted to see.
>
> I want to add that this is a headless machine. No monitor connected.
>
I've just realized that you aren't even using any AMD GPU for
transcoding, so I have no idea why removing the AMD specific workaround
can cause a performance problem for i915.
It must be somehow related to i915 now adding some additional
synchronization in between submissions.
Adding the Intel mailing list, maybe somebody has a better idea.
Regards,
Christian.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Intel-gfx] Performance drop using deinterlace_vaapi on 5.19-rcX
2022-06-20 15:28 ` [Intel-gfx] " Christian König
@ 2022-06-20 17:28 ` Daniel Vetter
2022-06-22 7:00 ` Tvrtko Ursulin
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Vetter @ 2022-06-20 17:28 UTC (permalink / raw)
To: Christian König; +Cc: Thomas Voegtle, Intel Graphics Development
On Mon, 20 Jun 2022 at 17:28, Christian König <christian.koenig@amd.com> wrote:
>
> Hi Thomas,
>
> Am 20.06.22 um 16:31 schrieb Thomas Voegtle:
> > On Mon, 20 Jun 2022, Christian König wrote:
> >
> >> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
> >>> On Mon, 20 Jun 2022, Christian König wrote:
> >>>
> >>>> Hi Thomas,
> >>>>
> >>>> [moving vger to bcc]
> >>>>
> >>>> mhm, sounds like something isn't running in parallel any more.
> >>>>
> >>>> We usually don't test the multimedia engines for this but we do test
> >>>> gfx+compute, so I'm really wondering what goes wrong here.
> >>>>
> >>>> Could you run some tests for me? Additional to that I'm going to
> >>>> raise
> >>>> that issue with our multimedia guys later today.
> >>>
> >>> Yes, I can run some tests for you. Which tests?
> >>
> >> Try this as root:
> >>
> >> echo 1 >
> >> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
> >> echo 1 >
> >> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
> >> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
> >>
> >> Then start the encoding in another shell, after it completed cancel
> >> the cat with cntr+c and save the log file.
> >>
> >> Do this one with the old kernel and once with the new one.
> >
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.18.0-i5-trace.log.bz2&data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xv8vLUuBq37sBFcGxdua%2FnNQ51BiN1USn30ehP8bys0%3D&reserved=0
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.19.0-rc3-i5-trace.log.bz2&data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xuBVrQMQ%2FDK3Gv1qN%2FntJ9NjXOZxD6XVkmDCWfG4K44%3D&reserved=0
> >
> >
> > I hope I have done this correctly.
> > All necessary tracing things switched on?
>
> Yeah, that looks like what I wanted to see.
>
> >
> > I want to add that this is a headless machine. No monitor connected.
> >
>
> I've just realized that you aren't even using any AMD GPU for
> transcoding, so I have no idea why removing the AMD specific workaround
> can cause a performance problem for i915.
>
> It must be somehow related to i915 now adding some additional
> synchronization in between submissions.
>
> Adding the Intel mailing list, maybe somebody has a better idea.
Only thing I can spot is that we now pile up USAGE_WRITE fences, but
beforehand they got replaced. Also the deinterlace stuff means libva
uses render engine, so this kinda fits - without using the render
engine it's just a single engine, and hence you should never have
multiple write fences (not logically, but hsw is a ringbuffer and i915
doesn't have a ringbuffer scheduler, so it's all in-order anyway and
hence not possible to change something).
This would mean that i915 is doing something silly (well not obeying
the old dma_resv rules that any new exclusive fence must be a strict
superset of all currently attached fences), which it totally is doing
with the EXEC_OBJECT_ASYNC flag. But libva doesn't use that.
So tbh I have no idea, but maybe a quick hack that tosses any old
USAGE_WRITE fence like the old dma_resv_add_excl_fence did would sched
some light?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Intel-gfx] Performance drop using deinterlace_vaapi on 5.19-rcX
2022-06-20 17:28 ` Daniel Vetter
@ 2022-06-22 7:00 ` Tvrtko Ursulin
0 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2022-06-22 7:00 UTC (permalink / raw)
To: Daniel Vetter, Christian König
Cc: Thomas Voegtle, Intel Graphics Development
On 20/06/2022 18:28, Daniel Vetter wrote:
> On Mon, 20 Jun 2022 at 17:28, Christian König <christian.koenig@amd.com> wrote:
>>
>> Hi Thomas,
>>
>> Am 20.06.22 um 16:31 schrieb Thomas Voegtle:
>>> On Mon, 20 Jun 2022, Christian König wrote:
>>>
>>>> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
>>>>> On Mon, 20 Jun 2022, Christian König wrote:
>>>>>
>>>>>> Hi Thomas,
>>>>>>
>>>>>> [moving vger to bcc]
>>>>>>
>>>>>> mhm, sounds like something isn't running in parallel any more.
>>>>>>
>>>>>> We usually don't test the multimedia engines for this but we do test
>>>>>> gfx+compute, so I'm really wondering what goes wrong here.
>>>>>>
>>>>>> Could you run some tests for me? Additional to that I'm going to
>>>>>> raise
>>>>>> that issue with our multimedia guys later today.
>>>>>
>>>>> Yes, I can run some tests for you. Which tests?
>>>>
>>>> Try this as root:
>>>>
>>>> echo 1 >
>>>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
>>>> echo 1 >
>>>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
>>>> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
>>>>
>>>> Then start the encoding in another shell, after it completed cancel
>>>> the cat with cntr+c and save the log file.
>>>>
>>>> Do this one with the old kernel and once with the new one.
>>>
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.18.0-i5-trace.log.bz2&data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xv8vLUuBq37sBFcGxdua%2FnNQ51BiN1USn30ehP8bys0%3D&reserved=0
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.19.0-rc3-i5-trace.log.bz2&data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xuBVrQMQ%2FDK3Gv1qN%2FntJ9NjXOZxD6XVkmDCWfG4K44%3D&reserved=0
>>>
>>>
>>> I hope I have done this correctly.
>>> All necessary tracing things switched on?
>>
>> Yeah, that looks like what I wanted to see.
>>
>>>
>>> I want to add that this is a headless machine. No monitor connected.
>>>
>>
>> I've just realized that you aren't even using any AMD GPU for
>> transcoding, so I have no idea why removing the AMD specific workaround
>> can cause a performance problem for i915.
>>
>> It must be somehow related to i915 now adding some additional
>> synchronization in between submissions.
>>
>> Adding the Intel mailing list, maybe somebody has a better idea.
>
> Only thing I can spot is that we now pile up USAGE_WRITE fences, but
> beforehand they got replaced. Also the deinterlace stuff means libva
> uses render engine, so this kinda fits - without using the render
> engine it's just a single engine, and hence you should never have
> multiple write fences (not logically, but hsw is a ringbuffer and i915
> doesn't have a ringbuffer scheduler, so it's all in-order anyway and
> hence not possible to change something).
>
> This would mean that i915 is doing something silly (well not obeying
> the old dma_resv rules that any new exclusive fence must be a strict
> superset of all currently attached fences), which it totally is doing
> with the EXEC_OBJECT_ASYNC flag. But libva doesn't use that.
>
> So tbh I have no idea, but maybe a quick hack that tosses any old
> USAGE_WRITE fence like the old dma_resv_add_excl_fence did would sched
> some light?
I did not see the original email but having found it in the archives
(https://lore.kernel.org/lkml/0249066a-2e95-c21d-d16a-fba08c633c0b@lio96.de/),
~3.8x slowdown is pretty bad.
Thomas, could you please file a bug using
https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs
for instructions please? It can get handled and prioritized from there.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-06-22 7:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-18 16:13 Performance drop using deinterlace_vaapi on 5.19-rcX Thomas Voegtle
2022-06-20 11:32 ` Christian König
2022-06-20 11:40 ` Thomas Voegtle
2022-06-20 13:26 ` Christian König
2022-06-20 14:31 ` Thomas Voegtle
2022-06-20 15:28 ` [Intel-gfx] " Christian König
2022-06-20 17:28 ` Daniel Vetter
2022-06-22 7:00 ` Tvrtko Ursulin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.