All of lore.kernel.org
 help / color / mirror / Atom feed
* Performance drop using deinterlace_vaapi on 5.19-rcX
@ 2022-06-18 16:13 Thomas Voegtle
  2022-06-20 11:32 ` Christian König
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Voegtle @ 2022-06-18 16:13 UTC (permalink / raw)
  To: Christian König, Daniel Vetter, amd-gfx; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]


Hello,

I noticed a performance drop encoding a mpeg file to a h264 video using
the vaapi option deinterlace_vaapi on a Haswell i5-4570 with Linux
5.19-rc1.

A 10 minute long video takes normally 41s to convert, now with 5.19-rc1
it takes about 2m 36s.

My ffmpeg line is:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128
-hwaccel_output_format vaapi -i test.vdr -vf 'deinterlace_vaapi' -c:v
h264_vaapi

Removing the option deinterlace_vaapi shows no difference in performance 
between 5.18 and 5.19-rcX.


I bisected this down to:

commit 047a1b877ed48098bed71fcfb1d4891e1b54441d
Author: Christian König <christian.koenig@amd.com>
Date:   Tue Nov 23 09:33:07 2021 +0100

     dma-buf & drm/amdgpu: remove dma_resv workaround


and wasn't able to revert this one on top of 5.19-rcX.

I tried the predecessor commit:

commit 73511edf8b196e6f1ccda0fdf294ff57aa2dc9db (HEAD)
Author: Christian König <christian.koenig@amd.com>
Date:   Tue Nov 9 11:08:18 2021 +0100

     dma-buf: specify usage while adding fences to dma_resv obj v7

which is fine.

Using ffmpeg 5.0.1 with libva 2.10.0 and intel vaapi driver 2.4.1


  Best regards,

     Thomas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
  2022-06-18 16:13 Performance drop using deinterlace_vaapi on 5.19-rcX Thomas Voegtle
@ 2022-06-20 11:32 ` Christian König
  2022-06-20 11:40   ` Thomas Voegtle
  0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2022-06-20 11:32 UTC (permalink / raw)
  To: Thomas Voegtle, Christian König, Daniel Vetter, amd-gfx

Hi Thomas,

[moving vger to bcc]

mhm, sounds like something isn't running in parallel any more.

We usually don't test the multimedia engines for this but we do test 
gfx+compute, so I'm really wondering what goes wrong here.

Could you run some tests for me? Additional to that I'm going to raise 
that issue with our multimedia guys later today.

Thanks for the info,
Christian.

Am 18.06.22 um 18:13 schrieb Thomas Voegtle:
>
> Hello,
>
> I noticed a performance drop encoding a mpeg file to a h264 video using
> the vaapi option deinterlace_vaapi on a Haswell i5-4570 with Linux
> 5.19-rc1.
>
> A 10 minute long video takes normally 41s to convert, now with 5.19-rc1
> it takes about 2m 36s.
>
> My ffmpeg line is:
> ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128
> -hwaccel_output_format vaapi -i test.vdr -vf 'deinterlace_vaapi' -c:v
> h264_vaapi
>
> Removing the option deinterlace_vaapi shows no difference in 
> performance between 5.18 and 5.19-rcX.
>
>
> I bisected this down to:
>
> commit 047a1b877ed48098bed71fcfb1d4891e1b54441d
> Author: Christian König <christian.koenig@amd.com>
> Date:   Tue Nov 23 09:33:07 2021 +0100
>
>     dma-buf & drm/amdgpu: remove dma_resv workaround
>
>
> and wasn't able to revert this one on top of 5.19-rcX.
>
> I tried the predecessor commit:
>
> commit 73511edf8b196e6f1ccda0fdf294ff57aa2dc9db (HEAD)
> Author: Christian König <christian.koenig@amd.com>
> Date:   Tue Nov 9 11:08:18 2021 +0100
>
>     dma-buf: specify usage while adding fences to dma_resv obj v7
>
> which is fine.
>
> Using ffmpeg 5.0.1 with libva 2.10.0 and intel vaapi driver 2.4.1
>
>
>  Best regards,
>
>     Thomas


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
  2022-06-20 11:32 ` Christian König
@ 2022-06-20 11:40   ` Thomas Voegtle
  2022-06-20 13:26     ` Christian König
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Voegtle @ 2022-06-20 11:40 UTC (permalink / raw)
  To: Christian König; +Cc: Daniel Vetter, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 478 bytes --]

On Mon, 20 Jun 2022, Christian König wrote:

> Hi Thomas,
>
> [moving vger to bcc]
>
> mhm, sounds like something isn't running in parallel any more.
>
> We usually don't test the multimedia engines for this but we do test 
> gfx+compute, so I'm really wondering what goes wrong here.
>
> Could you run some tests for me? Additional to that I'm going to raise that 
> issue with our multimedia guys later today.

Yes, I can run some tests for you. Which tests?


       Thomas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
  2022-06-20 11:40   ` Thomas Voegtle
@ 2022-06-20 13:26     ` Christian König
  2022-06-20 14:31       ` Thomas Voegtle
  0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2022-06-20 13:26 UTC (permalink / raw)
  To: Thomas Voegtle; +Cc: Daniel Vetter, amd-gfx

Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
> On Mon, 20 Jun 2022, Christian König wrote:
>
>> Hi Thomas,
>>
>> [moving vger to bcc]
>>
>> mhm, sounds like something isn't running in parallel any more.
>>
>> We usually don't test the multimedia engines for this but we do test 
>> gfx+compute, so I'm really wondering what goes wrong here.
>>
>> Could you run some tests for me? Additional to that I'm going to 
>> raise that issue with our multimedia guys later today.
>
> Yes, I can run some tests for you. Which tests?

Try this as root:

echo 1 > /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
echo 1 > 
/sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
cat /sys/kernel/debug/tracing/trace_pipe > trace.log

Then start the encoding in another shell, after it completed cancel the 
cat with cntr+c and save the log file.

Do this one with the old kernel and once with the new one.

Regards,
Christian.

>
>
>       Thomas


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance drop using deinterlace_vaapi on 5.19-rcX
  2022-06-20 13:26     ` Christian König
@ 2022-06-20 14:31       ` Thomas Voegtle
  2022-06-20 15:28         ` [Intel-gfx] " Christian König
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Voegtle @ 2022-06-20 14:31 UTC (permalink / raw)
  To: Christian König; +Cc: Daniel Vetter, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 1258 bytes --]

On Mon, 20 Jun 2022, Christian König wrote:

> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
>>  On Mon, 20 Jun 2022, Christian König wrote:
>>
>>>  Hi Thomas,
>>>
>>>  [moving vger to bcc]
>>>
>>>  mhm, sounds like something isn't running in parallel any more.
>>>
>>>  We usually don't test the multimedia engines for this but we do test
>>>  gfx+compute, so I'm really wondering what goes wrong here.
>>>
>>>  Could you run some tests for me? Additional to that I'm going to raise
>>>  that issue with our multimedia guys later today.
>>
>>  Yes, I can run some tests for you. Which tests?
>
> Try this as root:
>
> echo 1 > /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
> echo 1 > /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
>
> Then start the encoding in another shell, after it completed cancel the cat 
> with cntr+c and save the log file.
>
> Do this one with the old kernel and once with the new one.


    https://32h.de/tv/5.18.0-i5-trace.log.bz2
    https://32h.de/tv/5.19.0-rc3-i5-trace.log.bz2


I hope I have done this correctly.
All necessary tracing things switched on?

I want to add that this is a headless machine. No monitor connected.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] Performance drop using deinterlace_vaapi on 5.19-rcX
  2022-06-20 14:31       ` Thomas Voegtle
@ 2022-06-20 15:28         ` Christian König
  2022-06-20 17:28           ` Daniel Vetter
  0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2022-06-20 15:28 UTC (permalink / raw)
  To: Thomas Voegtle; +Cc: Daniel Vetter, Intel Graphics Development

Hi Thomas,

Am 20.06.22 um 16:31 schrieb Thomas Voegtle:
> On Mon, 20 Jun 2022, Christian König wrote:
>
>> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
>>>  On Mon, 20 Jun 2022, Christian König wrote:
>>>
>>>>  Hi Thomas,
>>>>
>>>>  [moving vger to bcc]
>>>>
>>>>  mhm, sounds like something isn't running in parallel any more.
>>>>
>>>>  We usually don't test the multimedia engines for this but we do test
>>>>  gfx+compute, so I'm really wondering what goes wrong here.
>>>>
>>>>  Could you run some tests for me? Additional to that I'm going to 
>>>> raise
>>>>  that issue with our multimedia guys later today.
>>>
>>>  Yes, I can run some tests for you. Which tests?
>>
>> Try this as root:
>>
>> echo 1 > 
>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
>> echo 1 > 
>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
>> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
>>
>> Then start the encoding in another shell, after it completed cancel 
>> the cat with cntr+c and save the log file.
>>
>> Do this one with the old kernel and once with the new one.
>
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.18.0-i5-trace.log.bz2&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xv8vLUuBq37sBFcGxdua%2FnNQ51BiN1USn30ehP8bys0%3D&amp;reserved=0
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.19.0-rc3-i5-trace.log.bz2&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xuBVrQMQ%2FDK3Gv1qN%2FntJ9NjXOZxD6XVkmDCWfG4K44%3D&amp;reserved=0
>
>
> I hope I have done this correctly.
> All necessary tracing things switched on?

Yeah, that looks like what I wanted to see.

>
> I want to add that this is a headless machine. No monitor connected.
>

I've just realized that you aren't even using any AMD GPU for 
transcoding, so I have no idea why removing the AMD specific workaround 
can cause a performance problem for i915.

It must be somehow related to i915 now adding some additional 
synchronization in between submissions.

Adding the Intel mailing list, maybe somebody has a better idea.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] Performance drop using deinterlace_vaapi on 5.19-rcX
  2022-06-20 15:28         ` [Intel-gfx] " Christian König
@ 2022-06-20 17:28           ` Daniel Vetter
  2022-06-22  7:00             ` Tvrtko Ursulin
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Vetter @ 2022-06-20 17:28 UTC (permalink / raw)
  To: Christian König; +Cc: Thomas Voegtle, Intel Graphics Development

On Mon, 20 Jun 2022 at 17:28, Christian König <christian.koenig@amd.com> wrote:
>
> Hi Thomas,
>
> Am 20.06.22 um 16:31 schrieb Thomas Voegtle:
> > On Mon, 20 Jun 2022, Christian König wrote:
> >
> >> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
> >>>  On Mon, 20 Jun 2022, Christian König wrote:
> >>>
> >>>>  Hi Thomas,
> >>>>
> >>>>  [moving vger to bcc]
> >>>>
> >>>>  mhm, sounds like something isn't running in parallel any more.
> >>>>
> >>>>  We usually don't test the multimedia engines for this but we do test
> >>>>  gfx+compute, so I'm really wondering what goes wrong here.
> >>>>
> >>>>  Could you run some tests for me? Additional to that I'm going to
> >>>> raise
> >>>>  that issue with our multimedia guys later today.
> >>>
> >>>  Yes, I can run some tests for you. Which tests?
> >>
> >> Try this as root:
> >>
> >> echo 1 >
> >> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
> >> echo 1 >
> >> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
> >> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
> >>
> >> Then start the encoding in another shell, after it completed cancel
> >> the cat with cntr+c and save the log file.
> >>
> >> Do this one with the old kernel and once with the new one.
> >
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.18.0-i5-trace.log.bz2&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xv8vLUuBq37sBFcGxdua%2FnNQ51BiN1USn30ehP8bys0%3D&amp;reserved=0
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.19.0-rc3-i5-trace.log.bz2&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xuBVrQMQ%2FDK3Gv1qN%2FntJ9NjXOZxD6XVkmDCWfG4K44%3D&amp;reserved=0
> >
> >
> > I hope I have done this correctly.
> > All necessary tracing things switched on?
>
> Yeah, that looks like what I wanted to see.
>
> >
> > I want to add that this is a headless machine. No monitor connected.
> >
>
> I've just realized that you aren't even using any AMD GPU for
> transcoding, so I have no idea why removing the AMD specific workaround
> can cause a performance problem for i915.
>
> It must be somehow related to i915 now adding some additional
> synchronization in between submissions.
>
> Adding the Intel mailing list, maybe somebody has a better idea.

Only thing I can spot is that we now pile up USAGE_WRITE fences, but
beforehand they got replaced. Also the deinterlace stuff means libva
uses render engine, so this kinda fits - without using the render
engine it's just a single engine, and hence you should never have
multiple write fences (not logically, but hsw is a ringbuffer and i915
doesn't have a ringbuffer scheduler, so it's all in-order anyway and
hence not possible to change something).

This would mean that i915 is doing something silly (well not obeying
the old dma_resv rules that any new exclusive fence must be a strict
superset of all currently attached fences), which it totally is doing
with the EXEC_OBJECT_ASYNC flag. But libva doesn't use that.

So tbh I have no idea, but maybe a quick hack that tosses any old
USAGE_WRITE fence like the old dma_resv_add_excl_fence did would sched
some light?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] Performance drop using deinterlace_vaapi on 5.19-rcX
  2022-06-20 17:28           ` Daniel Vetter
@ 2022-06-22  7:00             ` Tvrtko Ursulin
  0 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2022-06-22  7:00 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Thomas Voegtle, Intel Graphics Development


On 20/06/2022 18:28, Daniel Vetter wrote:
> On Mon, 20 Jun 2022 at 17:28, Christian König <christian.koenig@amd.com> wrote:
>>
>> Hi Thomas,
>>
>> Am 20.06.22 um 16:31 schrieb Thomas Voegtle:
>>> On Mon, 20 Jun 2022, Christian König wrote:
>>>
>>>> Am 20.06.22 um 13:40 schrieb Thomas Voegtle:
>>>>>   On Mon, 20 Jun 2022, Christian König wrote:
>>>>>
>>>>>>   Hi Thomas,
>>>>>>
>>>>>>   [moving vger to bcc]
>>>>>>
>>>>>>   mhm, sounds like something isn't running in parallel any more.
>>>>>>
>>>>>>   We usually don't test the multimedia engines for this but we do test
>>>>>>   gfx+compute, so I'm really wondering what goes wrong here.
>>>>>>
>>>>>>   Could you run some tests for me? Additional to that I'm going to
>>>>>> raise
>>>>>>   that issue with our multimedia guys later today.
>>>>>
>>>>>   Yes, I can run some tests for you. Which tests?
>>>>
>>>> Try this as root:
>>>>
>>>> echo 1 >
>>>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_init/enable
>>>> echo 1 >
>>>> /sys/kernel/debug/tracing/events/dma_fence/dma_fence_signaled/enable
>>>> cat /sys/kernel/debug/tracing/trace_pipe > trace.log
>>>>
>>>> Then start the encoding in another shell, after it completed cancel
>>>> the cat with cntr+c and save the log file.
>>>>
>>>> Do this one with the old kernel and once with the new one.
>>>
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.18.0-i5-trace.log.bz2&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xv8vLUuBq37sBFcGxdua%2FnNQ51BiN1USn30ehP8bys0%3D&amp;reserved=0
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F32h.de%2Ftv%2F5.19.0-rc3-i5-trace.log.bz2&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C41a052960a4d4f7dd38e08da52c99097%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637913323382588469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xuBVrQMQ%2FDK3Gv1qN%2FntJ9NjXOZxD6XVkmDCWfG4K44%3D&amp;reserved=0
>>>
>>>
>>> I hope I have done this correctly.
>>> All necessary tracing things switched on?
>>
>> Yeah, that looks like what I wanted to see.
>>
>>>
>>> I want to add that this is a headless machine. No monitor connected.
>>>
>>
>> I've just realized that you aren't even using any AMD GPU for
>> transcoding, so I have no idea why removing the AMD specific workaround
>> can cause a performance problem for i915.
>>
>> It must be somehow related to i915 now adding some additional
>> synchronization in between submissions.
>>
>> Adding the Intel mailing list, maybe somebody has a better idea.
> 
> Only thing I can spot is that we now pile up USAGE_WRITE fences, but
> beforehand they got replaced. Also the deinterlace stuff means libva
> uses render engine, so this kinda fits - without using the render
> engine it's just a single engine, and hence you should never have
> multiple write fences (not logically, but hsw is a ringbuffer and i915
> doesn't have a ringbuffer scheduler, so it's all in-order anyway and
> hence not possible to change something).
> 
> This would mean that i915 is doing something silly (well not obeying
> the old dma_resv rules that any new exclusive fence must be a strict
> superset of all currently attached fences), which it totally is doing
> with the EXEC_OBJECT_ASYNC flag. But libva doesn't use that.
> 
> So tbh I have no idea, but maybe a quick hack that tosses any old
> USAGE_WRITE fence like the old dma_resv_add_excl_fence did would sched
> some light?

I did not see the original email but having found it in the archives 
(https://lore.kernel.org/lkml/0249066a-2e95-c21d-d16a-fba08c633c0b@lio96.de/), 
~3.8x slowdown is pretty bad.

Thomas, could you please file a bug using 
https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs 
for instructions please? It can get handled and prioritized from there.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-22  7:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-18 16:13 Performance drop using deinterlace_vaapi on 5.19-rcX Thomas Voegtle
2022-06-20 11:32 ` Christian König
2022-06-20 11:40   ` Thomas Voegtle
2022-06-20 13:26     ` Christian König
2022-06-20 14:31       ` Thomas Voegtle
2022-06-20 15:28         ` [Intel-gfx] " Christian König
2022-06-20 17:28           ` Daniel Vetter
2022-06-22  7:00             ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.