All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
@ 2022-12-07 14:44 Mikhail Gavrilov
  2022-12-07 14:58 ` Alex Deucher
  0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2022-12-07 14:44 UTC (permalink / raw)
  To: Deucher, Alexander, Chen, Guchun, James.Zhu, amd-gfx list

[-- Attachment #1: Type: text/plain, Size: 3520 bytes --]

Hi,

I found a commit that on certain video files leads to problems with
VAAPI hardware decoding.
Reproducing the issue requires mesa to be built with the h264 hardware
encoder enabled and the attached file to be playable in the vlc
player.
Before kernel 5.16 this only led to an artifact in the form of a green
bar at the top of the screen, then starting from 5.17 the GPU began to
freeze.
In 6.0, the problem with GPU freezing is solved, but the kernel itself
freezes when certain actions are performed. And the vlc application
cannot be terminated in any way.

The kernel trace would be like:
[  976.184187] amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:40 vmid:1 pasid:32785, for process vlc pid 9905 thread
vlc:cs0 pid 9956)
[  976.184205] amdgpu 0000:03:00.0: amdgpu:   in page starting at
address 0x0000800106b53000 from client 0x12 (VMC)
[  976.184210] amdgpu 0000:03:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00141651
[  976.184213] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0xb)
[  976.184216] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[  976.184219] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[  976.184222] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[  976.184225] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[  976.184228] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
[  976.184234] amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:40 vmid:1 pasid:32785, for process vlc pid 9905 thread
vlc:cs0 pid 9956)
[  976.184238] amdgpu 0000:03:00.0: amdgpu:   in page starting at
address 0x0000800106b52000 from client 0x12 (VMC)
[  976.184242] amdgpu 0000:03:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[  976.184245] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID:
unknown (0x0)
[  976.184248] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[  976.184251] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[  976.184253] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[  976.184256] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[  976.184259] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[  976.184264] amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:40 vmid:1 pasid:32785, for process vlc pid 9905 thread
vlc:cs0 pid 9956)
[  976.184268] amdgpu 0000:03:00.0: amdgpu:   in page starting at
address 0x0000800106b53000 from client 0x12 (VMC)
[  976.184271] amdgpu 0000:03:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[  976.184273] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID:
unknown (0x0)
[  976.184276] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[  976.184279] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[  976.184281] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[  976.184284] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[  976.184286] amdgpu 0000:03:00.0: amdgpu: RW: 0x0


The problematic commit is:
commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 (HEAD)
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Aug 9 11:22:20 2021 -0400

    drm/amdgpu: handle VCN instances when harvesting (v2)

    There may be multiple instances and only one is harvested.

    v2: fix typo in commit message

    Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)")
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1673
    Reviewed-by: Guchun Chen <guchun.chen@amd.com>
    Reviewed-by: James Zhu <James.Zhu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org


Thanks!

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: test_sample_480_2.mp4 --]
[-- Type: video/mp4, Size: 127816 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2022-12-07 14:44 [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding Mikhail Gavrilov
@ 2022-12-07 14:58 ` Alex Deucher
  2022-12-07 20:43   ` Mikhail Gavrilov
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2022-12-07 14:58 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Deucher, Alexander, James.Zhu, amd-gfx list, Chen, Guchun

On Wed, Dec 7, 2022 at 9:44 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> Hi,
>
> I found a commit that on certain video files leads to problems with
> VAAPI hardware decoding.
> Reproducing the issue requires mesa to be built with the h264 hardware
> encoder enabled and the attached file to be playable in the vlc
> player.
> Before kernel 5.16 this only led to an artifact in the form of a green
> bar at the top of the screen, then starting from 5.17 the GPU began to
> freeze.
> In 6.0, the problem with GPU freezing is solved, but the kernel itself
> freezes when certain actions are performed. And the vlc application
> cannot be terminated in any way.

What GPU do you have and what entries do you have in
sys/class/drm/card0/device/ip_discovery/die/0/UVD for the device?
specifically the harvest settings for each instance if there are
multiple instances.  If you had an rx6700 you might have been using
software rendering prior to commit
7cbe08a930a132d84b4cf79953b00b074ec7a2a.

Alex

>
> The kernel trace would be like:
> [  976.184187] amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:40 vmid:1 pasid:32785, for process vlc pid 9905 thread
> vlc:cs0 pid 9956)
> [  976.184205] amdgpu 0000:03:00.0: amdgpu:   in page starting at
> address 0x0000800106b53000 from client 0x12 (VMC)
> [  976.184210] amdgpu 0000:03:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00141651
> [  976.184213] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0xb)
> [  976.184216] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
> [  976.184219] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
> [  976.184222] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5
> [  976.184225] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
> [  976.184228] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
> [  976.184234] amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:40 vmid:1 pasid:32785, for process vlc pid 9905 thread
> vlc:cs0 pid 9956)
> [  976.184238] amdgpu 0000:03:00.0: amdgpu:   in page starting at
> address 0x0000800106b52000 from client 0x12 (VMC)
> [  976.184242] amdgpu 0000:03:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
> [  976.184245] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID:
> unknown (0x0)
> [  976.184248] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
> [  976.184251] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
> [  976.184253] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
> [  976.184256] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
> [  976.184259] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
> [  976.184264] amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:40 vmid:1 pasid:32785, for process vlc pid 9905 thread
> vlc:cs0 pid 9956)
> [  976.184268] amdgpu 0000:03:00.0: amdgpu:   in page starting at
> address 0x0000800106b53000 from client 0x12 (VMC)
> [  976.184271] amdgpu 0000:03:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
> [  976.184273] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID:
> unknown (0x0)
> [  976.184276] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
> [  976.184279] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
> [  976.184281] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
> [  976.184284] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
> [  976.184286] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
>
>
> The problematic commit is:
> commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 (HEAD)
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Mon Aug 9 11:22:20 2021 -0400
>
>     drm/amdgpu: handle VCN instances when harvesting (v2)
>
>     There may be multiple instances and only one is harvested.
>
>     v2: fix typo in commit message
>
>     Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)")
>     Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1673
>     Reviewed-by: Guchun Chen <guchun.chen@amd.com>
>     Reviewed-by: James Zhu <James.Zhu@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Cc: stable@vger.kernel.org
>
>
> Thanks!
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2022-12-07 14:58 ` Alex Deucher
@ 2022-12-07 20:43   ` Mikhail Gavrilov
  2022-12-07 20:54     ` Alex Deucher
  0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2022-12-07 20:43 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Deucher, Alexander, James.Zhu, amd-gfx list, Chen, Guchun

On Wed, Dec 7, 2022 at 7:58 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
>
> What GPU do you have and what entries do you have in
> sys/class/drm/card0/device/ip_discovery/die/0/UVD for the device?

I bisected the issue on the Radeon 6800M.

Parent commit for 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 is
46dd2965bdd1c5a4f6499c73ff32e636fa8f9769.
For both commits ip_discovery is absent.
# ls /sys/class/drm/card0/device/ | grep ip
# ls /sys/class/drm/card1/device/ | grep ip

But from verbose info I see that player for
7cbe08a930a132d84b4cf79953b00b074ec7a2a7 use acceleration:
$ vlc -v Downloads/test_sample_480_2.mp4
VLC media player 3.0.18 Vetinari (revision )
[0000561f72097520] main libvlc: Running vlc with the default
interface. Use 'cvlc' to use vlc without interface.
[00007fa224001190] mp4 demux warning: elst box found
[00007fa224001190] mp4 demux warning: STTS table of 1 entries
[00007fa224001190] mp4 demux warning: CTTS table of 78 entries
[00007fa224001190] mp4 demux warning: elst box found
[00007fa224001190] mp4 demux warning: STTS table of 1 entries
[00007fa224001190] mp4 demux warning: elst old=0 new=1
[00007fa224d19010] faad decoder warning: decoded zero sample
[00007fa224001190] mp4 demux warning: elst old=0 new=1
[00007fa214007030] gl gl: Initialized libplacebo v4.208.0 (API v208)
libva info: VA-API version 1.16.0
libva error: vaGetDriverNameByIndex() failed with unknown libva error,
driver_name = (null)
[00007fa214007030] glconv_vaapi_x11 gl error: vaInitialize: unknown libva error
libva info: VA-API version 1.16.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_16
libva info: va_openDriver() returns 0
[00007fa224c0b3a0] avcodec decoder: Using Mesa Gallium driver
23.0.0-devel for AMD Radeon RX 6800M (navi22, LLVM 15.0.4, DRM 3.42,
5.14.0-rc4-14-7cbe08a930a132d84b4cf79953b00b074ec7a2a7+) for hardware
decoding
[h264 @ 0x7fa224c3fa40] Using deprecated struct vaapi_context in decode.
[0000561f72174de0] pulse audio output warning: starting late (-9724 us)

And for 46dd2965bdd1c5a4f6499c73ff32e636fa8f9769 commit did not use
acceleration:
$ vlc -v Downloads/test_sample_480_2.mp4
VLC media player 3.0.18 Vetinari (revision )
[000055f61ad35520] main libvlc: Running vlc with the default
interface. Use 'cvlc' to use vlc without interface.
[00007fc7e8001190] mp4 demux warning: elst box found
[00007fc7e8001190] mp4 demux warning: STTS table of 1 entries
[00007fc7e8001190] mp4 demux warning: CTTS table of 78 entries
[00007fc7e8001190] mp4 demux warning: elst box found
[00007fc7e8001190] mp4 demux warning: STTS table of 1 entries
[00007fc7e8001190] mp4 demux warning: elst old=0 new=1
[00007fc7e8d19010] faad decoder warning: decoded zero sample
[00007fc7e8001190] mp4 demux warning: elst old=0 new=1
[00007fc7d8007030] gl gl: Initialized libplacebo v4.208.0 (API v208)
libva info: VA-API version 1.16.0
libva error: vaGetDriverNameByIndex() failed with unknown libva error,
driver_name = (null)
[00007fc7d8007030] glconv_vaapi_x11 gl error: vaInitialize: unknown libva error
libva info: VA-API version 1.16.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_16
libva info: va_openDriver() returns 0
[00007fc7d40b3260] vaapi generic error: profile(7) is not supported
[00007fc7d8a089c0] gl gl: Initialized libplacebo v4.208.0 (API v208)
Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared
object file: No such file or directory
Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared
object file: No such file or directory
[00007fc7d89e4f80] gl gl: Initialized libplacebo v4.208.0 (API v208)
[000055f61ae12de0] pulse audio output warning: starting late (-13537 us)

So my bisect didn't make sense :(
Anyway can you reproduce the issue with the attached sample file and
vlc on fresh kernel (6.1-rc8)?

Thanks!

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2022-12-07 20:43   ` Mikhail Gavrilov
@ 2022-12-07 20:54     ` Alex Deucher
  2022-12-09 14:37       ` Leo Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2022-12-07 20:54 UTC (permalink / raw)
  To: Mikhail Gavrilov, Leo Liu, Thong Thai
  Cc: Deucher, Alexander, James.Zhu, amd-gfx list, Chen, Guchun

+ Leo, Thong

On Wed, Dec 7, 2022 at 3:43 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Wed, Dec 7, 2022 at 7:58 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> >
> > What GPU do you have and what entries do you have in
> > sys/class/drm/card0/device/ip_discovery/die/0/UVD for the device?
>
> I bisected the issue on the Radeon 6800M.
>
> Parent commit for 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 is
> 46dd2965bdd1c5a4f6499c73ff32e636fa8f9769.
> For both commits ip_discovery is absent.
> # ls /sys/class/drm/card0/device/ | grep ip
> # ls /sys/class/drm/card1/device/ | grep ip
>
> But from verbose info I see that player for
> 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 use acceleration:
> $ vlc -v Downloads/test_sample_480_2.mp4
> VLC media player 3.0.18 Vetinari (revision )
> [0000561f72097520] main libvlc: Running vlc with the default
> interface. Use 'cvlc' to use vlc without interface.
> [00007fa224001190] mp4 demux warning: elst box found
> [00007fa224001190] mp4 demux warning: STTS table of 1 entries
> [00007fa224001190] mp4 demux warning: CTTS table of 78 entries
> [00007fa224001190] mp4 demux warning: elst box found
> [00007fa224001190] mp4 demux warning: STTS table of 1 entries
> [00007fa224001190] mp4 demux warning: elst old=0 new=1
> [00007fa224d19010] faad decoder warning: decoded zero sample
> [00007fa224001190] mp4 demux warning: elst old=0 new=1
> [00007fa214007030] gl gl: Initialized libplacebo v4.208.0 (API v208)
> libva info: VA-API version 1.16.0
> libva error: vaGetDriverNameByIndex() failed with unknown libva error,
> driver_name = (null)
> [00007fa214007030] glconv_vaapi_x11 gl error: vaInitialize: unknown libva error
> libva info: VA-API version 1.16.0
> libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
> libva info: Found init function __vaDriverInit_1_16
> libva info: va_openDriver() returns 0
> [00007fa224c0b3a0] avcodec decoder: Using Mesa Gallium driver
> 23.0.0-devel for AMD Radeon RX 6800M (navi22, LLVM 15.0.4, DRM 3.42,
> 5.14.0-rc4-14-7cbe08a930a132d84b4cf79953b00b074ec7a2a7+) for hardware
> decoding
> [h264 @ 0x7fa224c3fa40] Using deprecated struct vaapi_context in decode.
> [0000561f72174de0] pulse audio output warning: starting late (-9724 us)
>
> And for 46dd2965bdd1c5a4f6499c73ff32e636fa8f9769 commit did not use
> acceleration:
> $ vlc -v Downloads/test_sample_480_2.mp4
> VLC media player 3.0.18 Vetinari (revision )
> [000055f61ad35520] main libvlc: Running vlc with the default
> interface. Use 'cvlc' to use vlc without interface.
> [00007fc7e8001190] mp4 demux warning: elst box found
> [00007fc7e8001190] mp4 demux warning: STTS table of 1 entries
> [00007fc7e8001190] mp4 demux warning: CTTS table of 78 entries
> [00007fc7e8001190] mp4 demux warning: elst box found
> [00007fc7e8001190] mp4 demux warning: STTS table of 1 entries
> [00007fc7e8001190] mp4 demux warning: elst old=0 new=1
> [00007fc7e8d19010] faad decoder warning: decoded zero sample
> [00007fc7e8001190] mp4 demux warning: elst old=0 new=1
> [00007fc7d8007030] gl gl: Initialized libplacebo v4.208.0 (API v208)
> libva info: VA-API version 1.16.0
> libva error: vaGetDriverNameByIndex() failed with unknown libva error,
> driver_name = (null)
> [00007fc7d8007030] glconv_vaapi_x11 gl error: vaInitialize: unknown libva error
> libva info: VA-API version 1.16.0
> libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
> libva info: Found init function __vaDriverInit_1_16
> libva info: va_openDriver() returns 0
> [00007fc7d40b3260] vaapi generic error: profile(7) is not supported
> [00007fc7d8a089c0] gl gl: Initialized libplacebo v4.208.0 (API v208)
> Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared
> object file: No such file or directory
> Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared
> object file: No such file or directory
> [00007fc7d89e4f80] gl gl: Initialized libplacebo v4.208.0 (API v208)
> [000055f61ae12de0] pulse audio output warning: starting late (-13537 us)
>
> So my bisect didn't make sense :(
> Anyway can you reproduce the issue with the attached sample file and
> vlc on fresh kernel (6.1-rc8)?
>
> Thanks!
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2022-12-07 20:54     ` Alex Deucher
@ 2022-12-09 14:37       ` Leo Liu
  2023-02-17  6:09         ` Mikhail Gavrilov
  0 siblings, 1 reply; 9+ messages in thread
From: Leo Liu @ 2022-12-09 14:37 UTC (permalink / raw)
  To: Alex Deucher, Mikhail Gavrilov, Thong Thai
  Cc: Deucher, Alexander, James.Zhu, amd-gfx list, Chen, Guchun

Please try the latest AMDGPU driver:

https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/

On 2022-12-07 15:54, Alex Deucher wrote:
> + Leo, Thong
>
> On Wed, Dec 7, 2022 at 3:43 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
>> On Wed, Dec 7, 2022 at 7:58 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>>>
>>> What GPU do you have and what entries do you have in
>>> sys/class/drm/card0/device/ip_discovery/die/0/UVD for the device?
>> I bisected the issue on the Radeon 6800M.
>>
>> Parent commit for 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 is
>> 46dd2965bdd1c5a4f6499c73ff32e636fa8f9769.
>> For both commits ip_discovery is absent.
>> # ls /sys/class/drm/card0/device/ | grep ip
>> # ls /sys/class/drm/card1/device/ | grep ip
>>
>> But from verbose info I see that player for
>> 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 use acceleration:
>> $ vlc -v Downloads/test_sample_480_2.mp4
>> VLC media player 3.0.18 Vetinari (revision )
>> [0000561f72097520] main libvlc: Running vlc with the default
>> interface. Use 'cvlc' to use vlc without interface.
>> [00007fa224001190] mp4 demux warning: elst box found
>> [00007fa224001190] mp4 demux warning: STTS table of 1 entries
>> [00007fa224001190] mp4 demux warning: CTTS table of 78 entries
>> [00007fa224001190] mp4 demux warning: elst box found
>> [00007fa224001190] mp4 demux warning: STTS table of 1 entries
>> [00007fa224001190] mp4 demux warning: elst old=0 new=1
>> [00007fa224d19010] faad decoder warning: decoded zero sample
>> [00007fa224001190] mp4 demux warning: elst old=0 new=1
>> [00007fa214007030] gl gl: Initialized libplacebo v4.208.0 (API v208)
>> libva info: VA-API version 1.16.0
>> libva error: vaGetDriverNameByIndex() failed with unknown libva error,
>> driver_name = (null)
>> [00007fa214007030] glconv_vaapi_x11 gl error: vaInitialize: unknown libva error
>> libva info: VA-API version 1.16.0
>> libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
>> libva info: Found init function __vaDriverInit_1_16
>> libva info: va_openDriver() returns 0
>> [00007fa224c0b3a0] avcodec decoder: Using Mesa Gallium driver
>> 23.0.0-devel for AMD Radeon RX 6800M (navi22, LLVM 15.0.4, DRM 3.42,
>> 5.14.0-rc4-14-7cbe08a930a132d84b4cf79953b00b074ec7a2a7+) for hardware
>> decoding
>> [h264 @ 0x7fa224c3fa40] Using deprecated struct vaapi_context in decode.
>> [0000561f72174de0] pulse audio output warning: starting late (-9724 us)
>>
>> And for 46dd2965bdd1c5a4f6499c73ff32e636fa8f9769 commit did not use
>> acceleration:
>> $ vlc -v Downloads/test_sample_480_2.mp4
>> VLC media player 3.0.18 Vetinari (revision )
>> [000055f61ad35520] main libvlc: Running vlc with the default
>> interface. Use 'cvlc' to use vlc without interface.
>> [00007fc7e8001190] mp4 demux warning: elst box found
>> [00007fc7e8001190] mp4 demux warning: STTS table of 1 entries
>> [00007fc7e8001190] mp4 demux warning: CTTS table of 78 entries
>> [00007fc7e8001190] mp4 demux warning: elst box found
>> [00007fc7e8001190] mp4 demux warning: STTS table of 1 entries
>> [00007fc7e8001190] mp4 demux warning: elst old=0 new=1
>> [00007fc7e8d19010] faad decoder warning: decoded zero sample
>> [00007fc7e8001190] mp4 demux warning: elst old=0 new=1
>> [00007fc7d8007030] gl gl: Initialized libplacebo v4.208.0 (API v208)
>> libva info: VA-API version 1.16.0
>> libva error: vaGetDriverNameByIndex() failed with unknown libva error,
>> driver_name = (null)
>> [00007fc7d8007030] glconv_vaapi_x11 gl error: vaInitialize: unknown libva error
>> libva info: VA-API version 1.16.0
>> libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
>> libva info: Found init function __vaDriverInit_1_16
>> libva info: va_openDriver() returns 0
>> [00007fc7d40b3260] vaapi generic error: profile(7) is not supported
>> [00007fc7d8a089c0] gl gl: Initialized libplacebo v4.208.0 (API v208)
>> Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared
>> object file: No such file or directory
>> Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared
>> object file: No such file or directory
>> [00007fc7d89e4f80] gl gl: Initialized libplacebo v4.208.0 (API v208)
>> [000055f61ae12de0] pulse audio output warning: starting late (-13537 us)
>>
>> So my bisect didn't make sense :(
>> Anyway can you reproduce the issue with the attached sample file and
>> vlc on fresh kernel (6.1-rc8)?
>>
>> Thanks!
>>
>> --
>> Best Regards,
>> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2022-12-09 14:37       ` Leo Liu
@ 2023-02-17  6:09         ` Mikhail Gavrilov
  2023-02-17 15:29           ` Alex Deucher
  0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2023-02-17  6:09 UTC (permalink / raw)
  To: Leo Liu
  Cc: Chen, Guchun, Thong Thai, amd-gfx list, Deucher, Alexander,
	Alex Deucher, James.Zhu

On Fri, Dec 9, 2022 at 7:37 PM Leo Liu <leo.liu@amd.com> wrote:
>
> Please try the latest AMDGPU driver:
>
> https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/
>

Sorry Leo, I miss your message.
This issue is still actual for 6.2-rc8.

In my first message I was mistaken.

> Before kernel 5.16 this only led to an artifact in the form of
> a green bar at the top of the screen, then starting from 5.17
> the GPU began to freeze.

The real behaviour before 5.18:
- vlc could plays video with small artifacts in the form of a green
bar on top of the video
- after playing video process vlc correctly exiting

On 5.18 this behaviour changed:
- vlc show black screen instead of playing video
- after playing the process not exiting
- if I tries kill vlc process with 'kill -9' vlc became zombi process
and many other processes start hangs (in kernel log appears follow
lines after 2 minutes)

INFO: task vlc:sh8:5248 blocked for more than 122 seconds.
      Tainted: G        W    L   --------  ---  5.18.0-60.fc37.x86_64+debug #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:vlc:sh8         state:D stack:13616 pid: 5248 ppid:  1934 flags:0x00004006
Call Trace:
 <TASK>
 __schedule+0x492/0x1650
 ? _raw_spin_unlock_irqrestore+0x40/0x60
 ? debug_check_no_obj_freed+0x12d/0x250
 schedule+0x4e/0xb0
 schedule_timeout+0xe1/0x120
 ? lock_release+0x215/0x460
 ? trace_hardirqs_on+0x1a/0xf0
 ? _raw_spin_unlock_irqrestore+0x40/0x60
 dma_fence_default_wait+0x197/0x240
 ? __bpf_trace_dma_fence+0x10/0x10
 dma_fence_wait_timeout+0x229/0x260
 drm_sched_entity_fini+0x101/0x270 [gpu_sched]
 amdgpu_vm_fini+0x2b5/0x460 [amdgpu]
 ? idr_destroy+0x70/0xb0
 ? mutex_destroy+0x1e/0x50
 amdgpu_driver_postclose_kms+0x1ec/0x2c0 [amdgpu]
 drm_file_free.part.0+0x20d/0x260
 drm_release+0x6a/0x120
 __fput+0xab/0x270
 task_work_run+0x5c/0xa0
 do_exit+0x394/0xc40
 ? rcu_read_lock_sched_held+0x10/0x70
 do_group_exit+0x33/0xb0
 get_signal+0xbbc/0xbc0
 arch_do_signal_or_restart+0x30/0x770
 ? do_futex+0xfd/0x190
 ? __x64_sys_futex+0x63/0x190
 exit_to_user_mode_prepare+0x172/0x270
 syscall_exit_to_user_mode+0x16/0x50
 do_syscall_64+0x67/0x80
 ? do_syscall_64+0x67/0x80
 ? rcu_read_lock_sched_held+0x10/0x70
 ? trace_hardirqs_on_prepare+0x5e/0x110
 ? do_syscall_64+0x67/0x80
 ? rcu_read_lock_sched_held+0x10/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82c2364529
RSP: 002b:00007f8210ff8c00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f82c2364529
RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f823022542c
RBP: 00007f8210ff8c30 R08: 0000000000000000 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 00007f823022542c
 </TASK>
INFO: lockdep is turned off.

I bisected this issue and problematic commit is

❯ git bisect bad
5f3854f1f4e211f494018160b348a1c16e58013f is the first bad commit
commit 5f3854f1f4e211f494018160b348a1c16e58013f
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Thu Mar 24 18:04:00 2022 -0400

    drm/amdgpu: add more cases to noretry=1

    Port current list from amd-staging-drm-next.

    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 +++
 1 file changed, 3 insertions(+)

Unfortunately I couldn't simply revert this commit on 6.2-rc8 for
checking, because it leads to conflicts.

Alex, you as author of this commit could help me with it?


-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2023-02-17  6:09         ` Mikhail Gavrilov
@ 2023-02-17 15:29           ` Alex Deucher
  2023-02-17 21:50             ` Mikhail Gavrilov
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2023-02-17 15:29 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Chen, Guchun, Thong Thai, amd-gfx list, Deucher, Alexander,
	James.Zhu, Leo Liu

On Fri, Feb 17, 2023 at 1:10 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Fri, Dec 9, 2022 at 7:37 PM Leo Liu <leo.liu@amd.com> wrote:
> >
> > Please try the latest AMDGPU driver:
> >
> > https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/
> >
>
> Sorry Leo, I miss your message.
> This issue is still actual for 6.2-rc8.
>
> In my first message I was mistaken.
>
> > Before kernel 5.16 this only led to an artifact in the form of
> > a green bar at the top of the screen, then starting from 5.17
> > the GPU began to freeze.
>
> The real behaviour before 5.18:
> - vlc could plays video with small artifacts in the form of a green
> bar on top of the video
> - after playing video process vlc correctly exiting
>
> On 5.18 this behaviour changed:
> - vlc show black screen instead of playing video
> - after playing the process not exiting
> - if I tries kill vlc process with 'kill -9' vlc became zombi process
> and many other processes start hangs (in kernel log appears follow
> lines after 2 minutes)
>
> INFO: task vlc:sh8:5248 blocked for more than 122 seconds.
>       Tainted: G        W    L   --------  ---  5.18.0-60.fc37.x86_64+debug #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:vlc:sh8         state:D stack:13616 pid: 5248 ppid:  1934 flags:0x00004006
> Call Trace:
>  <TASK>
>  __schedule+0x492/0x1650
>  ? _raw_spin_unlock_irqrestore+0x40/0x60
>  ? debug_check_no_obj_freed+0x12d/0x250
>  schedule+0x4e/0xb0
>  schedule_timeout+0xe1/0x120
>  ? lock_release+0x215/0x460
>  ? trace_hardirqs_on+0x1a/0xf0
>  ? _raw_spin_unlock_irqrestore+0x40/0x60
>  dma_fence_default_wait+0x197/0x240
>  ? __bpf_trace_dma_fence+0x10/0x10
>  dma_fence_wait_timeout+0x229/0x260
>  drm_sched_entity_fini+0x101/0x270 [gpu_sched]
>  amdgpu_vm_fini+0x2b5/0x460 [amdgpu]
>  ? idr_destroy+0x70/0xb0
>  ? mutex_destroy+0x1e/0x50
>  amdgpu_driver_postclose_kms+0x1ec/0x2c0 [amdgpu]
>  drm_file_free.part.0+0x20d/0x260
>  drm_release+0x6a/0x120
>  __fput+0xab/0x270
>  task_work_run+0x5c/0xa0
>  do_exit+0x394/0xc40
>  ? rcu_read_lock_sched_held+0x10/0x70
>  do_group_exit+0x33/0xb0
>  get_signal+0xbbc/0xbc0
>  arch_do_signal_or_restart+0x30/0x770
>  ? do_futex+0xfd/0x190
>  ? __x64_sys_futex+0x63/0x190
>  exit_to_user_mode_prepare+0x172/0x270
>  syscall_exit_to_user_mode+0x16/0x50
>  do_syscall_64+0x67/0x80
>  ? do_syscall_64+0x67/0x80
>  ? rcu_read_lock_sched_held+0x10/0x70
>  ? trace_hardirqs_on_prepare+0x5e/0x110
>  ? do_syscall_64+0x67/0x80
>  ? rcu_read_lock_sched_held+0x10/0x70
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f82c2364529
> RSP: 002b:00007f8210ff8c00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f82c2364529
> RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f823022542c
> RBP: 00007f8210ff8c30 R08: 0000000000000000 R09: 00000000ffffffff
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000001 R15: 00007f823022542c
>  </TASK>
> INFO: lockdep is turned off.
>
> I bisected this issue and problematic commit is
>
> ❯ git bisect bad
> 5f3854f1f4e211f494018160b348a1c16e58013f is the first bad commit
> commit 5f3854f1f4e211f494018160b348a1c16e58013f
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Thu Mar 24 18:04:00 2022 -0400
>
>     drm/amdgpu: add more cases to noretry=1
>
>     Port current list from amd-staging-drm-next.
>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> Unfortunately I couldn't simply revert this commit on 6.2-rc8 for
> checking, because it leads to conflicts.
>
> Alex, you as author of this commit could help me with it?

append amdgpu.noretry=0 to the kernel command line in grub.

Alex

>
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2023-02-17 15:29           ` Alex Deucher
@ 2023-02-17 21:50             ` Mikhail Gavrilov
  2023-02-27 21:10               ` Alex Deucher
  0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gavrilov @ 2023-02-17 21:50 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Chen, Guchun, Thong Thai, amd-gfx list, Deucher, Alexander,
	James.Zhu, Leo Liu

[-- Attachment #1: Type: text/plain, Size: 5273 bytes --]

On Fri, Feb 17, 2023 at 8:30 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Fri, Feb 17, 2023 at 1:10 AM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > On Fri, Dec 9, 2022 at 7:37 PM Leo Liu <leo.liu@amd.com> wrote:
> > >
> > > Please try the latest AMDGPU driver:
> > >
> > > https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/
> > >
> >
> > Sorry Leo, I miss your message.
> > This issue is still actual for 6.2-rc8.
> >
> > In my first message I was mistaken.
> >
> > > Before kernel 5.16 this only led to an artifact in the form of
> > > a green bar at the top of the screen, then starting from 5.17
> > > the GPU began to freeze.
> >
> > The real behaviour before 5.18:
> > - vlc could plays video with small artifacts in the form of a green
> > bar on top of the video
> > - after playing video process vlc correctly exiting
> >
> > On 5.18 this behaviour changed:
> > - vlc show black screen instead of playing video
> > - after playing the process not exiting
> > - if I tries kill vlc process with 'kill -9' vlc became zombi process
> > and many other processes start hangs (in kernel log appears follow
> > lines after 2 minutes)
> >
> > INFO: task vlc:sh8:5248 blocked for more than 122 seconds.
> >       Tainted: G        W    L   --------  ---  5.18.0-60.fc37.x86_64+debug #1
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:vlc:sh8         state:D stack:13616 pid: 5248 ppid:  1934 flags:0x00004006
> > Call Trace:
> >  <TASK>
> >  __schedule+0x492/0x1650
> >  ? _raw_spin_unlock_irqrestore+0x40/0x60
> >  ? debug_check_no_obj_freed+0x12d/0x250
> >  schedule+0x4e/0xb0
> >  schedule_timeout+0xe1/0x120
> >  ? lock_release+0x215/0x460
> >  ? trace_hardirqs_on+0x1a/0xf0
> >  ? _raw_spin_unlock_irqrestore+0x40/0x60
> >  dma_fence_default_wait+0x197/0x240
> >  ? __bpf_trace_dma_fence+0x10/0x10
> >  dma_fence_wait_timeout+0x229/0x260
> >  drm_sched_entity_fini+0x101/0x270 [gpu_sched]
> >  amdgpu_vm_fini+0x2b5/0x460 [amdgpu]
> >  ? idr_destroy+0x70/0xb0
> >  ? mutex_destroy+0x1e/0x50
> >  amdgpu_driver_postclose_kms+0x1ec/0x2c0 [amdgpu]
> >  drm_file_free.part.0+0x20d/0x260
> >  drm_release+0x6a/0x120
> >  __fput+0xab/0x270
> >  task_work_run+0x5c/0xa0
> >  do_exit+0x394/0xc40
> >  ? rcu_read_lock_sched_held+0x10/0x70
> >  do_group_exit+0x33/0xb0
> >  get_signal+0xbbc/0xbc0
> >  arch_do_signal_or_restart+0x30/0x770
> >  ? do_futex+0xfd/0x190
> >  ? __x64_sys_futex+0x63/0x190
> >  exit_to_user_mode_prepare+0x172/0x270
> >  syscall_exit_to_user_mode+0x16/0x50
> >  do_syscall_64+0x67/0x80
> >  ? do_syscall_64+0x67/0x80
> >  ? rcu_read_lock_sched_held+0x10/0x70
> >  ? trace_hardirqs_on_prepare+0x5e/0x110
> >  ? do_syscall_64+0x67/0x80
> >  ? rcu_read_lock_sched_held+0x10/0x70
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f82c2364529
> > RSP: 002b:00007f8210ff8c00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> > RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f82c2364529
> > RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f823022542c
> > RBP: 00007f8210ff8c30 R08: 0000000000000000 R09: 00000000ffffffff
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000000 R14: 0000000000000001 R15: 00007f823022542c
> >  </TASK>
> > INFO: lockdep is turned off.
> >
> > I bisected this issue and problematic commit is
> >
> > ❯ git bisect bad
> > 5f3854f1f4e211f494018160b348a1c16e58013f is the first bad commit
> > commit 5f3854f1f4e211f494018160b348a1c16e58013f
> > Author: Alex Deucher <alexander.deucher@amd.com>
> > Date:   Thu Mar 24 18:04:00 2022 -0400
> >
> >     drm/amdgpu: add more cases to noretry=1
> >
> >     Port current list from amd-staging-drm-next.
> >
> >     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > Unfortunately I couldn't simply revert this commit on 6.2-rc8 for
> > checking, because it leads to conflicts.
> >
> > Alex, you as author of this commit could help me with it?
>
> append amdgpu.noretry=0 to the kernel command line in grub.

Thanks, I checked the "amdgpu.noretry=0" and after the page fault
occurs vlc could play video with little artifacts.

So I have some questions:

1. Why retrys was disabled by default if it really stills needed for
recoverable page faults? As Christian answered me before here:
https://lore.kernel.org/all/f253ff1f-3c5c-c785-1272-e4fe69a366ec@amd.com/T/#m73a0a6eb7b2531eacf24fd498e8d2eec675f05a6

The page faults (Not to be confused with kernel panic) it's absolutely
normal phenomenon for a buggy userspace. And if it "normal" I wold
prefer what is not had affect on system reliability. But as we can see
it leads to appears zombie processes with follow hang.

2.If recoverable page faults is not an option, is it possible to
somehow fix this issue or not?

P.S. I also see page faults in other scenarios (for example when
playing in "Division 2" or "The Callisto Protocol". I attached my
kernel log for show it) but it not leads to zombie processes.

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: dmesg.tar.xz --]
[-- Type: application/x-xz, Size: 35188 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.
  2023-02-17 21:50             ` Mikhail Gavrilov
@ 2023-02-27 21:10               ` Alex Deucher
  0 siblings, 0 replies; 9+ messages in thread
From: Alex Deucher @ 2023-02-27 21:10 UTC (permalink / raw)
  To: Mikhail Gavrilov, Kuehling, Felix
  Cc: Chen, Guchun, Thong Thai, amd-gfx list, Deucher, Alexander,
	James.Zhu, Leo Liu

+ Felix


On Fri, Feb 17, 2023 at 4:50 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Fri, Feb 17, 2023 at 8:30 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Fri, Feb 17, 2023 at 1:10 AM Mikhail Gavrilov
> > <mikhail.v.gavrilov@gmail.com> wrote:
> > >
> > > On Fri, Dec 9, 2022 at 7:37 PM Leo Liu <leo.liu@amd.com> wrote:
> > > >
> > > > Please try the latest AMDGPU driver:
> > > >
> > > > https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/
> > > >
> > >
> > > Sorry Leo, I miss your message.
> > > This issue is still actual for 6.2-rc8.
> > >
> > > In my first message I was mistaken.
> > >
> > > > Before kernel 5.16 this only led to an artifact in the form of
> > > > a green bar at the top of the screen, then starting from 5.17
> > > > the GPU began to freeze.
> > >
> > > The real behaviour before 5.18:
> > > - vlc could plays video with small artifacts in the form of a green
> > > bar on top of the video
> > > - after playing video process vlc correctly exiting
> > >
> > > On 5.18 this behaviour changed:
> > > - vlc show black screen instead of playing video
> > > - after playing the process not exiting
> > > - if I tries kill vlc process with 'kill -9' vlc became zombi process
> > > and many other processes start hangs (in kernel log appears follow
> > > lines after 2 minutes)
> > >
> > > INFO: task vlc:sh8:5248 blocked for more than 122 seconds.
> > >       Tainted: G        W    L   --------  ---  5.18.0-60.fc37.x86_64+debug #1
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > task:vlc:sh8         state:D stack:13616 pid: 5248 ppid:  1934 flags:0x00004006
> > > Call Trace:
> > >  <TASK>
> > >  __schedule+0x492/0x1650
> > >  ? _raw_spin_unlock_irqrestore+0x40/0x60
> > >  ? debug_check_no_obj_freed+0x12d/0x250
> > >  schedule+0x4e/0xb0
> > >  schedule_timeout+0xe1/0x120
> > >  ? lock_release+0x215/0x460
> > >  ? trace_hardirqs_on+0x1a/0xf0
> > >  ? _raw_spin_unlock_irqrestore+0x40/0x60
> > >  dma_fence_default_wait+0x197/0x240
> > >  ? __bpf_trace_dma_fence+0x10/0x10
> > >  dma_fence_wait_timeout+0x229/0x260
> > >  drm_sched_entity_fini+0x101/0x270 [gpu_sched]
> > >  amdgpu_vm_fini+0x2b5/0x460 [amdgpu]
> > >  ? idr_destroy+0x70/0xb0
> > >  ? mutex_destroy+0x1e/0x50
> > >  amdgpu_driver_postclose_kms+0x1ec/0x2c0 [amdgpu]
> > >  drm_file_free.part.0+0x20d/0x260
> > >  drm_release+0x6a/0x120
> > >  __fput+0xab/0x270
> > >  task_work_run+0x5c/0xa0
> > >  do_exit+0x394/0xc40
> > >  ? rcu_read_lock_sched_held+0x10/0x70
> > >  do_group_exit+0x33/0xb0
> > >  get_signal+0xbbc/0xbc0
> > >  arch_do_signal_or_restart+0x30/0x770
> > >  ? do_futex+0xfd/0x190
> > >  ? __x64_sys_futex+0x63/0x190
> > >  exit_to_user_mode_prepare+0x172/0x270
> > >  syscall_exit_to_user_mode+0x16/0x50
> > >  do_syscall_64+0x67/0x80
> > >  ? do_syscall_64+0x67/0x80
> > >  ? rcu_read_lock_sched_held+0x10/0x70
> > >  ? trace_hardirqs_on_prepare+0x5e/0x110
> > >  ? do_syscall_64+0x67/0x80
> > >  ? rcu_read_lock_sched_held+0x10/0x70
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > RIP: 0033:0x7f82c2364529
> > > RSP: 002b:00007f8210ff8c00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> > > RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f82c2364529
> > > RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f823022542c
> > > RBP: 00007f8210ff8c30 R08: 0000000000000000 R09: 00000000ffffffff
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 0000000000000000 R14: 0000000000000001 R15: 00007f823022542c
> > >  </TASK>
> > > INFO: lockdep is turned off.
> > >
> > > I bisected this issue and problematic commit is
> > >
> > > ❯ git bisect bad
> > > 5f3854f1f4e211f494018160b348a1c16e58013f is the first bad commit
> > > commit 5f3854f1f4e211f494018160b348a1c16e58013f
> > > Author: Alex Deucher <alexander.deucher@amd.com>
> > > Date:   Thu Mar 24 18:04:00 2022 -0400
> > >
> > >     drm/amdgpu: add more cases to noretry=1
> > >
> > >     Port current list from amd-staging-drm-next.
> > >
> > >     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > >
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > Unfortunately I couldn't simply revert this commit on 6.2-rc8 for
> > > checking, because it leads to conflicts.
> > >
> > > Alex, you as author of this commit could help me with it?
> >
> > append amdgpu.noretry=0 to the kernel command line in grub.
>
> Thanks, I checked the "amdgpu.noretry=0" and after the page fault
> occurs vlc could play video with little artifacts.
>
> So I have some questions:
>
> 1. Why retrys was disabled by default if it really stills needed for
> recoverable page faults? As Christian answered me before here:
> https://lore.kernel.org/all/f253ff1f-3c5c-c785-1272-e4fe69a366ec@amd.com/T/#m73a0a6eb7b2531eacf24fd498e8d2eec675f05a6
>

You don't actually want retry page faults, because for gfx apps,
nothing is going to page in the missing pages.  The retry stuff is for
demand paging type scenarios and only certain GPUs (GFX9-based)
actually support the necessary semantics to make this work.  Even then
it would only be useful in APIs which support demand paging.  Right
now GFX APIs don't really do this.

> The page faults (Not to be confused with kernel panic) it's absolutely
> normal phenomenon for a buggy userspace. And if it "normal" I wold
> prefer what is not had affect on system reliability. But as we can see
> it leads to appears zombie processes with follow hang.
>

If you don't retry the fault, the kernel reports the fault, but the
engine should continue.  Reads will return 0 and writes will be
dropped.  So it shouldn't hang unless the page fault causes some
deadlock in the engine itself (e.g., due to the bogus data returned).

> 2.If recoverable page faults is not an option, is it possible to
> somehow fix this issue or not?

I think this is probably a bug in mesa somewhere where the UMD has the
alignment wrong somewhere or some dependency between GFX and VCN is
not completing because of the page fault.

>
> P.S. I also see page faults in other scenarios (for example when
> playing in "Division 2" or "The Callisto Protocol". I attached my
> kernel log for show it) but it not leads to zombie processes.

Right, that is the expected behavior when the fault is non-fatal.

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-02-27 21:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07 14:44 [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding Mikhail Gavrilov
2022-12-07 14:58 ` Alex Deucher
2022-12-07 20:43   ` Mikhail Gavrilov
2022-12-07 20:54     ` Alex Deucher
2022-12-09 14:37       ` Leo Liu
2023-02-17  6:09         ` Mikhail Gavrilov
2023-02-17 15:29           ` Alex Deucher
2023-02-17 21:50             ` Mikhail Gavrilov
2023-02-27 21:10               ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.