linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
@ 2021-01-10 22:26 Mikhail Gavrilov
  2021-01-11  9:03 ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2021-01-10 22:26 UTC (permalink / raw)
  To: amd-gfx list, dri-devel, Linux List Kernel Mailing,
	Harry Wentland, Christian König, Deucher, Alexander

Hi folks,
today I joined to testing Kernel 5.11 and saw that the kernel log was
flooded with BUG messages:
BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0
INFO: lockdep is turned off.
CPU: 15 PID: 266 Comm: kswapd0 Tainted: G        W        ---------
---  5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0x8b/0xb0
 ___might_sleep.cold+0xb6/0xc6
 vm_unmap_aliases+0x21/0x40
 change_page_attr_set_clr+0x9e/0x190
 set_memory_wb+0x2f/0x80
 ttm_pool_free_page+0x28/0x90 [ttm]
 ttm_pool_shrink+0x45/0xb0 [ttm]
 ttm_pool_shrinker_scan+0xa/0x20 [ttm]
 do_shrink_slab+0x177/0x3a0
 shrink_slab+0x9c/0x290
 shrink_node+0x2e6/0x700
 balance_pgdat+0x2f5/0x650
 kswapd+0x21d/0x4d0
 ? do_wait_intr_irq+0xd0/0xd0
 ? balance_pgdat+0x650/0x650
 kthread+0x13a/0x150
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

But the most unpleasant thing is that after a while the monitor turns
off and does not go on again until the restart.
This is accompanied by an entry in the kernel log:

amdgpu 0000:0b:00.0: amdgpu: 00000000ff7d8b94 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12

$ grep "Failed to pin framebuffer with error" -Rn .
./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816:
DRM_ERROR("Failed to pin framebuffer with error %d\n", r);

$ git blame -L 5811,5821 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
Blaming lines:   0% (11/9167), done.
5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811)
 domain = AMDGPU_GEM_DOMAIN_VRAM;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5812)
7b7c6c81b3a37 (Junwei Zhang    2018-06-25 12:51:14 +0800 5813)  r =
amdgpu_bo_pin(rbo, domain);
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5814)  if
(unlikely(r != 0)) {
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5815)
 if (r != -ERESTARTSYS)
30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5816)
         DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
0f257b09531b4 (Chunming Zhou   2019-05-07 19:45:31 +0800 5817)
 ttm_eu_backoff_reservation(&ticket, &list);
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5818)
 return r;
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5819)  }
e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5820)
bb812f1ea87dd (Junwei Zhang    2018-06-25 13:32:24 +0800 5821)  r =
amdgpu_ttm_alloc_gart(&rbo->tbo);

Who knows how to fix it?

Full kernel logs is here:
[1] https://pastebin.com/fLasjDHX
[2] https://pastebin.com/g3wR2r9e

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-10 22:26 [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12 Mikhail Gavrilov
@ 2021-01-11  9:03 ` Christian König
  2021-01-11 14:01   ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2021-01-11  9:03 UTC (permalink / raw)
  To: Mikhail Gavrilov, amd-gfx list, dri-devel,
	Linux List Kernel Mailing, Harry Wentland, Deucher, Alexander

Hi Mikhail

Am 10.01.21 um 23:26 schrieb Mikhail Gavrilov:
> Hi folks,
> today I joined to testing Kernel 5.11 and saw that the kernel log was
> flooded with BUG messages:
> BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: kswapd0
> INFO: lockdep is turned off.
> CPU: 15 PID: 266 Comm: kswapd0 Tainted: G        W        ---------
> ---  5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 2802 10/21/2020
> Call Trace:
>   dump_stack+0x8b/0xb0
>   ___might_sleep.cold+0xb6/0xc6
>   vm_unmap_aliases+0x21/0x40
>   change_page_attr_set_clr+0x9e/0x190
>   set_memory_wb+0x2f/0x80
>   ttm_pool_free_page+0x28/0x90 [ttm]
>   ttm_pool_shrink+0x45/0xb0 [ttm]
>   ttm_pool_shrinker_scan+0xa/0x20 [ttm]
>   do_shrink_slab+0x177/0x3a0
>   shrink_slab+0x9c/0x290
>   shrink_node+0x2e6/0x700
>   balance_pgdat+0x2f5/0x650
>   kswapd+0x21d/0x4d0
>   ? do_wait_intr_irq+0xd0/0xd0
>   ? balance_pgdat+0x650/0x650
>   kthread+0x13a/0x150
>   ? __kthread_bind_mask+0x60/0x60
>   ret_from_fork+0x22/0x30

I'm probably responsible for this. Need to double check why we try to 
allocate memory while freeing some.

> But the most unpleasant thing is that after a while the monitor turns
> off and does not go on again until the restart.
> This is accompanied by an entry in the kernel log:
>
> amdgpu 0000:0b:00.0: amdgpu: 00000000ff7d8b94 pin failed
> [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
> framebuffer with error -12

-12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by the 
problem above, maybe something completely unrelated.

I will take a look.

Thanks,
Christian.

>
> $ grep "Failed to pin framebuffer with error" -Rn .
> ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816:
> DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
>
> $ git blame -L 5811,5821 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> Blaming lines:   0% (11/9167), done.
> 5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811)
>   domain = AMDGPU_GEM_DOMAIN_VRAM;
> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5812)
> 7b7c6c81b3a37 (Junwei Zhang    2018-06-25 12:51:14 +0800 5813)  r =
> amdgpu_bo_pin(rbo, domain);
> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5814)  if
> (unlikely(r != 0)) {
> 30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5815)
>   if (r != -ERESTARTSYS)
> 30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5816)
>           DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
> 0f257b09531b4 (Chunming Zhou   2019-05-07 19:45:31 +0800 5817)
>   ttm_eu_backoff_reservation(&ticket, &list);
> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5818)
>   return r;
> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5819)  }
> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5820)
> bb812f1ea87dd (Junwei Zhang    2018-06-25 13:32:24 +0800 5821)  r =
> amdgpu_ttm_alloc_gart(&rbo->tbo);
>
> Who knows how to fix it?
>
> Full kernel logs is here:
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FfLasjDHX&data=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Uj9Ob3lUCAsH8NrxC715zSfl5Yqc44ySVo%2FZkdyTpCM%3D&reserved=0
> [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fg3wR2r9e&data=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=u8irMU3i8c37W5SkyiaAi%2FtwMoPorezm3NI1EYI3csE%3D&reserved=0
>
> --
> Best Regards,
> Mike Gavrilov.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-11  9:03 ` Christian König
@ 2021-01-11 14:01   ` Christian König
  2021-01-11 19:23     ` Mikhail Gavrilov
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2021-01-11 14:01 UTC (permalink / raw)
  To: Mikhail Gavrilov, amd-gfx list, dri-devel,
	Linux List Kernel Mailing, Harry Wentland, Deucher, Alexander

Am 11.01.21 um 10:03 schrieb Christian König:
> Hi Mikhail
>
> Am 10.01.21 um 23:26 schrieb Mikhail Gavrilov:
>> Hi folks,
>> today I joined to testing Kernel 5.11 and saw that the kernel log was
>> flooded with BUG messages:
>> BUG: sleeping function called from invalid context at mm/vmalloc.c:1756
>> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 266, name: 
>> kswapd0
>> INFO: lockdep is turned off.
>> CPU: 15 PID: 266 Comm: kswapd0 Tainted: G        W ---------
>> ---  5.11.0-0.rc2.20210108gitf5e6c330254a.119.fc34.x86_64 #1
>> Hardware name: System manufacturer System Product Name/ROG STRIX
>> X570-I GAMING, BIOS 2802 10/21/2020
>> Call Trace:
>>   dump_stack+0x8b/0xb0
>>   ___might_sleep.cold+0xb6/0xc6
>>   vm_unmap_aliases+0x21/0x40
>>   change_page_attr_set_clr+0x9e/0x190
>>   set_memory_wb+0x2f/0x80
>>   ttm_pool_free_page+0x28/0x90 [ttm]
>>   ttm_pool_shrink+0x45/0xb0 [ttm]
>>   ttm_pool_shrinker_scan+0xa/0x20 [ttm]
>>   do_shrink_slab+0x177/0x3a0
>>   shrink_slab+0x9c/0x290
>>   shrink_node+0x2e6/0x700
>>   balance_pgdat+0x2f5/0x650
>>   kswapd+0x21d/0x4d0
>>   ? do_wait_intr_irq+0xd0/0xd0
>>   ? balance_pgdat+0x650/0x650
>>   kthread+0x13a/0x150
>>   ? __kthread_bind_mask+0x60/0x60
>>   ret_from_fork+0x22/0x30
>
> I'm probably responsible for this. Need to double check why we try to 
> allocate memory while freeing some.

Changing the page table attributes while releasing memory might sleep. 
So we can't use a spinlock here.

Thanks for the report, a patch to fix this is on the mailing list now.

>> But the most unpleasant thing is that after a while the monitor turns
>> off and does not go on again until the restart.
>> This is accompanied by an entry in the kernel log:
>>
>> amdgpu 0000:0b:00.0: amdgpu: 00000000ff7d8b94 pin failed
>> [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
>> framebuffer with error -12
>
> -12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by 
> the problem above, maybe something completely unrelated.
>
> I will take a look.

The looks like a completely unrelated memory leak to me.

Probably best if you open up a bug report for this.

Thanks,
Christian.

>
> Thanks,
> Christian.
>
>>
>> $ grep "Failed to pin framebuffer with error" -Rn .
>> ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:5816:
>> DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
>>
>> $ git blame -L 5811,5821 
>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> Blaming lines:   0% (11/9167), done.
>> 5d43be0ccbc2f (Christian König 2017-10-26 18:06:23 +0200 5811)
>>   domain = AMDGPU_GEM_DOMAIN_VRAM;
>> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5812)
>> 7b7c6c81b3a37 (Junwei Zhang    2018-06-25 12:51:14 +0800 5813) r =
>> amdgpu_bo_pin(rbo, domain);
>> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5814) if
>> (unlikely(r != 0)) {
>> 30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5815)
>>   if (r != -ERESTARTSYS)
>> 30b7c6147d18d (Harry Wentland  2017-10-26 15:35:14 -0400 5816)
>>           DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
>> 0f257b09531b4 (Chunming Zhou   2019-05-07 19:45:31 +0800 5817)
>>   ttm_eu_backoff_reservation(&ticket, &list);
>> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5818)
>>   return r;
>> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5819) }
>> e7b07ceef2a65 (Harry Wentland  2017-08-10 13:29:07 -0400 5820)
>> bb812f1ea87dd (Junwei Zhang    2018-06-25 13:32:24 +0800 5821) r =
>> amdgpu_ttm_alloc_gart(&rbo->tbo);
>>
>> Who knows how to fix it?
>>
>> Full kernel logs is here:
>> [1] 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FfLasjDHX&data=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Uj9Ob3lUCAsH8NrxC715zSfl5Yqc44ySVo%2FZkdyTpCM%3D&reserved=0
>> [2] 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fg3wR2r9e&data=04%7C01%7Cchristian.koenig%40amd.com%7C15ef83e462e049429be208d8b5b6c6bb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459143942981908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=u8irMU3i8c37W5SkyiaAi%2FtwMoPorezm3NI1EYI3csE%3D&reserved=0
>>
>> -- 
>> Best Regards,
>> Mike Gavrilov.
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-11 14:01   ` Christian König
@ 2021-01-11 19:23     ` Mikhail Gavrilov
  2021-01-11 20:45       ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2021-01-11 19:23 UTC (permalink / raw)
  To: Christian König
  Cc: amd-gfx list, dri-devel, Linux List Kernel Mailing,
	Harry Wentland, Deucher, Alexander

On Mon, 11 Jan 2021 at 19:01, Christian König <christian.koenig@amd.com> wrote:

> Changing the page table attributes while releasing memory might sleep.
> So we can't use a spinlock here.
>
> Thanks for the report, a patch to fix this is on the mailing list now.

Can you look also the first trace?
Here a same error message "sleeping function called from invalid
context" and a lot of [amdgpu] code.

BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 501, name: systemd-udevd
1 lock held by systemd-udevd/501:
 #0: ffff978e0278d258 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0x3b/0xb0
CPU: 25 PID: 501 Comm: systemd-udevd Not tainted
5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0x8b/0xb0
 ___might_sleep.cold+0xb6/0xc6
 ? dcn30_clock_source_create+0x34/0xb0 [amdgpu]
 kmem_cache_alloc_trace+0x204/0x230
 dcn30_clock_source_create+0x34/0xb0 [amdgpu]
 dcn30_create_resource_pool+0x1d9/0x13a0 [amdgpu]
 ? rcu_read_lock_sched_held+0x3f/0x80
 ? trace_kmalloc+0xb2/0xe0
 ? __kmalloc+0x191/0x280
 ? dc_create_resource_pool+0x110/0x1d0 [amdgpu]
 dc_create_resource_pool+0x110/0x1d0 [amdgpu]
 dc_create+0x205/0x790 [amdgpu]
 ? trace_kmalloc+0xb2/0xe0
 ? kmem_cache_alloc_trace+0x174/0x230
 amdgpu_dm_init.isra.0+0x1b9/0x250 [amdgpu]
 ? dev_vprintk_emit+0x171/0x195
 ? dev_printk_emit+0x3e/0x40
 dm_hw_init+0xe/0x20 [amdgpu]
 amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
 ? pci_conf1_read+0xa4/0x100
 amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
 amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
 local_pci_probe+0x42/0x80
 pci_device_probe+0xd9/0x1a0
 really_probe+0x205/0x460
 driver_probe_device+0xe1/0x150
 device_driver_attach+0xa8/0xb0
 __driver_attach+0x8c/0x150
 ? device_driver_attach+0xb0/0xb0
 ? device_driver_attach+0xb0/0xb0
 bus_for_each_dev+0x67/0x90
 bus_add_driver+0x12e/0x1f0
 driver_register+0x8f/0xe0
 ? 0xffffffffc0d9c000
 do_one_initcall+0x67/0x320
 ? rcu_read_lock_sched_held+0x3f/0x80
 ? trace_kmalloc+0xb2/0xe0
 ? kmem_cache_alloc_trace+0x174/0x230
 do_init_module+0x5c/0x270
 __do_sys_init_module+0x130/0x190
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f363661deee
Code: 48 8b 0d 85 1f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffeb7191588 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 0000561b94563170 RCX: 00007f363661deee
RDX: 0000561b94579df0 RSI: 0000000000b8a356 RDI: 00007f3633b9e010
RBP: 00007f3633b9e010 R08: 0000561b94565240 R09: 00007ffeb718d786
R10: 0000561ef5ef1595 R11: 0000000000000246 R12: 0000561b94579df0
R13: 0000561b9457a3e0 R14: 0000000000000000 R15: 0000561b94576530
[drm] Display Core initialized with v3.2.116!
[drm] DMUB hardware initialized: version=0x02000001
usb 1-3.2: new high-speed USB device number 5 using xhci_hcd
[drm] REG_WAIT timeout 1us * 100000 tries - mpc2_assert_idle_mpcc line:480

> > -12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by
> > the problem above, maybe something completely unrelated.
> >
> > I will take a look.
>
> The looks like a completely unrelated memory leak to me.
>
> Probably best if you open up a bug report for this.

Yes, the monitor still turns off after applying patch "make the pool
shrinker lock a mutex".
Anyway patch fixed the issue with flood of message "BUG: sleeping
function called from invalid context at mm/vmalloc.c:1756" so kernel
log became cleaner.
Now the issue with turns off monitor looks in logs so:

DMA-API: cacheline tracking ENOMEM, dma-debug disabled
amdgpu 0000:0b:00.0: amdgpu: 000000006b791523 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
BUG: kernel NULL pointer dereference, address: 0000000000000060
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 20 PID: 3780 Comm: brave:cs0 Tainted: G        W        ---------
---  5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
RIP: 0010:ttm_tt_swapin+0x34/0x1b0 [ttm]
Code: 55 41 54 55 53 48 83 ec 10 48 8b 47 20 48 89 44 24 08 48 85 c0
0f 84 86 01 00 00 48 8b 44 24 08 49 89 fc 4c 8b a8 e0 01 00 00 <41> 8b
45 60 89 44 24 04 8b 47 0c 85 c0 0f 84 df 00 00 00 31 db 65
RSP: 0018:ffffa7400532b9c0 EFLAGS: 00010286
RAX: ffff978e2ae25800 RBX: ffff97910ec12058 RCX: ffff978e12caac70
RDX: 0000000080000010 RSI: 0000000000000000 RDI: ffff97912c3d99c0
RBP: ffff97912c3d99c0 R08: 0000000000000000 R09: 0000000070b3a000
R10: 0000000000000002 R11: 0000000000000000 R12: ffff97912c3d99c0
R13: 0000000000000000 R14: ffffa7400532ba90 R15: ffff978e182c6350
FS:  00007f070bb1b640(0000) GS:ffff979509200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 00000001f0cd2000 CR4: 0000000000350ee0
Call Trace:
 ttm_tt_populate+0xa9/0xe0 [ttm]
 ttm_bo_handle_move_mem+0x142/0x180 [ttm]
 ttm_bo_validate+0x12e/0x1c0 [ttm]
 amdgpu_cs_bo_validate+0x82/0x190 [amdgpu]
 amdgpu_cs_list_validate+0x105/0x150 [amdgpu]
 amdgpu_cs_ioctl+0x80a/0x1f10 [amdgpu]
 ? trace_hardirqs_off_caller+0x21/0xd0
 ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
 drm_ioctl_kernel+0x8c/0xe0 [drm]
 drm_ioctl+0x20f/0x3c0 [drm]
 ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
 ? selinux_file_ioctl+0x147/0x200
 ? lock_acquired+0x1fa/0x380
 ? lock_release+0x1e9/0x400
 ? trace_hardirqs_on+0x1b/0xe0
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 __x64_sys_ioctl+0x82/0xb0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f0725633f8b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c
c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d b5 be 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f070bb19ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f070bb19f40 RCX: 00007f0725633f8b
RDX: 00007f070bb19f40 RSI: 00000000c0186444 RDI: 000000000000001b
RBP: 00000000c0186444 R08: 00007f070bb1a540 R09: 00007f070bb19f20
R10: 0000000000000000 R11: 0000000000000246 R12: 00002b89a7bdb088
R13: 000000000000001b R14: 0000000000000000 R15: 00000000fffffffd
Modules linked in: snd_seq_dummy snd_hrtimer uinput rfcomm nft_objref
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
iptable_filter cmac bnep zstd sunrpc vfat fat uas usb_storage
hid_logitech_hidpp hid_logitech_dj mt76x2u mt76x2_common mt76x02_usb
mt76_usb mt76x02_lib gspca_zc3xx mt76 gspca_main snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel
snd_intel_dspcfg intel_rapl_msr soundwire_intel joydev
intel_rapl_common soundwire_generic_allocation iwlmvm snd_soc_core
uvcvideo edac_mce_amd videobuf2_vmalloc videobuf2_memops snd_compress
snd_usb_audio kvm_amd videobuf2_v4l2 snd_pcm_dmaengine
 snd_usbmidi_lib soundwire_cadence videobuf2_common btusb mac80211
snd_rawmidi snd_hda_codec videodev kvm snd_hda_core ac97_bus snd_hwdep
btrtl libarc4 snd_seq btbcm btintel snd_seq_device irqbypass xpad
bluetooth mc snd_pcm iwlwifi rapl ff_memless eeepc_wmi asus_wmi
snd_timer sparse_keymap ecdh_generic video wmi_bmof ecc pcspkr snd
sp5100_tco cfg80211 k10temp soundcore i2c_piix4 rfkill acpi_cpufreq
binfmt_misc ip_tables amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched
drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel cec drm
ghash_clmulni_intel ccp igb nvme dca i2c_algo_bit xhci_pci nvme_core
xhci_pci_renesas wmi pinctrl_amd fuse
CR2: 0000000000000060
---[ end trace b0dd767146d85401 ]---
RIP: 0010:ttm_tt_swapin+0x34/0x1b0 [ttm]
Code: 55 41 54 55 53 48 83 ec 10 48 8b 47 20 48 89 44 24 08 48 85 c0
0f 84 86 01 00 00 48 8b 44 24 08 49 89 fc 4c 8b a8 e0 01 00 00 <41> 8b
45 60 89 44 24 04 8b 47 0c 85 c0 0f 84 df 00 00 00 31 db 65
RSP: 0018:ffffa7400532b9c0 EFLAGS: 00010286
RAX: ffff978e2ae25800 RBX: ffff97910ec12058 RCX: ffff978e12caac70
RDX: 0000000080000010 RSI: 0000000000000000 RDI: ffff97912c3d99c0
RBP: ffff97912c3d99c0 R08: 0000000000000000 R09: 0000000070b3a000
R10: 0000000000000002 R11: 0000000000000000 R12: ffff97912c3d99c0
R13: 0000000000000000 R14: ffffa7400532ba90 R15: ffff978e182c6350
FS:  00007f070bb1b640(0000) GS:ffff979509200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 00000001f0cd2000 CR4: 0000000000350ee0
BUG: sleeping function called from invalid context at
include/linux/percpu-rwsem.h:49
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 3780, name: brave:cs0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last  enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff8c0d9abb>] copy_process+0x8fb/0x1de0
softirqs last  enabled at (0): [<ffffffff8c0d9abb>] copy_process+0x8fb/0x1de0
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 20 PID: 3780 Comm: brave:cs0 Tainted: G      D W        ---------
---  5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0x8b/0xb0
 ___might_sleep.cold+0xb6/0xc6
 exit_signals+0x1c/0x2d0
 do_exit+0xcd/0xc20
 ? __x64_sys_ioctl+0x82/0xb0
 rewind_stack_do_exit+0x17/0x20
RIP: 0033:0x7f0725633f8b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c
c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d b5 be 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f070bb19ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f070bb19f40 RCX: 00007f0725633f8b
RDX: 00007f070bb19f40 RSI: 00000000c0186444 RDI: 000000000000001b
RBP: 00000000c0186444 R08: 00007f070bb1a540 R09: 00007f070bb19f20
R10: 0000000000000000 R11: 0000000000000246 R12: 00002b89a7bdb088
R13: 000000000000001b R14: 0000000000000000 R15: 00000000fffffffd
GpuWatchdog[3635]: segfault at 0 ip 000055a8db6e3429 sp
00007fc593e4d420 error 6 in gitkraken[55a8d7d97000+5cb7000]
Code: 00 79 09 48 8b 7d c0 e8 85 f6 bd fe c7 45 c0 aa aa aa aa 0f ae
f0 41 8b 84 24 e0 00 00 00 89 45 c0 48 8d 7d c0 e8 e7 96 6b fc <c7> 04
25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e

You said that I need open up a bug report you means site
https://bugzilla.kernel.org ?
I thought mailing lists is better because bug report on
bugzilla.kernel.org usually leave opened for several years without
attention.

Full kernel logs is here:
[1] https://pastebin.com/w64H4b8w

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-11 19:23     ` Mikhail Gavrilov
@ 2021-01-11 20:45       ` Christian König
  2021-01-11 21:51         ` Mikhail Gavrilov
  2021-01-14  0:22         ` Mikhail Gavrilov
  0 siblings, 2 replies; 13+ messages in thread
From: Christian König @ 2021-01-11 20:45 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: amd-gfx list, dri-devel, Linux List Kernel Mailing,
	Harry Wentland, Deucher, Alexander

Hi Mike,

Am 11.01.21 um 20:23 schrieb Mikhail Gavrilov:
> On Mon, 11 Jan 2021 at 19:01, Christian König <christian.koenig@amd.com> wrote:
>
>> Changing the page table attributes while releasing memory might sleep.
>> So we can't use a spinlock here.
>>
>> Thanks for the report, a patch to fix this is on the mailing list now.
> Can you look also the first trace?

Unfortunately not, that's DC stuff. Easiest is to assign this as a bug 
tracker to our DC team.

> Here a same error message "sleeping function called from invalid
> context" and a lot of [amdgpu] code.

[SNIP]

>>> -12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by
>>> the problem above, maybe something completely unrelated.
>>>
>>> I will take a look.
>> The looks like a completely unrelated memory leak to me.
>>
>> Probably best if you open up a bug report for this.
> Yes, the monitor still turns off after applying patch "make the pool
> shrinker lock a mutex".
> Anyway patch fixed the issue with flood of message "BUG: sleeping
> function called from invalid context at mm/vmalloc.c:1756" so kernel
> log became cleaner.

At least some progress. Any objections that I add your e-mail address as 
tested-by tag?

> Now the issue with turns off monitor looks in logs so:
>
> DMA-API: cacheline tracking ENOMEM, dma-debug disabled
> amdgpu 0000:0b:00.0: amdgpu: 000000006b791523 pin failed
> [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
> framebuffer with error -12
> BUG: kernel NULL pointer dereference, address: 0000000000000060
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP NOPTI
> CPU: 20 PID: 3780 Comm: brave:cs0 Tainted: G        W        ---------
> ---  5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 2802 10/21/2020
> RIP: 0010:ttm_tt_swapin+0x34/0x1b0 [ttm]
> Code: 55 41 54 55 53 48 83 ec 10 48 8b 47 20 48 89 44 24 08 48 85 c0
> 0f 84 86 01 00 00 48 8b 44 24 08 49 89 fc 4c 8b a8 e0 01 00 00 <41> 8b
> 45 60 89 44 24 04 8b 47 0c 85 c0 0f 84 df 00 00 00 31 db 65
> RSP: 0018:ffffa7400532b9c0 EFLAGS: 00010286
> RAX: ffff978e2ae25800 RBX: ffff97910ec12058 RCX: ffff978e12caac70
> RDX: 0000000080000010 RSI: 0000000000000000 RDI: ffff97912c3d99c0
> RBP: ffff97912c3d99c0 R08: 0000000000000000 R09: 0000000070b3a000
> R10: 0000000000000002 R11: 0000000000000000 R12: ffff97912c3d99c0
> R13: 0000000000000000 R14: ffffa7400532ba90 R15: ffff978e182c6350
> FS:  00007f070bb1b640(0000) GS:ffff979509200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000060 CR3: 00000001f0cd2000 CR4: 0000000000350ee0
> Call Trace:
>   ttm_tt_populate+0xa9/0xe0 [ttm]
>   ttm_bo_handle_move_mem+0x142/0x180 [ttm]
>   ttm_bo_validate+0x12e/0x1c0 [ttm]

I can take a look at this one here. Looks like some missing error 
handling when allocating memory.

Can you decode to which line number ttm_tt_swapin+0x34 points to?

[SNIP]

> You said that I need open up a bug report you means site
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C75040f5053404b0f302b08d8b666769b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459898491581880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IbkSfHK%2BD13OCcYMg%2BlNsZixi9gDEQEfS7Mxyf7vGdM%3D&amp;reserved=0 ?
> I thought mailing lists is better because bug report on
> bugzilla.kernel.org usually leave opened for several years without
> attention.

Please use this one here: 
https://gitlab.freedesktop.org/drm/amd/-/issues/new

If you can't find the DC guys of hand in the assignee list just assign 
to me and I will forward.

But what you have in your logs so far are only unrelated symptoms, the 
root of the problem is that somebody is leaking memory.

What you could do as well is to try to enable kmemleak and maybe try 
some bleeding edge branch like drm-misc-fixes or Alex 
amd-staging-drm-next branch.

Thanks for the help,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-11 20:45       ` Christian König
@ 2021-01-11 21:51         ` Mikhail Gavrilov
  2021-01-14  0:22         ` Mikhail Gavrilov
  1 sibling, 0 replies; 13+ messages in thread
From: Mikhail Gavrilov @ 2021-01-11 21:51 UTC (permalink / raw)
  To: Christian König
  Cc: amd-gfx list, dri-devel, Linux List Kernel Mailing,
	Harry Wentland, Deucher, Alexander

Hi Christian,

On Tue, 12 Jan 2021 at 01:45, Christian König <christian.koenig@amd.com> wrote:
>
> Hi Mike,
>
> Unfortunately not, that's DC stuff. Easiest is to assign this as a bug
> tracker to our DC team.
Ok

> At least some progress. Any objections that I add your e-mail address as
> tested-by tag?
Yes, feel free add me.

> I can take a look at this one here. Looks like some missing error
> handling when allocating memory.
> Can you decode to which line number ttm_tt_swapin+0x34 points to?
$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname
-r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_tt_swapin+0x34
ttm_tt_swapin+0x34/0xd0:
mapping_gfp_mask at
/usr/src/debug/kernel-20210108gitf5e6c330254a/linux-5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64/./include/linux/pagemap.h:105
(discriminator 2)
(inlined by) ttm_tt_swapin at
/usr/src/debug/kernel-20210108gitf5e6c330254a/linux-5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64/drivers/gpu/drm/ttm/ttm_tt.c:210
(discriminator 2)

$ cat -s -n /usr/src/debug/kernel-20210108gitf5e6c330254a/linux-5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64/drivers/gpu/drm/ttm/ttm_tt.c
| head -220 | tail -20
   201      struct page *from_page;
   202      struct page *to_page;
   203      gfp_t gfp_mask;
   204      int i, ret;
   205
   206      swap_storage = ttm->swap_storage;
   207      BUG_ON(swap_storage == NULL);
   208
   209      swap_space = swap_storage->f_mapping;
   210      gfp_mask = mapping_gfp_mask(swap_space);
   211
   212      for (i = 0; i < ttm->num_pages; ++i) {
   213          from_page = shmem_read_mapping_page_gfp(swap_space, i,
   214                              gfp_mask);
   215          if (IS_ERR(from_page)) {
   216              ret = PTR_ERR(from_page);
   217              goto out_err;
   218          }
   219          to_page = ttm->pages[i];
   220          if (unlikely(to_page == NULL)) {

> Please use this one here:
> https://gitlab.freedesktop.org/drm/amd/-/issues/new
>
> If you can't find the DC guys of hand in the assignee list just assign
> to me and I will forward.
https://gitlab.freedesktop.org/drm/amd/-/issues/1439
Ok, let's continue there.

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-11 20:45       ` Christian König
  2021-01-11 21:51         ` Mikhail Gavrilov
@ 2021-01-14  0:22         ` Mikhail Gavrilov
  2021-01-14 13:56           ` Christian König
  1 sibling, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2021-01-14  0:22 UTC (permalink / raw)
  To: Christian König
  Cc: amd-gfx list, dri-devel, Linux List Kernel Mailing,
	Harry Wentland, Deucher, Alexander

On Tue, 12 Jan 2021 at 01:45, Christian König <christian.koenig@amd.com> wrote:
>
> But what you have in your logs so far are only unrelated symptoms, the
> root of the problem is that somebody is leaking memory.
>
> What you could do as well is to try to enable kmemleak

I captured some memleaks.
Do they contain any useful information?

[1] https://pastebin.com/n0FE7Hsu
[2] https://pastebin.com/MUX55L1k
[3] https://pastebin.com/a3FT7DVG
[4] https://pastebin.com/1ALvJKz7

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-14  0:22         ` Mikhail Gavrilov
@ 2021-01-14 13:56           ` Christian König
  2021-01-14 14:06             ` Daniel Vetter
  2021-01-14 22:43             ` Mikhail Gavrilov
  0 siblings, 2 replies; 13+ messages in thread
From: Christian König @ 2021-01-14 13:56 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Deucher, Alexander, Harry Wentland, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

Am 14.01.21 um 01:22 schrieb Mikhail Gavrilov:
> On Tue, 12 Jan 2021 at 01:45, Christian König <christian.koenig@amd.com> wrote:
>> But what you have in your logs so far are only unrelated symptoms, the
>> root of the problem is that somebody is leaking memory.
>>
>> What you could do as well is to try to enable kmemleak
> I captured some memleaks.
> Do they contain any useful information?

Unfortunately not of hand.

I also don't see any bug reports from other people and can't reproduce 
the last backtrace you send out TTM here.

Do you have any local modifications or special setup in your system? 
Like bpf scripts or something like that?

Christian.

>
> [1] https://pastebin.com/n0FE7Hsu
> [2] https://pastebin.com/MUX55L1k
> [3] https://pastebin.com/a3FT7DVG
> [4] https://pastebin.com/1ALvJKz7
>
> --
> Best Regards,
> Mike Gavrilov.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-14 13:56           ` Christian König
@ 2021-01-14 14:06             ` Daniel Vetter
  2021-01-14 22:43             ` Mikhail Gavrilov
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Vetter @ 2021-01-14 14:06 UTC (permalink / raw)
  To: Christian König
  Cc: Mikhail Gavrilov, Deucher, Alexander, amd-gfx list, dri-devel,
	Linux List Kernel Mailing

On Thu, Jan 14, 2021 at 2:56 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 14.01.21 um 01:22 schrieb Mikhail Gavrilov:
> > On Tue, 12 Jan 2021 at 01:45, Christian König <christian.koenig@amd.com> wrote:
> >> But what you have in your logs so far are only unrelated symptoms, the
> >> root of the problem is that somebody is leaking memory.
> >>
> >> What you could do as well is to try to enable kmemleak
> > I captured some memleaks.
> > Do they contain any useful information?
>
> Unfortunately not of hand.
>
> I also don't see any bug reports from other people and can't reproduce
> the last backtrace you send out TTM here.
>
> Do you have any local modifications or special setup in your system?
> Like bpf scripts or something like that?

There's another bug report (for rcar-du, bisected to the a switch to
use more cma helpers) about leaking mmaps, which keeps too many fb
alive, so maybe we have gained a refcount leak somewhere recently. But
could also be totally unrelated.
-Daniel



>
> Christian.
>
> >
> > [1] https://pastebin.com/n0FE7Hsu
> > [2] https://pastebin.com/MUX55L1k
> > [3] https://pastebin.com/a3FT7DVG
> > [4] https://pastebin.com/1ALvJKz7
> >
> > --
> > Best Regards,
> > Mike Gavrilov.
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-14 13:56           ` Christian König
  2021-01-14 14:06             ` Daniel Vetter
@ 2021-01-14 22:43             ` Mikhail Gavrilov
  2021-01-20  0:59               ` Mikhail Gavrilov
  1 sibling, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2021-01-14 22:43 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Harry Wentland, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

On Thu, 14 Jan 2021 at 18:56, Christian König <christian.koenig@amd.com> wrote:
> Unfortunately not of hand.
>
> I also don't see any bug reports from other people and can't reproduce
> the last backtrace you send out TTM here.

Because only the most desperate will install kernels with enabled
debug flags and then load the system by opening a huge number of
programs and tabs. So you shouldn't be surprised that I'm the only one
here.
This is what my desktop looks like every day: https://imgur.com/a/Kxlmrem

> Do you have any local modifications or special setup in your system?
> Like bpf scripts or something like that?

No, my I didn't write any bpf scripts, but looks like my distribution
Fedora Rawhide uses some bpf scripts by default out of box:

# bpftool prog
20: cgroup_device  tag 40ddf486530245f5  gpl
    loaded_at 2021-01-15T01:30:04+0500  uid 0
    xlated 504B  jited 309B  memlock 4096B
21: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:04+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
22: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:04+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
23: cgroup_device  tag ca8e50a3c7fb034b  gpl
    loaded_at 2021-01-15T01:30:05+0500  uid 0
    xlated 496B  jited 307B  memlock 4096B
24: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:05+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
25: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:05+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
26: cgroup_device  tag be31ae23198a0378  gpl
    loaded_at 2021-01-15T01:30:13+0500  uid 0
    xlated 464B  jited 288B  memlock 4096B
27: cgroup_device  tag ee0e253c78993a24  gpl
    loaded_at 2021-01-15T01:30:13+0500  uid 0
    xlated 416B  jited 255B  memlock 4096B
28: cgroup_device  tag 438c5618576e5b0c  gpl
    loaded_at 2021-01-15T01:30:13+0500  uid 0
    xlated 568B  jited 354B  memlock 4096B
29: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:13+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
30: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:13+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
31: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:13+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
32: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:13+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
33: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:14+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
34: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:14+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
35: cgroup_device  tag ee0e253c78993a24  gpl
    loaded_at 2021-01-15T01:30:14+0500  uid 0
    xlated 416B  jited 255B  memlock 4096B
38: cgroup_device  tag 3a0ef5414c2f6fca  gpl
    loaded_at 2021-01-15T01:30:14+0500  uid 0
    xlated 744B  jited 447B  memlock 4096B
39: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:14+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
40: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:14+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
41: cgroup_device  tag ee0e253c78993a24  gpl
    loaded_at 2021-01-15T01:30:18+0500  uid 0
    xlated 416B  jited 255B  memlock 4096B
42: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:18+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B
43: cgroup_skb  tag 6deef7357e7b4530  gpl
    loaded_at 2021-01-15T01:30:18+0500  uid 0
    xlated 64B  jited 54B  memlock 4096B

I catched yet another couples of leaks , but nothing new:
https://pastebin.com/2EgvYJdz

[1] do_detailed_mode+0x7c1/0x13d0 [drm]
[2] drm_mode_duplicate+0x45/0x220 [drm]
[3] do_seccomp+0x215/0x2280
[4] __vmalloc_node_range+0x464/0x7b0
[5] bpf_prog_alloc_no_stats+0xa2/0x2b0
[6] bpf_prog_store_orig_filter+0x7b/0x1c0
[7] kmemdup+0x1a/0x40

Did the following trace message confuse anyone?
==================================================================
BUG: KASAN: slab-out-of-bounds in
kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
Read of size 1 at addr ffff88812a6b4181 by task systemd-udevd/491

CPU: 20 PID: 491 Comm: systemd-udevd Not tainted
5.11.0-0.rc3.20210114git65f0d2414b70.125.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0xae/0xe5
 print_address_description.constprop.0+0x18/0x160
 ? kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
 kasan_report.cold+0x7f/0x10e
 ? kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
 kfd_create_crat_image_virtual+0x12d2/0x1380 [amdgpu]
 ? kfd_create_crat_image_acpi+0x340/0x340 [amdgpu]
 ? __raw_spin_lock_init+0x39/0x110
 kfd_topology_init+0x2ac/0x400 [amdgpu]
 ? kfd_create_topology_device+0x320/0x320 [amdgpu]
 ? __class_register+0x2ad/0x430
 ? __class_create+0xc5/0x130
 kgd2kfd_init+0x95/0xf0 [amdgpu]
 amdgpu_amdkfd_init+0x7f/0xb0 [amdgpu]
 ? smuio_v11_0_update_rom_clock_gating+0x1d0/0x1d0 [amdgpu]
 ? record_print_text.cold+0x11/0x11
 ? kmem_cache_create_usercopy+0x25c/0x310
 amdgpu_init+0x59/0x1000 [amdgpu]
 ? 0xffffffffc1f12000
 do_one_initcall+0xfb/0x530
 ? perf_trace_initcall_level+0x3d0/0x3d0
 ? __memset+0x29/0x30
 ? unpoison_range+0x3a/0x60
 do_init_module+0x1ce/0x7a0
 load_module+0x9841/0xa380
 ? module_frob_arch_sections+0x20/0x20
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? sched_clock_cpu+0x18/0x170
 ? irqtime_account_irq+0x44/0x1e0
 ? sched_clock+0x5/0x10
 ? lock_acquire+0x2dd/0x7a0
 ? sched_clock+0x5/0x10
 ? lock_is_held_type+0xb8/0xf0
 ? __do_sys_init_module+0x18b/0x220
 __do_sys_init_module+0x18b/0x220
 ? load_module+0xa380/0xa380
 ? ktime_get_coarse_real_ts64+0x12f/0x160
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc22aecaeee
Code: 48 8b 0d 85 1f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc62d60e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 0000560489080060 RCX: 00007fc22aecaeee
RDX: 0000560489080f70 RSI: 0000000001e2e8f6 RDI: 00007fc226471010
RBP: 00007fc226471010 R08: 000056048907d470 R09: 00007ffc62d5d606
R10: 00005601e94f449d R11: 0000000000000246 R12: 0000560489080f70
R13: 000056048907c9b0 R14: 0000000000000000 R15: 00005604890814e0

Allocated by task 491:
 kasan_save_stack+0x1b/0x40
 ____kasan_kmalloc.constprop.0+0x84/0xa0
 kfd_create_crat_image_virtual+0x13b/0x1380 [amdgpu]
 kfd_topology_init+0x2ac/0x400 [amdgpu]
 kgd2kfd_init+0x95/0xf0 [amdgpu]
 amdgpu_amdkfd_init+0x7f/0xb0 [amdgpu]
 amdgpu_init+0x59/0x1000 [amdgpu]
 do_one_initcall+0xfb/0x530
 do_init_module+0x1ce/0x7a0
 load_module+0x9841/0xa380
 __do_sys_init_module+0x18b/0x220
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88812a6b4100
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 1 bytes to the right of
 128-byte region [ffff88812a6b4100, ffff88812a6b4180)
The buggy address belongs to the page:
page:00000000edb67e0c refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x12a6b4
flags: 0x17ffffc0000200(slab)
raw: 0017ffffc0000200 ffffea000406a140 0000000500000005 ffff888100041640
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88812a6b4080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88812a6b4100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88812a6b4180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffff88812a6b4200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88812a6b4280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint


Full kernel log: https://pastebin.com/bUiXRVYw
Kernel build options: https://pastebin.com/v3zsC03i

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-14 22:43             ` Mikhail Gavrilov
@ 2021-01-20  0:59               ` Mikhail Gavrilov
  2021-01-21 13:27                 ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2021-01-20  0:59 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Harry Wentland, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

On Fri, 15 Jan 2021 at 03:43, Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>

In rc4, the number of warnings has dropped dramatically.
No more errors "kasan slab-out-of-bounds" and no "DMA-API device
driver failed to check map error".
But still not fixed "sleeping function called from invalid context at
include/linux/sched/mm.h:196" and "BUG: key ffff88810b0d9148 has not
been registered!"
Second issue Navi specific because it started to happen in 5.10 kernel
after replacing Radeon VII to 6900XT.

1.
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 500, name: systemd-udevd
1 lock held by systemd-udevd/500:
 #0: ffff888107690258 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0xa3/0x250
CPU: 9 PID: 500 Comm: systemd-udevd Not tainted
5.11.0-0.rc4.129.fc34.x86_64+debug #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
 dump_stack+0xae/0xe5
 ___might_sleep.cold+0x150/0x17e
 ? dcn30_clock_source_create+0x53/0x110 [amdgpu]
 kmem_cache_alloc_trace+0x23f/0x270
 dcn30_clock_source_create+0x53/0x110 [amdgpu]
 dcn30_create_resource_pool+0x998/0x4890 [amdgpu]
 ? dcn30_calc_max_scaled_time+0x40/0x40 [amdgpu]
 ? lock_is_held_type+0xb8/0xf0
 ? unpoison_range+0x3a/0x60
 ? ____kasan_kmalloc.constprop.0+0x84/0xa0
 ? dc_create_resource_pool+0x26e/0x5e0 [amdgpu]
 dc_create_resource_pool+0x26e/0x5e0 [amdgpu]
 dc_create+0x636/0x1bc0 [amdgpu]
 ? lock_acquire+0x2dd/0x7a0
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0x18/0x170
 ? find_held_lock+0x33/0x110
 ? dc_create_state+0xa0/0xa0 [amdgpu]
 ? lock_downgrade+0x6b0/0x6b0
 ? module_assert_mutex_or_preempt+0x3e/0x70
 ? lock_is_held_type+0xb8/0xf0
 ? unpoison_range+0x3a/0x60
 ? ____kasan_kmalloc.constprop.0+0x84/0xa0
 amdgpu_dm_init.isra.0+0x479/0x640 [amdgpu]
 ? vprintk_emit+0x1c0/0x460
 ? dev_vprintk_emit+0x2d8/0x31a
 ? sched_clock+0x5/0x10
 ? dm_resume+0x13b0/0x13b0 [amdgpu]
 ? dev_attr_show.cold+0x35/0x35
 ? lock_downgrade+0x6b0/0x6b0
 ? dev_printk_emit+0x8c/0xa8
 ? dev_vprintk_emit+0x31a/0x31a
 ? wait_for_completion_io+0x240/0x240
 ? __dev_printk+0x71/0xdf
 ? smu_hw_init.cold+0x16b/0x18a [amdgpu]
 ? smu_suspend+0x240/0x240 [amdgpu]
 ? navi10_ih_irq_init+0xea3/0x2420 [amdgpu]
 dm_hw_init+0xe/0x20 [amdgpu]
 amdgpu_device_init.cold+0x3031/0x4940 [amdgpu]
 ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
 ? pci_bus_read_config_byte+0x140/0x140
 ? do_pci_enable_device+0x1f8/0x260
 ? pci_find_saved_ext_cap+0x110/0x110
 ? pci_enable_bridge+0xf9/0x1e0
 ? pci_dev_check_d3cold+0x107/0x250
 ? pci_enable_device_flags+0x201/0x340
 amdgpu_driver_load_kms+0x167/0x8a0 [amdgpu]
 amdgpu_pci_probe+0x235/0x360 [amdgpu]
 ? amdgpu_pci_remove+0xd0/0xd0 [amdgpu]
 local_pci_probe+0xd8/0x170
 pci_device_probe+0x318/0x5c0
 ? kernfs_create_link+0x16c/0x230
 ? pci_device_remove+0x1d0/0x1d0
 really_probe+0x224/0xc40
 driver_probe_device+0x1f2/0x380
 device_driver_attach+0x1df/0x250
 __driver_attach+0xf6/0x260
 ? device_driver_attach+0x250/0x250
 bus_for_each_dev+0x114/0x180
 ? subsys_dev_iter_exit+0x10/0x10
 bus_add_driver+0x352/0x570
 driver_register+0x20f/0x390
 ? __pci_register_driver+0x13a/0x210
 ? 0xffffffffc1d8d000
 do_one_initcall+0xfb/0x530
 ? perf_trace_initcall_level+0x3d0/0x3d0
 ? __memset+0x2b/0x30
 ? unpoison_range+0x3a/0x60
 do_init_module+0x1ce/0x7a0
 load_module+0x9841/0xa380
 ? module_frob_arch_sections+0x20/0x20
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? sched_clock_cpu+0x18/0x170
 ? sched_clock+0x5/0x10
 ? lock_acquire+0x2dd/0x7a0
 ? sched_clock+0x5/0x10
 ? lock_is_held_type+0xb8/0xf0
 ? __do_sys_init_module+0x18b/0x220
 __do_sys_init_module+0x18b/0x220
 ? load_module+0xa380/0xa380
 ? ktime_get_coarse_real_ts64+0x12f/0x160
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f2c109da07e
Code: 48 8b 0d f5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d c2 1d 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc84d33f88 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055b87f8260a0 RCX: 00007f2c109da07e
RDX: 000055b87f834060 RSI: 0000000001e2cbf6 RDI: 00007f2c0b7e0010
RBP: 00007f2c0b7e0010 R08: 000055b87f8281e0 R09: 00007ffc84d30a26
R10: 000055bd2404cc18 R11: 0000000000000246 R12: 000055b87f834060
R13: 000055b87f831ca0 R14: 0000000000000000 R15: 000055b87f832640
[drm] Display Core initialized with v3.2.116!
[drm] DMUB hardware initialized: version=0x02000001
usb 1-3.2: Device not responding to setup address.
usb 1-3.2: device not accepting address 5, error -71
[drm] REG_WAIT timeout 1us * 100000 tries - mpc2_assert_idle_mpcc line:480


2.
BUG: key ffff88810b0d9148 has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 25 PID: 500 at kernel/locking/lockdep.c:4618
lockdep_init_map_waits+0x592/0x770
Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 gpu_sched
drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
ghash_clmulni_intel ccp igb nvme dca nvme_core i2c_algo_bit xhci_pci
xhci_pci_renesas wmi pinctrl_amd fuse
CPU: 25 PID: 500 Comm: systemd-udevd Tainted: G        W
--------- ---  5.11.0-0.rc4.129.fc34.x86_64+debug #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
RIP: 0010:lockdep_init_map_waits+0x592/0x770
Code: 08 84 d2 0f 85 d8 01 00 00 8b 3d e1 02 38 04 85 ff 0f 85 7e fc
ff ff 48 c7 c6 e0 04 ca 8e 48 c7 c7 40 fd c9 8e e8 01 8e 23 02 <0f> 0b
e9 64 fc ff ff 48 89 df 44 89 4c 24 0c 44 89 44 24 08 48 89
RSP: 0018:ffffc900029bef88 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000537de7
RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8886f9fe72ab
R10: ffffed10df3fce55 R11: 0000000000000001 R12: ffff88810b0d9148
R13: 0000000000000000 R14: ffffffff8edbda60 R15: ffff88810b0db690
FS:  00007f2c0fdda140(0000) GS:ffff8886f9e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b8800aec68 CR3: 0000000127fd0000 CR4: 0000000000350ee0
Call Trace:
 ? lockdep_hardirqs_on+0x75/0xf0
 __kernfs_create_file+0x102/0x2f0
 sysfs_add_file_mode_ns+0x1af/0x500
 sysfs_create_bin_file+0x100/0x160
 ? lock_is_held_type+0xb8/0xf0
 ? sysfs_add_file_to_group+0x150/0x150
 ? static_obj+0x8a/0xc0
 ? lockdep_init_map_waits+0x2a2/0x770
 hdcp_create_workqueue+0x879/0xb50 [amdgpu]
 amdgpu_dm_init.isra.0.cold+0x7f2/0x374c [amdgpu]
 ? vprintk_emit+0x140/0x460
 ? dev_vprintk_emit+0x2d8/0x31a
 ? sched_clock+0x5/0x10
 ? dm_resume+0x13b0/0x13b0 [amdgpu]
 ? dev_attr_show.cold+0x35/0x35
 ? psp_set_srm+0x250/0x250 [amdgpu]
 ? hdcp_update_display+0x5b0/0x5b0 [amdgpu]
 ? lock_downgrade+0x6b0/0x6b0
 ? dev_printk_emit+0x8c/0xa8
 ? dev_vprintk_emit+0x31a/0x31a
 ? wait_for_completion_io+0x240/0x240
 ? __dev_printk+0x71/0xdf
 ? smu_hw_init.cold+0x16b/0x18a [amdgpu]
 ? smu_suspend+0x240/0x240 [amdgpu]
 ? navi10_ih_irq_init+0xea3/0x2420 [amdgpu]
 dm_hw_init+0xe/0x20 [amdgpu]
 amdgpu_device_init.cold+0x3031/0x4940 [amdgpu]
 ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
 ? pci_bus_read_config_byte+0x140/0x140
 ? do_pci_enable_device+0x1f8/0x260
 ? pci_find_saved_ext_cap+0x110/0x110
 ? pci_enable_bridge+0xf9/0x1e0
 ? pci_dev_check_d3cold+0x107/0x250
 ? pci_enable_device_flags+0x201/0x340
 amdgpu_driver_load_kms+0x167/0x8a0 [amdgpu]
 amdgpu_pci_probe+0x235/0x360 [amdgpu]
 ? amdgpu_pci_remove+0xd0/0xd0 [amdgpu]
 local_pci_probe+0xd8/0x170
 pci_device_probe+0x318/0x5c0
 ? kernfs_create_link+0x16c/0x230
 ? pci_device_remove+0x1d0/0x1d0
 really_probe+0x224/0xc40
 driver_probe_device+0x1f2/0x380
 device_driver_attach+0x1df/0x250
 __driver_attach+0xf6/0x260
 ? device_driver_attach+0x250/0x250
 bus_for_each_dev+0x114/0x180
 ? subsys_dev_iter_exit+0x10/0x10
 bus_add_driver+0x352/0x570
 driver_register+0x20f/0x390
 ? __pci_register_driver+0x13a/0x210
 ? 0xffffffffc1d8d000
 do_one_initcall+0xfb/0x530
 ? perf_trace_initcall_level+0x3d0/0x3d0
 ? __memset+0x2b/0x30
 ? unpoison_range+0x3a/0x60
 do_init_module+0x1ce/0x7a0
 load_module+0x9841/0xa380
 ? module_frob_arch_sections+0x20/0x20
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? sched_clock_cpu+0x18/0x170
 ? sched_clock+0x5/0x10
 ? lock_acquire+0x2dd/0x7a0
 ? sched_clock+0x5/0x10
 ? lock_is_held_type+0xb8/0xf0
 ? __do_sys_init_module+0x18b/0x220
 __do_sys_init_module+0x18b/0x220
 ? load_module+0xa380/0xa380
 ? ktime_get_coarse_real_ts64+0x12f/0x160
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f2c109da07e
Code: 48 8b 0d f5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d c2 1d 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc84d33f88 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055b87f8260a0 RCX: 00007f2c109da07e
RDX: 000055b87f834060 RSI: 0000000001e2cbf6 RDI: 00007f2c0b7e0010
RBP: 00007f2c0b7e0010 R08: 000055b87f8281e0 R09: 00007ffc84d30a26
R10: 000055bd2404cc18 R11: 0000000000000246 R12: 000055b87f834060
R13: 000055b87f831ca0 R14: 0000000000000000 R15: 000055b87f832640
irq event stamp: 593331
hardirqs last  enabled at (593331): [<ffffffff8c3602f0>]
console_unlock+0x7c0/0x9a0
hardirqs last disabled at (593330): [<ffffffff8c3601e8>]
console_unlock+0x6b8/0x9a0
softirqs last  enabled at (593162): [<ffffffff8e801112>]
asm_call_irq_on_stack+0x12/0x20
softirqs last disabled at (593157): [<ffffffff8e801112>]
asm_call_irq_on_stack+0x12/0x20
---[ end trace 37dc3a4a3aa1704a ]---

Issue with the switching off monitor still happens too, but messages
in logs become more detailed:
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
amdgpu 0000:0b:00.0: amdgpu: 0000000087613007 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!

I hope "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the
buffer list -4!" gives an idea of what happened.

Full kernel log is here: https://pastebin.com/nX69zgvf

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-20  0:59               ` Mikhail Gavrilov
@ 2021-01-21 13:27                 ` Christian König
  2021-01-25  5:28                   ` Mikhail Gavrilov
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2021-01-21 13:27 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Deucher, Alexander, Harry Wentland, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

I still have no idea what's going on here.

The KASAN messages from the DC code are completely unrelated.

Please add the full dmesg to your bug report.

Christian.

Am 20.01.21 um 01:59 schrieb Mikhail Gavrilov:
> On Fri, 15 Jan 2021 at 03:43, Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> In rc4, the number of warnings has dropped dramatically.
> No more errors "kasan slab-out-of-bounds" and no "DMA-API device
> driver failed to check map error".
> But still not fixed "sleeping function called from invalid context at
> include/linux/sched/mm.h:196" and "BUG: key ffff88810b0d9148 has not
> been registered!"
> Second issue Navi specific because it started to happen in 5.10 kernel
> after replacing Radeon VII to 6900XT.
>
> 1.
> BUG: sleeping function called from invalid context at
> include/linux/sched/mm.h:196
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 500, name: systemd-udevd
> 1 lock held by systemd-udevd/500:
>   #0: ffff888107690258 (&dev->mutex){....}-{3:3}, at:
> device_driver_attach+0xa3/0x250
> CPU: 9 PID: 500 Comm: systemd-udevd Not tainted
> 5.11.0-0.rc4.129.fc34.x86_64+debug #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 2802 10/21/2020
> Call Trace:
>   dump_stack+0xae/0xe5
>   ___might_sleep.cold+0x150/0x17e
>   ? dcn30_clock_source_create+0x53/0x110 [amdgpu]
>   kmem_cache_alloc_trace+0x23f/0x270
>   dcn30_clock_source_create+0x53/0x110 [amdgpu]
>   dcn30_create_resource_pool+0x998/0x4890 [amdgpu]
>   ? dcn30_calc_max_scaled_time+0x40/0x40 [amdgpu]
>   ? lock_is_held_type+0xb8/0xf0
>   ? unpoison_range+0x3a/0x60
>   ? ____kasan_kmalloc.constprop.0+0x84/0xa0
>   ? dc_create_resource_pool+0x26e/0x5e0 [amdgpu]
>   dc_create_resource_pool+0x26e/0x5e0 [amdgpu]
>   dc_create+0x636/0x1bc0 [amdgpu]
>   ? lock_acquire+0x2dd/0x7a0
>   ? sched_clock+0x5/0x10
>   ? sched_clock_cpu+0x18/0x170
>   ? find_held_lock+0x33/0x110
>   ? dc_create_state+0xa0/0xa0 [amdgpu]
>   ? lock_downgrade+0x6b0/0x6b0
>   ? module_assert_mutex_or_preempt+0x3e/0x70
>   ? lock_is_held_type+0xb8/0xf0
>   ? unpoison_range+0x3a/0x60
>   ? ____kasan_kmalloc.constprop.0+0x84/0xa0
>   amdgpu_dm_init.isra.0+0x479/0x640 [amdgpu]
>   ? vprintk_emit+0x1c0/0x460
>   ? dev_vprintk_emit+0x2d8/0x31a
>   ? sched_clock+0x5/0x10
>   ? dm_resume+0x13b0/0x13b0 [amdgpu]
>   ? dev_attr_show.cold+0x35/0x35
>   ? lock_downgrade+0x6b0/0x6b0
>   ? dev_printk_emit+0x8c/0xa8
>   ? dev_vprintk_emit+0x31a/0x31a
>   ? wait_for_completion_io+0x240/0x240
>   ? __dev_printk+0x71/0xdf
>   ? smu_hw_init.cold+0x16b/0x18a [amdgpu]
>   ? smu_suspend+0x240/0x240 [amdgpu]
>   ? navi10_ih_irq_init+0xea3/0x2420 [amdgpu]
>   dm_hw_init+0xe/0x20 [amdgpu]
>   amdgpu_device_init.cold+0x3031/0x4940 [amdgpu]
>   ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
>   ? pci_bus_read_config_byte+0x140/0x140
>   ? do_pci_enable_device+0x1f8/0x260
>   ? pci_find_saved_ext_cap+0x110/0x110
>   ? pci_enable_bridge+0xf9/0x1e0
>   ? pci_dev_check_d3cold+0x107/0x250
>   ? pci_enable_device_flags+0x201/0x340
>   amdgpu_driver_load_kms+0x167/0x8a0 [amdgpu]
>   amdgpu_pci_probe+0x235/0x360 [amdgpu]
>   ? amdgpu_pci_remove+0xd0/0xd0 [amdgpu]
>   local_pci_probe+0xd8/0x170
>   pci_device_probe+0x318/0x5c0
>   ? kernfs_create_link+0x16c/0x230
>   ? pci_device_remove+0x1d0/0x1d0
>   really_probe+0x224/0xc40
>   driver_probe_device+0x1f2/0x380
>   device_driver_attach+0x1df/0x250
>   __driver_attach+0xf6/0x260
>   ? device_driver_attach+0x250/0x250
>   bus_for_each_dev+0x114/0x180
>   ? subsys_dev_iter_exit+0x10/0x10
>   bus_add_driver+0x352/0x570
>   driver_register+0x20f/0x390
>   ? __pci_register_driver+0x13a/0x210
>   ? 0xffffffffc1d8d000
>   do_one_initcall+0xfb/0x530
>   ? perf_trace_initcall_level+0x3d0/0x3d0
>   ? __memset+0x2b/0x30
>   ? unpoison_range+0x3a/0x60
>   do_init_module+0x1ce/0x7a0
>   load_module+0x9841/0xa380
>   ? module_frob_arch_sections+0x20/0x20
>   ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
>   ? sched_clock_cpu+0x18/0x170
>   ? sched_clock+0x5/0x10
>   ? lock_acquire+0x2dd/0x7a0
>   ? sched_clock+0x5/0x10
>   ? lock_is_held_type+0xb8/0xf0
>   ? __do_sys_init_module+0x18b/0x220
>   __do_sys_init_module+0x18b/0x220
>   ? load_module+0xa380/0xa380
>   ? ktime_get_coarse_real_ts64+0x12f/0x160
>   do_syscall_64+0x33/0x40
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f2c109da07e
> Code: 48 8b 0d f5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
> 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 8b 0d c2 1d 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffc84d33f88 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 000055b87f8260a0 RCX: 00007f2c109da07e
> RDX: 000055b87f834060 RSI: 0000000001e2cbf6 RDI: 00007f2c0b7e0010
> RBP: 00007f2c0b7e0010 R08: 000055b87f8281e0 R09: 00007ffc84d30a26
> R10: 000055bd2404cc18 R11: 0000000000000246 R12: 000055b87f834060
> R13: 000055b87f831ca0 R14: 0000000000000000 R15: 000055b87f832640
> [drm] Display Core initialized with v3.2.116!
> [drm] DMUB hardware initialized: version=0x02000001
> usb 1-3.2: Device not responding to setup address.
> usb 1-3.2: device not accepting address 5, error -71
> [drm] REG_WAIT timeout 1us * 100000 tries - mpc2_assert_idle_mpcc line:480
>
>
> 2.
> BUG: key ffff88810b0d9148 has not been registered!
> ------------[ cut here ]------------
> DEBUG_LOCKS_WARN_ON(1)
> WARNING: CPU: 25 PID: 500 at kernel/locking/lockdep.c:4618
> lockdep_init_map_waits+0x592/0x770
> Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 gpu_sched
> drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
> ghash_clmulni_intel ccp igb nvme dca nvme_core i2c_algo_bit xhci_pci
> xhci_pci_renesas wmi pinctrl_amd fuse
> CPU: 25 PID: 500 Comm: systemd-udevd Tainted: G        W
> --------- ---  5.11.0-0.rc4.129.fc34.x86_64+debug #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 2802 10/21/2020
> RIP: 0010:lockdep_init_map_waits+0x592/0x770
> Code: 08 84 d2 0f 85 d8 01 00 00 8b 3d e1 02 38 04 85 ff 0f 85 7e fc
> ff ff 48 c7 c6 e0 04 ca 8e 48 c7 c7 40 fd c9 8e e8 01 8e 23 02 <0f> 0b
> e9 64 fc ff ff 48 89 df 44 89 4c 24 0c 44 89 44 24 08 48 89
> RSP: 0018:ffffc900029bef88 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
> RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000537de7
> RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8886f9fe72ab
> R10: ffffed10df3fce55 R11: 0000000000000001 R12: ffff88810b0d9148
> R13: 0000000000000000 R14: ffffffff8edbda60 R15: ffff88810b0db690
> FS:  00007f2c0fdda140(0000) GS:ffff8886f9e00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055b8800aec68 CR3: 0000000127fd0000 CR4: 0000000000350ee0
> Call Trace:
>   ? lockdep_hardirqs_on+0x75/0xf0
>   __kernfs_create_file+0x102/0x2f0
>   sysfs_add_file_mode_ns+0x1af/0x500
>   sysfs_create_bin_file+0x100/0x160
>   ? lock_is_held_type+0xb8/0xf0
>   ? sysfs_add_file_to_group+0x150/0x150
>   ? static_obj+0x8a/0xc0
>   ? lockdep_init_map_waits+0x2a2/0x770
>   hdcp_create_workqueue+0x879/0xb50 [amdgpu]
>   amdgpu_dm_init.isra.0.cold+0x7f2/0x374c [amdgpu]
>   ? vprintk_emit+0x140/0x460
>   ? dev_vprintk_emit+0x2d8/0x31a
>   ? sched_clock+0x5/0x10
>   ? dm_resume+0x13b0/0x13b0 [amdgpu]
>   ? dev_attr_show.cold+0x35/0x35
>   ? psp_set_srm+0x250/0x250 [amdgpu]
>   ? hdcp_update_display+0x5b0/0x5b0 [amdgpu]
>   ? lock_downgrade+0x6b0/0x6b0
>   ? dev_printk_emit+0x8c/0xa8
>   ? dev_vprintk_emit+0x31a/0x31a
>   ? wait_for_completion_io+0x240/0x240
>   ? __dev_printk+0x71/0xdf
>   ? smu_hw_init.cold+0x16b/0x18a [amdgpu]
>   ? smu_suspend+0x240/0x240 [amdgpu]
>   ? navi10_ih_irq_init+0xea3/0x2420 [amdgpu]
>   dm_hw_init+0xe/0x20 [amdgpu]
>   amdgpu_device_init.cold+0x3031/0x4940 [amdgpu]
>   ? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
>   ? pci_bus_read_config_byte+0x140/0x140
>   ? do_pci_enable_device+0x1f8/0x260
>   ? pci_find_saved_ext_cap+0x110/0x110
>   ? pci_enable_bridge+0xf9/0x1e0
>   ? pci_dev_check_d3cold+0x107/0x250
>   ? pci_enable_device_flags+0x201/0x340
>   amdgpu_driver_load_kms+0x167/0x8a0 [amdgpu]
>   amdgpu_pci_probe+0x235/0x360 [amdgpu]
>   ? amdgpu_pci_remove+0xd0/0xd0 [amdgpu]
>   local_pci_probe+0xd8/0x170
>   pci_device_probe+0x318/0x5c0
>   ? kernfs_create_link+0x16c/0x230
>   ? pci_device_remove+0x1d0/0x1d0
>   really_probe+0x224/0xc40
>   driver_probe_device+0x1f2/0x380
>   device_driver_attach+0x1df/0x250
>   __driver_attach+0xf6/0x260
>   ? device_driver_attach+0x250/0x250
>   bus_for_each_dev+0x114/0x180
>   ? subsys_dev_iter_exit+0x10/0x10
>   bus_add_driver+0x352/0x570
>   driver_register+0x20f/0x390
>   ? __pci_register_driver+0x13a/0x210
>   ? 0xffffffffc1d8d000
>   do_one_initcall+0xfb/0x530
>   ? perf_trace_initcall_level+0x3d0/0x3d0
>   ? __memset+0x2b/0x30
>   ? unpoison_range+0x3a/0x60
>   do_init_module+0x1ce/0x7a0
>   load_module+0x9841/0xa380
>   ? module_frob_arch_sections+0x20/0x20
>   ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
>   ? sched_clock_cpu+0x18/0x170
>   ? sched_clock+0x5/0x10
>   ? lock_acquire+0x2dd/0x7a0
>   ? sched_clock+0x5/0x10
>   ? lock_is_held_type+0xb8/0xf0
>   ? __do_sys_init_module+0x18b/0x220
>   __do_sys_init_module+0x18b/0x220
>   ? load_module+0xa380/0xa380
>   ? ktime_get_coarse_real_ts64+0x12f/0x160
>   do_syscall_64+0x33/0x40
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f2c109da07e
> Code: 48 8b 0d f5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
> 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 8b 0d c2 1d 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffc84d33f88 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 000055b87f8260a0 RCX: 00007f2c109da07e
> RDX: 000055b87f834060 RSI: 0000000001e2cbf6 RDI: 00007f2c0b7e0010
> RBP: 00007f2c0b7e0010 R08: 000055b87f8281e0 R09: 00007ffc84d30a26
> R10: 000055bd2404cc18 R11: 0000000000000246 R12: 000055b87f834060
> R13: 000055b87f831ca0 R14: 0000000000000000 R15: 000055b87f832640
> irq event stamp: 593331
> hardirqs last  enabled at (593331): [<ffffffff8c3602f0>]
> console_unlock+0x7c0/0x9a0
> hardirqs last disabled at (593330): [<ffffffff8c3601e8>]
> console_unlock+0x6b8/0x9a0
> softirqs last  enabled at (593162): [<ffffffff8e801112>]
> asm_call_irq_on_stack+0x12/0x20
> softirqs last disabled at (593157): [<ffffffff8e801112>]
> asm_call_irq_on_stack+0x12/0x20
> ---[ end trace 37dc3a4a3aa1704a ]---
>
> Issue with the switching off monitor still happens too, but messages
> in logs become more detailed:
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
> amdgpu 0000:0b:00.0: amdgpu: 0000000087613007 pin failed
> [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
> framebuffer with error -12
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
>
> I hope "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the
> buffer list -4!" gives an idea of what happened.
>
> Full kernel log is here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FnX69zgvf&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cdee77ab7d3c04b44adda08d8bcdebcfe%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467012155850822%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=J6TiqMBHrrZyNolxaUgKo4%2BNa6kBCBytrs1bJhqzGuU%3D&amp;reserved=0
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12
  2021-01-21 13:27                 ` Christian König
@ 2021-01-25  5:28                   ` Mikhail Gavrilov
  0 siblings, 0 replies; 13+ messages in thread
From: Mikhail Gavrilov @ 2021-01-25  5:28 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Harry Wentland, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

On Thu, 21 Jan 2021 at 18:27, Christian König <christian.koenig@amd.com> wrote:
>
> I still have no idea what's going on here.
>
> The KASAN messages from the DC code are completely unrelated.
>
> Please add the full dmesg to your bug report.
>

I did it.
https://gitlab.freedesktop.org/drm/amd/-/issues/1439#note_776267

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-25  5:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-10 22:26 [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12 Mikhail Gavrilov
2021-01-11  9:03 ` Christian König
2021-01-11 14:01   ` Christian König
2021-01-11 19:23     ` Mikhail Gavrilov
2021-01-11 20:45       ` Christian König
2021-01-11 21:51         ` Mikhail Gavrilov
2021-01-14  0:22         ` Mikhail Gavrilov
2021-01-14 13:56           ` Christian König
2021-01-14 14:06             ` Daniel Vetter
2021-01-14 22:43             ` Mikhail Gavrilov
2021-01-20  0:59               ` Mikhail Gavrilov
2021-01-21 13:27                 ` Christian König
2021-01-25  5:28                   ` Mikhail Gavrilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).