linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.11-rc1 TTM list corruption
@ 2020-12-31 10:40 Borislav Petkov
  2021-01-01 14:34 ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2020-12-31 10:40 UTC (permalink / raw)
  To: Christian Koenig, Huang Rui; +Cc: dri-devel, lkml

Hi folks,

got this when trying to suspend my workstation to disk, it was still
responsive so I could catch the splat:

[22020.334381] ------------[ cut here ]------------
[22020.339057] list_del corruption. next->prev should be ffffffff8b7a9a40, but was ffff8881020bced0
[22020.347764] WARNING: CPU: 12 PID: 13134 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
[22020.356397] Modules linked in: fuse essiv authenc nft_counter nf_tables libcrc32c nfnetlink loop dm_crypt dm_mod amd64_edac edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic led_class kvm ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_pcm snd_timer irqbypass crct10dif_pclmul snd crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr k10temp soundcore gpio_amdpt gpio_generic acpi_cpufreq radeon aesni_intel glue_helper crypto_simd cryptd pinctrl_amd
[22020.400855] CPU: 12 PID: 13134 Comm: hib.sh Not tainted 5.11.0-rc1+ #2
[22020.400857] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019
[22020.400858] RIP: 0010:__list_del_entry_valid+0x8a/0x90
[22020.400861] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 78 30 0f 82 e8 24 6c 46 00 0f 0b 31 c0 c3 48 c7 c7 b8 30 0f 82 e8 13 6c 46 00 <0f> 0b 31 c0 c3 cc 48 85 d2 89 f8 74 20 48 8d 0c 16 0f b6 16 48 ff
[22020.400863] RSP: 0018:ffffc90001fbbcf8 EFLAGS: 00010292
[22020.441503] RAX: 0000000000000054 RBX: ffffffff8b7a9a40 RCX: 0000000000000000
[22020.441505] RDX: ffff8887fef26600 RSI: ffff8887fef17450 RDI: ffff8887fef17450
[22020.441505] RBP: 0000000000003f82 R08: ffff8887fef17450 R09: ffffc90001fbbb38
[22020.441506] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[22020.441507] R13: 0000000000000080 R14: 0000000000000480 R15: 000000000000019b
[22020.441508] FS:  00007f51c72f9740(0000) GS:ffff8887fef00000(0000) knlGS:0000000000000000
[22020.490045] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22020.490046] CR2: 00005557afb81018 CR3: 000000012099e000 CR4: 00000000003506e0
[22020.490047] Call Trace:
[22020.490048]  ttm_pool_shrink+0x61/0xd0
[22020.508965]  ttm_pool_shrinker_scan+0xa/0x20
[22020.508966]  shrink_slab.part.0.constprop.0+0x1a1/0x330
[22020.508970]  drop_slab_node+0x37/0x50
[22020.522011]  drop_slab+0x33/0x60
[22020.522012]  drop_caches_sysctl_handler+0x70/0x80
[22020.522015]  proc_sys_call_handler+0x140/0x220
[22020.534286]  new_sync_write+0x10b/0x190
[22020.534289]  vfs_write+0x1b7/0x290
[22020.534291]  ksys_write+0x60/0xe0
[22020.544762]  do_syscall_64+0x33/0x40
[22020.544765]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[22020.553320] RIP: 0033:0x7f51c73eaff3
[22020.553322] Code: 8b 15 a1 ee 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
[22020.553324] RSP: 002b:00007ffd0a748ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[22020.553325] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f51c73eaff3
[22020.553326] RDX: 0000000000000002 RSI: 000056039fd0ee70 RDI: 0000000000000001
[22020.553327] RBP: 000056039fd0ee70 R08: 000000000000000a R09: 0000000000000001
[22020.553327] R10: 000056039fd0e770 R11: 0000000000000246 R12: 0000000000000002
[22020.611218] R13: 00007f51c74bb6a0 R14: 0000000000000002 R15: 00007f51c74bb8a0
[22020.611220] ---[ end trace f7ea94a6ddb98f71 ]---

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2020-12-31 10:40 5.11-rc1 TTM list corruption Borislav Petkov
@ 2021-01-01 14:34 ` Christian König
  2021-01-04 10:58   ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2021-01-01 14:34 UTC (permalink / raw)
  To: Borislav Petkov, Huang Rui; +Cc: dri-devel, lkml

Hi Borislav,

my best guess is that this is an use after free.

Going to double check the code, but can you reproduce this issue reliable?

Thanks,
Christian.

Am 31.12.20 um 11:40 schrieb Borislav Petkov:
> Hi folks,
>
> got this when trying to suspend my workstation to disk, it was still
> responsive so I could catch the splat:
>
> [22020.334381] ------------[ cut here ]------------
> [22020.339057] list_del corruption. next->prev should be ffffffff8b7a9a40, but was ffff8881020bced0
> [22020.347764] WARNING: CPU: 12 PID: 13134 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> [22020.356397] Modules linked in: fuse essiv authenc nft_counter nf_tables libcrc32c nfnetlink loop dm_crypt dm_mod amd64_edac edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic led_class kvm ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_pcm snd_timer irqbypass crct10dif_pclmul snd crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr k10temp soundcore gpio_amdpt gpio_generic acpi_cpufreq radeon aesni_intel glue_helper crypto_simd cryptd pinctrl_amd
> [22020.400855] CPU: 12 PID: 13134 Comm: hib.sh Not tainted 5.11.0-rc1+ #2
> [22020.400857] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019
> [22020.400858] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> [22020.400861] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 78 30 0f 82 e8 24 6c 46 00 0f 0b 31 c0 c3 48 c7 c7 b8 30 0f 82 e8 13 6c 46 00 <0f> 0b 31 c0 c3 cc 48 85 d2 89 f8 74 20 48 8d 0c 16 0f b6 16 48 ff
> [22020.400863] RSP: 0018:ffffc90001fbbcf8 EFLAGS: 00010292
> [22020.441503] RAX: 0000000000000054 RBX: ffffffff8b7a9a40 RCX: 0000000000000000
> [22020.441505] RDX: ffff8887fef26600 RSI: ffff8887fef17450 RDI: ffff8887fef17450
> [22020.441505] RBP: 0000000000003f82 R08: ffff8887fef17450 R09: ffffc90001fbbb38
> [22020.441506] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> [22020.441507] R13: 0000000000000080 R14: 0000000000000480 R15: 000000000000019b
> [22020.441508] FS:  00007f51c72f9740(0000) GS:ffff8887fef00000(0000) knlGS:0000000000000000
> [22020.490045] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [22020.490046] CR2: 00005557afb81018 CR3: 000000012099e000 CR4: 00000000003506e0
> [22020.490047] Call Trace:
> [22020.490048]  ttm_pool_shrink+0x61/0xd0
> [22020.508965]  ttm_pool_shrinker_scan+0xa/0x20
> [22020.508966]  shrink_slab.part.0.constprop.0+0x1a1/0x330
> [22020.508970]  drop_slab_node+0x37/0x50
> [22020.522011]  drop_slab+0x33/0x60
> [22020.522012]  drop_caches_sysctl_handler+0x70/0x80
> [22020.522015]  proc_sys_call_handler+0x140/0x220
> [22020.534286]  new_sync_write+0x10b/0x190
> [22020.534289]  vfs_write+0x1b7/0x290
> [22020.534291]  ksys_write+0x60/0xe0
> [22020.544762]  do_syscall_64+0x33/0x40
> [22020.544765]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [22020.553320] RIP: 0033:0x7f51c73eaff3
> [22020.553322] Code: 8b 15 a1 ee 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> [22020.553324] RSP: 002b:00007ffd0a748ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [22020.553325] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f51c73eaff3
> [22020.553326] RDX: 0000000000000002 RSI: 000056039fd0ee70 RDI: 0000000000000001
> [22020.553327] RBP: 000056039fd0ee70 R08: 000000000000000a R09: 0000000000000001
> [22020.553327] R10: 000056039fd0e770 R11: 0000000000000246 R12: 0000000000000002
> [22020.611218] R13: 00007f51c74bb6a0 R14: 0000000000000002 R15: 00007f51c74bb8a0
> [22020.611220] ---[ end trace f7ea94a6ddb98f71 ]---
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-01 14:34 ` Christian König
@ 2021-01-04 10:58   ` Borislav Petkov
  2021-01-04 14:48     ` Christian König
  2021-01-05  4:12     ` Huang Rui
  0 siblings, 2 replies; 13+ messages in thread
From: Borislav Petkov @ 2021-01-04 10:58 UTC (permalink / raw)
  To: Christian König; +Cc: Huang Rui, dri-devel, lkml

On Fri, Jan 01, 2021 at 03:34:28PM +0100, Christian König wrote:
> Going to double check the code, but can you reproduce this issue
> reliable?

Lemme find a test box which can trigger it too - the splat happened
on my workstation and I'd like to avoid debugging there for obvious
reasons.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-04 10:58   ` Borislav Petkov
@ 2021-01-04 14:48     ` Christian König
  2021-01-05  4:12     ` Huang Rui
  1 sibling, 0 replies; 13+ messages in thread
From: Christian König @ 2021-01-04 14:48 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Huang Rui, dri-devel, lkml

Am 04.01.21 um 11:58 schrieb Borislav Petkov:
> On Fri, Jan 01, 2021 at 03:34:28PM +0100, Christian König wrote:
>> Going to double check the code, but can you reproduce this issue
>> reliable?
> Lemme find a test box which can trigger it too - the splat happened
> on my workstation and I'd like to avoid debugging there for obvious
> reasons.

Please do so since I can't reproduce this problem and double checking 
the source doesn't show anything obvious either.

Thanks,
Christian.

>
> Thx.
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-04 10:58   ` Borislav Petkov
  2021-01-04 14:48     ` Christian König
@ 2021-01-05  4:12     ` Huang Rui
  2021-01-05 10:31       ` Borislav Petkov
  1 sibling, 1 reply; 13+ messages in thread
From: Huang Rui @ 2021-01-05  4:12 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Koenig, Christian, dri-devel, lkml

On Mon, Jan 04, 2021 at 06:58:02PM +0800, Borislav Petkov wrote:
> On Fri, Jan 01, 2021 at 03:34:28PM +0100, Christian K?nig wrote:
> > Going to double check the code, but can you reproduce this issue
> > reliable?
> 
> Lemme find a test box which can trigger it too - the splat happened
> on my workstation and I'd like to avoid debugging there for obvious
> reasons.

Hi Boris, Christian,

I am reproducing this issue as well, are you using a Raven board?

Thanks,
Ray

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cray.huang%40amd.com%7C33b48c914b5b4672ac7308d8b09f9a03%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637453546869304657%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=e3Fj4KGz5n0D9O0zGApDTfstJpNmeu6HSJN2oa8iSKA%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-05  4:12     ` Huang Rui
@ 2021-01-05 10:31       ` Borislav Petkov
  2021-01-05 11:08         ` Huang Rui
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-01-05 10:31 UTC (permalink / raw)
  To: Huang Rui; +Cc: Koenig, Christian, dri-devel, lkml

Hi,

On Tue, Jan 05, 2021 at 12:12:13PM +0800, Huang Rui wrote:
> I am reproducing this issue as well, are you using a Raven board?

I have no clue what Raven is. The workstation I triggered it once on, has:

[    7.563968] [drm] radeon kernel modesetting enabled.
[    7.581417] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x174B:0xE164 0x00).
[    7.609217] [drm] Detected VRAM RAM=2048M, BAR=256M
[    7.614031] [drm] RAM width 64bits DDR
[    7.639665] [drm] radeon: 2048M of VRAM memory ready
[    7.644557] [drm] radeon: 1024M of GTT memory ready.
[    7.649451] [drm] Loading CAICOS Microcode
[    7.653548] [drm] Internal thermal controller without fan control
[    7.661221] [drm] radeon: dpm initialized
[    7.665227] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    7.671821] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[    7.703858] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
[    7.749689] [drm] radeon: irq initialized.
[    7.769826] [drm] ring test on 0 succeeded in 1 usecs
[    7.774797] [drm] ring test on 3 succeeded in 3 usecs
[    7.955500] [drm] ring test on 5 succeeded in 1 usecs
[    7.960468] [drm] UVD initialized successfully.
[    7.965047] [drm] ib test on ring 0 succeeded in 0 usecs
[    7.970316] [drm] ib test on ring 3 succeeded in 0 usecs
[    8.626877] [drm] ib test on ring 5 succeeded
[    8.631376] [drm] Radeon Display Connectors
[    8.635496] [drm] Connector 0:
[    8.638503] [drm]   HDMI-A-1
[    8.641339] [drm]   HPD2
[    8.643835] [drm]   DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c
[    8.651102] [drm]   Encoders:
[    8.654022] [drm]     DFP1: INTERNAL_UNIPHY1
[    8.658224] [drm] Connector 1:
[    8.661232] [drm]   DVI-D-1
[    8.663982] [drm]   HPD4
[    8.666479] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
[    8.673745] [drm]   Encoders:
[    8.676665] [drm]     DFP2: INTERNAL_UNIPHY
[    8.680782] [drm] Connector 2:
[    8.683789] [drm]   VGA-1
[    8.686369] [drm]   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
[    8.693636] [drm]   Encoders:
[    8.696555] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    8.788923] [drm] fb mappable at 0xE0363000
[    8.793036] [drm] vram apper at 0xE0000000
[    8.797064] [drm] size 9216000
[    8.800071] [drm] fb depth is 24
[    8.803249] [drm]    pitch is 7680
[    8.807106] fbcon: radeondrmfb (fb0) is primary device
[    8.918927] radeon 0000:1d:00.0: [drm] fb0: radeondrmfb frame buffer device
[    8.938461] [drm] Initialized radeon 2.50.0 20080528 for 0000:1d:00.0 on minor 0

HTH.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-05 10:31       ` Borislav Petkov
@ 2021-01-05 11:08         ` Huang Rui
  2021-01-05 11:34           ` Christian König
  2021-01-05 11:43           ` Borislav Petkov
  0 siblings, 2 replies; 13+ messages in thread
From: Huang Rui @ 2021-01-05 11:08 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Koenig, Christian, dri-devel, lkml

On Tue, Jan 05, 2021 at 06:31:38PM +0800, Borislav Petkov wrote:
> Hi,
> 
> On Tue, Jan 05, 2021 at 12:12:13PM +0800, Huang Rui wrote:
> > I am reproducing this issue as well, are you using a Raven board?
> 
> I have no clue what Raven is. The workstation I triggered it once on, has:
> 
> [    7.563968] [drm] radeon kernel modesetting enabled.
> [    7.581417] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x174B:0xE164 0x00).

Ah, this asic is a bit old and still use radeon driver. So we didn't
reproduce it on amdgpu driver. I don't have such the old asic in my hand.
May we know whether this issue can be duplicated after SI which is used
amdgpu module (not sure whether you have recent APU or GPU)?

Thanks,
Ray

> [    7.609217] [drm] Detected VRAM RAM=2048M, BAR=256M
> [    7.614031] [drm] RAM width 64bits DDR
> [    7.639665] [drm] radeon: 2048M of VRAM memory ready
> [    7.644557] [drm] radeon: 1024M of GTT memory ready.
> [    7.649451] [drm] Loading CAICOS Microcode
> [    7.653548] [drm] Internal thermal controller without fan control
> [    7.661221] [drm] radeon: dpm initialized
> [    7.665227] [drm] GART: num cpu pages 262144, num gpu pages 262144
> [    7.671821] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
> [    7.703858] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
> [    7.749689] [drm] radeon: irq initialized.
> [    7.769826] [drm] ring test on 0 succeeded in 1 usecs
> [    7.774797] [drm] ring test on 3 succeeded in 3 usecs
> [    7.955500] [drm] ring test on 5 succeeded in 1 usecs
> [    7.960468] [drm] UVD initialized successfully.
> [    7.965047] [drm] ib test on ring 0 succeeded in 0 usecs
> [    7.970316] [drm] ib test on ring 3 succeeded in 0 usecs
> [    8.626877] [drm] ib test on ring 5 succeeded
> [    8.631376] [drm] Radeon Display Connectors
> [    8.635496] [drm] Connector 0:
> [    8.638503] [drm]   HDMI-A-1
> [    8.641339] [drm]   HPD2
> [    8.643835] [drm]   DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c
> [    8.651102] [drm]   Encoders:
> [    8.654022] [drm]     DFP1: INTERNAL_UNIPHY1
> [    8.658224] [drm] Connector 1:
> [    8.661232] [drm]   DVI-D-1
> [    8.663982] [drm]   HPD4
> [    8.666479] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
> [    8.673745] [drm]   Encoders:
> [    8.676665] [drm]     DFP2: INTERNAL_UNIPHY
> [    8.680782] [drm] Connector 2:
> [    8.683789] [drm]   VGA-1
> [    8.686369] [drm]   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
> [    8.693636] [drm]   Encoders:
> [    8.696555] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
> [    8.788923] [drm] fb mappable at 0xE0363000
> [    8.793036] [drm] vram apper at 0xE0000000
> [    8.797064] [drm] size 9216000
> [    8.800071] [drm] fb depth is 24
> [    8.803249] [drm]    pitch is 7680
> [    8.807106] fbcon: radeondrmfb (fb0) is primary device
> [    8.918927] radeon 0000:1d:00.0: [drm] fb0: radeondrmfb frame buffer device
> [    8.938461] [drm] Initialized radeon 2.50.0 20080528 for 0000:1d:00.0 on minor 0
> 
> HTH.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cray.huang%40amd.com%7C31b8dcd4040e4a49380e08d8b16517ad%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454395066317813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=al4lLGA%2BCdHK4HzO8M5VJthY8Iv71xQ0TsDGwJpgs1A%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-05 11:08         ` Huang Rui
@ 2021-01-05 11:34           ` Christian König
  2021-01-05 11:43           ` Borislav Petkov
  1 sibling, 0 replies; 13+ messages in thread
From: Christian König @ 2021-01-05 11:34 UTC (permalink / raw)
  To: Huang Rui, Borislav Petkov; +Cc: dri-devel, lkml

Am 05.01.21 um 12:08 schrieb Huang Rui:
> On Tue, Jan 05, 2021 at 06:31:38PM +0800, Borislav Petkov wrote:
>> Hi,
>>
>> On Tue, Jan 05, 2021 at 12:12:13PM +0800, Huang Rui wrote:
>>> I am reproducing this issue as well, are you using a Raven board?
>> I have no clue what Raven is. The workstation I triggered it once on, has:
>>
>> [    7.563968] [drm] radeon kernel modesetting enabled.
>> [    7.581417] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x174B:0xE164 0x00).
> Ah, this asic is a bit old and still use radeon driver. So we didn't
> reproduce it on amdgpu driver. I don't have such the old asic in my hand.
> May we know whether this issue can be duplicated after SI which is used
> amdgpu module (not sure whether you have recent APU or GPU)?

Ah! Thanks Ray for pointing this out. I have SI based hardware here.

Going to try this in a few minutes.

Thanks,
Christian.

>
> Thanks,
> Ray
>
>> [    7.609217] [drm] Detected VRAM RAM=2048M, BAR=256M
>> [    7.614031] [drm] RAM width 64bits DDR
>> [    7.639665] [drm] radeon: 2048M of VRAM memory ready
>> [    7.644557] [drm] radeon: 1024M of GTT memory ready.
>> [    7.649451] [drm] Loading CAICOS Microcode
>> [    7.653548] [drm] Internal thermal controller without fan control
>> [    7.661221] [drm] radeon: dpm initialized
>> [    7.665227] [drm] GART: num cpu pages 262144, num gpu pages 262144
>> [    7.671821] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
>> [    7.703858] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
>> [    7.749689] [drm] radeon: irq initialized.
>> [    7.769826] [drm] ring test on 0 succeeded in 1 usecs
>> [    7.774797] [drm] ring test on 3 succeeded in 3 usecs
>> [    7.955500] [drm] ring test on 5 succeeded in 1 usecs
>> [    7.960468] [drm] UVD initialized successfully.
>> [    7.965047] [drm] ib test on ring 0 succeeded in 0 usecs
>> [    7.970316] [drm] ib test on ring 3 succeeded in 0 usecs
>> [    8.626877] [drm] ib test on ring 5 succeeded
>> [    8.631376] [drm] Radeon Display Connectors
>> [    8.635496] [drm] Connector 0:
>> [    8.638503] [drm]   HDMI-A-1
>> [    8.641339] [drm]   HPD2
>> [    8.643835] [drm]   DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c
>> [    8.651102] [drm]   Encoders:
>> [    8.654022] [drm]     DFP1: INTERNAL_UNIPHY1
>> [    8.658224] [drm] Connector 1:
>> [    8.661232] [drm]   DVI-D-1
>> [    8.663982] [drm]   HPD4
>> [    8.666479] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
>> [    8.673745] [drm]   Encoders:
>> [    8.676665] [drm]     DFP2: INTERNAL_UNIPHY
>> [    8.680782] [drm] Connector 2:
>> [    8.683789] [drm]   VGA-1
>> [    8.686369] [drm]   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
>> [    8.693636] [drm]   Encoders:
>> [    8.696555] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
>> [    8.788923] [drm] fb mappable at 0xE0363000
>> [    8.793036] [drm] vram apper at 0xE0000000
>> [    8.797064] [drm] size 9216000
>> [    8.800071] [drm] fb depth is 24
>> [    8.803249] [drm]    pitch is 7680
>> [    8.807106] fbcon: radeondrmfb (fb0) is primary device
>> [    8.918927] radeon 0000:1d:00.0: [drm] fb0: radeondrmfb frame buffer device
>> [    8.938461] [drm] Initialized radeon 2.50.0 20080528 for 0000:1d:00.0 on minor 0
>>
>> HTH.
>>
>> -- 
>> Regards/Gruss,
>>      Boris.
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7Cray.huang%40amd.com%7C31b8dcd4040e4a49380e08d8b16517ad%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454395066317813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=al4lLGA%2BCdHK4HzO8M5VJthY8Iv71xQ0TsDGwJpgs1A%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-05 11:08         ` Huang Rui
  2021-01-05 11:34           ` Christian König
@ 2021-01-05 11:43           ` Borislav Petkov
  2021-01-05 12:20             ` Huang Rui
  1 sibling, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-01-05 11:43 UTC (permalink / raw)
  To: Huang Rui; +Cc: Koenig, Christian, dri-devel, lkml

On Tue, Jan 05, 2021 at 07:08:52PM +0800, Huang Rui wrote:
> Ah, this asic is a bit old and still use radeon driver. So we didn't
> reproduce it on amdgpu driver. I don't have such the old asic in my hand.
> May we know whether this issue can be duplicated after SI which is used
> amdgpu module (not sure whether you have recent APU or GPU)?

The latest I have (I think it is the latest) is:

[    1.826102] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x17AA:0x5099 0xD1).

and so far that hasn't triggered it. Which makes sense because that
thing uses amdgpu:

[    1.810260] [drm] amdgpu kernel modesetting enabled.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-05 11:43           ` Borislav Petkov
@ 2021-01-05 12:20             ` Huang Rui
  2021-01-05 15:40               ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Huang Rui @ 2021-01-05 12:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Koenig, Christian, dri-devel, lkml

On Tue, Jan 05, 2021 at 07:43:51PM +0800, Borislav Petkov wrote:
> On Tue, Jan 05, 2021 at 07:08:52PM +0800, Huang Rui wrote:
> > Ah, this asic is a bit old and still use radeon driver. So we didn't
> > reproduce it on amdgpu driver. I don't have such the old asic in my hand.
> > May we know whether this issue can be duplicated after SI which is used
> > amdgpu module (not sure whether you have recent APU or GPU)?
> 
> The latest I have (I think it is the latest) is:
> 
> [    1.826102] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x17AA:0x5099 0xD1).
> 
> and so far that hasn't triggered it. Which makes sense because that
> thing uses amdgpu:
> 
> [    1.810260] [drm] amdgpu kernel modesetting enabled.

Yes! Renoir is late enough for amdgpu kernel module. :-)
Please let us know if you still encounter the issue.

Thanks,
Ray

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-05 12:20             ` Huang Rui
@ 2021-01-05 15:40               ` Christian König
  2021-01-06 16:54                 ` David Woodhouse
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2021-01-05 15:40 UTC (permalink / raw)
  To: Huang Rui, Borislav Petkov; +Cc: dri-devel, lkml

Am 05.01.21 um 13:20 schrieb Huang Rui:
> On Tue, Jan 05, 2021 at 07:43:51PM +0800, Borislav Petkov wrote:
>> On Tue, Jan 05, 2021 at 07:08:52PM +0800, Huang Rui wrote:
>>> Ah, this asic is a bit old and still use radeon driver. So we didn't
>>> reproduce it on amdgpu driver. I don't have such the old asic in my hand.
>>> May we know whether this issue can be duplicated after SI which is used
>>> amdgpu module (not sure whether you have recent APU or GPU)?
>> The latest I have (I think it is the latest) is:
>>
>> [    1.826102] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x17AA:0x5099 0xD1).
>>
>> and so far that hasn't triggered it. Which makes sense because that
>> thing uses amdgpu:
>>
>> [    1.810260] [drm] amdgpu kernel modesetting enabled.
> Yes! Renoir is late enough for amdgpu kernel module. :-)
> Please let us know if you still encounter the issue.

Thanks for the hints guys. You need a rather specific configuration, but 
I can reproduce this now.

Let's see what the problem is here.

Thanks,
Christian.

>
> Thanks,
> Ray


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-05 15:40               ` Christian König
@ 2021-01-06 16:54                 ` David Woodhouse
  2021-01-06 17:10                   ` Alex Deucher
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2021-01-06 16:54 UTC (permalink / raw)
  To: Christian König, Huang Rui, Borislav Petkov; +Cc: dri-devel, lkml

[-- Attachment #1: Type: text/plain, Size: 10332 bytes --]

On Tue, 2021-01-05 at 16:40 +0100, Christian König wrote:
> Am 05.01.21 um 13:20 schrieb Huang Rui:
> > On Tue, Jan 05, 2021 at 07:43:51PM +0800, Borislav Petkov wrote:
> > > On Tue, Jan 05, 2021 at 07:08:52PM +0800, Huang Rui wrote:
> > > > Ah, this asic is a bit old and still use radeon driver. So we didn't
> > > > reproduce it on amdgpu driver. I don't have such the old asic in my hand.
> > > > May we know whether this issue can be duplicated after SI which is used
> > > > amdgpu module (not sure whether you have recent APU or GPU)?
> > > 
> > > The latest I have (I think it is the latest) is:
> > > 
> > > [    1.826102] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x17AA:0x5099 0xD1).
> > > 
> > > and so far that hasn't triggered it. Which makes sense because that
> > > thing uses amdgpu:
> > > 
> > > [    1.810260] [drm] amdgpu kernel modesetting enabled.
> > 
> > Yes! Renoir is late enough for amdgpu kernel module. :-)
> > Please let us know if you still encounter the issue.
> 
> Thanks for the hints guys. You need a rather specific configuration, but 
> I can reproduce this now.
> 
> Let's see what the problem is here.

FWIW I'm seeing it here on my workstation too.

[    3.952102] [drm] radeon kernel modesetting enabled.
[    3.952885] checking generic (90000000 300000) vs hw (90000000 10000000)
[    3.952898] fb0: switching to radeondrmfb from EFI VGA
[    3.953665] Console: switching to colour dummy device 80x25
[    3.953696] radeon 0000:03:00.0: vgaarb: deactivate vga console
[    3.953898] [drm] initializing kernel modesetting (CYPRESS 0x1002:0x6898 0x1462:0x8032 0x00).
[    3.953940] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000cbfff window]
[    3.953945] caller pci_map_rom+0x6c/0x1b0 mapping multiple BARs
[    3.953972] ATOM BIOS: 113
[    3.954028] radeon 0000:03:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[    3.954032] radeon 0000:03:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
[    3.954037] [drm] Detected VRAM RAM=1024M, BAR=256M
[    3.954039] [drm] RAM width 256bits DDR
[    3.954087] [TTM] Zone  kernel: Available graphics memory: 16389788 KiB
[    3.954090] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[    3.954105] [drm] radeon: 1024M of VRAM memory ready
[    3.954107] [drm] radeon: 1024M of GTT memory ready.
[    3.954114] [drm] Loading CYPRESS Microcode
[    3.954168] [drm] Internal thermal controller with fan control
[    3.954531] usb 3-1.1.1: New USB device found, idVendor=10d5, idProduct=1234, bcdDevice= 9.02
[    3.954539] usb 3-1.1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    3.958098] hub 3-1.1.1:1.0: USB hub found
[    3.959704] hub 3-1.1.1:1.0: 4 ports detected
[    3.975098] [drm] radeon: dpm initialized
[    3.975159] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    3.976074] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[    3.979669] igb 0000:01:00.0 eno0: renamed from eth0
[    3.993789] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
[    3.993912] radeon 0000:03:00.0: WB enabled
[    3.993915] radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00
[    3.993918] radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
[    3.994359] radeon 0000:03:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418
[    3.994531] radeon 0000:03:00.0: radeon: MSI limited to 32-bit
[    3.994563] radeon 0000:03:00.0: radeon: using MSI.
[    3.994581] [drm] radeon: irq initialized.
[    4.011086] [drm] ring test on 0 succeeded in 1 usecs
[    4.011094] [drm] ring test on 3 succeeded in 2 usecs
[    4.030666] EXT4-fs (md127): mounted filesystem with ordered data mode. Opts: (null)
[    4.188159] [drm] ring test on 5 succeeded in 1 usecs
[    4.188165] [drm] UVD initialized successfully.
[    4.188326] [drm] ib test on ring 0 succeeded in 0 usecs
[    4.188371] [drm] ib test on ring 3 succeeded in 0 usecs
...
[    4.839982] [drm] ib test on ring 5 succeeded
[    4.841079] [drm] Radeon Display Connectors
[    4.841087] [drm] Connector 0:
[    4.841090] [drm]   DP-1
[    4.841094] [drm]   HPD4
[    4.841097] [drm]   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
[    4.841104] [drm]   Encoders:
[    4.841107] [drm]     DFP1: INTERNAL_UNIPHY2
[    4.841111] [drm] Connector 1:
[    4.841114] [drm]   HDMI-A-1
[    4.841118] [drm]   HPD5
[    4.841120] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
[    4.841127] [drm]   Encoders:
[    4.841130] [drm]     DFP2: INTERNAL_UNIPHY2
[    4.841133] [drm] Connector 2:
[    4.841136] [drm]   DVI-I-1
[    4.841139] [drm]   HPD1
[    4.841142] [drm]   DDC: 0x6450 0x6450 0x6454 0x6454 0x6458 0x6458 0x645c 0x645c
[    4.841149] [drm]   Encoders:
[    4.841151] [drm]     DFP3: INTERNAL_UNIPHY1
[    4.841155] [drm]     CRT2: INTERNAL_KLDSCP_DAC2
[    4.841159] [drm] Connector 3:
[    4.841162] [drm]   DVI-I-2
[    4.841165] [drm]   HPD6
[    4.841168] [drm]   DDC: 0x6470 0x6470 0x6474 0x6474 0x6478 0x6478 0x647c 0x647c
[    4.841174] [drm]   Encoders:
[    4.841177] [drm]     DFP4: INTERNAL_UNIPHY
[    4.841180] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    4.921539] [drm] fb mappable at 0x9034D000
[    4.921547] [drm] vram apper at 0x90000000
[    4.921549] [drm] size 9216000
[    4.921552] [drm] fb depth is 24
[    4.921555] [drm]    pitch is 7680
[    4.921680] fbcon: radeondrmfb (fb0) is primary device
[    4.943121] Console: switching to colour frame buffer device 240x75
[    4.950509] radeon 0000:03:00.0: [drm] fb0: radeondrmfb frame buffer device
[    4.959011] [drm] Initialized radeon 2.50.0 20080528 for 0000:03:00.0 on minor 0


...

[27221.673320] list_del corruption. next->prev should be ffffffffc02e4e40, but was ffff98de96e40ed0
[27221.673355] ------------[ cut here ]------------
[27221.673357] kernel BUG at lib/list_debug.c:54!
[27221.673365] invalid opcode: 0000 [#1] SMP PTI
[27221.673370] CPU: 9 PID: 263 Comm: kswapd0 Tainted: G S        I       5.10.0+ #701
[27221.673373] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[27221.673376] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[27221.673386] Code: c7 c7 08 b7 40 9d e8 77 3f fe ff 0f 0b 48 89 fe 48 c7 c7 98 b7 40 9d e8 66 3f fe ff 0f 0b 48 c7 c7 48 b8 40 9d e8 58 3f fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 08 b8 40 9d e8 44 3f fe ff 0f 0b
[27221.673389] RSP: 0000:ffffac17007f3c20 EFLAGS: 00010286
[27221.673394] RAX: 0000000000000054 RBX: ffffffffc02e4e40 RCX: 0000000000000000
[27221.673396] RDX: ffff98e5df866ba0 RSI: ffff98e5df858ac0 RDI: ffff98e5df858ac0
[27221.673398] RBP: 0000000000000080 R08: 0000000000000000 R09: ffffac17007f3a58
[27221.673401] R10: ffffac17007f3a50 R11: ffffffff9d744ca8 R12: 0000000000000000
[27221.673403] R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02e4ba0
[27221.673405] FS:  0000000000000000(0000) GS:ffff98e5df840000(0000) knlGS:0000000000000000
[27221.673408] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27221.673411] CR2: 00000000004fea86 CR3: 000000079a9e4001 CR4: 00000000001726e0
[27221.673414] Call Trace:
[27221.673420]  ttm_pool_shrink+0x53/0xb0 [ttm]
[27221.673433]  ttm_pool_shrinker_scan+0xa/0x20 [ttm]
[27221.673440]  do_shrink_slab+0x145/0x240
[27221.673447]  shrink_slab+0x9c/0x280
[27221.673451]  shrink_node+0x2c2/0x6f0
[27221.673456]  balance_pgdat+0x2ff/0x620
[27221.673461]  kswapd+0x1e6/0x360
[27221.673464]  ? finish_wait+0x80/0x80
[27221.673471]  ? balance_pgdat+0x620/0x620
[27221.673474]  kthread+0x11b/0x140
[27221.673479]  ? __kthread_bind_mask+0x60/0x60
[27221.673483]  ret_from_fork+0x22/0x30
[27221.673491] Modules linked in: vhost_net vhost vhost_iotlb tap xt_MASQUERADE xt_conntrack xt_CHECKSUM ip6t_REJECT ipt_REJECT nf_nat_tftp nft_objref nf_conntrack_tftp nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security tun bridge iptable_nat nf_nat nf_conntrack stp llc nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat intel_rapl_msr intel_rapl_common sb_edac snd_hda_codec_realtek x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp ledtrig_audio snd_hda_codec_hdmi coretemp kvm_intel snd_hda_intel joydev snd_intel_dspcfg apple_mfi_fastcharge snd_hda_codec kvm snd_hda_core iTCO_wdt irqbypass intel_pmc_bxt snd_hwdep iTCO_vendor_support snd_seq ipmi_si rapl snd_seq_device ipmi_devintf intel_cstate
[27221.673569]  snd_pcm mei_me intel_uncore i2c_i801 ipmi_msghandler pcspkr snd_timer i2c_smbus mei snd lpc_ich ioatdma soundcore acpi_power_meter acpi_pad auth_rpcgss binfmt_misc sunrpc ip_tables radeon uas usb_storage drm_ttm_helper ttm drm_kms_helper igb cec crct10dif_pclmul crc32_pclmul crc32c_intel dca drm raid0 ghash_clmulni_intel wmi i2c_algo_bit fuse ecryptfs
[27221.673609] ---[ end trace 98f04a1b0e5570b4 ]---
[27221.726254] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[27221.726277] Code: c7 c7 08 b7 40 9d e8 77 3f fe ff 0f 0b 48 89 fe 48 c7 c7 98 b7 40 9d e8 66 3f fe ff 0f 0b 48 c7 c7 48 b8 40 9d e8 58 3f fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 08 b8 40 9d e8 44 3f fe ff 0f 0b
[27221.726281] RSP: 0000:ffffac17007f3c20 EFLAGS: 00010286
[27221.726284] RAX: 0000000000000054 RBX: ffffffffc02e4e40 RCX: 0000000000000000
[27221.726286] RDX: ffff98e5df866ba0 RSI: ffff98e5df858ac0 RDI: ffff98e5df858ac0
[27221.726288] RBP: 0000000000000080 R08: 0000000000000000 R09: ffffac17007f3a58
[27221.726290] R10: ffffac17007f3a50 R11: ffffffff9d744ca8 R12: 0000000000000000
[27221.726292] R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02e4ba0
[27221.726294] FS:  0000000000000000(0000) GS:ffff98e5df840000(0000) knlGS:0000000000000000
[27221.726296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27221.726298] CR2: 00000000004fea86 CR3: 000000079a9e4001 CR4: 00000000001726e0


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 5.11-rc1 TTM list corruption
  2021-01-06 16:54                 ` David Woodhouse
@ 2021-01-06 17:10                   ` Alex Deucher
  0 siblings, 0 replies; 13+ messages in thread
From: Alex Deucher @ 2021-01-06 17:10 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Christian König, Huang Rui, Borislav Petkov, lkml, dri-devel

On Wed, Jan 6, 2021 at 11:54 AM David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Tue, 2021-01-05 at 16:40 +0100, Christian König wrote:
> > Am 05.01.21 um 13:20 schrieb Huang Rui:
> > > On Tue, Jan 05, 2021 at 07:43:51PM +0800, Borislav Petkov wrote:
> > > > On Tue, Jan 05, 2021 at 07:08:52PM +0800, Huang Rui wrote:
> > > > > Ah, this asic is a bit old and still use radeon driver. So we didn't
> > > > > reproduce it on amdgpu driver. I don't have such the old asic in my hand.
> > > > > May we know whether this issue can be duplicated after SI which is used
> > > > > amdgpu module (not sure whether you have recent APU or GPU)?
> > > >
> > > > The latest I have (I think it is the latest) is:
> > > >
> > > > [    1.826102] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x17AA:0x5099 0xD1).
> > > >
> > > > and so far that hasn't triggered it. Which makes sense because that
> > > > thing uses amdgpu:
> > > >
> > > > [    1.810260] [drm] amdgpu kernel modesetting enabled.
> > >
> > > Yes! Renoir is late enough for amdgpu kernel module. :-)
> > > Please let us know if you still encounter the issue.
> >
> > Thanks for the hints guys. You need a rather specific configuration, but
> > I can reproduce this now.
> >
> > Let's see what the problem is here.
>
> FWIW I'm seeing it here on my workstation too.

Should be fixed with this patch set I think:
https://patchwork.freedesktop.org/series/85515/

Alex

>
> [    3.952102] [drm] radeon kernel modesetting enabled.
> [    3.952885] checking generic (90000000 300000) vs hw (90000000 10000000)
> [    3.952898] fb0: switching to radeondrmfb from EFI VGA
> [    3.953665] Console: switching to colour dummy device 80x25
> [    3.953696] radeon 0000:03:00.0: vgaarb: deactivate vga console
> [    3.953898] [drm] initializing kernel modesetting (CYPRESS 0x1002:0x6898 0x1462:0x8032 0x00).
> [    3.953940] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000cbfff window]
> [    3.953945] caller pci_map_rom+0x6c/0x1b0 mapping multiple BARs
> [    3.953972] ATOM BIOS: 113
> [    3.954028] radeon 0000:03:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
> [    3.954032] radeon 0000:03:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
> [    3.954037] [drm] Detected VRAM RAM=1024M, BAR=256M
> [    3.954039] [drm] RAM width 256bits DDR
> [    3.954087] [TTM] Zone  kernel: Available graphics memory: 16389788 KiB
> [    3.954090] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
> [    3.954105] [drm] radeon: 1024M of VRAM memory ready
> [    3.954107] [drm] radeon: 1024M of GTT memory ready.
> [    3.954114] [drm] Loading CYPRESS Microcode
> [    3.954168] [drm] Internal thermal controller with fan control
> [    3.954531] usb 3-1.1.1: New USB device found, idVendor=10d5, idProduct=1234, bcdDevice= 9.02
> [    3.954539] usb 3-1.1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> [    3.958098] hub 3-1.1.1:1.0: USB hub found
> [    3.959704] hub 3-1.1.1:1.0: 4 ports detected
> [    3.975098] [drm] radeon: dpm initialized
> [    3.975159] [drm] GART: num cpu pages 262144, num gpu pages 262144
> [    3.976074] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
> [    3.979669] igb 0000:01:00.0 eno0: renamed from eth0
> [    3.993789] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
> [    3.993912] radeon 0000:03:00.0: WB enabled
> [    3.993915] radeon 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00
> [    3.993918] radeon 0000:03:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
> [    3.994359] radeon 0000:03:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418
> [    3.994531] radeon 0000:03:00.0: radeon: MSI limited to 32-bit
> [    3.994563] radeon 0000:03:00.0: radeon: using MSI.
> [    3.994581] [drm] radeon: irq initialized.
> [    4.011086] [drm] ring test on 0 succeeded in 1 usecs
> [    4.011094] [drm] ring test on 3 succeeded in 2 usecs
> [    4.030666] EXT4-fs (md127): mounted filesystem with ordered data mode. Opts: (null)
> [    4.188159] [drm] ring test on 5 succeeded in 1 usecs
> [    4.188165] [drm] UVD initialized successfully.
> [    4.188326] [drm] ib test on ring 0 succeeded in 0 usecs
> [    4.188371] [drm] ib test on ring 3 succeeded in 0 usecs
> ...
> [    4.839982] [drm] ib test on ring 5 succeeded
> [    4.841079] [drm] Radeon Display Connectors
> [    4.841087] [drm] Connector 0:
> [    4.841090] [drm]   DP-1
> [    4.841094] [drm]   HPD4
> [    4.841097] [drm]   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
> [    4.841104] [drm]   Encoders:
> [    4.841107] [drm]     DFP1: INTERNAL_UNIPHY2
> [    4.841111] [drm] Connector 1:
> [    4.841114] [drm]   HDMI-A-1
> [    4.841118] [drm]   HPD5
> [    4.841120] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
> [    4.841127] [drm]   Encoders:
> [    4.841130] [drm]     DFP2: INTERNAL_UNIPHY2
> [    4.841133] [drm] Connector 2:
> [    4.841136] [drm]   DVI-I-1
> [    4.841139] [drm]   HPD1
> [    4.841142] [drm]   DDC: 0x6450 0x6450 0x6454 0x6454 0x6458 0x6458 0x645c 0x645c
> [    4.841149] [drm]   Encoders:
> [    4.841151] [drm]     DFP3: INTERNAL_UNIPHY1
> [    4.841155] [drm]     CRT2: INTERNAL_KLDSCP_DAC2
> [    4.841159] [drm] Connector 3:
> [    4.841162] [drm]   DVI-I-2
> [    4.841165] [drm]   HPD6
> [    4.841168] [drm]   DDC: 0x6470 0x6470 0x6474 0x6474 0x6478 0x6478 0x647c 0x647c
> [    4.841174] [drm]   Encoders:
> [    4.841177] [drm]     DFP4: INTERNAL_UNIPHY
> [    4.841180] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
> [    4.921539] [drm] fb mappable at 0x9034D000
> [    4.921547] [drm] vram apper at 0x90000000
> [    4.921549] [drm] size 9216000
> [    4.921552] [drm] fb depth is 24
> [    4.921555] [drm]    pitch is 7680
> [    4.921680] fbcon: radeondrmfb (fb0) is primary device
> [    4.943121] Console: switching to colour frame buffer device 240x75
> [    4.950509] radeon 0000:03:00.0: [drm] fb0: radeondrmfb frame buffer device
> [    4.959011] [drm] Initialized radeon 2.50.0 20080528 for 0000:03:00.0 on minor 0
>
>
> ...
>
> [27221.673320] list_del corruption. next->prev should be ffffffffc02e4e40, but was ffff98de96e40ed0
> [27221.673355] ------------[ cut here ]------------
> [27221.673357] kernel BUG at lib/list_debug.c:54!
> [27221.673365] invalid opcode: 0000 [#1] SMP PTI
> [27221.673370] CPU: 9 PID: 263 Comm: kswapd0 Tainted: G S        I       5.10.0+ #701
> [27221.673373] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> [27221.673376] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
> [27221.673386] Code: c7 c7 08 b7 40 9d e8 77 3f fe ff 0f 0b 48 89 fe 48 c7 c7 98 b7 40 9d e8 66 3f fe ff 0f 0b 48 c7 c7 48 b8 40 9d e8 58 3f fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 08 b8 40 9d e8 44 3f fe ff 0f 0b
> [27221.673389] RSP: 0000:ffffac17007f3c20 EFLAGS: 00010286
> [27221.673394] RAX: 0000000000000054 RBX: ffffffffc02e4e40 RCX: 0000000000000000
> [27221.673396] RDX: ffff98e5df866ba0 RSI: ffff98e5df858ac0 RDI: ffff98e5df858ac0
> [27221.673398] RBP: 0000000000000080 R08: 0000000000000000 R09: ffffac17007f3a58
> [27221.673401] R10: ffffac17007f3a50 R11: ffffffff9d744ca8 R12: 0000000000000000
> [27221.673403] R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02e4ba0
> [27221.673405] FS:  0000000000000000(0000) GS:ffff98e5df840000(0000) knlGS:0000000000000000
> [27221.673408] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [27221.673411] CR2: 00000000004fea86 CR3: 000000079a9e4001 CR4: 00000000001726e0
> [27221.673414] Call Trace:
> [27221.673420]  ttm_pool_shrink+0x53/0xb0 [ttm]
> [27221.673433]  ttm_pool_shrinker_scan+0xa/0x20 [ttm]
> [27221.673440]  do_shrink_slab+0x145/0x240
> [27221.673447]  shrink_slab+0x9c/0x280
> [27221.673451]  shrink_node+0x2c2/0x6f0
> [27221.673456]  balance_pgdat+0x2ff/0x620
> [27221.673461]  kswapd+0x1e6/0x360
> [27221.673464]  ? finish_wait+0x80/0x80
> [27221.673471]  ? balance_pgdat+0x620/0x620
> [27221.673474]  kthread+0x11b/0x140
> [27221.673479]  ? __kthread_bind_mask+0x60/0x60
> [27221.673483]  ret_from_fork+0x22/0x30
> [27221.673491] Modules linked in: vhost_net vhost vhost_iotlb tap xt_MASQUERADE xt_conntrack xt_CHECKSUM ip6t_REJECT ipt_REJECT nf_nat_tftp nft_objref nf_conntrack_tftp nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security tun bridge iptable_nat nf_nat nf_conntrack stp llc nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat intel_rapl_msr intel_rapl_common sb_edac snd_hda_codec_realtek x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp ledtrig_audio snd_hda_codec_hdmi coretemp kvm_intel snd_hda_intel joydev snd_intel_dspcfg apple_mfi_fastcharge snd_hda_codec kvm snd_hda_core iTCO_wdt irqbypass intel_pmc_bxt snd_hwdep iTCO_vendor_support snd_seq ipmi_si rapl snd_seq_device ipmi_devintf intel_cstate
> [27221.673569]  snd_pcm mei_me intel_uncore i2c_i801 ipmi_msghandler pcspkr snd_timer i2c_smbus mei snd lpc_ich ioatdma soundcore acpi_power_meter acpi_pad auth_rpcgss binfmt_misc sunrpc ip_tables radeon uas usb_storage drm_ttm_helper ttm drm_kms_helper igb cec crct10dif_pclmul crc32_pclmul crc32c_intel dca drm raid0 ghash_clmulni_intel wmi i2c_algo_bit fuse ecryptfs
> [27221.673609] ---[ end trace 98f04a1b0e5570b4 ]---
> [27221.726254] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
> [27221.726277] Code: c7 c7 08 b7 40 9d e8 77 3f fe ff 0f 0b 48 89 fe 48 c7 c7 98 b7 40 9d e8 66 3f fe ff 0f 0b 48 c7 c7 48 b8 40 9d e8 58 3f fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 08 b8 40 9d e8 44 3f fe ff 0f 0b
> [27221.726281] RSP: 0000:ffffac17007f3c20 EFLAGS: 00010286
> [27221.726284] RAX: 0000000000000054 RBX: ffffffffc02e4e40 RCX: 0000000000000000
> [27221.726286] RDX: ffff98e5df866ba0 RSI: ffff98e5df858ac0 RDI: ffff98e5df858ac0
> [27221.726288] RBP: 0000000000000080 R08: 0000000000000000 R09: ffffac17007f3a58
> [27221.726290] R10: ffffac17007f3a50 R11: ffffffff9d744ca8 R12: 0000000000000000
> [27221.726292] R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02e4ba0
> [27221.726294] FS:  0000000000000000(0000) GS:ffff98e5df840000(0000) knlGS:0000000000000000
> [27221.726296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [27221.726298] CR2: 00000000004fea86 CR3: 000000079a9e4001 CR4: 00000000001726e0
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-06 17:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-31 10:40 5.11-rc1 TTM list corruption Borislav Petkov
2021-01-01 14:34 ` Christian König
2021-01-04 10:58   ` Borislav Petkov
2021-01-04 14:48     ` Christian König
2021-01-05  4:12     ` Huang Rui
2021-01-05 10:31       ` Borislav Petkov
2021-01-05 11:08         ` Huang Rui
2021-01-05 11:34           ` Christian König
2021-01-05 11:43           ` Borislav Petkov
2021-01-05 12:20             ` Huang Rui
2021-01-05 15:40               ` Christian König
2021-01-06 16:54                 ` David Woodhouse
2021-01-06 17:10                   ` Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).