All of lore.kernel.org
 help / color / mirror / Atom feed
* amd_sfh driver causes kernel oops during boot
@ 2023-05-23 17:27 Haochen Tong
  2023-05-24  3:58 ` Bagas Sanjaya
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Haochen Tong @ 2023-05-23 17:27 UTC (permalink / raw)
  To: stable; +Cc: regressions, linux-input, Basavaraj Natikar

Hi,

Since kernel 6.3.0 (and also 6.4rc3), on a ThinkPad Z13 system with Arch 
Linux, I've noticed that the amd_sfh driver spews a lot of stack traces 
during boot. Sometimes it is an oops:

BUG: unable to handle page fault for address: 000000000001000f
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 8 PID: 457 Comm: (udev-worker) Not tainted 6.3.3-arch1-1 #1 
fa7b7e0107004b3021a57a74b951e0a25e7e8584
Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET47W (1.27 ) 
12/08/2022
RIP: 0010:amd_sfh_get_report+0x1e/0x110 [amd_sfh]
Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 
57 41 56 41 55 41 54 55 53 48 8b 87 60 1d 00 00 48 8b 68 08 <8b> 45 10 
85 c0 0f 84 a9 00 00 00 49 89 fc 41 89 f7 41 89 d6 31 db
RSP: 0018:ffffb164426f3a20 EFLAGS: 00010246
RAX: ffff9b0ae6b7bd00 RBX: ffff9b0ac0f46000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff9b0ac0f46000
RBP: 000000000000ffff R08: ffffb164426f3ab8 R09: ffffb164426f3ab8
R10: 000000000020031b R11: ffff9b0ace40ac00 R12: ffff9b0ace40ac00
R13: 0000000000000002 R14: 0000000000000002 R15: ffff9b0acd213010
FS:  00007fe9ceb82200(0000) GS:ffff9b1122000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000001000f CR3: 000000010940c000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
   <TASK>
   amdtp_hid_request+0x36/0x50 [amd_sfh 
2e3095779aada9fdb1764f08ca578ccb14e41fe4]
   sensor_hub_get_feature+0xad/0x170 [hid_sensor_hub 
d6157999c9d260a1bfa6f27d4a0dc2c3e2c5654e]
   hid_sensor_parse_common_attributes+0x217/0x310 [hid_sensor_iio_common 
07a7935272aa9c7a28193b574580b3e953a64ec4]
   hid_gyro_3d_probe+0x7f/0x2e0 [hid_sensor_gyro_3d 
9f2eb51294a1f0c0315b365f335617cbaef01eab]
   platform_probe+0x44/0xa0
   really_probe+0x19e/0x3e0
   ? __pfx___driver_attach+0x10/0x10
   __driver_probe_device+0x78/0x160
   driver_probe_device+0x1f/0x90
   __driver_attach+0xd2/0x1c0
   bus_for_each_dev+0x88/0xd0
   bus_add_driver+0x116/0x220
   driver_register+0x59/0x100
   ? __pfx_init_module+0x10/0x10 [hid_sensor_gyro_3d 
9f2eb51294a1f0c0315b365f335617cbaef01eab]
   do_one_initcall+0x5d/0x240
   do_init_module+0x4a/0x200
   __do_sys_init_module+0x17f/0x1b0
   do_syscall_64+0x60/0x90
   ? ksys_read+0x6f/0xf0
   ? syscall_exit_to_user_mode+0x1b/0x40
   ? do_syscall_64+0x6c/0x90
   ? exc_page_fault+0x7c/0x180
   entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fe9ce721f9e
Code: 48 8b 0d bd ed 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 
00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 8a ed 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd280dd828 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055b72a37f630 RCX: 00007fe9ce721f9e
RDX: 00007fe9cec7a343 RSI: 00000000000077f8 RDI: 000055b72a56c7f0
RBP: 00007fe9cec7a343 R08: 00000000000077f8 R09: 0000000000000000
R10: 000000000001a0a1 R11: 0000000000000246 R12: 0000000000020000
R13: 000055b72a363b90 R14: 000055b72a37f630 R15: 000055b72a36a070
   </TASK>
Modules linked in: hid_sensor_accel_3d(+) hid_sensor_gyro_3d(+) qrtr 
hid_sensor_trigger snd_sof industrialio_triggered_buffer ath11k_pci(+) 
kfifo_buf snd_sof_utils hid_sensor_iio_common joydev ath11k industrialio 
snd_soc_core mousedev qmi_helpers snd_compress hid_sensor_hub 
snd_hda_scodec_cs35l41_spi ac97_bus snd_hda_codec_realtek(+) 
snd_pcm_dmaengine intel_rapl_msr snd_hda_codec_hdmi 
snd_hda_codec_generic intel_rapl_common mac80211 snd_pci_ps btusb 
snd_rpl_pci_acp6x btrtl snd_hda_intel edac_mce_amd uvcvideo btbcm 
snd_acp_pci snd_intel_dspcfg snd_pci_acp6x videobuf2_vmalloc 
snd_intel_sdw_acpi libarc4 uvc btintel snd_usb_audio(+) snd_pci_acp5x 
videobuf2_memops btmtk snd_hda_codec kvm_amd videobuf2_v4l2 
snd_hda_scodec_cs35l41_i2c snd_usbmidi_lib snd_hda_scodec_cs35l41 
snd_rn_pci_acp3x ucsi_acpi bluetooth videodev snd_hda_core typec_ucsi 
snd_acp_config snd_hda_cs_dsp_ctls wacom(+) hid_multitouch cfg80211 
snd_rawmidi sp5100_tco kvm snd_seq_device cs_dsp videobuf2_common typec 
ecdh_generic snd_soc_acpi
   think_lmi snd_hwdep snd_pcm irqbypass crc16 snd_soc_cs35l41_lib mhi 
thunderbolt firmware_attributes_class snd_pci_acp3x amd_sfh(+) k10temp 
psmouse roles rapl i2c_piix4 mc snd_timer wmi_bmof 
serial_multi_instantiate i2c_hid_acpi acpi_tad i2c_hid amd_pmf amd_pmc 
mac_hid sch_fq tcp_bbr dm_multipath i2c_dev crypto_user fuse loop zram 
ip_tables x_tables xfs libcrc32c crc32c_generic dm_crypt cbc 
encrypted_keys trusted asn1_encoder tee usbhid dm_mod amdgpu 
i2c_algo_bit serio_raw thinkpad_acpi drm_ttm_helper atkbd libps2 
crct10dif_pclmul vivaldi_fmap crc32_pclmul ledtrig_audio crc32c_intel 
polyval_clmulni ttm polyval_generic drm_buddy nvme gf128mul 
platform_profile gpu_sched ghash_clmulni_intel sha512_ssse3 snd 
aesni_intel soundcore drm_display_helper crypto_simd rfkill nvme_core 
xhci_pci cryptd cec ccp xhci_pci_renesas i8042 video nvme_common serio wmi
CR2: 000000000001000f
---[ end trace 0000000000000000 ]---
RIP: 0010:amd_sfh_get_report+0x1e/0x110 [amd_sfh]
Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 
57 41 56 41 55 41 54 55 53 48 8b 87 60 1d 00 00 48 8b 68 08 <8b> 45 10 
85 c0 0f 84 a9 00 00 00 49 89 fc 41 89 f7 41 89 d6 31 db
RSP: 0018:ffffb164426f3a20 EFLAGS: 00010246
RAX: ffff9b0ae6b7bd00 RBX: ffff9b0ac0f46000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff9b0ac0f46000
RBP: 000000000000ffff R08: ffffb164426f3ab8 R09: ffffb164426f3ab8
R10: 000000000020031b R11: ffff9b0ace40ac00 R12: ffff9b0ace40ac00
R13: 0000000000000002 R14: 0000000000000002 R15: ffff9b0acd213010
FS:  00007fe9ceb82200(0000) GS:ffff9b1122000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000001000f CR3: 000000010940c000 CR4: 0000000000750ee0
PKRU: 55555554

Sometimes it is a list corruption in the same function with a similar stack:

------------[ cut here ]------------
list_add corruption. next is NULL.
WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0
...
CPU: 5 PID: 433 Comm: (udev-worker) Not tainted 6.4.0-rc3-1-mainline #1 
b60166e85cb97a6631db26f9dcda0196ed7a0c93
Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET47W (1.27 ) 
12/08/2022
RIP: 0010:__list_add_valid+0x57/0xa0
Code: 01 00 00 00 c3 cc cc cc cc 48 c7 c7 58 91 e6 9a e8 1e b9 a8 ff 0f 
0b 31 c0 c3 cc cc cc cc 48 c7 c7 80 91 e6 9a e8 09 b9 a8 ff <0f> 0b eb 
e9 48 89 c1 48 c7 c7 a8 91 e6 9a e8 f6 b8 a8 ff 0f 0b eb
RSP: 0018:ffffad9dc0c7bb10 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff92d5a8099448 RCX: 0000000000000027
RDX: ffff92dbe1f61688 RSI: 0000000000000001 RDI: ffff92dbe1f61680
RBP: ffff92d59ea93508 R08: 0000000000000000 R09: ffffad9dc0c7b9a0
R10: 0000000000000003 R11: ffffffff9b6ca808 R12: 0000000000000000
R13: ffff92d5a8099440 R14: ffff92d59ea93760 R15: 0000000000000002
FS:  00007fbaf0262200(0000) GS:ffff92dbe1f40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005651de666000 CR3: 000000011cfee000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
  <TASK>
  amd_sfh_get_report+0xba/0x110 [amd_sfh 
78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
  amdtp_hid_request+0x36/0x50 [amd_sfh 
78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
  sensor_hub_get_feature+0xad/0x170 [hid_sensor_hub 
30e53e2c49ea1702e2482c0b3860e22265679e39]
  hid_sensor_parse_common_attributes+0x217/0x310 [hid_sensor_iio_common 
ed7fba7a4d4147d48156e6a4b2a034ad3fc94350]
  hid_gyro_3d_probe+0x7f/0x2e0 [hid_sensor_gyro_3d 
10978a2cdfc8979f2a7366fcd005e0ea826088eb]
  platform_probe+0x44/0xa0
  really_probe+0x19e/0x3e0
  ? __pfx___driver_attach+0x10/0x10
  __driver_probe_device+0x78/0x160
  driver_probe_device+0x1f/0x90
  __driver_attach+0xd2/0x1c0
  bus_for_each_dev+0x88/0xd0
  bus_add_driver+0x116/0x220
  driver_register+0x59/0x100
  ? __pfx_hid_gyro_3d_platform_driver_init+0x10/0x10 [hid_sensor_gyro_3d 
10978a2cdfc8979f2a7366fcd005e0ea826088eb]
  do_one_initcall+0x5d/0x240
  do_init_module+0x60/0x240
  __do_sys_init_module+0x17f/0x1b0
  do_syscall_64+0x60/0x90
  ? exc_page_fault+0x7f/0x180
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fbaf06c0f9e
Code: 48 8b 0d bd ed 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 
00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 8a ed 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc5ce88528 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 00005651de36dff0 RCX: 00007fbaf06c0f9e
RDX: 00007fbaf0ba9343 RSI: 00000000000079f0 RDI: 00005651de646fe0
RBP: 00007fbaf0ba9343 R08: 00000000000079f0 R09: 0000000000000000
R10: 0000000000019fb1 R11: 0000000000000246 R12: 0000000000020000
R13: 00005651de45fb10 R14: 00005651de36dff0 R15: 00005651de44d5f0
  </TASK>
---[ end trace 0000000000000000 ]---

This occurs during almost every boot. When it happens there is usually a 
(udev-worker) process lingering forever, which is unkillable and even 
prevents shutdown.

Looking at past journals it never happened before 6.3 so I believe it is 
a regression.

Relevant device:
63:00.7 Signal processing controller [1180]: Advanced Micro Devices, 
Inc. [AMD] Sensor Fusion Hub [1022:15e4]
         Subsystem: Lenovo Sensor Fusion Hub [17aa:22f1]
         Kernel driver in use: pcie_mp2_amd
         Kernel modules: amd_sfh

I would appreciate it if someone could take a look at this.


Best regards,
Haochen Tong

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-05-23 17:27 amd_sfh driver causes kernel oops during boot Haochen Tong
@ 2023-05-24  3:58 ` Bagas Sanjaya
  2023-05-24  6:10   ` Haochen Tong
  2023-05-24 10:08 ` Bagas Sanjaya
  2023-07-07  9:37 ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 1 reply; 21+ messages in thread
From: Bagas Sanjaya @ 2023-05-24  3:58 UTC (permalink / raw)
  To: Haochen Tong, stable; +Cc: regressions, linux-input, Basavaraj Natikar

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

On Wed, May 24, 2023 at 01:27:57AM +0800, Haochen Tong wrote:
> Hi,
> 
> Since kernel 6.3.0 (and also 6.4rc3), on a ThinkPad Z13 system with Arch
> Linux, I've noticed that the amd_sfh driver spews a lot of stack traces
> during boot. Sometimes it is an oops:

What last kernel version before this regression occurs? Do you mean
v6.2?

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-05-24  3:58 ` Bagas Sanjaya
@ 2023-05-24  6:10   ` Haochen Tong
  2023-05-24 10:10     ` Bagas Sanjaya
  0 siblings, 1 reply; 21+ messages in thread
From: Haochen Tong @ 2023-05-24  6:10 UTC (permalink / raw)
  To: Bagas Sanjaya, stable; +Cc: regressions, linux-input, Basavaraj Natikar

Hi,

On 5/24/23 11:58, Bagas Sanjaya wrote:
> On Wed, May 24, 2023 at 01:27:57AM +0800, Haochen Tong wrote:
>> Hi,
>>
>> Since kernel 6.3.0 (and also 6.4rc3), on a ThinkPad Z13 system with Arch
>> Linux, I've noticed that the amd_sfh driver spews a lot of stack traces
>> during boot. Sometimes it is an oops:
> 
> What last kernel version before this regression occurs? Do you mean
> v6.2?
> 

I was using 6.2.12 (Arch Linux distro kernel) before seeing this regression.


Thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-05-23 17:27 amd_sfh driver causes kernel oops during boot Haochen Tong
  2023-05-24  3:58 ` Bagas Sanjaya
@ 2023-05-24 10:08 ` Bagas Sanjaya
  2023-07-07  9:37 ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 0 replies; 21+ messages in thread
From: Bagas Sanjaya @ 2023-05-24 10:08 UTC (permalink / raw)
  To: Haochen Tong, stable
  Cc: Linux Regressions, Linux Input Devices, Basavaraj Natikar,
	Jiri Kosina, Benjamin Tissoires

[-- Attachment #1: Type: text/plain, Size: 10214 bytes --]

On Wed, May 24, 2023 at 01:27:57AM +0800, Haochen Tong wrote:
> Hi,
> 
> Since kernel 6.3.0 (and also 6.4rc3), on a ThinkPad Z13 system with Arch
> Linux, I've noticed that the amd_sfh driver spews a lot of stack traces
> during boot. Sometimes it is an oops:
> 
> BUG: unable to handle page fault for address: 000000000001000f
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 8 PID: 457 Comm: (udev-worker) Not tainted 6.3.3-arch1-1 #1
> fa7b7e0107004b3021a57a74b951e0a25e7e8584
> Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET47W (1.27 )
> 12/08/2022
> RIP: 0010:amd_sfh_get_report+0x1e/0x110 [amd_sfh]
> Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 57
> 41 56 41 55 41 54 55 53 48 8b 87 60 1d 00 00 48 8b 68 08 <8b> 45 10 85 c0 0f
> 84 a9 00 00 00 49 89 fc 41 89 f7 41 89 d6 31 db
> RSP: 0018:ffffb164426f3a20 EFLAGS: 00010246
> RAX: ffff9b0ae6b7bd00 RBX: ffff9b0ac0f46000 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff9b0ac0f46000
> RBP: 000000000000ffff R08: ffffb164426f3ab8 R09: ffffb164426f3ab8
> R10: 000000000020031b R11: ffff9b0ace40ac00 R12: ffff9b0ace40ac00
> R13: 0000000000000002 R14: 0000000000000002 R15: ffff9b0acd213010
> FS:  00007fe9ceb82200(0000) GS:ffff9b1122000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000001000f CR3: 000000010940c000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
>   <TASK>
>   amdtp_hid_request+0x36/0x50 [amd_sfh
> 2e3095779aada9fdb1764f08ca578ccb14e41fe4]
>   sensor_hub_get_feature+0xad/0x170 [hid_sensor_hub
> d6157999c9d260a1bfa6f27d4a0dc2c3e2c5654e]
>   hid_sensor_parse_common_attributes+0x217/0x310 [hid_sensor_iio_common
> 07a7935272aa9c7a28193b574580b3e953a64ec4]
>   hid_gyro_3d_probe+0x7f/0x2e0 [hid_sensor_gyro_3d
> 9f2eb51294a1f0c0315b365f335617cbaef01eab]
>   platform_probe+0x44/0xa0
>   really_probe+0x19e/0x3e0
>   ? __pfx___driver_attach+0x10/0x10
>   __driver_probe_device+0x78/0x160
>   driver_probe_device+0x1f/0x90
>   __driver_attach+0xd2/0x1c0
>   bus_for_each_dev+0x88/0xd0
>   bus_add_driver+0x116/0x220
>   driver_register+0x59/0x100
>   ? __pfx_init_module+0x10/0x10 [hid_sensor_gyro_3d
> 9f2eb51294a1f0c0315b365f335617cbaef01eab]
>   do_one_initcall+0x5d/0x240
>   do_init_module+0x4a/0x200
>   __do_sys_init_module+0x17f/0x1b0
>   do_syscall_64+0x60/0x90
>   ? ksys_read+0x6f/0xf0
>   ? syscall_exit_to_user_mode+0x1b/0x40
>   ? do_syscall_64+0x6c/0x90
>   ? exc_page_fault+0x7c/0x180
>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7fe9ce721f9e
> Code: 48 8b 0d bd ed 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00
> 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d 8a ed 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffd280dd828 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 000055b72a37f630 RCX: 00007fe9ce721f9e
> RDX: 00007fe9cec7a343 RSI: 00000000000077f8 RDI: 000055b72a56c7f0
> RBP: 00007fe9cec7a343 R08: 00000000000077f8 R09: 0000000000000000
> R10: 000000000001a0a1 R11: 0000000000000246 R12: 0000000000020000
> R13: 000055b72a363b90 R14: 000055b72a37f630 R15: 000055b72a36a070
>   </TASK>
> Modules linked in: hid_sensor_accel_3d(+) hid_sensor_gyro_3d(+) qrtr
> hid_sensor_trigger snd_sof industrialio_triggered_buffer ath11k_pci(+)
> kfifo_buf snd_sof_utils hid_sensor_iio_common joydev ath11k industrialio
> snd_soc_core mousedev qmi_helpers snd_compress hid_sensor_hub
> snd_hda_scodec_cs35l41_spi ac97_bus snd_hda_codec_realtek(+)
> snd_pcm_dmaengine intel_rapl_msr snd_hda_codec_hdmi snd_hda_codec_generic
> intel_rapl_common mac80211 snd_pci_ps btusb snd_rpl_pci_acp6x btrtl
> snd_hda_intel edac_mce_amd uvcvideo btbcm snd_acp_pci snd_intel_dspcfg
> snd_pci_acp6x videobuf2_vmalloc snd_intel_sdw_acpi libarc4 uvc btintel
> snd_usb_audio(+) snd_pci_acp5x videobuf2_memops btmtk snd_hda_codec kvm_amd
> videobuf2_v4l2 snd_hda_scodec_cs35l41_i2c snd_usbmidi_lib
> snd_hda_scodec_cs35l41 snd_rn_pci_acp3x ucsi_acpi bluetooth videodev
> snd_hda_core typec_ucsi snd_acp_config snd_hda_cs_dsp_ctls wacom(+)
> hid_multitouch cfg80211 snd_rawmidi sp5100_tco kvm snd_seq_device cs_dsp
> videobuf2_common typec ecdh_generic snd_soc_acpi
>   think_lmi snd_hwdep snd_pcm irqbypass crc16 snd_soc_cs35l41_lib mhi
> thunderbolt firmware_attributes_class snd_pci_acp3x amd_sfh(+) k10temp
> psmouse roles rapl i2c_piix4 mc snd_timer wmi_bmof serial_multi_instantiate
> i2c_hid_acpi acpi_tad i2c_hid amd_pmf amd_pmc mac_hid sch_fq tcp_bbr
> dm_multipath i2c_dev crypto_user fuse loop zram ip_tables x_tables xfs
> libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder
> tee usbhid dm_mod amdgpu i2c_algo_bit serio_raw thinkpad_acpi drm_ttm_helper
> atkbd libps2 crct10dif_pclmul vivaldi_fmap crc32_pclmul ledtrig_audio
> crc32c_intel polyval_clmulni ttm polyval_generic drm_buddy nvme gf128mul
> platform_profile gpu_sched ghash_clmulni_intel sha512_ssse3 snd aesni_intel
> soundcore drm_display_helper crypto_simd rfkill nvme_core xhci_pci cryptd
> cec ccp xhci_pci_renesas i8042 video nvme_common serio wmi
> CR2: 000000000001000f
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:amd_sfh_get_report+0x1e/0x110 [amd_sfh]
> Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 57
> 41 56 41 55 41 54 55 53 48 8b 87 60 1d 00 00 48 8b 68 08 <8b> 45 10 85 c0 0f
> 84 a9 00 00 00 49 89 fc 41 89 f7 41 89 d6 31 db
> RSP: 0018:ffffb164426f3a20 EFLAGS: 00010246
> RAX: ffff9b0ae6b7bd00 RBX: ffff9b0ac0f46000 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff9b0ac0f46000
> RBP: 000000000000ffff R08: ffffb164426f3ab8 R09: ffffb164426f3ab8
> R10: 000000000020031b R11: ffff9b0ace40ac00 R12: ffff9b0ace40ac00
> R13: 0000000000000002 R14: 0000000000000002 R15: ffff9b0acd213010
> FS:  00007fe9ceb82200(0000) GS:ffff9b1122000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000001000f CR3: 000000010940c000 CR4: 0000000000750ee0
> PKRU: 55555554
> 
> Sometimes it is a list corruption in the same function with a similar stack:
> 
> ------------[ cut here ]------------
> list_add corruption. next is NULL.
> WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0
> ...
> CPU: 5 PID: 433 Comm: (udev-worker) Not tainted 6.4.0-rc3-1-mainline #1
> b60166e85cb97a6631db26f9dcda0196ed7a0c93
> Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET47W (1.27 )
> 12/08/2022
> RIP: 0010:__list_add_valid+0x57/0xa0
> Code: 01 00 00 00 c3 cc cc cc cc 48 c7 c7 58 91 e6 9a e8 1e b9 a8 ff 0f 0b
> 31 c0 c3 cc cc cc cc 48 c7 c7 80 91 e6 9a e8 09 b9 a8 ff <0f> 0b eb e9 48 89
> c1 48 c7 c7 a8 91 e6 9a e8 f6 b8 a8 ff 0f 0b eb
> RSP: 0018:ffffad9dc0c7bb10 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff92d5a8099448 RCX: 0000000000000027
> RDX: ffff92dbe1f61688 RSI: 0000000000000001 RDI: ffff92dbe1f61680
> RBP: ffff92d59ea93508 R08: 0000000000000000 R09: ffffad9dc0c7b9a0
> R10: 0000000000000003 R11: ffffffff9b6ca808 R12: 0000000000000000
> R13: ffff92d5a8099440 R14: ffff92d59ea93760 R15: 0000000000000002
> FS:  00007fbaf0262200(0000) GS:ffff92dbe1f40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005651de666000 CR3: 000000011cfee000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  amd_sfh_get_report+0xba/0x110 [amd_sfh
> 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
>  amdtp_hid_request+0x36/0x50 [amd_sfh
> 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
>  sensor_hub_get_feature+0xad/0x170 [hid_sensor_hub
> 30e53e2c49ea1702e2482c0b3860e22265679e39]
>  hid_sensor_parse_common_attributes+0x217/0x310 [hid_sensor_iio_common
> ed7fba7a4d4147d48156e6a4b2a034ad3fc94350]
>  hid_gyro_3d_probe+0x7f/0x2e0 [hid_sensor_gyro_3d
> 10978a2cdfc8979f2a7366fcd005e0ea826088eb]
>  platform_probe+0x44/0xa0
>  really_probe+0x19e/0x3e0
>  ? __pfx___driver_attach+0x10/0x10
>  __driver_probe_device+0x78/0x160
>  driver_probe_device+0x1f/0x90
>  __driver_attach+0xd2/0x1c0
>  bus_for_each_dev+0x88/0xd0
>  bus_add_driver+0x116/0x220
>  driver_register+0x59/0x100
>  ? __pfx_hid_gyro_3d_platform_driver_init+0x10/0x10 [hid_sensor_gyro_3d
> 10978a2cdfc8979f2a7366fcd005e0ea826088eb]
>  do_one_initcall+0x5d/0x240
>  do_init_module+0x60/0x240
>  __do_sys_init_module+0x17f/0x1b0
>  do_syscall_64+0x60/0x90
>  ? exc_page_fault+0x7f/0x180
>  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7fbaf06c0f9e
> Code: 48 8b 0d bd ed 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00
> 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d 8a ed 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffc5ce88528 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 00005651de36dff0 RCX: 00007fbaf06c0f9e
> RDX: 00007fbaf0ba9343 RSI: 00000000000079f0 RDI: 00005651de646fe0
> RBP: 00007fbaf0ba9343 R08: 00000000000079f0 R09: 0000000000000000
> R10: 0000000000019fb1 R11: 0000000000000246 R12: 0000000000020000
> R13: 00005651de45fb10 R14: 00005651de36dff0 R15: 00005651de44d5f0
>  </TASK>
> ---[ end trace 0000000000000000 ]---
> 
> This occurs during almost every boot. When it happens there is usually a
> (udev-worker) process lingering forever, which is unkillable and even
> prevents shutdown.
> 
> Looking at past journals it never happened before 6.3 so I believe it is a
> regression.
> 
> Relevant device:
> 63:00.7 Signal processing controller [1180]: Advanced Micro Devices, Inc.
> [AMD] Sensor Fusion Hub [1022:15e4]
>         Subsystem: Lenovo Sensor Fusion Hub [17aa:22f1]
>         Kernel driver in use: pcie_mp2_amd
>         Kernel modules: amd_sfh
> 

Thanks for the bug report. I'm adding it to regzbot:

#regzbot ^introduced: v6.2..v6.3
#regzbot title: amd_sfh driver causes kernel oops (udev-worker becomes zombie) on ThinkPad Z13

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-05-24  6:10   ` Haochen Tong
@ 2023-05-24 10:10     ` Bagas Sanjaya
  2023-06-05 11:24       ` Malte Starostik
  2023-06-06  2:39       ` Bagas Sanjaya
  0 siblings, 2 replies; 21+ messages in thread
From: Bagas Sanjaya @ 2023-05-24 10:10 UTC (permalink / raw)
  To: Haochen Tong, stable; +Cc: regressions, linux-input, Basavaraj Natikar

[-- Attachment #1: Type: text/plain, Size: 529 bytes --]

On Wed, May 24, 2023 at 02:10:31PM +0800, Haochen Tong wrote:
> > What last kernel version before this regression occurs? Do you mean
> > v6.2?
> > 
> 
> I was using 6.2.12 (Arch Linux distro kernel) before seeing this regression.

Can you perform bisection to find the culprit that introduces the
regression? Since you're on Arch Linux, see its wiki article [1] for
instructions.

Thanks.

[1]: https://wiki.archlinux.org/title/Bisecting_bugs_with_Git

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-05-24 10:10     ` Bagas Sanjaya
@ 2023-06-05 11:24       ` Malte Starostik
  2023-06-06  2:36         ` Bagas Sanjaya
  2023-06-06  2:39       ` Bagas Sanjaya
  1 sibling, 1 reply; 21+ messages in thread
From: Malte Starostik @ 2023-06-05 11:24 UTC (permalink / raw)
  To: bagasdotme; +Cc: basavaraj.natikar, linux-input, linux, regressions, stable

Hello,

chiming in here as I'm experiencing what looks like the exact same issue, also 
on a Lenovo Z13 notebook, also on Arch:
Oops during startup in task udev-worker followed by udev-worker blocking all 
attempts to suspend or cleanly shutdown/reboot the machine - in fact I first 
noticed because the machine surprised with repeatedly running out of battery 
after it had supposedly been in standby but couldn't. Only then I noticed the 
error on boot.

bisect result:
904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
commit 904e28c6de083fa4834cdbd0026470ddc30676fc
Merge: a738688177dc 2f7f4efb9411
Author: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Date:   Wed Feb 22 10:44:31 2023 +0100

    Merge branch 'for-6.3/hid-bpf' into for-linus
    
    Initial support of HID-BPF (Benjamin Tissoires)
    
    The history is a little long for this series, as it was intended to be
    sent for v6.2. However some last minute issues forced us to postpone it
    to v6.3.
    
    Conflicts:
    * drivers/hid/i2c-hid/Kconfig:
      commit bf7660dab30d ("HID: stop drivers from selecting CONFIG_HID")
      conflicts with commit 2afac81dd165 ("HID: fix I2C_HID not selected
      when I2C_HID_OF_ELAN is")
      the resolution is simple enough: just drop the "default" and "select"
      lines as the new commit from Arnd is doing


BR Malte

> On Wed, May 24, 2023 at 02:10:31PM +0800, Haochen Tong wrote:
> > > What last kernel version before this regression occurs? Do you mean
> > > v6.2?
> > > 
> > 
> > I was using 6.2.12 (Arch Linux distro kernel) before seeing this 
regression.
> 
> Can you perform bisection to find the culprit that introduces the
> regression? Since you're on Arch Linux, see its wiki article [1] for
> instructions.
> 
> Thanks.
> 
> [1]: https://wiki.archlinux.org/title/Bisecting_bugs_with_Git




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-05 11:24       ` Malte Starostik
@ 2023-06-06  2:36         ` Bagas Sanjaya
  2023-06-06  6:56           ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 21+ messages in thread
From: Bagas Sanjaya @ 2023-06-06  2:36 UTC (permalink / raw)
  To: Malte Starostik, Benjamin Tissoires
  Cc: basavaraj.natikar, linux-input, linux, regressions, stable

[-- Attachment #1: Type: text/plain, Size: 1656 bytes --]

On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
> Hello,
> 
> chiming in here as I'm experiencing what looks like the exact same issue, also 
> on a Lenovo Z13 notebook, also on Arch:
> Oops during startup in task udev-worker followed by udev-worker blocking all 
> attempts to suspend or cleanly shutdown/reboot the machine - in fact I first 
> noticed because the machine surprised with repeatedly running out of battery 
> after it had supposedly been in standby but couldn't. Only then I noticed the 
> error on boot.
> 
> bisect result:
> 904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
> commit 904e28c6de083fa4834cdbd0026470ddc30676fc
> Merge: a738688177dc 2f7f4efb9411
> Author: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> Date:   Wed Feb 22 10:44:31 2023 +0100
> 
>     Merge branch 'for-6.3/hid-bpf' into for-linus

Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related
to amd_sfh). Can you repeat the bisection?

Anyway, tl;dr:

> A: http://en.wikipedia.org/wiki/Top_post
> Q: Were do I find info about this thing called top-posting?
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> 
> A: No.
> Q: Should I include quotations after my reply?
> 
> http://daringfireball.net/2007/07/on_top

And telling regzbot:

#regzbot introduced: 904e28c6de083f
#regzbot title: HID-BPF feature causes amd_sfh kernel oops during boot and suspend/reboot

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-05-24 10:10     ` Bagas Sanjaya
  2023-06-05 11:24       ` Malte Starostik
@ 2023-06-06  2:39       ` Bagas Sanjaya
  2023-06-06  3:41         ` Haochen Tong
  1 sibling, 1 reply; 21+ messages in thread
From: Bagas Sanjaya @ 2023-06-06  2:39 UTC (permalink / raw)
  To: Haochen Tong, stable, Linux regression tracking (Thorsten Leemhuis)
  Cc: regressions, linux-input, Basavaraj Natikar

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On Wed, May 24, 2023 at 05:10:45PM +0700, Bagas Sanjaya wrote:
> On Wed, May 24, 2023 at 02:10:31PM +0800, Haochen Tong wrote:
> > > What last kernel version before this regression occurs? Do you mean
> > > v6.2?
> > > 
> > 
> > I was using 6.2.12 (Arch Linux distro kernel) before seeing this regression.
> 
> Can you perform bisection to find the culprit that introduces the
> regression? Since you're on Arch Linux, see its wiki article [1] for
> instructions.
> 

Haochen, any news on this? Has the bisection been done and any result?
Another reporter had concluded possibly bad bisect [1].

Thanks.

[1]: https://lore.kernel.org/regressions/3250319.ancTxkQ2z5@zen/

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-06  2:39       ` Bagas Sanjaya
@ 2023-06-06  3:41         ` Haochen Tong
  0 siblings, 0 replies; 21+ messages in thread
From: Haochen Tong @ 2023-06-06  3:41 UTC (permalink / raw)
  To: Bagas Sanjaya, stable, Linux regression tracking (Thorsten Leemhuis)
  Cc: regressions, linux-input, Basavaraj Natikar

On 6/6/23 10:39, Bagas Sanjaya wrote:
> On Wed, May 24, 2023 at 05:10:45PM +0700, Bagas Sanjaya wrote:
>> On Wed, May 24, 2023 at 02:10:31PM +0800, Haochen Tong wrote:
>>>> What last kernel version before this regression occurs? Do you mean
>>>> v6.2?
>>>>
>>>
>>> I was using 6.2.12 (Arch Linux distro kernel) before seeing this regression.
>>
>> Can you perform bisection to find the culprit that introduces the
>> regression? Since you're on Arch Linux, see its wiki article [1] for
>> instructions.
>>
> 
> Haochen, any news on this? Has the bisection been done and any result?
> Another reporter had concluded possibly bad bisect [1].
> 
> Thanks.
> 
> [1]: https://lore.kernel.org/regressions/3250319.ancTxkQ2z5@zen/
> 

Hi,

Sorry for the late reply. I haven't gotten enough time for it yet.

I took a look at the git logs, and it doesn't look like the modules 
involved in the original stack trace (amd_sfh, hid_sensor_hub, 
hid_sensor_iio_common, hid_sensor_gyro_3d) has received any significant 
changes between v6.2 and v6.3. IMHO, the bisect done by Malte might 
indicate that the issue could be a problem outside of these modules.

Also, I've upgrade from 6.3.3 to 6.3.5 a week ago and this issue hasn't 
happened so far in 4 reboots. However, there still doesn't seem to be 
any changes regarding these modules, so I'm not sure if it's fixed 
elsewhere or I'm just being lucky. It would be nice if someone can 
confirm or disprove this.


Thanks,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-06  2:36         ` Bagas Sanjaya
@ 2023-06-06  6:56           ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-06  8:08             ` Benjamin Tissoires
  2023-06-06  9:53             ` Malte Starostik
  0 siblings, 2 replies; 21+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-06-06  6:56 UTC (permalink / raw)
  To: Bagas Sanjaya, Malte Starostik, Benjamin Tissoires
  Cc: basavaraj.natikar, linux-input, linux, regressions, stable

On 06.06.23 04:36, Bagas Sanjaya wrote:
> On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
>> Hello,
>>
>> chiming in here as I'm experiencing what looks like the exact same issue, also 
>> on a Lenovo Z13 notebook, also on Arch:
>> Oops during startup in task udev-worker followed by udev-worker blocking all 
>> attempts to suspend or cleanly shutdown/reboot the machine - in fact I first 
>> noticed because the machine surprised with repeatedly running out of battery 
>> after it had supposedly been in standby but couldn't. Only then I noticed the 
>> error on boot.
>>
>> bisect result:
>> 904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
>> commit 904e28c6de083fa4834cdbd0026470ddc30676fc
>> Merge: a738688177dc 2f7f4efb9411
>> Author: Benjamin Tissoires <benjamin.tissoires@redhat.com>
>> Date:   Wed Feb 22 10:44:31 2023 +0100
>>
>>     Merge branch 'for-6.3/hid-bpf' into for-linus
> 
> Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related
> to amd_sfh). Can you repeat the bisection?

Well, amd_sfh afaics apparently interacts with HID (see trace earlier in
the thread), so it's not that far away. But it's a merge commit, which
is possible, but doesn't happen every day. So a recheck might really be
a good idea.

> Anyway, tl;dr:
> 
>> A: http://en.wikipedia.org/wiki/Top_post
>> Q: Were do I find info about this thing called top-posting?
> [...]

BTW, I'm not sure if this really is helpful. Teaching this to upcoming
kernel developers is definitely worth it, but I wonder if pushing this
on all reporters might do more harm than good. I also wonder if asking
them a bit more kindly might be wiser (e.g. instead of "Anyway, tl;dr:"
something like "BTW, please do not top-post:" or something like that maybe).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-06  6:56           ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-06-06  8:08             ` Benjamin Tissoires
  2023-06-06 15:25               ` Limonciello, Mario
  2023-06-06  9:53             ` Malte Starostik
  1 sibling, 1 reply; 21+ messages in thread
From: Benjamin Tissoires @ 2023-06-06  8:08 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Bagas Sanjaya, Malte Starostik, basavaraj.natikar, linux-input,
	linux, stable, Mario Limonciello


On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> 
> On 06.06.23 04:36, Bagas Sanjaya wrote:
> > On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
> >> Hello,
> >>
> >> chiming in here as I'm experiencing what looks like the exact same issue, also 
> >> on a Lenovo Z13 notebook, also on Arch:
> >> Oops during startup in task udev-worker followed by udev-worker blocking all 
> >> attempts to suspend or cleanly shutdown/reboot the machine - in fact I first 
> >> noticed because the machine surprised with repeatedly running out of battery 
> >> after it had supposedly been in standby but couldn't. Only then I noticed the 
> >> error on boot.
> >>
> >> bisect result:
> >> 904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
> >> commit 904e28c6de083fa4834cdbd0026470ddc30676fc
> >> Merge: a738688177dc 2f7f4efb9411
> >> Author: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> >> Date:   Wed Feb 22 10:44:31 2023 +0100
> >>
> >>     Merge branch 'for-6.3/hid-bpf' into for-linus
> > 
> > Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related
> > to amd_sfh). Can you repeat the bisection?
> 
> Well, amd_sfh afaics apparently interacts with HID (see trace earlier in
> the thread), so it's not that far away. But it's a merge commit, which
> is possible, but doesn't happen every day. So a recheck might really be
> a good idea.

Let's not rule out that there is a bad interaction between HID-BPF and
AMD SFH. HID-BPF is able to process any incoming HID event, whether it
comes from AND SFH, USB, BT, I2C or anything else.

However, looking at the stack trace in the initial report[0], it seems
we are getting the oops/stack traces while we are still in amd_sfh:

list_add corruption. next is NULL.
WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0
...
RIP: 0010:__list_add_valid+0x57/0xa0
...
Call Trace:
  <TASK>
  amd_sfh_get_report+0xba/0x110 [amd_sfh 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
...

If HID-BPF were involved, we should see a call to hid_input_report() IMO.
Also AMD SFH calls hid_input_report() in a workqueue, so I would expect
a different stack trace.

I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors are enabled,
clean up") because the stack trace says that there is a bad list_add,
which could happen if the object is not correctly initialized.

However, that commit was present in v6.2, so it might not be that one.

Back to the merge commit: the hid-bpf tree was merged in the hid tree
while it took its branch during the v6.1 cycle. So that might be the
reason you get this as a result of bisection because the AMD SFH code in
the hid-bpf branch is the one from the v6.1 kernel, and when you merge
it to the v6.2+ branch, you get a different code for that driver.

Cheers,
Benjamin

[0] https://lore.kernel.org/regressions/f40e3897-76f1-2cd0-2d83-e48d87130eab@hexchain.org/#t


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-06  6:56           ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-06  8:08             ` Benjamin Tissoires
@ 2023-06-06  9:53             ` Malte Starostik
  1 sibling, 0 replies; 21+ messages in thread
From: Malte Starostik @ 2023-06-06  9:53 UTC (permalink / raw)
  To: Bagas Sanjaya, Benjamin Tissoires,
	Linux regressions mailing list, regressions
  Cc: basavaraj.natikar, linux-input, linux

Am Dienstag, 6. Juni 2023, 08:56:16 CEST schrieb Linux regression tracking 
(Thorsten Leemhuis):
> On 06.06.23 04:36, Bagas Sanjaya wrote:
> > On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
> >> chiming in here as I'm experiencing what looks like the exact same issue,
> >> also on a Lenovo Z13 notebook, also on Arch:

> >> bisect result:
> >> 904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
> >> commit 904e28c6de083fa4834cdbd0026470ddc30676fc
> >> Merge: a738688177dc 2f7f4efb9411
> >> Author: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> >> Date:   Wed Feb 22 10:44:31 2023 +0100
> >> 
> >>     Merge branch 'for-6.3/hid-bpf' into for-linus
> > 
> > Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related
> > to amd_sfh). Can you repeat the bisection?

I'm digging further. That merge is what git bisect ended at, but admittedly my 
git skills and especially with a large codebase aren't too advanced.
While at 904e28c6de083fa4834cdbd0026470ddc30676fc, git show only shows the diff 
for tools/testing/selftests/Makefile which can't really be the culprit. 
However, git diff @~..@ has changes in drivers/hid/amd-sfh-hid/Kconfig (seems 
innocuous, too), but also some changes to drivers/hid/hid-core.c. Nothing 
obvious either, but at least it's not too far from the trace.

> Well, amd_sfh afaics apparently interacts with HID (see trace earlier in
> the thread), so it's not that far away. But it's a merge commit, which
> is possible, but doesn't happen every day. So a recheck might really be
> a good idea.

I will recheck some more, the Oops only happens with roughly 30 % chance 
during boot. When it doesn't, there seem to be no other issues until the next 
boot either. I made sure to reboot a few times after each bisect step, will 
look deeper into the area.

> > Anyway, tl;dr:
> >> A: http://en.wikipedia.org/wiki/Top_post
> >> Q: Were do I find info about this thing called top-posting?
> > 
> > [...]
> 
> BTW, I'm not sure if this really is helpful. Teaching this to upcoming
> kernel developers is definitely worth it, but I wonder if pushing this
> on all reporters might do more harm than good. I also wonder if asking
> them a bit more kindly might be wiser (e.g. instead of "Anyway, tl;dr:"
> something like "BTW, please do not top-post:" or something like that maybe).

Thanks, and I agree in general. However, my case was in fact even worse :-) 
I'm totally aware of the badness of top-posting. It happened because I had a 
draft of the reply. Set In-Reply-To from the link in the wev archive and 
pasted the previous message from there. Couple days later, I just pasted the 
result on top and disregarded the existing text.

BR Malte



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-06  8:08             ` Benjamin Tissoires
@ 2023-06-06 15:25               ` Limonciello, Mario
  2023-06-06 22:57                 ` Malte Starostik
  0 siblings, 1 reply; 21+ messages in thread
From: Limonciello, Mario @ 2023-06-06 15:25 UTC (permalink / raw)
  To: Benjamin Tissoires, Linux regressions mailing list
  Cc: Bagas Sanjaya, Malte Starostik, basavaraj.natikar, linux-input,
	linux, stable


On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
> On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 06.06.23 04:36, Bagas Sanjaya wrote:
>>> On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
>>>> Hello,
>>>>
>>>> chiming in here as I'm experiencing what looks like the exact same issue, also
>>>> on a Lenovo Z13 notebook, also on Arch:
>>>> Oops during startup in task udev-worker followed by udev-worker blocking all
>>>> attempts to suspend or cleanly shutdown/reboot the machine - in fact I first
>>>> noticed because the machine surprised with repeatedly running out of battery
>>>> after it had supposedly been in standby but couldn't. Only then I noticed the
>>>> error on boot.
>>>>
>>>> bisect result:
>>>> 904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
>>>> commit 904e28c6de083fa4834cdbd0026470ddc30676fc
>>>> Merge: a738688177dc 2f7f4efb9411
>>>> Author: Benjamin Tissoires <benjamin.tissoires@redhat.com>
>>>> Date:   Wed Feb 22 10:44:31 2023 +0100
>>>>
>>>>      Merge branch 'for-6.3/hid-bpf' into for-linus
>>> Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related
>>> to amd_sfh). Can you repeat the bisection?
>> Well, amd_sfh afaics apparently interacts with HID (see trace earlier in
>> the thread), so it's not that far away. But it's a merge commit, which
>> is possible, but doesn't happen every day. So a recheck might really be
>> a good idea.
> Let's not rule out that there is a bad interaction between HID-BPF and
> AMD SFH. HID-BPF is able to process any incoming HID event, whether it
> comes from AND SFH, USB, BT, I2C or anything else.
>
> However, looking at the stack trace in the initial report[0], it seems
> we are getting the oops/stack traces while we are still in amd_sfh:
>
> list_add corruption. next is NULL.
> WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0
> ...
> RIP: 0010:__list_add_valid+0x57/0xa0
> ...
> Call Trace:
>    <TASK>
>    amd_sfh_get_report+0xba/0x110 [amd_sfh 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
> ...
>
> If HID-BPF were involved, we should see a call to hid_input_report() IMO.
> Also AMD SFH calls hid_input_report() in a workqueue, so I would expect
> a different stack trace.
>
> I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors are enabled,
> clean up") because the stack trace says that there is a bad list_add,
> which could happen if the object is not correctly initialized.
>
> However, that commit was present in v6.2, so it might not be that one.
>
> Back to the merge commit: the hid-bpf tree was merged in the hid tree
> while it took its branch during the v6.1 cycle. So that might be the
> reason you get this as a result of bisection because the AMD SFH code in
> the hid-bpf branch is the one from the v6.1 kernel, and when you merge
> it to the v6.2+ branch, you get a different code for that driver.
>
> Cheers,
> Benjamin
>
> [0] https://lore.kernel.org/regressions/f40e3897-76f1-2cd0-2d83-e48d87130eab@hexchain.org/#t
If I'm not mistaken the Z13 doesn't actually have any
sensors connected to SFH.  So I think the suspicion on
7bcfdab3f0c6 and theory this is triggered by HID init makes
a lot of sense.

Can you try this patch?

diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c 
b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
index d9b7b01900b5..fa693a5224c6 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
@@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev 
*privdata)
                         devm_kfree(dev, cl_data->report_descr[i]);
                 }
                 dev_warn(dev, "Failed to discover, sensors not enabled 
is %d\n", cl_data->is_any_sensor_enabled);
+               cl_data->num_hid_devices = 0;
                 return -EOPNOTSUPP;
         }
         schedule_delayed_work(&cl_data->work_buffer, 
msecs_to_jiffies(AMD_SFH_IDLE_LOOP));


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-06 15:25               ` Limonciello, Mario
@ 2023-06-06 22:57                 ` Malte Starostik
  2023-06-20 13:20                   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 21+ messages in thread
From: Malte Starostik @ 2023-06-06 22:57 UTC (permalink / raw)
  To: Benjamin Tissoires, Linux regressions mailing list, Limonciello, Mario
  Cc: Bagas Sanjaya, basavaraj.natikar, linux-input, linux, stable

Am Dienstag, 6. Juni 2023, 17:25:13 CEST schrieb Limonciello, Mario:
> On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
> > On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> >>> On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
> >>>> Hello,
> >>>> 
> >>>> chiming in here as I'm experiencing what looks like the exact same
> >>>> issue, also on a Lenovo Z13 notebook, also on Arch:
> >>>> Oops during startup in task udev-worker followed by udev-worker
> >>>> blocking all attempts to suspend or cleanly shutdown/reboot the
> >>>> machine

> > I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors
> > are enabled, clean up") because the stack trace says that there is a bad
> > list_add, which could happen if the object is not correctly initialized.
> > 
> > However, that commit was present in v6.2, so it might not be that one.
> > 
> If I'm not mistaken the Z13 doesn't actually have any
> sensors connected to SFH.  So I think the suspicion on
> 7bcfdab3f0c6 and theory this is triggered by HID init makes
> a lot of sense.
> 
> Can you try this patch?
> 
> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> index d9b7b01900b5..fa693a5224c6 100644
> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> @@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev
> *privdata)
>                          devm_kfree(dev, cl_data->report_descr[i]);
>                  }
>                  dev_warn(dev, "Failed to discover, sensors not enabled
> is %d\n", cl_data->is_any_sensor_enabled);
> +               cl_data->num_hid_devices = 0;
>                  return -EOPNOTSUPP;
>          }
>          schedule_delayed_work(&cl_data->work_buffer,
> msecs_to_jiffies(AMD_SFH_IDLE_LOOP));

I applied this to 9e87b63ed37e202c77aa17d4112da6ae0c7c097c now, which was the 
origin when I started the whole bisection. Clean rebuild, issue still 
persists.

Out of 50 boots, I got:

25 clean
22 Oops as posted by the OP
1 same Oops, followed by a panic
1 lockup [1]
1 hanging with just a blank screen

Not sure whether the lockups are related, but [1] mentions modprobe and udev-
worker as well and all problems including the blank screen one appear roughly 
at the same time during boot. As this is before a graphics mode switch, I 
suspect the last mentioned case may be like [1] while the screen was blanked.
To support the timing correlation: the UVC error for the IR cam shown in the 
photo (normal boot noise) also appears right before the BUG in the non-lockup 
bad case.

I do see the dev_warn in dmesg, so the code path modified in your patch is 
indeed hit:
[   10.897521] pcie_mp2_amd 0000:63:00.7: Failed to discover, sensors not 
enabled is 1
[   10.897533] pcie_mp2_amd: probe of 0000:63:00.7 failed with error -95

BR Malte

[1] https://photos.app.goo.gl/2FAvQ7DqBsHEF6Bd8



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-06 22:57                 ` Malte Starostik
@ 2023-06-20 13:20                   ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-20 18:50                     ` Limonciello, Mario
                                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-06-20 13:20 UTC (permalink / raw)
  To: Malte Starostik, Benjamin Tissoires,
	Linux regressions mailing list, Limonciello, Mario
  Cc: Bagas Sanjaya, basavaraj.natikar, linux-input, linux, stable

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

What happens to this? From here it looks like there was no progress to
resolve the regression in the past two weeks, but maybe I just missed
something.

On 07.06.23 00:57, Malte Starostik wrote:
> Am Dienstag, 6. Juni 2023, 17:25:13 CEST schrieb Limonciello, Mario:
>> On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
>>> On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>> On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
>>>>>> Hello,
>>>>>>
>>>>>> chiming in here as I'm experiencing what looks like the exact same
>>>>>> issue, also on a Lenovo Z13 notebook, also on Arch:
>>>>>> Oops during startup in task udev-worker followed by udev-worker
>>>>>> blocking all attempts to suspend or cleanly shutdown/reboot the
>>>>>> machine
> 
>>> I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors
>>> are enabled, clean up") because the stack trace says that there is a bad
>>> list_add, which could happen if the object is not correctly initialized.
>>>
>>> However, that commit was present in v6.2, so it might not be that one.
>>>
>> If I'm not mistaken the Z13 doesn't actually have any
>> sensors connected to SFH.  So I think the suspicion on
>> 7bcfdab3f0c6 and theory this is triggered by HID init makes
>> a lot of sense.
>>
>> Can you try this patch?
>>
>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>> b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>> index d9b7b01900b5..fa693a5224c6 100644
>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>> @@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev
>> *privdata)
>>                          devm_kfree(dev, cl_data->report_descr[i]);
>>                  }
>>                  dev_warn(dev, "Failed to discover, sensors not enabled
>> is %d\n", cl_data->is_any_sensor_enabled);
>> +               cl_data->num_hid_devices = 0;
>>                  return -EOPNOTSUPP;
>>          }
>>          schedule_delayed_work(&cl_data->work_buffer,
>> msecs_to_jiffies(AMD_SFH_IDLE_LOOP));
> 
> I applied this to 9e87b63ed37e202c77aa17d4112da6ae0c7c097c now, which was the 
> origin when I started the whole bisection. Clean rebuild, issue still 
> persists.
> 
> Out of 50 boots, I got:
> 
> 25 clean
> 22 Oops as posted by the OP
> 1 same Oops, followed by a panic
> 1 lockup [1]
> 1 hanging with just a blank screen
> 
> Not sure whether the lockups are related, but [1] mentions modprobe and udev-
> worker as well and all problems including the blank screen one appear roughly 
> at the same time during boot. As this is before a graphics mode switch, I 
> suspect the last mentioned case may be like [1] while the screen was blanked.
> To support the timing correlation: the UVC error for the IR cam shown in the 
> photo (normal boot noise) also appears right before the BUG in the non-lockup 
> bad case.
> 
> I do see the dev_warn in dmesg, so the code path modified in your patch is 
> indeed hit:
> [   10.897521] pcie_mp2_amd 0000:63:00.7: Failed to discover, sensors not 
> enabled is 1
> [   10.897533] pcie_mp2_amd: probe of 0000:63:00.7 failed with error -95
> 
> BR Malte
> 
> [1] https://photos.app.goo.gl/2FAvQ7DqBsHEF6Bd8

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-20 13:20                   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-06-20 18:50                     ` Limonciello, Mario
  2023-06-20 20:03                       ` Limonciello, Mario
  2023-06-21  2:46                     ` Haochen Tong
  2023-07-10 12:16                     ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 1 reply; 21+ messages in thread
From: Limonciello, Mario @ 2023-06-20 18:50 UTC (permalink / raw)
  To: Linux regressions mailing list, Malte Starostik, Benjamin Tissoires
  Cc: Bagas Sanjaya, basavaraj.natikar, linux-input, linux, stable


>>>> I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors
>>>> are enabled, clean up") because the stack trace says that there is a bad
>>>> list_add, which could happen if the object is not correctly initialized.
>>>>
>>>> However, that commit was present in v6.2, so it might not be that one.
>>>>
>>> If I'm not mistaken the Z13 doesn't actually have any
>>> sensors connected to SFH.  So I think the suspicion on
>>> 7bcfdab3f0c6 and theory this is triggered by HID init makes
>>> a lot of sense.
>>>
>>> Can you try this patch?
>>>
>>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>> b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>> index d9b7b01900b5..fa693a5224c6 100644
>>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>> @@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev
>>> *privdata)
>>>                           devm_kfree(dev, cl_data->report_descr[i]);
>>>                   }
>>>                   dev_warn(dev, "Failed to discover, sensors not enabled
>>> is %d\n", cl_data->is_any_sensor_enabled);
>>> +               cl_data->num_hid_devices = 0;
>>>                   return -EOPNOTSUPP;
>>>           }
>>>           schedule_delayed_work(&cl_data->work_buffer,
>>> msecs_to_jiffies(AMD_SFH_IDLE_LOOP));
>> I applied this to 9e87b63ed37e202c77aa17d4112da6ae0c7c097c now, which was the
>> origin when I started the whole bisection. Clean rebuild, issue still
>> persists.
>>
>> Out of 50 boots, I got:
>>
>> 25 clean
>> 22 Oops as posted by the OP
>> 1 same Oops, followed by a panic
>> 1 lockup [1]
>> 1 hanging with just a blank screen
>>
>> Not sure whether the lockups are related, but [1] mentions modprobe and udev-
>> worker as well and all problems including the blank screen one appear roughly
>> at the same time during boot. As this is before a graphics mode switch, I
>> suspect the last mentioned case may be like [1] while the screen was blanked.
>> To support the timing correlation: the UVC error for the IR cam shown in the
>> photo (normal boot noise) also appears right before the BUG in the non-lockup
>> bad case.
>>
>> I do see the dev_warn in dmesg, so the code path modified in your patch is
>> indeed hit:
>> [   10.897521] pcie_mp2_amd 0000:63:00.7: Failed to discover, sensors not
>> enabled is 1
>> [   10.897533] pcie_mp2_amd: probe of 0000:63:00.7 failed with error -95
>>
>> BR Malte
>>
>> [1] https://photos.app.goo.gl/2FAvQ7DqBsHEF6Bd8

Apologies; for some reason I never got that above reply in my inbox,
some server along the way might have deemed it spam.

Anyways; I just double checked the Z13 I have on my hand.  I don't
have the PCI device for SFH (1022:164a) present on the system.

Can you please double check you are on the latest BIOS?

I'm on the latest release from LVFS, 0.1.57 according to fwupdmgr.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-20 18:50                     ` Limonciello, Mario
@ 2023-06-20 20:03                       ` Limonciello, Mario
  2023-06-21 23:41                         ` Malte Starostik
  0 siblings, 1 reply; 21+ messages in thread
From: Limonciello, Mario @ 2023-06-20 20:03 UTC (permalink / raw)
  To: Linux regressions mailing list, Malte Starostik, Benjamin Tissoires
  Cc: Bagas Sanjaya, basavaraj.natikar, linux-input, linux, stable


On 6/20/2023 1:50 PM, Limonciello, Mario wrote:
>
>>>>> I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no 
>>>>> sensors
>>>>> are enabled, clean up") because the stack trace says that there is 
>>>>> a bad
>>>>> list_add, which could happen if the object is not correctly 
>>>>> initialized.
>>>>>
>>>>> However, that commit was present in v6.2, so it might not be that 
>>>>> one.
>>>>>
>>>> If I'm not mistaken the Z13 doesn't actually have any
>>>> sensors connected to SFH.  So I think the suspicion on
>>>> 7bcfdab3f0c6 and theory this is triggered by HID init makes
>>>> a lot of sense.
>>>>
>>>> Can you try this patch?
>>>>
>>>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>>> b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>>> index d9b7b01900b5..fa693a5224c6 100644
>>>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>>> @@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev
>>>> *privdata)
>>>>                           devm_kfree(dev, cl_data->report_descr[i]);
>>>>                   }
>>>>                   dev_warn(dev, "Failed to discover, sensors not 
>>>> enabled
>>>> is %d\n", cl_data->is_any_sensor_enabled);
>>>> +               cl_data->num_hid_devices = 0;
>>>>                   return -EOPNOTSUPP;
>>>>           }
>>>> schedule_delayed_work(&cl_data->work_buffer,
>>>> msecs_to_jiffies(AMD_SFH_IDLE_LOOP));
>>> I applied this to 9e87b63ed37e202c77aa17d4112da6ae0c7c097c now, 
>>> which was the
>>> origin when I started the whole bisection. Clean rebuild, issue still
>>> persists.
>>>
>>> Out of 50 boots, I got:
>>>
>>> 25 clean
>>> 22 Oops as posted by the OP
>>> 1 same Oops, followed by a panic
>>> 1 lockup [1]
>>> 1 hanging with just a blank screen
>>>
>>> Not sure whether the lockups are related, but [1] mentions modprobe 
>>> and udev-
>>> worker as well and all problems including the blank screen one 
>>> appear roughly
>>> at the same time during boot. As this is before a graphics mode 
>>> switch, I
>>> suspect the last mentioned case may be like [1] while the screen was 
>>> blanked.
>>> To support the timing correlation: the UVC error for the IR cam 
>>> shown in the
>>> photo (normal boot noise) also appears right before the BUG in the 
>>> non-lockup
>>> bad case.
>>>
>>> I do see the dev_warn in dmesg, so the code path modified in your 
>>> patch is
>>> indeed hit:
>>> [   10.897521] pcie_mp2_amd 0000:63:00.7: Failed to discover, 
>>> sensors not
>>> enabled is 1
>>> [   10.897533] pcie_mp2_amd: probe of 0000:63:00.7 failed with error 
>>> -95
>>>
>>> BR Malte
>>>
>>> [1] https://photos.app.goo.gl/2FAvQ7DqBsHEF6Bd8
>
> Apologies; for some reason I never got that above reply in my inbox,
> some server along the way might have deemed it spam.
>
> Anyways; I just double checked the Z13 I have on my hand.  I don't
> have the PCI device for SFH (1022:164a) present on the system.
>
> Can you please double check you are on the latest BIOS?
>
> I'm on the latest release from LVFS, 0.1.57 according to fwupdmgr.
>
Hopefully the newer BIOS fixes it for you, but if it doesn't I did come
up with another patch I've sent out that I guess could be another
solution.

https://lore.kernel.org/linux-input/20230620200117.22261-1-mario.limonciello@amd.com/T/#u



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-20 13:20                   ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-20 18:50                     ` Limonciello, Mario
@ 2023-06-21  2:46                     ` Haochen Tong
  2023-07-10 12:16                     ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 0 replies; 21+ messages in thread
From: Haochen Tong @ 2023-06-21  2:46 UTC (permalink / raw)
  To: Linux regressions mailing list, Malte Starostik,
	Benjamin Tissoires, Limonciello, Mario
  Cc: Bagas Sanjaya, basavaraj.natikar, linux-input, linux, stable

On 6/20/23 21:20, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
> 
> What happens to this? From here it looks like there was no progress to
> resolve the regression in the past two weeks, but maybe I just missed
> something.

Hi,

I just looked at the journal again and this problem seemed to go away 
after upgrading from 6.3.3 to 6.3.5. At that time the BIOS version was 
still 1.27. Now, on 1.57, the device 1022:164a is indeed no longer 
present anymore.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-20 20:03                       ` Limonciello, Mario
@ 2023-06-21 23:41                         ` Malte Starostik
  0 siblings, 0 replies; 21+ messages in thread
From: Malte Starostik @ 2023-06-21 23:41 UTC (permalink / raw)
  To: Linux regressions mailing list, Benjamin Tissoires, Limonciello, Mario
  Cc: Bagas Sanjaya, basavaraj.natikar, linux-input, linux, stable

Am Dienstag, 20. Juni 2023, 22:03:00 CEST schrieb Limonciello, Mario:
> On 6/20/2023 1:50 PM, Limonciello, Mario wrote:
> > Anyways; I just double checked the Z13 I have on my hand.  I don't
> > have the PCI device for SFH (1022:164a) present on the system.
> > 
> > Can you please double check you are on the latest BIOS?
> > 
> > I'm on the latest release from LVFS, 0.1.57 according to fwupdmgr.

I was on 0.1.27 while running the tests.
At least when I saw the errors first, there was no update offered. Haven't re-
checked until now.

> Hopefully the newer BIOS fixes it for you, but if it doesn't I did come
> up with another patch I've sent out that I guess could be another
> solution.

After updating to 0.1.57, it looks like I cannot reproduce the error anymore 
either.
 
> https://lore.kernel.org/linux-input/20230620200117.22261-1-mario.limonciello
> @amd.com/T/#u

I tested your patch before performing the firmware update. Still got the Oops 
just like before.

BR Malte



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-05-23 17:27 amd_sfh driver causes kernel oops during boot Haochen Tong
  2023-05-24  3:58 ` Bagas Sanjaya
  2023-05-24 10:08 ` Bagas Sanjaya
@ 2023-07-07  9:37 ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 0 replies; 21+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-07-07  9:37 UTC (permalink / raw)
  To: Haochen Tong, stable
  Cc: regressions, linux-input, Basavaraj Natikar, Mario Limonciello

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 23.05.23 19:27, Haochen Tong wrote:
> 
> Since kernel 6.3.0 (and also 6.4rc3), on a ThinkPad Z13 system with Arch
> Linux, I've noticed that the amd_sfh driver spews a lot of stack traces
> during boot. Sometimes it is an oops:

For the record:

#regzbot resolve: fixed in newer firmware and mainline post-6.4;
backport not planned, as bug unlikely to repeat, but possible when needed
#regzbot ignore-activity

For details see Mario's explanation here (thx for it, btw):
https://lore.kernel.org/all/89ea9fb7-9026-ccb6-ad88-50e1c28b4474@amd.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: amd_sfh driver causes kernel oops during boot
  2023-06-20 13:20                   ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-20 18:50                     ` Limonciello, Mario
  2023-06-21  2:46                     ` Haochen Tong
@ 2023-07-10 12:16                     ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 0 replies; 21+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-07-10 12:16 UTC (permalink / raw)
  To: Linux regressions mailing list; +Cc: linux-input, linux, stable

On 20.06.23 15:20, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 07.06.23 00:57, Malte Starostik wrote:
>> Am Dienstag, 6. Juni 2023, 17:25:13 CEST schrieb Limonciello, Mario:
>>> On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
>>>> On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>>> On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
>>>>>>>
>>>>>>> chiming in here as I'm experiencing what looks like the exact same
>>>>>>> issue, also on a Lenovo Z13 notebook, also on Arch:
>>>>>>> Oops during startup in task udev-worker followed by udev-worker
>>>>>>> blocking all attempts to suspend or cleanly shutdown/reboot the
>>>>>>> machine

For the record:

#regzbot resolve: fixed in newer firmware and mainline post-6.4;
backport possible when needed, but not planned
#regzbot ignore-activity

For details see Mario's explanation here (thx for it, btw):
https://lore.kernel.org/all/89ea9fb7-9026-ccb6-ad88-50e1c28b4474@amd.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-10 12:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 17:27 amd_sfh driver causes kernel oops during boot Haochen Tong
2023-05-24  3:58 ` Bagas Sanjaya
2023-05-24  6:10   ` Haochen Tong
2023-05-24 10:10     ` Bagas Sanjaya
2023-06-05 11:24       ` Malte Starostik
2023-06-06  2:36         ` Bagas Sanjaya
2023-06-06  6:56           ` Linux regression tracking (Thorsten Leemhuis)
2023-06-06  8:08             ` Benjamin Tissoires
2023-06-06 15:25               ` Limonciello, Mario
2023-06-06 22:57                 ` Malte Starostik
2023-06-20 13:20                   ` Linux regression tracking (Thorsten Leemhuis)
2023-06-20 18:50                     ` Limonciello, Mario
2023-06-20 20:03                       ` Limonciello, Mario
2023-06-21 23:41                         ` Malte Starostik
2023-06-21  2:46                     ` Haochen Tong
2023-07-10 12:16                     ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-06  9:53             ` Malte Starostik
2023-06-06  2:39       ` Bagas Sanjaya
2023-06-06  3:41         ` Haochen Tong
2023-05-24 10:08 ` Bagas Sanjaya
2023-07-07  9:37 ` Linux regression tracking #update (Thorsten Leemhuis)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.