Nouveau Archive on lore.kernel.org
 help / color / Atom feed
* [Nouveau] Optimus HDMI hotplug fails with "DRM: Dropped ACPI reprobe event due to RPM error: -22"
@ 2021-02-18  3:06 Craig Ringer
  2021-02-18  3:40 ` Craig Ringer
  0 siblings, 1 reply; 2+ messages in thread
From: Craig Ringer @ 2021-02-18  3:06 UTC (permalink / raw)
  To: nouveau

[-- Attachment #1.1: Type: text/plain, Size: 9488 bytes --]

Hi all

I'm trying to get HDMI hotplug working on my Lenovo T15g laptop with
Optimus graphics. HDMI works when plugged in at boot, but does not work
when hotplugged after boot, or when hot-unplugged then re-plugged. The
external display is not detected, its status remains 'disconnected' in
sysfs, and the display stays in what looks like DPMS-off state.

NOTE: This is a PRELIMINARY problem report and request for advice or
comment. I'm on a recent Fedora kernel but still need to try latest
mainline + nouveau. I still need to capture detailed debug logs from
nouveau, drm, kms, etc. And while writing the report I found an i915 config
issue I need to retry without. So this is mostly google-help for others
right now.

VERSIONS AND DEVICES
====

Kernel and nouveau version: 5.10.15-200.fc33.x86_64 with the bundled
nouveau driver. (I'll try latest mainline soon).

Video hardware:
  * GeForce RTX 2070 SUPER Mobile (PCI ID 10de:1e91)
  * Intel CometLake-H GT2 (PCI ID 8086:9bc4)

Laptop: Lenovo T15g. DMI identifies it as: LENOVO 20URCTO1WW/20URCTO1WW,
BIOS N30ET33W (1.16 ) 12/17/2020

I believe this is a muxless design with the external outputs under control
of the NVidia card, as the Intel card only has one output in
/sys/bus/drm/card0/ and the external display doesn't work (even when
attached at boot) if I blacklist the nouveau module.

BEHAVIOUR
============

An external HDMI display is only detected and used if it's attached before
boot. If hotplugged later instead it isn't detected and

    DRM: Dropped ACPI reprobe event due to RPM error: -22

is printed to dmesg.

"RPM error -22" is -EINVAL. AFAICS this is probably coming from the
rpm_resume() function [1] as called by __pm_runtime_resume() by
pm_runtime_get() by nouveau_display_acpi_ntfy() [2]. I haven't tracked it
down further yet - I'll do some perf probing and report back in a followup
post.

IIRC (need to repeat and verify) once hot-unplugged, the display won't
re-detect, even if it was connected at boot. Connecting it while the
machine is in S3 sleep doesn't help, it still doesn't get (re)detected on
resume.

    echo 'detect' > card1-HDMI-A-1/status

has no apparent effect - no message is printed to dmesg (default log level)
and the monitor isn't detected.

TAINTED KERNEL
============

While collecting info for this report, I noticed that I am still running
with some non-default i915 options from my old (non-hybrid-graphics)
laptop. I'll have to reboot without those to verify these i915 options
aren't the cause:

[    3.403694] Setting dangerous option enable_guc - tainting kernel
[    3.404506] Setting dangerous option enable_fbc - tainting kernel
[    3.405306] Setting dangerous option enable_dc - tainting kernel

I'll be sure to update once I disable these, but I'll post now. If nothing
else, it might help someone else.

NOUVEAU TIMEOUTS IN DMESG
============

I also noticed some nouveau related output in the kernel logs - I think
from the first suspend, or possibly the first HDMI unplug. I'll need to
verify this later. There are also some xhci_hcd messages that may or may
not be relevant. I'll include longer excerpts at the end of the post but
the basics are:

[25877.621114] nouveau 0000:01:00.0: timeout
[25877.621289] WARNING: CPU: 14 PID: 73556 at
drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247
nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25877.621631] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25877.621680] Call Trace:
[25877.621754]  gm200_acr_hsfw_boot+0xc3/0x160 [nouveau]
[25877.621782]  ? mutex_lock+0xe/0x30
[25877.621849]  nvkm_acr_hsf_boot+0x85/0xe0 [nouveau]
[25877.621916]  nvkm_acr_fini+0x25/0x30 [nouveau]
[25877.621984]  nvkm_subdev_fini+0x59/0xb0 [nouveau]
[25877.622100]  nvkm_device_fini+0x79/0x110 [nouveau]
[25877.622215]  nvkm_udevice_fini+0x47/0x60 [nouveau]
[25877.622277]  nvkm_object_fini+0xbc/0x150 [nouveau]
[25877.622343]  nvkm_object_fini+0x73/0x150 [nouveau]
[25877.622464]  nouveau_do_suspend+0x107/0x180 [nouveau]
[25877.622583]  nouveau_pmops_runtime_suspend+0x3b/0xb0 [nouveau]
[25877.622597]  pci_pm_runtime_suspend+0x5e/0x170
...

then

[25877.622741] nouveau 0000:01:00.0: acr: unload binary failed
[25877.946511] nouveau 0000:01:00.0: fifo: fault 00 [VIRT_READ] at
00000000000bd000 engine c0 [BAR2] client 07 [HUB/HOST_CPU] reason 0d
[REGION_VIOLATION] on channel -1 [01ffedf000 unknown]
[25913.829849] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at
00000000004df000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02
[PTE] on channel -1 [01ffedf000 unknown]

then

[25913.930365] nouveau 0000:01:00.0: timeout
[25913.930426] WARNING: CPU: 5 PID: 2395 at
drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247
nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25913.930511] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25913.930523] Call Trace:
[25913.930540]  gm200_acr_hsfw_boot+0xc3/0x160 [nouveau]
[25913.930543]  ? mutex_lock+0xe/0x30
[25913.930558]  nvkm_acr_hsf_boot+0x85/0xe0 [nouveau]
[25913.930573]  tu102_acr_init+0x15/0x30 [nouveau]
[25913.930587]  nvkm_acr_load+0x2b/0xd0 [nouveau]
[25913.930589]  ? ktime_get+0x38/0xa0
[25913.930603]  nvkm_subdev_init+0x92/0xd0 [nouveau]
[25913.930604]  ? ktime_get+0x38/0xa0
[25913.930629]  nvkm_device_init+0x10b/0x190 [nouveau]
[25913.930656]  nvkm_udevice_init+0x41/0x60 [nouveau]
[25913.930676]  nvkm_object_init+0x3e/0x100 [nouveau]
[25913.930690]  nvkm_object_init+0x6f/0x100 [nouveau]
[25913.930703]  nvkm_object_init+0x6f/0x100 [nouveau]
[25913.930729]  nouveau_do_resume+0x2b/0xc0 [nouveau]
[25913.930755]  nouveau_pmops_runtime_resume+0x7a/0x150 [nouveau]
[25913.930760]  pci_pm_runtime_resume+0xaa/0xc0
[...]
[25913.930806]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[...]
[25913.930820] nouveau 0000:01:00.0: acr: AHESASC binary failed
[25913.930821] nouveau 0000:01:00.0: acr: init failed, -110
[25913.930958] nouveau 0000:01:00.0: init failed with -110
[25913.930959] nouveau: systemd-logind[1510]:00000000:00000080: init failed
with -110
[25913.930960] nouveau: DRM-master:00000000:00000000: init failed with -110
[25913.930961] nouveau: DRM-master:00000000:00000000: init failed with -110
[25913.930963] nouveau 0000:01:00.0: DRM: Client resume failed with error:
-110
[25913.930963] nouveau 0000:01:00.0: DRM: resume failed with: -110

I'll do some poking around with perf, capture some ACPI state and verbose
nouveau + drm kernel logs for both attached-at-boot and detached-at-boot
cases, etc, then post a big diagnostics bundle in a bit. But I thought I'd
keep this initial report short-ish. I'll include some basic diag info below
though.

URL REFERENCES
============

URLs referenced:

[1]
https://github.com/torvalds/linux/blob/521b619acdc8f1f5acdac15b84f81fd9515b2aff/drivers/base/power/runtime.c#L702

[2]
https://github.com/torvalds/linux/blob/93b694d096cc10994c817730d4d50288f9ae3d66/drivers/gpu/drm/nouveau/nouveau_display.c#L530

BASIC DIAGNOSTICS
============

Basic diagnostics, when display physically connected (DVI-D -> HDMI) but
not detected by nouveau:

$ ls /sys/class/drm
card0  card0-eDP-1  card1  card1-DP-1  card1-DP-2  card1-DP-3  card1-eDP-2
 card1-HDMI-A-1  renderD128  renderD129  ttm  version

$ for f in */status; do printf "%s: %s\n" "$f" "$(cat $f)"; done
card0-eDP-1/status: connected
card1-DP-1/status: disconnected
card1-DP-2/status: disconnected
card1-DP-3/status: disconnected
card1-eDP-2/status: disconnected
card1-HDMI-A-1/status: disconnected

$ dmesg | tail -n 2
[42147.075025] nouveau 0000:01:00.0: DRM: Dropped ACPI reprobe event due to
RPM error: -22
[42151.153559] nouveau 0000:01:00.0: DRM: Dropped ACPI reprobe event due to
RPM error: -22

# for p in /sys/module/nouveau/parameters/*; do printf "%s: %s\n"
"$(basename $p)" "$(cat $p)"; done
[sudo] password for craig:
atomic: 0
config: (null)
debug: (null)
duallink: 1
fbcon_bpp: 0
hdmimhz: 0
ignorelid: 0
modeset: -1
mst: 1
noaccel: 0
nofbaccel: 0
runpm: -1
tv_disable: 0
tv_norm: (null)
vram_pushbuf: 0

$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.10.15-200.fc33.x86_64 [SNIP root dev args]
libata.allow_tpm=on systemd.unified_cgroup_hierarchy=0 rhgb

$ sudo  lspci -vvnnqPP -d 10de:1e91
00:01.0/01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104M
[GeForce RTX 2070 SUPER Mobile / Max-Q] [10de:1e91] (rev a1) (prog-if 00
[VGA controller])
Subsystem: Lenovo Device [17aa:22c3]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
....
Kernel driver in use: nouveau
Kernel modules: nouveau

# dmidecode
...
Processor Information
    ...
    Version: Intel(R) Core(TM) i9-10980HK CPU @ 2.40GHz
...
BIOS Information
    Vendor: LENOVO
    Version: N30ET33W (1.16 )
    Release Date: 12/17/2020
    ...
    BIOS Revision: 1.16
    Firmware Revision: 1.12
...
Port Connector Information
    Internal Reference Designator: Not Available
    Internal Connector Type: None
    External Reference Designator: Hdmi1
    External Connector Type: Other
    Port Type: Video Port

System Information
    Manufacturer: LENOVO
    Product Name: 20URCTO1WW
    Version: ThinkPad T15g Gen 1
    [snip serial number and uuid]
    SKU Number: LENOVO_MT_20UR_BU_Think_FM_ThinkPad T15g Gen 1
    Family: ThinkPad T15g Gen 1

I'll attach a detailed lspci, bigger excerpts from demesg, etc in a
followup to make sure I don't upset any mail filter.


-- 
Craig Ringer

[-- Attachment #1.2: Type: text/html, Size: 12573 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Nouveau] Optimus HDMI hotplug fails with "DRM: Dropped ACPI reprobe event due to RPM error: -22"
  2021-02-18  3:06 [Nouveau] Optimus HDMI hotplug fails with "DRM: Dropped ACPI reprobe event due to RPM error: -22" Craig Ringer
@ 2021-02-18  3:40 ` Craig Ringer
  0 siblings, 0 replies; 2+ messages in thread
From: Craig Ringer @ 2021-02-18  3:40 UTC (permalink / raw)
  To: nouveau

[-- Attachment #1.1: Type: text/plain, Size: 1920 bytes --]

On Thu, 18 Feb 2021 at 11:06, Craig Ringer <ringerc@ringerc.id.au> wrote:

> Hi all
>
> I'm trying to get HDMI hotplug working on my Lenovo T15g laptop with
> Optimus graphics. HDMI works when plugged in at boot, but does not work
> when hotplugged after boot, or when hot-unplugged then re-plugged. The
> external display is not detected, its status remains 'disconnected' in
> sysfs, and the display stays in what looks like DPMS-off state.
> [snip]
> I'll attach a detailed lspci, bigger excerpts from demesg, etc in a
> followup to make sure I don't upset any mail filter.
>

Detailed PCI info and trimmed dmesg uploaded to a GDrive since I don't
really want to send all that to the whole list. Files here:

https://drive.google.com/drive/folders/1oE3ow7d8N6npDNbL8vqHYjbAvvJCUcPN?usp=sharing

Contains:

$ sudo lspci -vvvvnnnnPPq | sed '/Device Serial Number/d' > pci.list

$ sudo dmesg | egrep -v
'e1000e|nvme|BTRFS|audit:|SELinux:|systemd\[1\]|thunderbolt|cfg80211|WiFi|Bluetooth|battery|iwlwifi|uvcvideo|usbcore|zram|iTCO_wdt|snd_hda_intel|squashfs|EXT4-fs|iSCSI|wlp0s20f3|IPv6|\<bridge\>|\<tun\>|virbr0|rfkill|nf_conntrack|psmouse'
| sed 's/SerialNumber.*$/SerialNumber REDACTED/' > dmesg.out

$ sudo dmidecode | sed '/Serial Number:/d;/Asset Tag/d' > dmi.dump

$ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
0:DIS: :DynOff:0000:01:00.0
1:IGD:+:Pwr:0000:00:02.0
2:DIS-Audio: :DynOff:0000:01:00.1

I can provide raw or decompiled ACPI DSDT and SSDTs on request, as well as
kernel logs with higher log levels, a nouveau module debug string, info
from /sys/kernel/debug, 'perf' runs, etc.

There's also some nouveau info in /sys/kernel/debug/dri/1 .

Also, I note that

    echo 'on' > /sys/kernel/debug/dri/1/HDMI-A-1/force

does not appear to have any effect when the display is plugged in and
turned on, but not being detected by nouveau. Also true for other ports
DP-1, DP-2, DP-3, eDP-2.

-- 
Craig Ringer

[-- Attachment #1.2: Type: text/html, Size: 3065 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18  3:06 [Nouveau] Optimus HDMI hotplug fails with "DRM: Dropped ACPI reprobe event due to RPM error: -22" Craig Ringer
2021-02-18  3:40 ` Craig Ringer

Nouveau Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/nouveau/0 nouveau/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 nouveau nouveau/ https://lore.kernel.org/nouveau \
		nouveau@lists.freedesktop.org
	public-inbox-index nouveau

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.nouveau


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git