regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [regression] Bug 216475 - fbcon crashes during single gpu passthough reattachment to host
@ 2022-09-19  9:10 Thorsten Leemhuis
  2022-11-15  9:40 ` [regression] Bug 216475 - fbcon crashes during single gpu passthough reattachment to host #forregzbot Thorsten Leemhuis
  0 siblings, 1 reply; 2+ messages in thread
From: Thorsten Leemhuis @ 2022-09-19  9:10 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Sergey V., DRI Development, LKML, regressions

Hi, this is your Linux kernel regression tracker speaking.

I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216475 :

> Created attachment 301792 [details]
> My dmesg right after VM shutdown
> 
> Hello, after 5.19 kernel many VFIO users have problems with reattaching GPU from guest to host. It works well previously (5.18.16 for me).
> 
> More complains about the issue:
> https://www.reddit.com/r/VFIO/comments/wp85ve/linux_519_kernel_single_gpu_passthough_black/
> 
> My PC Spec:
>   CPU: Ryzen 5950X
>   RAM: 128GB
>   GPU: NVIDIA RTX 3080
>   OS: Arch Linux
> 
> How to reproduce:
>   1. You have to have properly configured VM with working GPU passthough (too complicated to explain it here)
>   2. When VM starts it detaches GPU from host by 'start.sh' (see below)
>   3. VM starts properly, Windows loads properly
>   4. Shutdown VM regularly and GPU should be reattached by 'revert.sh' (see below)
> Actual results (5.19.*):
>   5. Windows shutdowns, and GPU is not reattaching to host only black screen present and monitors shutdown (no signal)
>   5.1 dmesg contains error message - dmesg.txt in attachments
>     WARNING: CPU: 30 PID: 12528 at drivers/video/fbdev/core/fbcon.c:999 fbcon_init+0x5ce/0x670
>     ...
>     BUG: kernel NULL pointer dereference, address: 0000000000000330
> Expected Result (5.18.* and previous):
>   5. Windows shutdowns, and GPU successfully reattached to host
> 
> I have tried to bisect git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git v5.18.16 as good and v5.19.2 as bad
> (I've done it for the first time, maybe I've done something wrong)
> 
> During bisect after some point my Linux doesn't boot, and it trying to mark those commits as bad.
> Commit below might be not real problem causer
> 
> Commit which I found by bisect:
> 
> commit 3647d6d3dbdafc55f8c4ca8225966963252abe7b (refs/bisect/bad)
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Tue Apr 5 23:03:33 2022 +0200
> 
>     fbcon: Move more code into fbcon_release
> 
>     con2fb_release_oldinfo() has a bunch more kfree() calls than
>     fbcon_exit(), but since kfree() on NULL is harmless doing that in both
>     places should be ok. This is also a bit more symmetric now again with
>     fbcon_open also allocating the fbcon_ops structure.
> 
>     Acked-by: Sam Ravnborg <sam@ravnborg.org>
>     Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>     Cc: Daniel Vetter <daniel@ffwll.ch>
>     Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>     Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>     Cc: Du Cheng <ducheng2@gmail.com>
>     Cc: Claudio Suarez <cssk@net-c.es>
>     Link: https://patchwork.freedesktop.org/patch/msgid/20220405210335.3434130-16-daniel.vetter@ffwll.ch
> 
> 
> start.sh
> ========
> #!/bin/bash
> set -x
> 
> systemctl stop display-manager.service
> while systemctl is-active --quiet "display-manager.service" ; do
>     sleep 1
> done
> 
> killall gdm-x-session
> killall -u bormor
> 
> echo 0 > /sys/class/vtconsole/vtcon0/bind
> echo 0 > /sys/class/vtconsole/vtcon1/bind
> 
> # Unbind EFI-Framebuffer
> echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
> 
> # Avoid a Race condition by waiting 2 seconds. This can be calibrated to be shorter or longer if required for your system
> sleep 2
> 
> # Unload all Nvidia drivers
> modprobe -r nvidia_drm
> modprobe -r nvidia_modeset
> modprobe -r nvidia_uvm
> modprobe -r nvidia
> modprobe -r nouveau
> 
> # Unbind the GPU from display driver
> virsh nodedev-detach pci_0000_09_00_0
> virsh nodedev-detach pci_0000_09_00_1
> 
> # Load VFIO Kernel Module  
> modprobe vfio-pci
> 
> 
> revert.sh
> ========
> #!/bin/bash
> set -x
> 
> # Unload VFIO-PCI Kernel Driver
> modprobe -r vfio-pci
> modprobe -r vfio_iommu_type1
> modprobe -r vfio
> 
> virsh nodedev-reattach pci_0000_09_00_1
> virsh nodedev-reattach pci_0000_09_00_0
> 
> echo 1 > /sys/class/vtconsole/vtcon0/bind
> echo 1 > /sys/class/vtconsole/vtcon1/bind
> 
> nvidia-xconfig --query-gpu-info > /dev/null 2>&1
> echo "efi-framebuffer.0" > /sys/bus/platform/drivers/efi-framebuffer/bind
> 
> modprobe nvidia_drm
> modprobe nvidia_modeset
> modprobe nvidia_uvm
> modprobe nvidia
> modprobe nouveau
> 
> 
> systemctl start display-manager.service

See the ticket for more details.

BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: 3647d6d3dbdafc55f8c4ca8225966963252abe7b
https://bugzilla.kernel.org/show_bug.cgi?id=216475
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [regression] Bug 216475 - fbcon crashes during single gpu passthough reattachment to host #forregzbot
  2022-09-19  9:10 [regression] Bug 216475 - fbcon crashes during single gpu passthough reattachment to host Thorsten Leemhuis
@ 2022-11-15  9:40 ` Thorsten Leemhuis
  0 siblings, 0 replies; 2+ messages in thread
From: Thorsten Leemhuis @ 2022-11-15  9:40 UTC (permalink / raw)
  To: regressions; +Cc: DRI Development, LKML

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

On 19.09.22 11:10, Thorsten Leemhuis wrote:
>
> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> kernel developer don't keep an eye on it, I decided to forward it by
> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216475 :
> [...]
> #regzbot introduced: 3647d6d3dbdafc55f8c4ca8225966963252abe7b
> https://bugzilla.kernel.org/show_bug.cgi?id=216475
> #regzbot ignore-activity

#regzbot invalid: reporter found workaround, quite special use case anyway

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-11-15  9:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-19  9:10 [regression] Bug 216475 - fbcon crashes during single gpu passthough reattachment to host Thorsten Leemhuis
2022-11-15  9:40 ` [regression] Bug 216475 - fbcon crashes during single gpu passthough reattachment to host #forregzbot Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).