[Qemu-devel] [Bug 1844053] Re: task blocked for more than X seconds - events drm_fb_helper_dirty_work

From: James Harvey <jamespharvey20@gmail.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [Bug 1844053] Re: task blocked for more than X seconds - events drm_fb_helper_dirty_work
Date: Sun, 15 Sep 2019 23:26:35 -0000	[thread overview]
Message-ID: <156858999601.32435.3530038468855163581.launchpad@wampee.canonical.com> (raw)
In-Reply-To: 156855280668.581.6516749257263798071.malonedeb@wampee.canonical.com

** Description changed:

  I've had bunches of these errors on 9 different boots, between
  2019-08-21 and now, with Arch host and guest, from linux 5.1.16 to
  5.2.14 on host and guest, with QEMU 4.0.0 and 4.1.0.  spice 0.14.2,
  spice-gtk 0.37, spice-protocol 0.14.0, virt-viewer 8.0.

  I've been fighting with some other issues related to a 5.2 btrfs
  regression, a QEMU qxl regression (see bug 1843151) which I ran into
  when trying to temporarily abandon virtio-vga, and I haven't paid enough
  attention to what impact it has on the system when these occur.  In
  journalctl, I can see I often rebooted minutes after they occurred, but
  sometimes much later.  That must mean whenever I saw it happen that I
  rebooted the VM, or potentially it impacted functionality of the system.

  Please let me know if and how I can get more information for you if
  needed.

  I've replicated this on both a system with integrated ASPEED video, and
  on an AMD Vega 64 running amdgpu.

  As an example, I have one boot which reported at 122 seconds, 245, 368,
  491, 614, 737, 860, 983, 1105, 1228, then I rebooted.

  I have another that reported 122/245/368/491/614/737, went away for 10
  minutes, then started reporting again 122/245/368/491, and went away.
  Then, I rebooted about 20 hours later.

  Host system has no graphical impact when this happens, and logs nothing
  in its journalctl.

+ Guest is tty mode only, with kernel argument "video=1280x1024".  No x
+ server.
+ 
  ==========

  INFO: task kworker/0:1:15 blocked for more than 122 seconds.
-       Not tainted 5.2.14-1 #1
+       Not tainted 5.2.14-1 #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  kworker/0:1     D    0    15      2 0x800004000
  Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
  Call Trace:
-  ? __schedule+0x27f/0x6d0
-  schedule+0x3d/0xc0
-  virtio_gpu_queue_fenced_ctrl_buffer+0xa1/0x130 [virtio_gpu]
-  ? wait_woken+0x80/0x80
-  virtio_gpu_surface_dirty+0x2a5/0x300 [virtio_gpu]
-  drm_fb_helper_dirty_work+0x156/0x160 [drm_kms_helper]
-  process_one_work+0x19a/0x3b0
-  worker_tread+0x50/0x3a0
-  kthread+0xfd/0x130
-  ? process_one_work+0x3b0/0x3b0
-  ? kthread_park+0x80/0x80
-  ret_from_fork+0x35/0x40
+  ? __schedule+0x27f/0x6d0
+  schedule+0x3d/0xc0
+  virtio_gpu_queue_fenced_ctrl_buffer+0xa1/0x130 [virtio_gpu]
+  ? wait_woken+0x80/0x80
+  virtio_gpu_surface_dirty+0x2a5/0x300 [virtio_gpu]
+  drm_fb_helper_dirty_work+0x156/0x160 [drm_kms_helper]
+  process_one_work+0x19a/0x3b0
+  worker_tread+0x50/0x3a0
+  kthread+0xfd/0x130
+  ? process_one_work+0x3b0/0x3b0
+  ? kthread_park+0x80/0x80
+  ret_from_fork+0x35/0x40

  ==========

  /usr/bin/qemu-system-x86_64 \
-    -name vm,process=qemu:vm \
-    -no-user-config \
-    -nodefaults \
-    -nographic \
-    -uuid <uuid> \
-    -pidfile <pidfile> \
-    -machine q35,accel=kvm,vmport=off,dump-guest-core=off \
-    -cpu SandyBridge-IBRS \
-    -smp cpus=4,cores=2,threads=1,sockets=2 \
-    -m 4G \
-    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
-    -drive if=pflash,format=raw,file=/var/qemu/efivars/vm.fd \
-    -monitor telnet:localhost:8000,server,nowait,nodelay \
-    -spice unix,addr=/tmp/spice.vm.sock,disable-ticketing \
-    -device ioh3420,id=pcie.1,bus=pcie.0,slot=0 \
-    -device virtio-vga,bus=pcie.1,addr=0 \
-    -usbdevice tablet \
-    -netdev bridge,id=network0,br=br0 \
-    -device virtio-net-pci,netdev=network0,mac=F4:F6:34:F6:34:2d,bus=pcie.0,addr=3 \
-    -device virtio-scsi-pci,id=scsi1 \
-    -drive driver=raw,node-name=hd0,file=/dev/lvm/vm,if=none,discard=unmap,cache=none,aio=threads
+    -name vm,process=qemu:vm \
+    -no-user-config \
+    -nodefaults \
+    -nographic \
+    -uuid <uuid> \
+    -pidfile <pidfile> \
+    -machine q35,accel=kvm,vmport=off,dump-guest-core=off \
+    -cpu SandyBridge-IBRS \
+    -smp cpus=4,cores=2,threads=1,sockets=2 \
+    -m 4G \
+    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
+    -drive if=pflash,format=raw,file=/var/qemu/efivars/vm.fd \
+    -monitor telnet:localhost:8000,server,nowait,nodelay \
+    -spice unix,addr=/tmp/spice.vm.sock,disable-ticketing \
+    -device ioh3420,id=pcie.1,bus=pcie.0,slot=0 \
+    -device virtio-vga,bus=pcie.1,addr=0 \
+    -usbdevice tablet \
+    -netdev bridge,id=network0,br=br0 \
+    -device virtio-net-pci,netdev=network0,mac=F4:F6:34:F6:34:2d,bus=pcie.0,addr=3 \
+    -device virtio-scsi-pci,id=scsi1 \
+    -drive driver=raw,node-name=hd0,file=/dev/lvm/vm,if=none,discard=unmap,cache=none,aio=threads

** Description changed:

  I've had bunches of these errors on 9 different boots, between
  2019-08-21 and now, with Arch host and guest, from linux 5.1.16 to
  5.2.14 on host and guest, with QEMU 4.0.0 and 4.1.0.  spice 0.14.2,
  spice-gtk 0.37, spice-protocol 0.14.0, virt-viewer 8.0.

  I've been fighting with some other issues related to a 5.2 btrfs
  regression, a QEMU qxl regression (see bug 1843151) which I ran into
  when trying to temporarily abandon virtio-vga, and I haven't paid enough
- attention to what impact it has on the system when these occur.  In
- journalctl, I can see I often rebooted minutes after they occurred, but
- sometimes much later.  That must mean whenever I saw it happen that I
- rebooted the VM, or potentially it impacted functionality of the system.
+ attention to what impact specifically this virtio_gpu issue has on the
+ system In journalctl, I can see I often rebooted minutes after they
+ occurred, but sometimes much later.  That must mean whenever I saw it
+ happen that I rebooted the VM, or potentially it impacted functionality
+ of the system.

  Please let me know if and how I can get more information for you if
  needed.

  I've replicated this on both a system with integrated ASPEED video, and
  on an AMD Vega 64 running amdgpu.

  As an example, I have one boot which reported at 122 seconds, 245, 368,
  491, 614, 737, 860, 983, 1105, 1228, then I rebooted.

  I have another that reported 122/245/368/491/614/737, went away for 10
  minutes, then started reporting again 122/245/368/491, and went away.
  Then, I rebooted about 20 hours later.

  Host system has no graphical impact when this happens, and logs nothing
  in its journalctl.

  Guest is tty mode only, with kernel argument "video=1280x1024".  No x
  server.

  ==========

  INFO: task kworker/0:1:15 blocked for more than 122 seconds.
        Not tainted 5.2.14-1 #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  kworker/0:1     D    0    15      2 0x800004000
  Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
  Call Trace:
   ? __schedule+0x27f/0x6d0
   schedule+0x3d/0xc0
   virtio_gpu_queue_fenced_ctrl_buffer+0xa1/0x130 [virtio_gpu]
   ? wait_woken+0x80/0x80
   virtio_gpu_surface_dirty+0x2a5/0x300 [virtio_gpu]
   drm_fb_helper_dirty_work+0x156/0x160 [drm_kms_helper]
   process_one_work+0x19a/0x3b0
   worker_tread+0x50/0x3a0
   kthread+0xfd/0x130
   ? process_one_work+0x3b0/0x3b0
   ? kthread_park+0x80/0x80
   ret_from_fork+0x35/0x40

  ==========

  /usr/bin/qemu-system-x86_64 \
     -name vm,process=qemu:vm \
     -no-user-config \
     -nodefaults \
     -nographic \
     -uuid <uuid> \
     -pidfile <pidfile> \
     -machine q35,accel=kvm,vmport=off,dump-guest-core=off \
     -cpu SandyBridge-IBRS \
     -smp cpus=4,cores=2,threads=1,sockets=2 \
     -m 4G \
     -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
     -drive if=pflash,format=raw,file=/var/qemu/efivars/vm.fd \
     -monitor telnet:localhost:8000,server,nowait,nodelay \
     -spice unix,addr=/tmp/spice.vm.sock,disable-ticketing \
     -device ioh3420,id=pcie.1,bus=pcie.0,slot=0 \
     -device virtio-vga,bus=pcie.1,addr=0 \
     -usbdevice tablet \
     -netdev bridge,id=network0,br=br0 \
     -device virtio-net-pci,netdev=network0,mac=F4:F6:34:F6:34:2d,bus=pcie.0,addr=3 \
     -device virtio-scsi-pci,id=scsi1 \
     -drive driver=raw,node-name=hd0,file=/dev/lvm/vm,if=none,discard=unmap,cache=none,aio=threads

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1844053

Title:
  task blocked for more than X seconds - events drm_fb_helper_dirty_work

Status in QEMU:
  New

Bug description:
  I've had bunches of these errors on 9 different boots, between
  2019-08-21 and now, with Arch host and guest, from linux 5.1.16 to
  5.2.14 on host and guest, with QEMU 4.0.0 and 4.1.0.  spice 0.14.2,
  spice-gtk 0.37, spice-protocol 0.14.0, virt-viewer 8.0.

  I've been fighting with some other issues related to a 5.2 btrfs
  regression, a QEMU qxl regression (see bug 1843151) which I ran into
  when trying to temporarily abandon virtio-vga, and I haven't paid
  enough attention to what impact specifically this virtio_gpu issue has
  on the system In journalctl, I can see I often rebooted minutes after
  they occurred, but sometimes much later.  That must mean whenever I
  saw it happen that I rebooted the VM, or potentially it impacted
  functionality of the system.

  Please let me know if and how I can get more information for you if
  needed.

  I've replicated this on both a system with integrated ASPEED video,
  and on an AMD Vega 64 running amdgpu.

  As an example, I have one boot which reported at 122 seconds, 245,
  368, 491, 614, 737, 860, 983, 1105, 1228, then I rebooted.

  I have another that reported 122/245/368/491/614/737, went away for 10
  minutes, then started reporting again 122/245/368/491, and went away.
  Then, I rebooted about 20 hours later.

  Host system has no graphical impact when this happens, and logs
  nothing in its journalctl.

  Guest is tty mode only, with kernel argument "video=1280x1024".  No x
  server.

  ==========

  INFO: task kworker/0:1:15 blocked for more than 122 seconds.
        Not tainted 5.2.14-1 #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  kworker/0:1     D    0    15      2 0x800004000
  Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
  Call Trace:
   ? __schedule+0x27f/0x6d0
   schedule+0x3d/0xc0
   virtio_gpu_queue_fenced_ctrl_buffer+0xa1/0x130 [virtio_gpu]
   ? wait_woken+0x80/0x80
   virtio_gpu_surface_dirty+0x2a5/0x300 [virtio_gpu]
   drm_fb_helper_dirty_work+0x156/0x160 [drm_kms_helper]
   process_one_work+0x19a/0x3b0
   worker_tread+0x50/0x3a0
   kthread+0xfd/0x130
   ? process_one_work+0x3b0/0x3b0
   ? kthread_park+0x80/0x80
   ret_from_fork+0x35/0x40

  ==========

  /usr/bin/qemu-system-x86_64 \
     -name vm,process=qemu:vm \
     -no-user-config \
     -nodefaults \
     -nographic \
     -uuid <uuid> \
     -pidfile <pidfile> \
     -machine q35,accel=kvm,vmport=off,dump-guest-core=off \
     -cpu SandyBridge-IBRS \
     -smp cpus=4,cores=2,threads=1,sockets=2 \
     -m 4G \
     -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
     -drive if=pflash,format=raw,file=/var/qemu/efivars/vm.fd \
     -monitor telnet:localhost:8000,server,nowait,nodelay \
     -spice unix,addr=/tmp/spice.vm.sock,disable-ticketing \
     -device ioh3420,id=pcie.1,bus=pcie.0,slot=0 \
     -device virtio-vga,bus=pcie.1,addr=0 \
     -usbdevice tablet \
     -netdev bridge,id=network0,br=br0 \
     -device virtio-net-pci,netdev=network0,mac=F4:F6:34:F6:34:2d,bus=pcie.0,addr=3 \
     -device virtio-scsi-pci,id=scsi1 \
     -drive driver=raw,node-name=hd0,file=/dev/lvm/vm,if=none,discard=unmap,cache=none,aio=threads

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1844053/+subscriptions