All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Wang <00107082@163.com>
To: dreaming.about.electric.sheep@gmail.com, airlied@redhat.com,
	kraxel@redhat.com, maarten.lankhorst@linux.intel.com,
	mripard@kernel.org, tzimmermann@suse.de, airlied@gmail.com,
	daniel@ffwll.ch
Cc: virtualization@lists.linux.dev,
	spice-devel@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	regressions@lists.linux.dev, David Wang <00107082@163.com>
Subject: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
Date: Tue, 30 Apr 2024 14:13:37 +0800	[thread overview]
Message-ID: <20240430061337.764633-1-00107082@163.com> (raw)

Hi,
I got following kernel WARNING when the my 2-core KVM(6.9.0-rc6) is under high cpu load.

	[Mon Apr 29 21:36:04 2024] ------------[ cut here ]------------
	[Mon Apr 29 21:36:04 2024] workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]
	[Mon Apr 29 21:36:04 2024] WARNING: CPU: 1 PID: 792 at kernel/workqueue.c:3728 check_flush_dependency+0xfd/0x120
	[Mon Apr 29 21:36:04 2024] Modules linked in: xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) nft_compat(E) nf_tables(E) br_netfilter(E) bridge(E) stp(E) llc(E) ip_set(E) nfnetlink(E) ip_vs_sh(E) ip_vs_wrr(E) ip_vs_rr(E) ip_vs(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) intel_rapl_msr(E) intel_rapl_common(E) crct10dif_pclmul(E) ghash_clmulni_intel(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_intel_dspcfg(E) sha512_ssse3(E) snd_hda_codec(E) sha512_generic(E) sha256_ssse3(E) overlay(E) sha1_ssse3(E) snd_hda_core(E) snd_hwdep(E) aesni_intel(E) snd_pcm(E) crypto_simd(E) pcspkr(E) cryptd(E) joydev(E) qxl(E) snd_timer(E) drm_ttm_helper(E) ttm(E) evdev(E) snd(E) iTCO_wdt(E) serio_raw(E) sg(E) virtio_balloon(E) virtio_console(E) iTCO_vendor_support(E) soundcore(E) qemu_fw_cfg(E) drm_kms_helper(E) button(E) binfmt_misc(E) fuse(E) drm(E) configfs(E) virtio_rng(E) rng_core(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E)
	[Mon Apr 29 21:36:04 2024]  hid_generic(E) usbhid(E) hid(E) sr_mod(E) cdrom(E) ahci(E) libahci(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E) libata(E) xhci_pci(E) crc32_pclmul(E) crc32c_intel(E) scsi_mod(E) scsi_common(E) lpc_ich(E) i2c_i801(E) xhci_hcd(E) psmouse(E) i2c_smbus(E) virtio_pci(E) usbcore(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E) usb_common(E) virtio(E) mfd_core(E) virtio_ring(E)
	[Mon Apr 29 21:36:04 2024] CPU: 1 PID: 792 Comm: kworker/u13:4 Tainted: G            E      6.9.0-rc6-linan-5 #197
	[Mon Apr 29 21:36:04 2024] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
	[Mon Apr 29 21:36:04 2024] Workqueue: ttm ttm_bo_delayed_delete [ttm]
	[Mon Apr 29 21:36:04 2024] RIP: 0010:check_flush_dependency+0xfd/0x120
	[Mon Apr 29 21:36:04 2024] Code: 8b 45 18 48 8d b2 c0 00 00 00 49 89 e8 48 8d 8b c0 00 00 00 48 c7 c7 68 30 a4 a7 c6 05 9b 12 6e 01 01 48 89 c2 e8 53 b9 fd ff <0f> 0b e9 1e ff ff ff 80 3d 86 12 6e 01 00 75 93 e9 4a ff ff ff 66
	[Mon Apr 29 21:36:04 2024] RSP: 0018:ffff9d31805abce8 EFLAGS: 00010086
	[Mon Apr 29 21:36:04 2024] RAX: 0000000000000000 RBX: ffff8c8c4004ee00 RCX: 0000000000000000
	[Mon Apr 29 21:36:04 2024] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
	[Mon Apr 29 21:36:04 2024] RBP: ffffffffc0b53570 R08: 0000000000000000 R09: 0000000000000003
	[Mon Apr 29 21:36:04 2024] R10: ffff9d31805abb80 R11: ffffffffa7cc1108 R12: ffff8c8c42eb8000
	[Mon Apr 29 21:36:04 2024] R13: ffff8c8c48077900 R14: ffff8c8cbbd30b80 R15: 0000000000000001
	[Mon Apr 29 21:36:04 2024] FS:  0000000000000000(0000) GS:ffff8c8cbbd00000(0000) knlGS:0000000000000000
	[Mon Apr 29 21:36:04 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[Mon Apr 29 21:36:04 2024] CR2: 00007ffd38bb3ff8 CR3: 000000010217a000 CR4: 0000000000350ef0
	[Mon Apr 29 21:36:04 2024] Call Trace:
	[Mon Apr 29 21:36:04 2024]  <TASK>
	[Mon Apr 29 21:36:04 2024]  ? __warn+0x7c/0x120
	[Mon Apr 29 21:36:04 2024]  ? check_flush_dependency+0xfd/0x120
	[Mon Apr 29 21:36:04 2024]  ? report_bug+0x18d/0x1c0
	[Mon Apr 29 21:36:04 2024]  ? srso_return_thunk+0x5/0x5f
	[Mon Apr 29 21:36:04 2024]  ? handle_bug+0x3c/0x80
	[Mon Apr 29 21:36:04 2024]  ? exc_invalid_op+0x13/0x60
	[Mon Apr 29 21:36:04 2024]  ? asm_exc_invalid_op+0x16/0x20
	[Mon Apr 29 21:36:04 2024]  ? __pfx_qxl_gc_work+0x10/0x10 [qxl]
	[Mon Apr 29 21:36:04 2024]  ? check_flush_dependency+0xfd/0x120
	[Mon Apr 29 21:36:04 2024]  ? check_flush_dependency+0xfd/0x120
	[Mon Apr 29 21:36:04 2024]  __flush_work.isra.0+0xc0/0x270
	[Mon Apr 29 21:36:04 2024]  ? srso_return_thunk+0x5/0x5f
	[Mon Apr 29 21:36:04 2024]  ? srso_return_thunk+0x5/0x5f
	[Mon Apr 29 21:36:04 2024]  ? __queue_work.part.0+0x18b/0x3d0
	[Mon Apr 29 21:36:04 2024]  ? srso_return_thunk+0x5/0x5f
	[Mon Apr 29 21:36:04 2024]  qxl_queue_garbage_collect+0x7f/0x90 [qxl]
	[Mon Apr 29 21:36:04 2024]  qxl_fence_wait+0x9c/0x180 [qxl]
	[Mon Apr 29 21:36:04 2024]  dma_fence_wait_timeout+0x61/0x130
	[Mon Apr 29 21:36:04 2024]  dma_resv_wait_timeout+0x6d/0xd0
	[Mon Apr 29 21:36:04 2024]  ttm_bo_delayed_delete+0x26/0x80 [ttm]
	[Mon Apr 29 21:36:04 2024]  process_one_work+0x18c/0x3b0
	[Mon Apr 29 21:36:04 2024]  worker_thread+0x273/0x390
	[Mon Apr 29 21:36:04 2024]  ? __pfx_worker_thread+0x10/0x10
	[Mon Apr 29 21:36:04 2024]  kthread+0xdd/0x110
	[Mon Apr 29 21:36:04 2024]  ? __pfx_kthread+0x10/0x10
	[Mon Apr 29 21:36:04 2024]  ret_from_fork+0x30/0x50
	[Mon Apr 29 21:36:04 2024]  ? __pfx_kthread+0x10/0x10
	[Mon Apr 29 21:36:04 2024]  ret_from_fork_asm+0x1a/0x30
	[Mon Apr 29 21:36:04 2024]  </TASK>
	[Mon Apr 29 21:36:04 2024] ---[ end trace 0000000000000000 ]---

I find that the exact warning message mentioned in
 https://lore.kernel.org/lkml/20240404181448.1643-1-dreaming.about.electric.sheep@gmail.com/T/#m8c2ecc83ebba8717b1290ec28d4dc15f2fa595d5
And confirmed that the warning is caused by 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.


It seems that under heavy load, qxl_queue_garbage_collect would be called within
a WQ_MEM_RECLAIM worker, and flush qxl_gc_work which is a
!WQ_MEM_RECLAIM worker. This will trigger the kernel WARNING by
check_flush_dependency.

And I tried following changes, setting flush flag to false.
The warning is gone, but I am not sure whether there is any other side-effect,
especially the issue mentioned in 
https://lore.kernel.org/lkml/20240404181448.1643-2-dreaming.about.electric.sheep@gmail.com/T/#m988ffad2000c794dcfdab7e60b03db93d8726391

Signed-off-by: David Wang <00107082@163.com>
---
 drivers/gpu/drm/qxl/qxl_release.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 9febc8b73f09..f372085c5aad 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -76,7 +76,7 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
 	qxl_io_notify_oom(qdev);
 
 	for (count = 0; count < 11; count++) {
-		if (!qxl_queue_garbage_collect(qdev, true))
+		if (!qxl_queue_garbage_collect(qdev, false))
 			break;
 
 		if (dma_fence_is_signaled(fence))
-- 
2.39.2



David


             reply	other threads:[~2024-04-30  6:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-30  6:13 David Wang [this message]
2024-05-06 14:30 ` [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl] David Wang
2024-05-07  5:04   ` Linux regression tracking (Thorsten Leemhuis)
2024-05-08 12:35     ` Anders Blomdell
2024-05-08 12:51       ` Linux regression tracking (Thorsten Leemhuis)
2024-05-13 11:32         ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240430061337.764633-1-00107082@163.com \
    --to=00107082@163.com \
    --cc=airlied@gmail.com \
    --cc=airlied@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dreaming.about.electric.sheep@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kraxel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=spice-devel@lists.freedesktop.org \
    --cc=tzimmermann@suse.de \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.