linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Sandeep Dhavale <dhavale@google.com>,
	jiangshanlai@gmail.com, torvalds@linux-foundation.org,
	peterz@infradead.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, joshdon@google.com, brho@google.com,
	briannorris@chromium.org, nhuck@google.com, agk@redhat.com,
	snitzer@kernel.org, void@manifault.com, kernel-team@android.com
Subject: Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods
Date: Thu, 8 Jun 2023 12:50:25 -1000	[thread overview]
Message-ID: <ZIJbMQOu_k07jkFf@slm.duckdns.org> (raw)
In-Reply-To: <fb3461cd-3fc2-189a-a86b-c638816a2440@amd.com>

Hello,

On Thu, Jun 08, 2023 at 08:31:34AM +0530, K Prateek Nayak wrote:
...
> Thank you for sharing the debug branch. I've managed to hit some one of
> the WARN_ON_ONCE() consistently but I still haven't seen a kernel panic
> yet. Sharing the traces below:

Yeah, that's good. It does a dirty fix-up. Shouldn't crash.

> o Early Boot
> 
>     [    4.182411] ------------[ cut here ]------------
>     [    4.186313] WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:1130 kick_pool+0xdb/0xe0
>     [    4.186313] Modules linked in:
>     [    4.186313] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-rc1-tj-wq-valid-cpu+ #481
>     [    4.186313] Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
>     [    4.186313] RIP: 0010:kick_pool+0xdb/0xe0
>     [    4.186313] Code: 6b c0 d0 01 73 24 41 89 45 64 49 8b 54 24 f8 48 89 d0 30 c0 83 e2 04 ba 00 00 00 00 48 0f 44 c2 48 83 80 c0 00 00 00 01 eb 82 <0f> 0b eb dc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f
>     [    4.186313] RSP: 0018:ffffbc1b800e7dd8 EFLAGS: 00010046
>     [    4.186313] RAX: 0000000000000100 RBX: ffff97c73d2321c0 RCX: 0000000000000000
>     [    4.186313] RDX: 0000000000000040 RSI: 0000000000000001 RDI: ffff9788c0159728
>     [    4.186313] RBP: ffffbc1b800e7df0 R08: 0000000000000100 R09: ffff9788c01593e0
>     [    4.186313] R10: ffff9788c01593c0 R11: 0000000000000001 R12: ffffffff8c582430
>     [    4.186313] R13: ffff9788c03fcd40 R14: 0000000000000000 R15: ffff97c73d2324b0
>     [    4.186313] FS:  0000000000000000(0000) GS:ffff97c73d200000(0000) knlGS:0000000000000000
>     [    4.186313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [    4.186313] CR2: ffff97cecee01000 CR3: 000000470d43a001 CR4: 0000000000770ef0
>     [    4.186313] PKRU: 55555554
>     [    4.186313] Call Trace:
>     [    4.186313]  <TASK>
>     [    4.186313]  create_worker+0x14e/0x280
>     [    4.186313]  ? wake_up_process+0x15/0x20
>     [    4.186313]  workqueue_init+0x22a/0x3d0
>     [    4.186313]  kernel_init_freeable+0x1fe/0x4f0
>     [    4.186313]  ? __pfx_kernel_init+0x10/0x10
>     [    4.186313]  kernel_init+0x1b/0x1f0
>     [    4.186313]  ? __pfx_kernel_init+0x10/0x10
>     [    4.186313]  ret_from_fork+0x2c/0x50
>     [    4.186313]  </TASK>
>     [    4.186313] ---[ end trace 0000000000000000 ]---
> 
> o I consistently see a WARN_ON_ONCE() in kick_pool() being hit when I
>   run "sudo ./stress-ng --iomix 96 --timeout 1m". I've seen few
>   different stack traces so far. Including all below just in case:
...
> This is the same WARN_ON_ONCE() you had added in the HEAD commit:
> 
>     $ scripts/faddr2line vmlinux kick_pool+0xdb
>     kick_pool+0xdb/0xe0:
>     kick_pool at kernel/workqueue.c:1130 (discriminator 1)
> 
>     $ sed -n 1130,1132p kernel/workqueue.c
>     if (!WARN_ON_ONCE(wake_cpu >= nr_cpu_ids))
>         p->wake_cpu = wake_cpu;
>     get_work_pwq(work)->stats[PWQ_STAT_REPATRIATED]++;
> 
> Let me know if you need any more data from my test setup.
> P.S. The kernel is still up and running (~30min) despite hitting this
> WARN_ON_ONCE() in my case :)

Okay, that was me being stupid and not initializing the new fields for
per-cpu workqueues. Can you please test the following branch? It should have
both bugs fixed properly.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git affinity-scopes-v2

If that doesn't crash, I'd love to hear how it affects the perf regressions
reported over that past few months.

Thanks.

-- 
tejun

  reply	other threads:[~2023-06-08 22:50 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-19  0:16 [PATCHSET v1 wq/for-6.5] workqueue: Improve unbound workqueue execution locality Tejun Heo
2023-05-19  0:16 ` [PATCH 01/24] workqueue: Drop the special locking rule for worker->flags and worker_pool->flags Tejun Heo
2023-05-19  0:16 ` [PATCH 02/24] workqueue: Cleanups around process_scheduled_works() Tejun Heo
2023-05-19  0:16 ` [PATCH 03/24] workqueue: Not all work insertion needs to wake up a worker Tejun Heo
2023-05-23  9:54   ` Lai Jiangshan
2023-05-23 21:37     ` Tejun Heo
2023-08-08  1:15   ` [PATCH v2 " Tejun Heo
2023-05-19  0:16 ` [PATCH 04/24] workqueue: Rename wq->cpu_pwqs to wq->cpu_pwq Tejun Heo
2023-05-19  0:16 ` [PATCH 05/24] workqueue: Relocate worker and work management functions Tejun Heo
2023-05-19  0:16 ` [PATCH 06/24] workqueue: Remove module param disable_numa and sysfs knobs pool_ids and numa Tejun Heo
2023-05-19  0:16 ` [PATCH 07/24] workqueue: Use a kthread_worker to release pool_workqueues Tejun Heo
2023-05-19  0:16 ` [PATCH 08/24] workqueue: Make per-cpu pool_workqueues allocated and released like unbound ones Tejun Heo
2023-05-19  0:16 ` [PATCH 09/24] workqueue: Make unbound workqueues to use per-cpu pool_workqueues Tejun Heo
2023-05-22  6:41   ` Leon Romanovsky
2023-05-22 12:27     ` Dennis Dalessandro
2023-05-19  0:16 ` [PATCH 10/24] workqueue: Rename workqueue_attrs->no_numa to ->ordered Tejun Heo
2023-05-19  0:16 ` [PATCH 11/24] workqueue: Rename NUMA related names to use pod instead Tejun Heo
2023-05-19  0:16 ` [PATCH 12/24] workqueue: Move wq_pod_init() below workqueue_init() Tejun Heo
2023-05-19  0:16 ` [PATCH 13/24] workqueue: Initialize unbound CPU pods later in the boot Tejun Heo
2023-05-19  0:16 ` [PATCH 14/24] workqueue: Generalize unbound CPU pods Tejun Heo
2023-05-30  8:06   ` K Prateek Nayak
2023-06-07  1:50     ` Tejun Heo
2023-05-30 21:18   ` Sandeep Dhavale
2023-05-31 12:14     ` K Prateek Nayak
2023-06-07 22:13       ` Tejun Heo
2023-06-08  3:01         ` K Prateek Nayak
2023-06-08 22:50           ` Tejun Heo [this message]
2023-06-09  3:43             ` K Prateek Nayak
2023-06-14 18:49               ` Sandeep Dhavale
2023-06-21 20:14                 ` Tejun Heo
2023-06-19  4:30             ` Swapnil Sapkal
2023-06-21 20:38               ` Tejun Heo
2023-07-05  7:04             ` K Prateek Nayak
2023-07-05 18:39               ` Tejun Heo
2023-07-11  3:02                 ` K Prateek Nayak
2023-07-31 23:52                   ` Tejun Heo
2023-08-08  1:08                     ` Tejun Heo
2023-05-19  0:17 ` [PATCH 15/24] workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue configuration Tejun Heo
2023-05-19  0:17 ` [PATCH 16/24] workqueue: Modularize wq_pod_type initialization Tejun Heo
2023-05-19  0:17 ` [PATCH 17/24] workqueue: Add multiple affinity scopes and interface to select them Tejun Heo
2023-05-19  0:17 ` [PATCH 18/24] workqueue: Factor out work to worker assignment and collision handling Tejun Heo
2023-05-19  0:17 ` [PATCH 19/24] workqueue: Factor out need_more_worker() check and worker wake-up Tejun Heo
2023-05-19  0:17 ` [PATCH 20/24] workqueue: Add workqueue_attrs->__pod_cpumask Tejun Heo
2023-05-19  0:17 ` [PATCH 21/24] workqueue: Implement non-strict affinity scope for unbound workqueues Tejun Heo
2023-05-19  0:17 ` [PATCH 22/24] workqueue: Add "Affinity Scopes and Performance" section to documentation Tejun Heo
2023-05-19  0:17 ` [PATCH 23/24] workqueue: Add pool_workqueue->cpu Tejun Heo
2023-05-19  0:17 ` [PATCH 24/24] workqueue: Implement localize-to-issuing-CPU for unbound workqueues Tejun Heo
2023-05-19  0:41 ` [PATCHSET v1 wq/for-6.5] workqueue: Improve unbound workqueue execution locality Linus Torvalds
2023-05-19 22:35   ` Tejun Heo
2023-05-19 23:03     ` Tejun Heo
2023-05-23  1:51       ` Linus Torvalds
2023-05-23 17:59         ` Linus Torvalds
2023-05-23 20:08           ` Rik van Riel
2023-05-23 21:36           ` Sandeep Dhavale
2023-05-23 11:18 ` Peter Zijlstra
2023-05-23 16:12   ` Vincent Guittot
2023-05-24  7:34     ` Peter Zijlstra
2023-05-24 13:15       ` Vincent Guittot
2023-06-05  4:46     ` Gautham R. Shenoy
2023-06-07 14:42       ` Libo Chen
2023-05-26  1:12   ` Tejun Heo
2023-05-30 11:32     ` Peter Zijlstra
2023-06-12 23:56 ` Brian Norris
2023-06-13  2:48   ` Tejun Heo
2023-06-13  9:26     ` Pin-yen Lin
2023-06-21 19:16       ` Tejun Heo
2023-06-21 19:31         ` Linus Torvalds
2023-06-29  9:49           ` Pin-yen Lin
2023-07-03 21:47             ` Tejun Heo
2023-08-08  1:22 ` Tejun Heo
2023-08-08  2:58   ` K Prateek Nayak
2023-08-08  7:59     ` Tejun Heo
2023-08-18  4:05     ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIJbMQOu_k07jkFf@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=agk@redhat.com \
    --cc=brho@google.com \
    --cc=briannorris@chromium.org \
    --cc=dhavale@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joshdon@google.com \
    --cc=kernel-team@android.com \
    --cc=kernel-team@meta.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nhuck@google.com \
    --cc=peterz@infradead.org \
    --cc=snitzer@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).