Re: [PATCH] workqueue: missing NOT while checking if Workqueue is offline

From: Geraldo Nascimento <geraldogabriel@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] workqueue: missing NOT while checking if Workqueue is offline
Date: Sun, 29 May 2022 02:53:39 -0300	[thread overview]
Message-ID: <YpMKY88/2tTK319E@geday> (raw)
In-Reply-To: <YpMDmZZ7IpEhjywp@slm.duckdns.org>

On Sat, May 28, 2022 at 07:24:41PM -1000, Tejun Heo wrote:
> On Sun, May 29, 2022 at 01:29:32AM -0300, Geraldo Nascimento wrote:
> > I would like very much to hear the opinion of the maintainers!
> 
> I have a hard time understanding what you're trying to do. Can you please
> slow down and start from describing the problem itself?

Hi Tejun,

Sorry for the hurry.

The problem is best described in https://gitlab.freedesktop.org/drm/amd/-/issues/1898

From my understanding from the context of __cancel_work_timer() we should not
ever call __flush_work() but I may be wrong. In the present case as
described in AMD's GitLab __cancel_work_timer() is being called by
cancel_delayed_work_sync() inside kfd_process_notifier_release()
from drivers/gpu/drm/amd/amdkfd/kfd_process.c:1157 (Linux 5.18).

We should only call __flush_work() from __cancel_work_timer() if
workqueue_init() is not yet initialized, that's possible during
early boot though not very likely. Anyway that's before kthreads are
spwaned, so we are sure that particular work isn't executing, hence
why it's safe to call __flush_work() in this particular case.
The comment on kernel/workqueue.c:3157 (for Linux 5.18) says it best:	

	/*
	 * This allows canceling during early boot.  We know that @work
	 * isn't executing.
	 */
	 	if (wq_online)
		__flush_work(work, true);

If __flush_work() is ever called during early boot it will result in a
WARN_ON because workqueue is not online. I have no idea if that's OK
though it hasn't harmed my machine. Of course I don't want to introduce
bugs, I wanna solve them, and I appreciate your cautious approach. Thank
you for the work.

What is not OK apparently is trying to use amdkfd without HSA_AMD_SVM configured! :)

Thank you,
Geraldo Nascimento

> 
> Thanks.
> 
> -- 
> tejun