All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kthread: add kthread_mod_pending_delayed_work api
@ 2021-02-14  0:06 Yiwei Zhang
  2021-02-15 13:28 ` Petr Mladek
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Yiwei Zhang @ 2021-02-14  0:06 UTC (permalink / raw)
  To: Andrew Morton, Felix Kuehling, Jens Axboe, Petr Mladek,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen
  Cc: linux-kernel, kernel-team, Yiwei Zhang

The existing kthread_mod_delayed_work api will queue a new work if
failing to cancel the current work due to no longer being pending.
However, there's a case that the same work can be enqueued from both
an async request and a delayed work, and a racing could happen if the
async request comes right after the timeout delayed work gets scheduled,
because the clean up work may not be safe to run twice.

Signed-off-by: Yiwei Zhang <zzyiwei@android.com>
---
 include/linux/kthread.h |  3 +++
 kernel/kthread.c        | 48 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 65b81e0c494d..250cdc5ff2a5 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -192,6 +192,9 @@ bool kthread_queue_delayed_work(struct kthread_worker *worker,
 bool kthread_mod_delayed_work(struct kthread_worker *worker,
 			      struct kthread_delayed_work *dwork,
 			      unsigned long delay);
+bool kthread_mod_pending_delayed_work(struct kthread_worker *worker,
+				      struct kthread_delayed_work *dwork,
+				      unsigned long delay);
 
 void kthread_flush_work(struct kthread_work *work);
 void kthread_flush_worker(struct kthread_worker *worker);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index a5eceecd4513..13881076afdd 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1142,6 +1142,54 @@ bool kthread_mod_delayed_work(struct kthread_worker *worker,
 }
 EXPORT_SYMBOL_GPL(kthread_mod_delayed_work);
 
+/**
+ * kthread_mod_pending_delayed_work - modify delay of a pending delayed work
+ * @worker: kthread worker to use
+ * @dwork: kthread delayed work to queue
+ * @delay: number of jiffies to wait before queuing
+ *
+ * If @dwork is still pending modify @dwork's timer so that it expires after
+ * @delay. If @dwork is still pending and @delay is zero, @work is guaranteed to
+ * be queued immediately.
+ *
+ * Return: %true if @dwork was pending and its timer was modified,
+ * %false otherwise.
+ *
+ * A special case is when the work is being canceled in parallel.
+ * It might be caused either by the real kthread_cancel_delayed_work_sync()
+ * or yet another kthread_mod_delayed_work() call. We let the other command
+ * win and return %false here. The caller is supposed to synchronize these
+ * operations a reasonable way.
+ *
+ * This function is safe to call from any context including IRQ handler.
+ * See __kthread_cancel_work() and kthread_delayed_work_timer_fn()
+ * for details.
+ */
+bool kthread_mod_pending_delayed_work(struct kthread_worker *worker,
+				      struct kthread_delayed_work *dwork,
+				      unsigned long delay)
+{
+	struct kthread_work *work = &dwork->work;
+	unsigned long flags;
+	int ret = true;
+
+	raw_spin_lock_irqsave(&worker->lock, flags);
+	if (!work->worker || work->canceling ||
+	    !__kthread_cancel_work(work, true, &flags)) {
+		ret = false;
+		goto out;
+	}
+
+	/* Work must not be used with >1 worker, see kthread_queue_work() */
+	WARN_ON_ONCE(work->worker != worker);
+
+	__kthread_queue_delayed_work(worker, dwork, delay);
+out:
+	raw_spin_unlock_irqrestore(&worker->lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kthread_mod_pending_delayed_work);
+
 static bool __kthread_cancel_work_sync(struct kthread_work *work, bool is_dwork)
 {
 	struct kthread_worker *worker = work->worker;
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-14  0:06 [PATCH] kthread: add kthread_mod_pending_delayed_work api Yiwei Zhang
@ 2021-02-15 13:28 ` Petr Mladek
  2021-02-16  9:11 ` Christoph Hellwig
  2021-02-19 10:56 ` Petr Mladek
  2 siblings, 0 replies; 15+ messages in thread
From: Petr Mladek @ 2021-02-15 13:28 UTC (permalink / raw)
  To: Yiwei Zhang
  Cc: Andrew Morton, Felix Kuehling, Jens Axboe, J. Bruce Fields,
	Peter Zijlstra, Frederic Weisbecker, Marcelo Tosatti,
	Ilias Stamatis, Rob Clark, Mathieu Desnoyers, Liang Chen,
	linux-kernel, kernel-team

On Sun 2021-02-14 00:06:11, Yiwei Zhang wrote:
> The existing kthread_mod_delayed_work api will queue a new work if
> failing to cancel the current work due to no longer being pending.
> However, there's a case that the same work can be enqueued from both
> an async request and a delayed work, and a racing could happen if the
> async request comes right after the timeout delayed work gets
> scheduled,

By other words, you want to modify the delayed work only when
it is still waiting in the queue. You do not want to queue new
work when it has not been already queued. Do I get it correctly?

Could you please provide a patch where the new API is used?

> because the clean up work may not be safe to run twice.

This looks like a bad design of the code. There is likely
another race that might break it. You should ask the following
questions:

Why anyone tries to modify the clean up work when it has been already
queued? There should be only one location/caller that triggers the clean up.

Could anyone queue any work to the workqueue after the clean up
work was queued? The cleanup work should be the last queued one.
The workqueue user must inform all other users that the queue
is being destroyed and nobody is allowed to queue any work
any longer.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-14  0:06 [PATCH] kthread: add kthread_mod_pending_delayed_work api Yiwei Zhang
  2021-02-15 13:28 ` Petr Mladek
@ 2021-02-16  9:11 ` Christoph Hellwig
  2021-02-16 18:58   ` Yiwei Zhang‎
  2021-02-19 10:56 ` Petr Mladek
  2 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2021-02-16  9:11 UTC (permalink / raw)
  To: Yiwei Zhang
  Cc: Andrew Morton, Felix Kuehling, Jens Axboe, Petr Mladek,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, linux-kernel, kernel-team

On Sun, Feb 14, 2021 at 12:06:11AM +0000, Yiwei Zhang wrote:
> The existing kthread_mod_delayed_work api will queue a new work if
> failing to cancel the current work due to no longer being pending.
> However, there's a case that the same work can be enqueued from both
> an async request and a delayed work, and a racing could happen if the
> async request comes right after the timeout delayed work gets scheduled,
> because the clean up work may not be safe to run twice.

Who is going to use this?  Please submit it together with the actual
user.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-16  9:11 ` Christoph Hellwig
@ 2021-02-16 18:58   ` Yiwei Zhang‎
  2021-02-17 11:14     ` Petr Mladek
  0 siblings, 1 reply; 15+ messages in thread
From: Yiwei Zhang‎ @ 2021-02-16 18:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Felix Kuehling, Jens Axboe, Petr Mladek,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

On Mon, Feb 15, 2021 at 5:28 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Sun 2021-02-14 00:06:11, Yiwei Zhang wrote:
> > The existing kthread_mod_delayed_work api will queue a new work if
> > failing to cancel the current work due to no longer being pending.
> > However, there's a case that the same work can be enqueued from both
> > an async request and a delayed work, and a racing could happen if the
> > async request comes right after the timeout delayed work gets
> > scheduled,
>
> By other words, you want to modify the delayed work only when
> it is still waiting in the queue. You do not want to queue new
> work when it has not been already queued. Do I get it correctly?
>
Yes, you are correct.

> Could you please provide a patch where the new API is used?
>
Currently it will only get used in a downstream gpu driver.

> > because the clean up work may not be safe to run twice.
>
> This looks like a bad design of the code. There is likely
> another race that might break it. You should ask the following
> questions:
>
> Why anyone tries to modify the clean up work when it has been already
> queued? There should be only one location/caller that triggers the clean up.
>
The clean up work was initially queued as a safe timeout work just in
case the userspace doesn't queue the cleanup work(e.g. process
crashing), which leaves the global driver in an incorrect driver
state(e.g. power state due to some hinting). In addition, the cleanup
work will also have to clean the cache allocated work struct as well
in the racing scenario.

> Could anyone queue any work to the workqueue after the clean up
> work was queued? The cleanup work should be the last queued one.
> The workqueue user must inform all other users that the queue
> is being destroyed and nobody is allowed to queue any work
> any longer.
>
User can queue the initial work(internally it queues a cleanup work
with a big timeout in case user doesn't queue it later). Then after
user has done stuff within the scope, the user will queue the cleanup
work again to cancel out the effect, which is when it may race with
the underlying timeout'ed cleanup work.

> Best Regards,
> Petr

On Tue, Feb 16, 2021 at 1:12 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sun, Feb 14, 2021 at 12:06:11AM +0000, Yiwei Zhang wrote:
> > The existing kthread_mod_delayed_work api will queue a new work if
> > failing to cancel the current work due to no longer being pending.
> > However, there's a case that the same work can be enqueued from both
> > an async request and a delayed work, and a racing could happen if the
> > async request comes right after the timeout delayed work gets scheduled,
> > because the clean up work may not be safe to run twice.
>
> Who is going to use this?  Please submit it together with the actual
> user.

I'm not sure what I should do if there's no users in the main kernel
tree. Currently it's only potentially used in a downstream gpu driver.
I was proposing here to see if this behavior is reasonable for this
mod function.(My guts feel, when folks are calling this mod function,
they do want to modify "the" exact pending work instead of requeuing
another if the pending one already gets scheduled...)

Best regards,
Yiwei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-16 18:58   ` Yiwei Zhang‎
@ 2021-02-17 11:14     ` Petr Mladek
  2021-02-19  6:29       ` Yiwei Zhang‎
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2021-02-17 11:14 UTC (permalink / raw)
  To: Yiwei Zhang
  Cc: Christoph Hellwig, Andrew Morton, Felix Kuehling, Jens Axboe,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

On Tue 2021-02-16 10:58:36, Yiwei Zhang wrote:
> On Mon, Feb 15, 2021 at 5:28 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Sun 2021-02-14 00:06:11, Yiwei Zhang wrote:
> > > The existing kthread_mod_delayed_work api will queue a new work if
> > > failing to cancel the current work due to no longer being pending.
> > > However, there's a case that the same work can be enqueued from both
> > > an async request and a delayed work, and a racing could happen if the
> > > async request comes right after the timeout delayed work gets
> > > scheduled,
> >
> > By other words, you want to modify the delayed work only when
> > it is still waiting in the queue. You do not want to queue new
> > work when it has not been already queued. Do I get it correctly?
> >
> Yes, you are correct.
> 
> > Could you please provide a patch where the new API is used?
> >
> Currently it will only get used in a downstream gpu driver.
> 
> > > because the clean up work may not be safe to run twice.
> >
> > This looks like a bad design of the code. There is likely
> > another race that might break it. You should ask the following
> > questions:
> >
> > Why anyone tries to modify the clean up work when it has been already
> > queued? There should be only one location/caller that triggers the clean up.
> >
> The clean up work was initially queued as a safe timeout work just in
> case the userspace doesn't queue the cleanup work(e.g. process
> crashing), which leaves the global driver in an incorrect driver
> state(e.g. power state due to some hinting). In addition, the cleanup
> work will also have to clean the cache allocated work struct as well
> in the racing scenario.
> 
> > Could anyone queue any work to the workqueue after the clean up
> > work was queued? The cleanup work should be the last queued one.
> > The workqueue user must inform all other users that the queue
> > is being destroyed and nobody is allowed to queue any work
> > any longer.
> >
> User can queue the initial work(internally it queues a cleanup work
> with a big timeout in case user doesn't queue it later). Then after
> user has done stuff within the scope, the user will queue the cleanup
> work again to cancel out the effect, which is when it may race with
> the underlying timeout'ed cleanup work.

And this is exactly the design problem. If there race is possible
then there are three possible scenarios:

1. User does the clean up before the timeout. This is the scenario
   where things work as expected.

2. User triggered clean up races with the clean up triggered by
   timeout. You try to handle this scenario by this patch.

3. User does clean up after the clean up has already been done
   by the timeout. It means that the user used the driver after
   it has already been cleaned up. This should not happen.
   I guess that the user commands will fail when the device has
   been cleaned up in the meantime.

By other words, you are focusing on a small race window. But there
is much bigger problem when the user could still use the cleaned
up driver.

There must be a better solution. You should avoid the timer because
it is not reliable. The following comes to my mind:

1. The userspace application might do the clean up from SIGKILL
   handler. It will do the clean up even when it crashes. But you
   would still rely on userspace to do the correct thing.

2. I do not see a clean solution in kernel

    One possibility might be to register something called from
   __put_task_struct(). It seems profile_handoff_task() calls
   some notifiers that can be registered from anywhere.

   Another possibility might be to register a notifier called by
   profile_task_exit(tsk) that is called from do_exit().

   It is not a clean solution because profile_task has another
   purpose. It might make sense to introduce a new generic notifier
   that is called during the task exit for this purpose.
   I am sure that it might have even more users.

   Anyway, look for put_task_struct(). It seems to be called in some
   drivers when destroying. I wonder if there is something that
   you might need.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-17 11:14     ` Petr Mladek
@ 2021-02-19  6:29       ` Yiwei Zhang‎
  2021-02-19 10:27         ` Petr Mladek
  0 siblings, 1 reply; 15+ messages in thread
From: Yiwei Zhang‎ @ 2021-02-19  6:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Christoph Hellwig, Andrew Morton, Felix Kuehling, Jens Axboe,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

> 2. User triggered clean up races with the clean up triggered by
>   timeout. You try to handle this scenario by this patch.
Yes, exactly.

> 3. User does clean up after the clean up has already been done
>   by the timeout.
This case is well handled. So only (2) has a potential race.

Let me clarify a bit more here. The "clean up" is not the clean up
when a process tears down, but it's actually a "post-work" to cancel
out an early "pre-work". The "pre-work" enqueues the delayed "post
work" for the timeout purpose. That pair of operations can repeatedly
happen.

The racing is currently worked around by refcounting the delayed_work
container, and the later "post-work" will take care of the work
deallocation.

I mainly want to reach out to see if we agree that this is a valid API
to be supported by kthread. Or we extend the existing
kthread_mod_delayed_work api to take another option to not re-queue if
the cancel failed.

Best,
Yiwei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-19  6:29       ` Yiwei Zhang‎
@ 2021-02-19 10:27         ` Petr Mladek
  2021-02-19 10:30           ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2021-02-19 10:27 UTC (permalink / raw)
  To: Yiwei Zhang
  Cc: Christoph Hellwig, Andrew Morton, Felix Kuehling, Jens Axboe,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

On Thu 2021-02-18 22:29:24, Yiwei Zhang wrote:
> > 2. User triggered clean up races with the clean up triggered by
> >   timeout. You try to handle this scenario by this patch.
> Yes, exactly.
> 
> > 3. User does clean up after the clean up has already been done
> >   by the timeout.
> This case is well handled. So only (2) has a potential race.

Just to be sure. Does the user work correctly when the clean up work
is done by the timemout before the user wanted to do the clean up?

> Let me clarify a bit more here. The "clean up" is not the clean up
> when a process tears down, but it's actually a "post-work" to cancel
> out an early "pre-work". The "pre-work" enqueues the delayed "post
> work" for the timeout purpose. That pair of operations can repeatedly
> happen.
> 
> The racing is currently worked around by refcounting the delayed_work
> container, and the later "post-work" will take care of the work
> deallocation.
> 
> I mainly want to reach out to see if we agree that this is a valid API
> to be supported by kthread. Or we extend the existing
> kthread_mod_delayed_work api to take another option to not re-queue if
> the cancel failed.

OK, I could imagine a situation when you want to speed up the delayed
work and avoid this race.

The kthread_worker API has more or less the same semantic as
the workqueue API. It makes it easier to switch between them.

The workqueue API has flush_delayed_work(). It does basically
the same as your code. We should call the function
kthread_worker_flush_delayed_work().

I am personally fine with adding this API. I am going to
comment the original code. Well, there might be a push-back
from other people because there will be no in-tree user.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-19 10:27         ` Petr Mladek
@ 2021-02-19 10:30           ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2021-02-19 10:30 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Yiwei Zhang, Christoph Hellwig, Andrew Morton, Felix Kuehling,
	Jens Axboe, J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

On Fri, Feb 19, 2021 at 11:27:13AM +0100, Petr Mladek wrote:
> I am personally fine with adding this API. I am going to
> comment the original code. Well, there might be a push-back
> from other people because there will be no in-tree user.

Adding a new API without an intree user is a complete no-go.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-14  0:06 [PATCH] kthread: add kthread_mod_pending_delayed_work api Yiwei Zhang
  2021-02-15 13:28 ` Petr Mladek
  2021-02-16  9:11 ` Christoph Hellwig
@ 2021-02-19 10:56 ` Petr Mladek
  2021-02-23  0:39   ` Yiwei Zhang‎
  2 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2021-02-19 10:56 UTC (permalink / raw)
  To: Yiwei Zhang
  Cc: Andrew Morton, Felix Kuehling, Jens Axboe, J. Bruce Fields,
	Peter Zijlstra, Frederic Weisbecker, Marcelo Tosatti,
	Ilias Stamatis, Rob Clark, Mathieu Desnoyers, Liang Chen,
	linux-kernel, kernel-team

On Sun 2021-02-14 00:06:11, Yiwei Zhang wrote:
> The existing kthread_mod_delayed_work api will queue a new work if
> failing to cancel the current work due to no longer being pending.
> However, there's a case that the same work can be enqueued from both
> an async request and a delayed work, and a racing could happen if the
> async request comes right after the timeout delayed work gets scheduled,
> because the clean up work may not be safe to run twice.

Please, provide more details about the use case. Why the work is
originally sheduled with a delay. And and why it suddenly can/should
be proceed immediately.

> 
> Signed-off-by: Yiwei Zhang <zzyiwei@android.com>
> ---
>  include/linux/kthread.h |  3 +++
>  kernel/kthread.c        | 48 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+)
> 
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1142,6 +1142,54 @@ bool kthread_mod_delayed_work(struct kthread_worker *worker,
>  }
>  EXPORT_SYMBOL_GPL(kthread_mod_delayed_work);
>  
> +/**
> + * kthread_mod_pending_delayed_work - modify delay of a pending delayed work
> + * @worker: kthread worker to use
> + * @dwork: kthread delayed work to queue
> + * @delay: number of jiffies to wait before queuing
> + *
> + * If @dwork is still pending modify @dwork's timer so that it expires after
> + * @delay. If @dwork is still pending and @delay is zero, @work is guaranteed to
> + * be queued immediately.
> + *
> + * Return: %true if @dwork was pending and its timer was modified,
> + * %false otherwise.
> + *
> + * A special case is when the work is being canceled in parallel.
> + * It might be caused either by the real kthread_cancel_delayed_work_sync()
> + * or yet another kthread_mod_delayed_work() call. We let the other command
> + * win and return %false here. The caller is supposed to synchronize these
> + * operations a reasonable way.
> + *
> + * This function is safe to call from any context including IRQ handler.
> + * See __kthread_cancel_work() and kthread_delayed_work_timer_fn()
> + * for details.
> + */
> +bool kthread_mod_pending_delayed_work(struct kthread_worker *worker,
> +				      struct kthread_delayed_work *dwork,
> +				      unsigned long delay)
> +{

kthread_worker API tries to follow the workqueue API. It helps to use and
switch between them easily.

workqueue API does not provide this possibility. Instead it has
flush_delayed_work(). It queues the work when it was pending and
waits until the work is procced. So, we might do:

bool kthread_flush_delayed_work(struct kthread_delayed_work *dwork)


> +	struct kthread_work *work = &dwork->work;
> +	unsigned long flags;
> +	int ret = true;
> +
> +	raw_spin_lock_irqsave(&worker->lock, flags);
> +	if (!work->worker || work->canceling ||
> +	    !__kthread_cancel_work(work, true, &flags)) {
> +		ret = false;
> +		goto out;
> +	}

Please, use separate checks with comments as it is done, for example,
in kthread_mod_delayed_work()

	struct kthread_work *work = &dwork->work;
	unsigned long flags;
	int ret;

	raw_spin_lock_irqsave(&worker->lock, flags);

	/* Do not bother with canceling when never queued. */
	if (!work->worker)
		goto nope;

	/* Do not fight with another command that is canceling this work. */
	if (work->canceling)
		goto nope;

	/* Nope when the work was not pending. */
	ret = __kthread_cancel_work(work, true, &flags);
	if (!ret)
		nope;

	/* Queue the work immediately. */
	kthread_insert_work(worker, work, &worker->work_list);
	raw_spin_unlock_irqrestore(&worker->lock, flags);

	return kthread_flush_work(work);
nope:
	raw_spin_unlock_irqrestore(&worker->lock, flags);
	return false;


Will this work for you?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-19 10:56 ` Petr Mladek
@ 2021-02-23  0:39   ` Yiwei Zhang‎
  2021-02-23  0:58     ` Yiwei Zhang‎
  0 siblings, 1 reply; 15+ messages in thread
From: Yiwei Zhang‎ @ 2021-02-23  0:39 UTC (permalink / raw)
  To: Petr Mladek, Christoph Hellwig
  Cc: Andrew Morton, Felix Kuehling, Jens Axboe, J. Bruce Fields,
	Peter Zijlstra, Frederic Weisbecker, Marcelo Tosatti,
	Ilias Stamatis, Rob Clark, Mathieu Desnoyers, Liang Chen,
	Linux Kernel Mailing List, kernel-team

On Fri, Feb 19, 2021 at 2:56 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Sun 2021-02-14 00:06:11, Yiwei Zhang wrote:
> > The existing kthread_mod_delayed_work api will queue a new work if
> > failing to cancel the current work due to no longer being pending.
> > However, there's a case that the same work can be enqueued from both
> > an async request and a delayed work, and a racing could happen if the
> > async request comes right after the timeout delayed work gets scheduled,
> > because the clean up work may not be safe to run twice.
>
> Please, provide more details about the use case. Why the work is
> originally sheduled with a delay. And and why it suddenly can/should
> be proceed immediately.
>
> >
> > Signed-off-by: Yiwei Zhang <zzyiwei@android.com>
> > ---
> >  include/linux/kthread.h |  3 +++
> >  kernel/kthread.c        | 48 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 51 insertions(+)
> >
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -1142,6 +1142,54 @@ bool kthread_mod_delayed_work(struct kthread_worker *worker,
> >  }
> >  EXPORT_SYMBOL_GPL(kthread_mod_delayed_work);
> >
> > +/**
> > + * kthread_mod_pending_delayed_work - modify delay of a pending delayed work
> > + * @worker: kthread worker to use
> > + * @dwork: kthread delayed work to queue
> > + * @delay: number of jiffies to wait before queuing
> > + *
> > + * If @dwork is still pending modify @dwork's timer so that it expires after
> > + * @delay. If @dwork is still pending and @delay is zero, @work is guaranteed to
> > + * be queued immediately.
> > + *
> > + * Return: %true if @dwork was pending and its timer was modified,
> > + * %false otherwise.
> > + *
> > + * A special case is when the work is being canceled in parallel.
> > + * It might be caused either by the real kthread_cancel_delayed_work_sync()
> > + * or yet another kthread_mod_delayed_work() call. We let the other command
> > + * win and return %false here. The caller is supposed to synchronize these
> > + * operations a reasonable way.
> > + *
> > + * This function is safe to call from any context including IRQ handler.
> > + * See __kthread_cancel_work() and kthread_delayed_work_timer_fn()
> > + * for details.
> > + */
> > +bool kthread_mod_pending_delayed_work(struct kthread_worker *worker,
> > +                                   struct kthread_delayed_work *dwork,
> > +                                   unsigned long delay)
> > +{
>
> kthread_worker API tries to follow the workqueue API. It helps to use and
> switch between them easily.
>
> workqueue API does not provide this possibility. Instead it has
> flush_delayed_work(). It queues the work when it was pending and
> waits until the work is procced. So, we might do:
>
> bool kthread_flush_delayed_work(struct kthread_delayed_work *dwork)
>
>
> > +     struct kthread_work *work = &dwork->work;
> > +     unsigned long flags;
> > +     int ret = true;
> > +
> > +     raw_spin_lock_irqsave(&worker->lock, flags);
> > +     if (!work->worker || work->canceling ||
> > +         !__kthread_cancel_work(work, true, &flags)) {
> > +             ret = false;
> > +             goto out;
> > +     }
>
> Please, use separate checks with comments as it is done, for example,
> in kthread_mod_delayed_work()
>
>         struct kthread_work *work = &dwork->work;
>         unsigned long flags;
>         int ret;
>
>         raw_spin_lock_irqsave(&worker->lock, flags);
>
>         /* Do not bother with canceling when never queued. */
>         if (!work->worker)
>                 goto nope;
>
>         /* Do not fight with another command that is canceling this work. */
>         if (work->canceling)
>                 goto nope;
>
>         /* Nope when the work was not pending. */
>         ret = __kthread_cancel_work(work, true, &flags);
>         if (!ret)
>                 nope;
>
>         /* Queue the work immediately. */
>         kthread_insert_work(worker, work, &worker->work_list);
>         raw_spin_unlock_irqrestore(&worker->lock, flags);
>
>         return kthread_flush_work(work);
> nope:
>         raw_spin_unlock_irqrestore(&worker->lock, flags);
>         return false;
>
>
> Will this work for you?
>
> Best Regards,
> Petr

Thanks for your comments and reviews, Petr! I completely understand
Christoph's pushback regarding no upstream use case here. Just want to
see if this is a missing use case in kthread. I'll propose again if
later I find a use case in any upstream drivers.

Best,
Yiwei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-23  0:39   ` Yiwei Zhang‎
@ 2021-02-23  0:58     ` Yiwei Zhang‎
  2021-02-23 15:52       ` Petr Mladek
  0 siblings, 1 reply; 15+ messages in thread
From: Yiwei Zhang‎ @ 2021-02-23  0:58 UTC (permalink / raw)
  To: Petr Mladek, Christoph Hellwig
  Cc: Andrew Morton, Felix Kuehling, Jens Axboe, J. Bruce Fields,
	Peter Zijlstra, Frederic Weisbecker, Marcelo Tosatti,
	Ilias Stamatis, Rob Clark, Mathieu Desnoyers, Liang Chen,
	Linux Kernel Mailing List, kernel-team

Since you awesome guys are here, I do have another kthread related
question, and hopefully to get some suggestions:

Below are the conditions:
1. The caller threads queuing the work are normal threads(non-RT).
2. The worker thread is a realtime kernel thread with relatively high prio.
3. We are not allowed to pin caller threads to fixed cpu clusters.

Sometimes when the CPU is busy, the worker thread starts preempting
the caller thread, which is not cool because it will make the
asynchronous effort a no-op. Is there a way we can include caller
thread metadata(task_struct pointer?) when it enqueues the work so
that the RT worker thread won't preempt the caller thread when that
queued work gets scheduled? Probably require the CPU scheduler to poke
at the next work...or any other ideas will be very appreciated,
thanks!

Best regards,
Yiwei

On Mon, Feb 22, 2021 at 4:39 PM Yiwei Zhang‎ <zzyiwei@android.com> wrote:
>
> On Fri, Feb 19, 2021 at 2:56 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Sun 2021-02-14 00:06:11, Yiwei Zhang wrote:
> > > The existing kthread_mod_delayed_work api will queue a new work if
> > > failing to cancel the current work due to no longer being pending.
> > > However, there's a case that the same work can be enqueued from both
> > > an async request and a delayed work, and a racing could happen if the
> > > async request comes right after the timeout delayed work gets scheduled,
> > > because the clean up work may not be safe to run twice.
> >
> > Please, provide more details about the use case. Why the work is
> > originally sheduled with a delay. And and why it suddenly can/should
> > be proceed immediately.
> >
> > >
> > > Signed-off-by: Yiwei Zhang <zzyiwei@android.com>
> > > ---
> > >  include/linux/kthread.h |  3 +++
> > >  kernel/kthread.c        | 48 +++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 51 insertions(+)
> > >
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -1142,6 +1142,54 @@ bool kthread_mod_delayed_work(struct kthread_worker *worker,
> > >  }
> > >  EXPORT_SYMBOL_GPL(kthread_mod_delayed_work);
> > >
> > > +/**
> > > + * kthread_mod_pending_delayed_work - modify delay of a pending delayed work
> > > + * @worker: kthread worker to use
> > > + * @dwork: kthread delayed work to queue
> > > + * @delay: number of jiffies to wait before queuing
> > > + *
> > > + * If @dwork is still pending modify @dwork's timer so that it expires after
> > > + * @delay. If @dwork is still pending and @delay is zero, @work is guaranteed to
> > > + * be queued immediately.
> > > + *
> > > + * Return: %true if @dwork was pending and its timer was modified,
> > > + * %false otherwise.
> > > + *
> > > + * A special case is when the work is being canceled in parallel.
> > > + * It might be caused either by the real kthread_cancel_delayed_work_sync()
> > > + * or yet another kthread_mod_delayed_work() call. We let the other command
> > > + * win and return %false here. The caller is supposed to synchronize these
> > > + * operations a reasonable way.
> > > + *
> > > + * This function is safe to call from any context including IRQ handler.
> > > + * See __kthread_cancel_work() and kthread_delayed_work_timer_fn()
> > > + * for details.
> > > + */
> > > +bool kthread_mod_pending_delayed_work(struct kthread_worker *worker,
> > > +                                   struct kthread_delayed_work *dwork,
> > > +                                   unsigned long delay)
> > > +{
> >
> > kthread_worker API tries to follow the workqueue API. It helps to use and
> > switch between them easily.
> >
> > workqueue API does not provide this possibility. Instead it has
> > flush_delayed_work(). It queues the work when it was pending and
> > waits until the work is procced. So, we might do:
> >
> > bool kthread_flush_delayed_work(struct kthread_delayed_work *dwork)
> >
> >
> > > +     struct kthread_work *work = &dwork->work;
> > > +     unsigned long flags;
> > > +     int ret = true;
> > > +
> > > +     raw_spin_lock_irqsave(&worker->lock, flags);
> > > +     if (!work->worker || work->canceling ||
> > > +         !__kthread_cancel_work(work, true, &flags)) {
> > > +             ret = false;
> > > +             goto out;
> > > +     }
> >
> > Please, use separate checks with comments as it is done, for example,
> > in kthread_mod_delayed_work()
> >
> >         struct kthread_work *work = &dwork->work;
> >         unsigned long flags;
> >         int ret;
> >
> >         raw_spin_lock_irqsave(&worker->lock, flags);
> >
> >         /* Do not bother with canceling when never queued. */
> >         if (!work->worker)
> >                 goto nope;
> >
> >         /* Do not fight with another command that is canceling this work. */
> >         if (work->canceling)
> >                 goto nope;
> >
> >         /* Nope when the work was not pending. */
> >         ret = __kthread_cancel_work(work, true, &flags);
> >         if (!ret)
> >                 nope;
> >
> >         /* Queue the work immediately. */
> >         kthread_insert_work(worker, work, &worker->work_list);
> >         raw_spin_unlock_irqrestore(&worker->lock, flags);
> >
> >         return kthread_flush_work(work);
> > nope:
> >         raw_spin_unlock_irqrestore(&worker->lock, flags);
> >         return false;
> >
> >
> > Will this work for you?
> >
> > Best Regards,
> > Petr
>
> Thanks for your comments and reviews, Petr! I completely understand
> Christoph's pushback regarding no upstream use case here. Just want to
> see if this is a missing use case in kthread. I'll propose again if
> later I find a use case in any upstream drivers.
>
> Best,
> Yiwei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-23  0:58     ` Yiwei Zhang‎
@ 2021-02-23 15:52       ` Petr Mladek
  2021-02-23 22:29         ` Yiwei Zhang‎
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2021-02-23 15:52 UTC (permalink / raw)
  To: Yiwei Zhang
  Cc: Christoph Hellwig, Andrew Morton, Felix Kuehling, Jens Axboe,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

On Mon 2021-02-22 16:58:46, Yiwei Zhang wrote:
> Since you awesome guys are here, I do have another kthread related
> question, and hopefully to get some suggestions:
> 
> Below are the conditions:
> 1. The caller threads queuing the work are normal threads(non-RT).
> 2. The worker thread is a realtime kernel thread with relatively high prio.
> 3. We are not allowed to pin caller threads to fixed cpu clusters.
> 
> Sometimes when the CPU is busy, the worker thread starts preempting
> the caller thread,

This works as expected. RT tasks have higher priority than normal tasks.

> which is not cool because it will make the
> asynchronous effort a no-op. Is there a way we can include caller
> thread metadata(task_struct pointer?) when it enqueues the work so
> that the RT worker thread won't preempt the caller thread when that
> queued work gets scheduled? Probably require the CPU scheduler to poke
> at the next work...or any other ideas will be very appreciated,
> thanks!

This sounds like a very strange use case.
Why is the worker kthread RT when the work can be delayed?

If the kthread has to be RT because of another work then
your proposal will not work. The delayed processing of
low priority work might block and delay any pending
high priority work.

You should consider handling the less important work in a separate
kthread worker with a lower priority or by the system workqueue.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-23 15:52       ` Petr Mladek
@ 2021-02-23 22:29         ` Yiwei Zhang‎
  2021-02-24  9:34           ` Petr Mladek
  0 siblings, 1 reply; 15+ messages in thread
From: Yiwei Zhang‎ @ 2021-02-23 22:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Christoph Hellwig, Andrew Morton, Felix Kuehling, Jens Axboe,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

> > which is not cool because it will make the
> > asynchronous effort a no-op. Is there a way we can include caller
> > thread metadata(task_struct pointer?) when it enqueues the work so
> > that the RT worker thread won't preempt the caller thread when that
> > queued work gets scheduled? Probably require the CPU scheduler to poke
> > at the next work...or any other ideas will be very appreciated,
> > thanks!
>
> This sounds like a very strange use case.
> Why is the worker kthread RT when the work can be delayed?
>
> If the kthread has to be RT because of another work then
> your proposal will not work. The delayed processing of
> low priority work might block and delay any pending
> high priority work.
>
> You should consider handling the less important work in a separate
> kthread worker with a lower priority or by the system workqueue.

Just want to clarify that it's not about delayed_work any more. In my
latest question, it's a RT thread with normal work queued and
scheduled to be run immediately. However, I simply don't want the
worker to preempt the thread that queues the work.

It's a high prio work that we don't want other random tasks to preempt
it. Meanwhile, we don't want it to preempt the called thread. In
addition, assume we can't raise the priority of those caller
threads(otherwise I'd be fine with using a workqueue).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-23 22:29         ` Yiwei Zhang‎
@ 2021-02-24  9:34           ` Petr Mladek
  2021-02-25 22:17             ` Yiwei Zhang‎
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2021-02-24  9:34 UTC (permalink / raw)
  To: Yiwei Zhang
  Cc: Christoph Hellwig, Andrew Morton, Felix Kuehling, Jens Axboe,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

On Tue 2021-02-23 14:29:37, Yiwei Zhang wrote:
> > > which is not cool because it will make the
> > > asynchronous effort a no-op. Is there a way we can include caller
> > > thread metadata(task_struct pointer?) when it enqueues the work so
> > > that the RT worker thread won't preempt the caller thread when that
> > > queued work gets scheduled? Probably require the CPU scheduler to poke
> > > at the next work...or any other ideas will be very appreciated,
> > > thanks!
> >
> > This sounds like a very strange use case.
> > Why is the worker kthread RT when the work can be delayed?
> >
> > If the kthread has to be RT because of another work then
> > your proposal will not work. The delayed processing of
> > low priority work might block and delay any pending
> > high priority work.
> >
> > You should consider handling the less important work in a separate
> > kthread worker with a lower priority or by the system workqueue.
> 
> Just want to clarify that it's not about delayed_work any more. In my
> latest question, it's a RT thread with normal work queued and
> scheduled to be run immediately. However, I simply don't want the
> worker to preempt the thread that queues the work.
> 
> It's a high prio work that we don't want other random tasks to preempt
> it. Meanwhile, we don't want it to preempt the called thread. In
> addition, assume we can't raise the priority of those caller
> threads(otherwise I'd be fine with using a workqueue).

Honestly, it sounds weird to me. Either the caller or the
worker has higher priority.

Well, I think that behavior could be achieved by
CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY.

Anyway, this is rather a question for scheduler experts.
It is possible that it has some solution. But it is also
possible that it is so specific behavior and it would
complicate the scheduler too much.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api
  2021-02-24  9:34           ` Petr Mladek
@ 2021-02-25 22:17             ` Yiwei Zhang‎
  0 siblings, 0 replies; 15+ messages in thread
From: Yiwei Zhang‎ @ 2021-02-25 22:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Christoph Hellwig, Andrew Morton, Felix Kuehling, Jens Axboe,
	J. Bruce Fields, Peter Zijlstra, Frederic Weisbecker,
	Marcelo Tosatti, Ilias Stamatis, Rob Clark, Mathieu Desnoyers,
	Liang Chen, Linux Kernel Mailing List, kernel-team

On Wed, Feb 24, 2021 at 1:34 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Tue 2021-02-23 14:29:37, Yiwei Zhang wrote:
> > > > which is not cool because it will make the
> > > > asynchronous effort a no-op. Is there a way we can include caller
> > > > thread metadata(task_struct pointer?) when it enqueues the work so
> > > > that the RT worker thread won't preempt the caller thread when that
> > > > queued work gets scheduled? Probably require the CPU scheduler to poke
> > > > at the next work...or any other ideas will be very appreciated,
> > > > thanks!
> > >
> > > This sounds like a very strange use case.
> > > Why is the worker kthread RT when the work can be delayed?
> > >
> > > If the kthread has to be RT because of another work then
> > > your proposal will not work. The delayed processing of
> > > low priority work might block and delay any pending
> > > high priority work.
> > >
> > > You should consider handling the less important work in a separate
> > > kthread worker with a lower priority or by the system workqueue.
> >
> > Just want to clarify that it's not about delayed_work any more. In my
> > latest question, it's a RT thread with normal work queued and
> > scheduled to be run immediately. However, I simply don't want the
> > worker to preempt the thread that queues the work.
> >
> > It's a high prio work that we don't want other random tasks to preempt
> > it. Meanwhile, we don't want it to preempt the called thread. In
> > addition, assume we can't raise the priority of those caller
> > threads(otherwise I'd be fine with using a workqueue).
>
> Honestly, it sounds weird to me. Either the caller or the
> worker has higher priority.
>
> Well, I think that behavior could be achieved by
> CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY.
>
> Anyway, this is rather a question for scheduler experts.
> It is possible that it has some solution. But it is also
> possible that it is so specific behavior and it would
> complicate the scheduler too much.
>
> Best Regards,
> Petr

Thanks for the pointers! I'll explore further with scheduling folks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-02-25 22:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-14  0:06 [PATCH] kthread: add kthread_mod_pending_delayed_work api Yiwei Zhang
2021-02-15 13:28 ` Petr Mladek
2021-02-16  9:11 ` Christoph Hellwig
2021-02-16 18:58   ` Yiwei Zhang‎
2021-02-17 11:14     ` Petr Mladek
2021-02-19  6:29       ` Yiwei Zhang‎
2021-02-19 10:27         ` Petr Mladek
2021-02-19 10:30           ` Christoph Hellwig
2021-02-19 10:56 ` Petr Mladek
2021-02-23  0:39   ` Yiwei Zhang‎
2021-02-23  0:58     ` Yiwei Zhang‎
2021-02-23 15:52       ` Petr Mladek
2021-02-23 22:29         ` Yiwei Zhang‎
2021-02-24  9:34           ` Petr Mladek
2021-02-25 22:17             ` Yiwei Zhang‎

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.