linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] task_work: return -EBUSY when adding same work
@ 2021-07-09 12:27 yaozhenguo
  2021-07-09 14:18 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: yaozhenguo @ 2021-07-09 12:27 UTC (permalink / raw)
  To: axboe, oleg; +Cc: linux-kernel, yaozhenguo, yaozhenguo

when same work is added to task->task_works list one by one,
the list becomes endless loop. So return -EBUSY when this
situation happen.

Signed-off-by: yaozhenguo <yaozhenguo1@gmail.com>
---
 kernel/task_work.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 1698fbe..5061ebf 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -27,7 +27,7 @@
  * list is LIFO.
  *
  * RETURNS:
- * 0 if succeeds or -ESRCH.
+ * 0 if succeeds or -ESRCH, -EBUSY.
  */
 int task_work_add(struct task_struct *task, struct callback_head *work,
 		  enum task_work_notify_mode notify)
@@ -41,6 +41,8 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
 		head = READ_ONCE(task->task_works);
 		if (unlikely(head == &work_exited))
 			return -ESRCH;
+		if (unlikely(head == work))
+			return -EBUSY;
 		work->next = head;
 	} while (cmpxchg(&task->task_works, head, work) != head);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] task_work: return -EBUSY when adding same work
  2021-07-09 12:27 [PATCH] task_work: return -EBUSY when adding same work yaozhenguo
@ 2021-07-09 14:18 ` Jens Axboe
       [not found]   ` <CA+WzARnFgohHZ=BhL4OaCagB_c1uB6a=Bv7vM_zRUJeANHksEg@mail.gmail.com>
  2021-07-13 10:41   ` David Laight
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2021-07-09 14:18 UTC (permalink / raw)
  To: yaozhenguo, oleg; +Cc: linux-kernel, yaozhenguo

On 7/9/21 6:27 AM, yaozhenguo wrote:
> when same work is added to task->task_works list one by one,
> the list becomes endless loop. So return -EBUSY when this
> situation happen.
> 
> Signed-off-by: yaozhenguo <yaozhenguo1@gmail.com>
> ---
>  kernel/task_work.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 1698fbe..5061ebf 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -27,7 +27,7 @@
>   * list is LIFO.
>   *
>   * RETURNS:
> - * 0 if succeeds or -ESRCH.
> + * 0 if succeeds or -ESRCH, -EBUSY.
>   */
>  int task_work_add(struct task_struct *task, struct callback_head *work,
>  		  enum task_work_notify_mode notify)
> @@ -41,6 +41,8 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>  		head = READ_ONCE(task->task_works);
>  		if (unlikely(head == &work_exited))
>  			return -ESRCH;
> +		if (unlikely(head == work))
> +			return -EBUSY;
>  		work->next = head;
>  	} while (cmpxchg(&task->task_works, head, work) != head);

I don't think there's anything conceptually wrong with this patch, but
it makes me think that you hit this condition. It's really a bug in the
caller, of course, is a WARN_ON_ONCE() warranted here? And who was the
caller?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] task_work: return -EBUSY when adding same work
       [not found]   ` <CA+WzARnFgohHZ=BhL4OaCagB_c1uB6a=Bv7vM_zRUJeANHksEg@mail.gmail.com>
@ 2021-07-12  2:44     ` Jens Axboe
  2021-07-12  3:45       ` zhenguo yao
  2021-07-12  9:08       ` zhenguo yao
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2021-07-12  2:44 UTC (permalink / raw)
  To: zhenguo yao; +Cc: oleg, linux-kernel, yaozhenguo

On 7/11/21 8:13 PM, zhenguo yao wrote:
> Yes I hit this condition.  The caller is queue_task_work in
> arch/x86/kernel/cpu/mce/core.c.
> It is really a BUG. I have submitted another patch to fix it:
> https://lkml.org/lkml/2021/7/9/186.

That patch seems broken, what happens if mce_kill_me is added already,
but it isn't the first work item in the list?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] task_work: return -EBUSY when adding same work
  2021-07-12  2:44     ` Jens Axboe
@ 2021-07-12  3:45       ` zhenguo yao
  2021-07-12  9:08       ` zhenguo yao
  1 sibling, 0 replies; 7+ messages in thread
From: zhenguo yao @ 2021-07-12  3:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: oleg, linux-kernel, yaozhenguo

This issue happens in a stress test of memory UE injection. It has
more than once UEs reported to the OS at the same moment in the test.
So  do_machine_check-->queue_task_work is called many times.
mce_kill_me work is added to list many times.  When mce_kill_me is add
to the list,  it becomes the list header and then another mce_kill_me
is added to the list before task_work_run is called.  The list becomes
a dead loop: task->task_works = mce_kill_me, mce_kill_me->next =
mce_kill_me.  When the task want to  return to user mode and run
task_work_run.  It becomes a dead loop and never return to user mode
and process signal SIGBUS that mce_kill_me sent to him. I fix this by
following patch
--
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aa..9333696 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1299,7 +1299,9 @@ static void queue_task_work(struct mce *m, int
kill_current_task)
        else
                current->mce_kill_me.func = kill_me_maybe;

-       task_work_add(current, &current->mce_kill_me, TWA_RESUME);
+       /* Avoid endless loops when task_work_run is running */
+       if (READ_ONCE(current->task_works) != &current->mce_kill_me)
+               task_work_add(current, &current->mce_kill_me, TWA_RESUME);
 }
--
But I think it is better return an error in task_work_add when same
work is added to the list. Similar problem may happen in other scenes.
It is hard to debug when it is a seldom issue.

Jens Axboe <axboe@kernel.dk> 于2021年7月12日周一 上午10:44写道:
>
> On 7/11/21 8:13 PM, zhenguo yao wrote:
> > Yes I hit this condition.  The caller is queue_task_work in
> > arch/x86/kernel/cpu/mce/core.c.
> > It is really a BUG. I have submitted another patch to fix it:
> > https://lkml.org/lkml/2021/7/9/186.
>
> That patch seems broken, what happens if mce_kill_me is added already,
> but it isn't the first work item in the list?
>
> --
> Jens Axboe
>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] task_work: return -EBUSY when adding same work
  2021-07-12  2:44     ` Jens Axboe
  2021-07-12  3:45       ` zhenguo yao
@ 2021-07-12  9:08       ` zhenguo yao
  1 sibling, 0 replies; 7+ messages in thread
From: zhenguo yao @ 2021-07-12  9:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: oleg, linux-kernel, yaozhenguo

OK,Got it.  My patch does not cover this situation.  It seems that
there is always dead-loop when same works are added to the list.

Jens Axboe <axboe@kernel.dk> 于2021年7月12日周一 上午10:44写道:
>
> On 7/11/21 8:13 PM, zhenguo yao wrote:
> > Yes I hit this condition.  The caller is queue_task_work in
> > arch/x86/kernel/cpu/mce/core.c.
> > It is really a BUG. I have submitted another patch to fix it:
> > https://lkml.org/lkml/2021/7/9/186.
>
> That patch seems broken, what happens if mce_kill_me is added already,
> but it isn't the first work item in the list?
>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] task_work: return -EBUSY when adding same work
  2021-07-09 14:18 ` Jens Axboe
       [not found]   ` <CA+WzARnFgohHZ=BhL4OaCagB_c1uB6a=Bv7vM_zRUJeANHksEg@mail.gmail.com>
@ 2021-07-13 10:41   ` David Laight
  2021-07-13 20:22     ` Jens Axboe
  1 sibling, 1 reply; 7+ messages in thread
From: David Laight @ 2021-07-13 10:41 UTC (permalink / raw)
  To: 'Jens Axboe', yaozhenguo, oleg; +Cc: linux-kernel, yaozhenguo

From: Jens Axboe
> Sent: 09 July 2021 15:18
...
> >   */
> >  int task_work_add(struct task_struct *task, struct callback_head *work,
> >  		  enum task_work_notify_mode notify)
> > @@ -41,6 +41,8 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
> >  		head = READ_ONCE(task->task_works);
> >  		if (unlikely(head == &work_exited))
> >  			return -ESRCH;
> > +		if (unlikely(head == work))
> > +			return -EBUSY;
> >  		work->next = head;
> >  	} while (cmpxchg(&task->task_works, head, work) != head);
> 
> I don't think there's anything conceptually wrong with this patch, but
> it makes me think that you hit this condition. It's really a bug in the
> caller, of course, is a WARN_ON_ONCE() warranted here? And who was the
> caller?

How can the caller know that the task is on the queue?

There will be a race condition just before the work function
is called and/or just after it returns that the caller
can't detect.
The check needs to be done atomically with the code that
removes the work item from the list.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] task_work: return -EBUSY when adding same work
  2021-07-13 10:41   ` David Laight
@ 2021-07-13 20:22     ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2021-07-13 20:22 UTC (permalink / raw)
  To: David Laight, yaozhenguo, oleg; +Cc: linux-kernel, yaozhenguo

On 7/13/21 4:41 AM, David Laight wrote:
> From: Jens Axboe
>> Sent: 09 July 2021 15:18
> ...
>>>   */
>>>  int task_work_add(struct task_struct *task, struct callback_head *work,
>>>  		  enum task_work_notify_mode notify)
>>> @@ -41,6 +41,8 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>>>  		head = READ_ONCE(task->task_works);
>>>  		if (unlikely(head == &work_exited))
>>>  			return -ESRCH;
>>> +		if (unlikely(head == work))
>>> +			return -EBUSY;
>>>  		work->next = head;
>>>  	} while (cmpxchg(&task->task_works, head, work) != head);
>>
>> I don't think there's anything conceptually wrong with this patch, but
>> it makes me think that you hit this condition. It's really a bug in the
>> caller, of course, is a WARN_ON_ONCE() warranted here? And who was the
>> caller?
> 
> How can the caller know that the task is on the queue?

It's similar to double adding a list entry to a list. It's the callers
responsibility to make sure that doesn't happen. The item happens to be
per-task here, but that doesn't really change the mechanics of it.

> There will be a race condition just before the work function
> is called and/or just after it returns that the caller
> can't detect.
> The check needs to be done atomically with the code that
> removes the work item from the list.

We're not polluting task_work with caller specific code like that, as
far as I'm concerned. Fix it in the caller. By the time ->func is run,
the item is off the original list and it can get re-added just fine. If
the event triggers after removing from the list but before ->func is
called, I don't see a problem with that. The caller just wants to ensure
that func is run, which it will be.

If the caller needs stronger guarantees than that, then it should
implement them separately on top of task_work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-13 20:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-09 12:27 [PATCH] task_work: return -EBUSY when adding same work yaozhenguo
2021-07-09 14:18 ` Jens Axboe
     [not found]   ` <CA+WzARnFgohHZ=BhL4OaCagB_c1uB6a=Bv7vM_zRUJeANHksEg@mail.gmail.com>
2021-07-12  2:44     ` Jens Axboe
2021-07-12  3:45       ` zhenguo yao
2021-07-12  9:08       ` zhenguo yao
2021-07-13 10:41   ` David Laight
2021-07-13 20:22     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).