[PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping,  since it can "WARN_ON(worker->task)".
@ 2013-06-19  4:03 Chen Gang
  2013-06-19  8:41 ` Tejun Heo
  2013-06-19  8:43 ` Thomas Gleixner
  0 siblings, 2 replies; 10+ messages in thread
From: Chen Gang @ 2013-06-19  4:03 UTC (permalink / raw)
  To: Tejun Heo, Thomas Gleixner, Oleg Nesterov, laijs
  Cc: Andrew Morton, linux-kernel

Since "WARN_ON(worker->task)", we can not assume that 'worker->task'
will be NULL before set 'current' to it.

So need let 'worker' lock protected too, just like it already lock
protected all time in main looping.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
---
 kernel/kthread.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 760e86d..8d572b8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -511,8 +511,10 @@ int kthread_worker_fn(void *worker_ptr)
 	struct kthread_worker *worker = worker_ptr;
 	struct kthread_work *work;

+	spin_lock_irq(&worker->lock);
 	WARN_ON(worker->task);
 	worker->task = current;
+	spin_unlock_irq(&worker->lock);
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */

-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping,  since it can "WARN_ON(worker->task)".
  2013-06-19  4:03 [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)" Chen Gang
@ 2013-06-19  8:41 ` Tejun Heo
  2013-06-19 10:17   ` Chen Gang
  2013-06-19  8:43 ` Thomas Gleixner
  1 sibling, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2013-06-19  8:41 UTC (permalink / raw)
  To: Chen Gang
  Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On Wed, Jun 19, 2013 at 12:03:38PM +0800, Chen Gang wrote:
> 
> Since "WARN_ON(worker->task)", we can not assume that 'worker->task'
> will be NULL before set 'current' to it.
> 
> So need let 'worker' lock protected too, just like it already lock
> protected all time in main looping.

That synchronization is the kthread_worker user's responsibility.  The
locking around worker->task = NULL is to prevent the worker task being
destroyed while insert_kthread_work() is trying to wake it up.  It has
nothing to do with the user trying to attach multiple tasks to the
same kthread_worker.  Plus, putting locking around WARN_ON() is
pointless.  It doesn't really fix anything.  It just makes WARN_ON()
trigger *slightly* more reliably.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping,  since it can "WARN_ON(worker->task)".
  2013-06-19  8:41 ` Tejun Heo
@ 2013-06-19 10:17   ` Chen Gang
  2013-06-19 15:52     ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Chen Gang @ 2013-06-19 10:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On 06/19/2013 04:41 PM, Tejun Heo wrote:
> On Wed, Jun 19, 2013 at 12:03:38PM +0800, Chen Gang wrote:
>> > 
>> > Since "WARN_ON(worker->task)", we can not assume that 'worker->task'
>> > will be NULL before set 'current' to it.
>> > 
>> > So need let 'worker' lock protected too, just like it already lock
>> > protected all time in main looping.
> That synchronization is the kthread_worker user's responsibility.  The
> locking around worker->task = NULL is to prevent the worker task being
> destroyed while insert_kthread_work() is trying to wake it up.  It has
> nothing to do with the user trying to attach multiple tasks to the
> same kthread_worker.  Plus, putting locking around WARN_ON() is
> pointless.  It doesn't really fix anything.  It just makes WARN_ON()
> trigger *slightly* more reliably.

Hmm... can 'worker->task' has chance to be not NULL before set 'current'
to it ?

why do we use WARN_ON(worker->task) ?

I guess it still has chance to let "worker->task != NULL", or it should
be BUG_ON(worker->task) instead of.

Thanks.
-- 
Chen Gang

Asianux Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping,  since it can "WARN_ON(worker->task)".
  2013-06-19 10:17   ` Chen Gang
@ 2013-06-19 15:52     ` Tejun Heo
  2013-06-20  1:53       ` Chen Gang
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2013-06-19 15:52 UTC (permalink / raw)
  To: Chen Gang
  Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote:
> Hmm... can 'worker->task' has chance to be not NULL before set 'current'
> to it ?

Yes, if the caller screws up and try to attach more than one workers
to the kthread_worker, which has some possibility of happening as
kthread_worker allows both attaching and detaching a worker.

> why do we use WARN_ON(worker->task) ?

To detect bugs on the caller side.

> I guess it still has chance to let "worker->task != NULL", or it should
> be BUG_ON(worker->task) instead of.

What difference does that make?

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping,  since it can "WARN_ON(worker->task)".
  2013-06-19 15:52     ` Tejun Heo
@ 2013-06-20  1:53       ` Chen Gang
  2013-06-20  7:02         ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Chen Gang @ 2013-06-20  1:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On 06/19/2013 11:52 PM, Tejun Heo wrote:
> On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote:
>> > Hmm... can 'worker->task' has chance to be not NULL before set 'current'
>> > to it ?
> Yes, if the caller screws up and try to attach more than one workers
> to the kthread_worker, which has some possibility of happening as
> kthread_worker allows both attaching and detaching a worker.
> 

If we detect the bugs, and still want to use WARN_ON() to report warning
and continue running, we need be sure of keeping the related things no
touch (at least not lead to worse).

If we can not be sure of keeping the related things no touch:
  if it is a kernel bug, better use BUG_ON() instead of,
  if it is a user mode bug, better to return failure with error code and
print related information.

>> > why do we use WARN_ON(worker->task) ?
> To detect bugs on the caller side.
> 

OK, thanks.

>> > I guess it still has chance to let "worker->task != NULL", or it should
>> > be BUG_ON(worker->task) instead of.
> What difference does that make?

BUG_ON() will stop current working flow and report kernel bug in details.

Thanks.
-- 
Chen Gang

Asianux Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)".
  2013-06-20  1:53       ` Chen Gang
@ 2013-06-20  7:02         ` Thomas Gleixner
  2013-06-20  7:37           ` Chen Gang
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2013-06-20  7:02 UTC (permalink / raw)
  To: Chen Gang; +Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On Thu, 20 Jun 2013, Chen Gang wrote:

> On 06/19/2013 11:52 PM, Tejun Heo wrote:
> > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote:
> >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current'
> >> > to it ?
> > Yes, if the caller screws up and try to attach more than one workers
> > to the kthread_worker, which has some possibility of happening as
> > kthread_worker allows both attaching and detaching a worker.
> > 
> 
> If we detect the bugs, and still want to use WARN_ON() to report warning
> and continue running, we need be sure of keeping the related things no
> touch (at least not lead to worse).
> 
> If we can not be sure of keeping the related things no touch:
>   if it is a kernel bug, better use BUG_ON() instead of,
>   if it is a user mode bug, better to return failure with error code and
> print related information.

Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at
all. WARN_ON() prints the very same information, but allows to
continue.

> BUG_ON() will stop current working flow and report kernel bug in details.

There is no reason to crash the machine completely. The kernel can
continue and the WARN_ON reports the bug with the same details.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)".
  2013-06-20  7:02         ` Thomas Gleixner
@ 2013-06-20  7:37           ` Chen Gang
  2013-06-20  8:28             ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Chen Gang @ 2013-06-20  7:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On 06/20/2013 03:02 PM, Thomas Gleixner wrote:
> On Thu, 20 Jun 2013, Chen Gang wrote:
> 
>> > On 06/19/2013 11:52 PM, Tejun Heo wrote:
>>> > > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote:
>>>>> > >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current'
>>>>> > >> > to it ?
>>> > > Yes, if the caller screws up and try to attach more than one workers
>>> > > to the kthread_worker, which has some possibility of happening as
>>> > > kthread_worker allows both attaching and detaching a worker.
>>> > > 
>> > 
>> > If we detect the bugs, and still want to use WARN_ON() to report warning
>> > and continue running, we need be sure of keeping the related things no
>> > touch (at least not lead to worse).
>> > 
>> > If we can not be sure of keeping the related things no touch:
>> >   if it is a kernel bug, better use BUG_ON() instead of,
>> >   if it is a user mode bug, better to return failure with error code and
>> > print related information.
> Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at
> all. WARN_ON() prints the very same information, but allows to
> continue.
> 

In fact, BUG_ON() and WARN_ON() has various implementations in different
architectures, and also can be configured by user.

Even some of 'crazy users' (e.g. randconfig), can make BUG_ON() and
WARN_ON() 'empty' (include/asm-generic/bug.h).

In my experience (mainly for servers), when find a kernel bug, it will
stop and report bug, that will let coredump analysing (or KDB trap) much
easier.

>> > BUG_ON() will stop current working flow and report kernel bug in details.
> There is no reason to crash the machine completely. The kernel can
> continue and the WARN_ON reports the bug with the same details.

If so (we still prefer to use WARN_ON), we'd better to let it in lock
protected.

At least when we still have to continue, try not to lead things worse.

It will provide much help for coredump analysing (or KDB trap).

In fact, for coredump analysers, for every real world coredump, they
have to assume the system has already continued blindly, and then die.

Thanks.
-- 
Chen Gang

Asianux Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)".
  2013-06-20  7:37           ` Chen Gang
@ 2013-06-20  8:28             ` Thomas Gleixner
  2013-06-20  9:36               ` Chen Gang
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2013-06-20  8:28 UTC (permalink / raw)
  To: Chen Gang; +Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On Thu, 20 Jun 2013, Chen Gang wrote:

> On 06/20/2013 03:02 PM, Thomas Gleixner wrote:
> > On Thu, 20 Jun 2013, Chen Gang wrote:
> > 
> >> > On 06/19/2013 11:52 PM, Tejun Heo wrote:
> >>> > > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote:
> >>>>> > >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current'
> >>>>> > >> > to it ?
> >>> > > Yes, if the caller screws up and try to attach more than one workers
> >>> > > to the kthread_worker, which has some possibility of happening as
> >>> > > kthread_worker allows both attaching and detaching a worker.
> >>> > > 
> >> > 
> >> > If we detect the bugs, and still want to use WARN_ON() to report warning
> >> > and continue running, we need be sure of keeping the related things no
> >> > touch (at least not lead to worse).
> >> > 
> >> > If we can not be sure of keeping the related things no touch:
> >> >   if it is a kernel bug, better use BUG_ON() instead of,
> >> >   if it is a user mode bug, better to return failure with error code and
> >> > print related information.
> > Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at
> > all. WARN_ON() prints the very same information, but allows to
> > continue.
> > 
> 
> In fact, BUG_ON() and WARN_ON() has various implementations in different
> architectures, and also can be configured by user.

And how is that relevant? 

> Even some of 'crazy users' (e.g. randconfig), can make BUG_ON() and
> WARN_ON() 'empty' (include/asm-generic/bug.h).

That does not matter at all.

> In my experience (mainly for servers), when find a kernel bug, it will
> stop and report bug, that will let coredump analysing (or KDB trap) much
> easier.

And your core dump will help you in what way? The code which
misbehaved is not longer executing. The problem is detected after the
fact and therefor your coredump will just tell you that worker->task
is not NULL.

> >> > BUG_ON() will stop current working flow and report kernel bug in details.
> > There is no reason to crash the machine completely. The kernel can
> > continue and the WARN_ON reports the bug with the same details.

Linus said about BUG_ON():

  Adding BUG_ON()'s just makes things much much much worse. There is
  *never* a reason to add a BUG_ON().

  BUG_ON() makes it almost impossible to debug something, because you
  just killed the machine. So using BUG_ON() for "please notice this"
  is stupid as hell, because the most common end result is: "Oh, the
  machine just hung with no messages".

And he is right about that. 

> If so (we still prefer to use WARN_ON), we'd better to let it in lock
> protected.

No, because the lock is not protecting anything in that case. If some
other code misbehaves and sets worker->task, then the lock does not
prevent this and taking the lock is not making the WARN_ON any more
reliable. So why the heck should we take it?

> At least when we still have to continue, try not to lead things worse.

And what's going to be better if we take the lock? Nothing, because
the lock CANNOT protect the check.

> It will provide much help for coredump analysing (or KDB trap).
> 
> In fact, for coredump analysers, for every real world coredump, they
> have to assume the system has already continued blindly, and then die.

Core dump analysers cannot analyse dynamic race conditions and neither
can KDB. 

So what do you gain from crashing the kernel? Exactly NOTHING.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)".
  2013-06-20  8:28             ` Thomas Gleixner
@ 2013-06-20  9:36               ` Chen Gang
  0 siblings, 0 replies; 10+ messages in thread
From: Chen Gang @ 2013-06-20  9:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On 06/20/2013 04:28 PM, Thomas Gleixner wrote:
> On Thu, 20 Jun 2013, Chen Gang wrote:
> 
>> > On 06/20/2013 03:02 PM, Thomas Gleixner wrote:
>>> > > On Thu, 20 Jun 2013, Chen Gang wrote:
>>> > > 
>>>>> > >> > On 06/19/2013 11:52 PM, Tejun Heo wrote:
>>>>>>> > >>> > > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote:
>>>>>>>>>>> > >>>>> > >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current'
>>>>>>>>>>> > >>>>> > >> > to it ?
>>>>>>> > >>> > > Yes, if the caller screws up and try to attach more than one workers
>>>>>>> > >>> > > to the kthread_worker, which has some possibility of happening as
>>>>>>> > >>> > > kthread_worker allows both attaching and detaching a worker.
>>>>>>> > >>> > > 
>>>>> > >> > 
>>>>> > >> > If we detect the bugs, and still want to use WARN_ON() to report warning
>>>>> > >> > and continue running, we need be sure of keeping the related things no
>>>>> > >> > touch (at least not lead to worse).
>>>>> > >> > 
>>>>> > >> > If we can not be sure of keeping the related things no touch:
>>>>> > >> >   if it is a kernel bug, better use BUG_ON() instead of,
>>>>> > >> >   if it is a user mode bug, better to return failure with error code and
>>>>> > >> > print related information.
>>> > > Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at
>>> > > all. WARN_ON() prints the very same information, but allows to
>>> > > continue.
>>> > > 
>> > 
>> > In fact, BUG_ON() and WARN_ON() has various implementations in different
>> > architectures, and also can be configured by user.
> And how is that relevant? 
>  

I only want to say 'Wrong. BUG_ON() is only for cases where ...." is not
quite precious.

>> > Even some of 'crazy users' (e.g. randconfig), can make BUG_ON() and
>> > WARN_ON() 'empty' (include/asm-generic/bug.h).
> That does not matter at all.
>  
>> > In my experience (mainly for servers), when find a kernel bug, it will
>> > stop and report bug, that will let coredump analysing (or KDB trap) much
>> > easier.
> And your core dump will help you in what way? The code which
> misbehaved is not longer executing. The problem is detected after the
> fact and therefor your coredump will just tell you that worker->task
> is not NULL.
>  

In this case, if generate a coredump, it will provide much help to
analyze the issues.

Normally, this coredump is not belongs to complex coredump.

I met a complex KDB trap (at least for me, it is complex, maybe easy for
others):
  When a driver is quiting, it releases the dma buffers firstly, then
immediately tell the hardware to stop dma usage.
  After the hardware writes 'a little waste data' to the released
buffer, the driver quite successfully.
  Then the driver restart again, and work normally.
  After 'a long period', the system finds random issues (sometimes for
ext3, sometimes for mm, block, or anywhere...).

I spend almost 1 month to find the root cause (from 2008-12 to 2009-01).

So if it generates a coredump (or KDB trap) when find bug, in most
cases, it is not a quite complex coredump.

>>>>> > >> > BUG_ON() will stop current working flow and report kernel bug in details.
>>> > > There is no reason to crash the machine completely. The kernel can
>>> > > continue and the WARN_ON reports the bug with the same details.
> Linus said about BUG_ON():
> 
>   Adding BUG_ON()'s just makes things much much much worse. There is
>   *never* a reason to add a BUG_ON().
>   
>   BUG_ON() makes it almost impossible to debug something, because you
>   just killed the machine. So using BUG_ON() for "please notice this"
>   is stupid as hell, because the most common end result is: "Oh, the
>   machine just hung with no messages".
> 
> And he is right about that. 
>  

Why we provide BUG_ON(), and many sub-systems also use it, at last ?

and Why we integrated KDB and KGDB at last ?

>> > If so (we still prefer to use WARN_ON), we'd better to let it in lock
>> > protected.
> No, because the lock is not protecting anything in that case. If some
> other code misbehaves and sets worker->task, then the lock does not
> prevent this and taking the lock is not making the WARN_ON any more
> reliable. So why the heck should we take it?
>  

For writing code, if 'worker->task' is not NULL, we can assume, it need
lock protected.

>> > At least when we still have to continue, try not to lead things worse.
> And what's going to be better if we take the lock? Nothing, because
> the lock CANNOT protect the check.
>  
>> > It will provide much help for coredump analysing (or KDB trap).
>> > 
>> > In fact, for coredump analysers, for every real world coredump, they
>> > have to assume the system has already continued blindly, and then die.
> Core dump analysers cannot analyse dynamic race conditions and neither
> can KDB. 
> 

Yes, they can.

At least for me, I have at least 20 successful experiences (2009 - 2010)
for coredump, dead lock, memory leak, and busy looping under user mode
system services (the related code about 400K).

For kernel, I also have some successful experience for coredump.

> So what do you gain from crashing the kernel? Exactly NOTHING.

So, the coredump (or KDB trap) is really useful for some guys (at least
for me).

Thanks.
-- 
Chen Gang

Asianux Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)".
  2013-06-19  4:03 [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)" Chen Gang
  2013-06-19  8:41 ` Tejun Heo
@ 2013-06-19  8:43 ` Thomas Gleixner
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2013-06-19  8:43 UTC (permalink / raw)
  To: Chen Gang; +Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel

On Wed, 19 Jun 2013, Chen Gang wrote:

> 
> Since "WARN_ON(worker->task)", we can not assume that 'worker->task'
> will be NULL before set 'current' to it.

It better is NULL and all that WARN_ON does is to verify that.

> So need let 'worker' lock protected too, just like it already lock
> protected all time in main looping.

No. That's pointless. This happens when the new worker starts up and
there is nothing which can modify worker->task at this point.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-06-20  9:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-19  4:03 [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)" Chen Gang
2013-06-19  8:41 ` Tejun Heo
2013-06-19 10:17   ` Chen Gang
2013-06-19 15:52     ` Tejun Heo
2013-06-20  1:53       ` Chen Gang
2013-06-20  7:02         ` Thomas Gleixner
2013-06-20  7:37           ` Chen Gang
2013-06-20  8:28             ` Thomas Gleixner
2013-06-20  9:36               ` Chen Gang
2013-06-19  8:43 ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.