* [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". @ 2013-06-19 4:03 Chen Gang 2013-06-19 8:41 ` Tejun Heo 2013-06-19 8:43 ` Thomas Gleixner 0 siblings, 2 replies; 10+ messages in thread From: Chen Gang @ 2013-06-19 4:03 UTC (permalink / raw) To: Tejun Heo, Thomas Gleixner, Oleg Nesterov, laijs Cc: Andrew Morton, linux-kernel Since "WARN_ON(worker->task)", we can not assume that 'worker->task' will be NULL before set 'current' to it. So need let 'worker' lock protected too, just like it already lock protected all time in main looping. Signed-off-by: Chen Gang <gang.chen@asianux.com> --- kernel/kthread.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index 760e86d..8d572b8 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -511,8 +511,10 @@ int kthread_worker_fn(void *worker_ptr) struct kthread_worker *worker = worker_ptr; struct kthread_work *work; + spin_lock_irq(&worker->lock); WARN_ON(worker->task); worker->task = current; + spin_unlock_irq(&worker->lock); repeat: set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */ -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-19 4:03 [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)" Chen Gang @ 2013-06-19 8:41 ` Tejun Heo 2013-06-19 10:17 ` Chen Gang 2013-06-19 8:43 ` Thomas Gleixner 1 sibling, 1 reply; 10+ messages in thread From: Tejun Heo @ 2013-06-19 8:41 UTC (permalink / raw) To: Chen Gang Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On Wed, Jun 19, 2013 at 12:03:38PM +0800, Chen Gang wrote: > > Since "WARN_ON(worker->task)", we can not assume that 'worker->task' > will be NULL before set 'current' to it. > > So need let 'worker' lock protected too, just like it already lock > protected all time in main looping. That synchronization is the kthread_worker user's responsibility. The locking around worker->task = NULL is to prevent the worker task being destroyed while insert_kthread_work() is trying to wake it up. It has nothing to do with the user trying to attach multiple tasks to the same kthread_worker. Plus, putting locking around WARN_ON() is pointless. It doesn't really fix anything. It just makes WARN_ON() trigger *slightly* more reliably. Thanks. -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-19 8:41 ` Tejun Heo @ 2013-06-19 10:17 ` Chen Gang 2013-06-19 15:52 ` Tejun Heo 0 siblings, 1 reply; 10+ messages in thread From: Chen Gang @ 2013-06-19 10:17 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On 06/19/2013 04:41 PM, Tejun Heo wrote: > On Wed, Jun 19, 2013 at 12:03:38PM +0800, Chen Gang wrote: >> > >> > Since "WARN_ON(worker->task)", we can not assume that 'worker->task' >> > will be NULL before set 'current' to it. >> > >> > So need let 'worker' lock protected too, just like it already lock >> > protected all time in main looping. > That synchronization is the kthread_worker user's responsibility. The > locking around worker->task = NULL is to prevent the worker task being > destroyed while insert_kthread_work() is trying to wake it up. It has > nothing to do with the user trying to attach multiple tasks to the > same kthread_worker. Plus, putting locking around WARN_ON() is > pointless. It doesn't really fix anything. It just makes WARN_ON() > trigger *slightly* more reliably. Hmm... can 'worker->task' has chance to be not NULL before set 'current' to it ? why do we use WARN_ON(worker->task) ? I guess it still has chance to let "worker->task != NULL", or it should be BUG_ON(worker->task) instead of. Thanks. -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-19 10:17 ` Chen Gang @ 2013-06-19 15:52 ` Tejun Heo 2013-06-20 1:53 ` Chen Gang 0 siblings, 1 reply; 10+ messages in thread From: Tejun Heo @ 2013-06-19 15:52 UTC (permalink / raw) To: Chen Gang Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote: > Hmm... can 'worker->task' has chance to be not NULL before set 'current' > to it ? Yes, if the caller screws up and try to attach more than one workers to the kthread_worker, which has some possibility of happening as kthread_worker allows both attaching and detaching a worker. > why do we use WARN_ON(worker->task) ? To detect bugs on the caller side. > I guess it still has chance to let "worker->task != NULL", or it should > be BUG_ON(worker->task) instead of. What difference does that make? -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-19 15:52 ` Tejun Heo @ 2013-06-20 1:53 ` Chen Gang 2013-06-20 7:02 ` Thomas Gleixner 0 siblings, 1 reply; 10+ messages in thread From: Chen Gang @ 2013-06-20 1:53 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Gleixner, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On 06/19/2013 11:52 PM, Tejun Heo wrote: > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote: >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current' >> > to it ? > Yes, if the caller screws up and try to attach more than one workers > to the kthread_worker, which has some possibility of happening as > kthread_worker allows both attaching and detaching a worker. > If we detect the bugs, and still want to use WARN_ON() to report warning and continue running, we need be sure of keeping the related things no touch (at least not lead to worse). If we can not be sure of keeping the related things no touch: if it is a kernel bug, better use BUG_ON() instead of, if it is a user mode bug, better to return failure with error code and print related information. >> > why do we use WARN_ON(worker->task) ? > To detect bugs on the caller side. > OK, thanks. >> > I guess it still has chance to let "worker->task != NULL", or it should >> > be BUG_ON(worker->task) instead of. > What difference does that make? BUG_ON() will stop current working flow and report kernel bug in details. Thanks. -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-20 1:53 ` Chen Gang @ 2013-06-20 7:02 ` Thomas Gleixner 2013-06-20 7:37 ` Chen Gang 0 siblings, 1 reply; 10+ messages in thread From: Thomas Gleixner @ 2013-06-20 7:02 UTC (permalink / raw) To: Chen Gang; +Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On Thu, 20 Jun 2013, Chen Gang wrote: > On 06/19/2013 11:52 PM, Tejun Heo wrote: > > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote: > >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current' > >> > to it ? > > Yes, if the caller screws up and try to attach more than one workers > > to the kthread_worker, which has some possibility of happening as > > kthread_worker allows both attaching and detaching a worker. > > > > If we detect the bugs, and still want to use WARN_ON() to report warning > and continue running, we need be sure of keeping the related things no > touch (at least not lead to worse). > > If we can not be sure of keeping the related things no touch: > if it is a kernel bug, better use BUG_ON() instead of, > if it is a user mode bug, better to return failure with error code and > print related information. Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at all. WARN_ON() prints the very same information, but allows to continue. > BUG_ON() will stop current working flow and report kernel bug in details. There is no reason to crash the machine completely. The kernel can continue and the WARN_ON reports the bug with the same details. Thanks, tglx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-20 7:02 ` Thomas Gleixner @ 2013-06-20 7:37 ` Chen Gang 2013-06-20 8:28 ` Thomas Gleixner 0 siblings, 1 reply; 10+ messages in thread From: Chen Gang @ 2013-06-20 7:37 UTC (permalink / raw) To: Thomas Gleixner Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On 06/20/2013 03:02 PM, Thomas Gleixner wrote: > On Thu, 20 Jun 2013, Chen Gang wrote: > >> > On 06/19/2013 11:52 PM, Tejun Heo wrote: >>> > > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote: >>>>> > >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current' >>>>> > >> > to it ? >>> > > Yes, if the caller screws up and try to attach more than one workers >>> > > to the kthread_worker, which has some possibility of happening as >>> > > kthread_worker allows both attaching and detaching a worker. >>> > > >> > >> > If we detect the bugs, and still want to use WARN_ON() to report warning >> > and continue running, we need be sure of keeping the related things no >> > touch (at least not lead to worse). >> > >> > If we can not be sure of keeping the related things no touch: >> > if it is a kernel bug, better use BUG_ON() instead of, >> > if it is a user mode bug, better to return failure with error code and >> > print related information. > Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at > all. WARN_ON() prints the very same information, but allows to > continue. > In fact, BUG_ON() and WARN_ON() has various implementations in different architectures, and also can be configured by user. Even some of 'crazy users' (e.g. randconfig), can make BUG_ON() and WARN_ON() 'empty' (include/asm-generic/bug.h). In my experience (mainly for servers), when find a kernel bug, it will stop and report bug, that will let coredump analysing (or KDB trap) much easier. >> > BUG_ON() will stop current working flow and report kernel bug in details. > There is no reason to crash the machine completely. The kernel can > continue and the WARN_ON reports the bug with the same details. If so (we still prefer to use WARN_ON), we'd better to let it in lock protected. At least when we still have to continue, try not to lead things worse. It will provide much help for coredump analysing (or KDB trap). In fact, for coredump analysers, for every real world coredump, they have to assume the system has already continued blindly, and then die. Thanks. -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-20 7:37 ` Chen Gang @ 2013-06-20 8:28 ` Thomas Gleixner 2013-06-20 9:36 ` Chen Gang 0 siblings, 1 reply; 10+ messages in thread From: Thomas Gleixner @ 2013-06-20 8:28 UTC (permalink / raw) To: Chen Gang; +Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On Thu, 20 Jun 2013, Chen Gang wrote: > On 06/20/2013 03:02 PM, Thomas Gleixner wrote: > > On Thu, 20 Jun 2013, Chen Gang wrote: > > > >> > On 06/19/2013 11:52 PM, Tejun Heo wrote: > >>> > > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote: > >>>>> > >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current' > >>>>> > >> > to it ? > >>> > > Yes, if the caller screws up and try to attach more than one workers > >>> > > to the kthread_worker, which has some possibility of happening as > >>> > > kthread_worker allows both attaching and detaching a worker. > >>> > > > >> > > >> > If we detect the bugs, and still want to use WARN_ON() to report warning > >> > and continue running, we need be sure of keeping the related things no > >> > touch (at least not lead to worse). > >> > > >> > If we can not be sure of keeping the related things no touch: > >> > if it is a kernel bug, better use BUG_ON() instead of, > >> > if it is a user mode bug, better to return failure with error code and > >> > print related information. > > Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at > > all. WARN_ON() prints the very same information, but allows to > > continue. > > > > In fact, BUG_ON() and WARN_ON() has various implementations in different > architectures, and also can be configured by user. And how is that relevant? > Even some of 'crazy users' (e.g. randconfig), can make BUG_ON() and > WARN_ON() 'empty' (include/asm-generic/bug.h). That does not matter at all. > In my experience (mainly for servers), when find a kernel bug, it will > stop and report bug, that will let coredump analysing (or KDB trap) much > easier. And your core dump will help you in what way? The code which misbehaved is not longer executing. The problem is detected after the fact and therefor your coredump will just tell you that worker->task is not NULL. > >> > BUG_ON() will stop current working flow and report kernel bug in details. > > There is no reason to crash the machine completely. The kernel can > > continue and the WARN_ON reports the bug with the same details. Linus said about BUG_ON(): Adding BUG_ON()'s just makes things much much much worse. There is *never* a reason to add a BUG_ON(). BUG_ON() makes it almost impossible to debug something, because you just killed the machine. So using BUG_ON() for "please notice this" is stupid as hell, because the most common end result is: "Oh, the machine just hung with no messages". And he is right about that. > If so (we still prefer to use WARN_ON), we'd better to let it in lock > protected. No, because the lock is not protecting anything in that case. If some other code misbehaves and sets worker->task, then the lock does not prevent this and taking the lock is not making the WARN_ON any more reliable. So why the heck should we take it? > At least when we still have to continue, try not to lead things worse. And what's going to be better if we take the lock? Nothing, because the lock CANNOT protect the check. > It will provide much help for coredump analysing (or KDB trap). > > In fact, for coredump analysers, for every real world coredump, they > have to assume the system has already continued blindly, and then die. Core dump analysers cannot analyse dynamic race conditions and neither can KDB. So what do you gain from crashing the kernel? Exactly NOTHING. Thanks, tglx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-20 8:28 ` Thomas Gleixner @ 2013-06-20 9:36 ` Chen Gang 0 siblings, 0 replies; 10+ messages in thread From: Chen Gang @ 2013-06-20 9:36 UTC (permalink / raw) To: Thomas Gleixner Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On 06/20/2013 04:28 PM, Thomas Gleixner wrote: > On Thu, 20 Jun 2013, Chen Gang wrote: > >> > On 06/20/2013 03:02 PM, Thomas Gleixner wrote: >>> > > On Thu, 20 Jun 2013, Chen Gang wrote: >>> > > >>>>> > >> > On 06/19/2013 11:52 PM, Tejun Heo wrote: >>>>>>> > >>> > > On Wed, Jun 19, 2013 at 06:17:36PM +0800, Chen Gang wrote: >>>>>>>>>>> > >>>>> > >> > Hmm... can 'worker->task' has chance to be not NULL before set 'current' >>>>>>>>>>> > >>>>> > >> > to it ? >>>>>>> > >>> > > Yes, if the caller screws up and try to attach more than one workers >>>>>>> > >>> > > to the kthread_worker, which has some possibility of happening as >>>>>>> > >>> > > kthread_worker allows both attaching and detaching a worker. >>>>>>> > >>> > > >>>>> > >> > >>>>> > >> > If we detect the bugs, and still want to use WARN_ON() to report warning >>>>> > >> > and continue running, we need be sure of keeping the related things no >>>>> > >> > touch (at least not lead to worse). >>>>> > >> > >>>>> > >> > If we can not be sure of keeping the related things no touch: >>>>> > >> > if it is a kernel bug, better use BUG_ON() instead of, >>>>> > >> > if it is a user mode bug, better to return failure with error code and >>>>> > >> > print related information. >>> > > Wrong. BUG_ON() is only for cases where the kernel CANNOT continue at >>> > > all. WARN_ON() prints the very same information, but allows to >>> > > continue. >>> > > >> > >> > In fact, BUG_ON() and WARN_ON() has various implementations in different >> > architectures, and also can be configured by user. > And how is that relevant? > I only want to say 'Wrong. BUG_ON() is only for cases where ...." is not quite precious. >> > Even some of 'crazy users' (e.g. randconfig), can make BUG_ON() and >> > WARN_ON() 'empty' (include/asm-generic/bug.h). > That does not matter at all. > >> > In my experience (mainly for servers), when find a kernel bug, it will >> > stop and report bug, that will let coredump analysing (or KDB trap) much >> > easier. > And your core dump will help you in what way? The code which > misbehaved is not longer executing. The problem is detected after the > fact and therefor your coredump will just tell you that worker->task > is not NULL. > In this case, if generate a coredump, it will provide much help to analyze the issues. Normally, this coredump is not belongs to complex coredump. I met a complex KDB trap (at least for me, it is complex, maybe easy for others): When a driver is quiting, it releases the dma buffers firstly, then immediately tell the hardware to stop dma usage. After the hardware writes 'a little waste data' to the released buffer, the driver quite successfully. Then the driver restart again, and work normally. After 'a long period', the system finds random issues (sometimes for ext3, sometimes for mm, block, or anywhere...). I spend almost 1 month to find the root cause (from 2008-12 to 2009-01). So if it generates a coredump (or KDB trap) when find bug, in most cases, it is not a quite complex coredump. >>>>> > >> > BUG_ON() will stop current working flow and report kernel bug in details. >>> > > There is no reason to crash the machine completely. The kernel can >>> > > continue and the WARN_ON reports the bug with the same details. > Linus said about BUG_ON(): > > Adding BUG_ON()'s just makes things much much much worse. There is > *never* a reason to add a BUG_ON(). > > BUG_ON() makes it almost impossible to debug something, because you > just killed the machine. So using BUG_ON() for "please notice this" > is stupid as hell, because the most common end result is: "Oh, the > machine just hung with no messages". > > And he is right about that. > Why we provide BUG_ON(), and many sub-systems also use it, at last ? and Why we integrated KDB and KGDB at last ? >> > If so (we still prefer to use WARN_ON), we'd better to let it in lock >> > protected. > No, because the lock is not protecting anything in that case. If some > other code misbehaves and sets worker->task, then the lock does not > prevent this and taking the lock is not making the WARN_ON any more > reliable. So why the heck should we take it? > For writing code, if 'worker->task' is not NULL, we can assume, it need lock protected. >> > At least when we still have to continue, try not to lead things worse. > And what's going to be better if we take the lock? Nothing, because > the lock CANNOT protect the check. > >> > It will provide much help for coredump analysing (or KDB trap). >> > >> > In fact, for coredump analysers, for every real world coredump, they >> > have to assume the system has already continued blindly, and then die. > Core dump analysers cannot analyse dynamic race conditions and neither > can KDB. > Yes, they can. At least for me, I have at least 20 successful experiences (2009 - 2010) for coredump, dead lock, memory leak, and busy looping under user mode system services (the related code about 400K). For kernel, I also have some successful experience for coredump. > So what do you gain from crashing the kernel? Exactly NOTHING. So, the coredump (or KDB trap) is really useful for some guys (at least for me). Thanks. -- Chen Gang Asianux Corporation ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)". 2013-06-19 4:03 [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)" Chen Gang 2013-06-19 8:41 ` Tejun Heo @ 2013-06-19 8:43 ` Thomas Gleixner 1 sibling, 0 replies; 10+ messages in thread From: Thomas Gleixner @ 2013-06-19 8:43 UTC (permalink / raw) To: Chen Gang; +Cc: Tejun Heo, Oleg Nesterov, laijs, Andrew Morton, linux-kernel On Wed, 19 Jun 2013, Chen Gang wrote: > > Since "WARN_ON(worker->task)", we can not assume that 'worker->task' > will be NULL before set 'current' to it. It better is NULL and all that WARN_ON does is to verify that. > So need let 'worker' lock protected too, just like it already lock > protected all time in main looping. No. That's pointless. This happens when the new worker starts up and there is nothing which can modify worker->task at this point. Thanks, tglx ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-06-20 9:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-06-19 4:03 [PATCH] kernel/kthread.c: need spin_lock_irq() for 'worker' before main looping, since it can "WARN_ON(worker->task)" Chen Gang 2013-06-19 8:41 ` Tejun Heo 2013-06-19 10:17 ` Chen Gang 2013-06-19 15:52 ` Tejun Heo 2013-06-20 1:53 ` Chen Gang 2013-06-20 7:02 ` Thomas Gleixner 2013-06-20 7:37 ` Chen Gang 2013-06-20 8:28 ` Thomas Gleixner 2013-06-20 9:36 ` Chen Gang 2013-06-19 8:43 ` Thomas Gleixner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.