* RE: [PATCH] workqueue: fix a workqueue kernel panic issue.
[not found] <1410941884-26034-1-git-send-email-zhangyf@marvell.com>
@ 2014-09-22 3:30 ` Yifan Zhang
2014-09-22 3:39 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Yifan Zhang @ 2014-09-22 3:30 UTC (permalink / raw)
To: Yifan Zhang, Tejun Heo, Jing Xiang, linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 1494 bytes --]
Hi Tejun,
What's do you think of this patch ? Any concern ?
BR,
Yifan
-----Original Message-----
From: Yifan Zhang [mailto:zhangyf@marvell.com]
Sent: 2014Äê9ÔÂ17ÈÕ 16:18
To: Tejun Heo; Jing Xiang; linux-kernel@vger.kernel.org
Cc: Yifan Zhang
Subject: [PATCH] workqueue: fix a workqueue kernel panic issue.
if created workqueue in multi-thread unsynchronized,
get_work_pwq() may return NULL, which cause kernel panic. Judge get_work_pwq() return value before use
pwq->wq->flags.
Signed-off-by: Yifan Zhang <zhangyf@marvell.com>
---
kernel/workqueue.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 5dbe22a..d3ac87f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1947,9 +1947,19 @@ __acquires(&pool->lock) {
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
- bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
+ bool cpu_intensive;
int work_color;
struct worker *collision;
+
+ if (pwq == NULL) {
+ pr_err("BUG: invalid struct work_struct.data %lu\n",
+ atomic_long_read(&work->data));
+ dump_stack();
+ return;
+ }
+
+ cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
+
#ifdef CONFIG_LOCKDEP
/*
* It is permissible to free the struct work_struct from
--
1.7.9.5
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] workqueue: fix a workqueue kernel panic issue.
2014-09-22 3:30 ` [PATCH] workqueue: fix a workqueue kernel panic issue Yifan Zhang
@ 2014-09-22 3:39 ` Tejun Heo
2014-09-22 10:40 ` Yifan Zhang
0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2014-09-22 3:39 UTC (permalink / raw)
To: Yifan Zhang; +Cc: Jing Xiang, linux-kernel
On Sun, Sep 21, 2014 at 08:30:32PM -0700, Yifan Zhang wrote:
> Hi Tejun,
>
> What's do you think of this patch ? Any concern ?
Hmmm? Haven't I already responded to this patch?
> -----Original Message-----
> From: Yifan Zhang [mailto:zhangyf@marvell.com]
> Sent: 2014年9月17日 16:18
> To: Tejun Heo; Jing Xiang; linux-kernel@vger.kernel.org
> Cc: Yifan Zhang
> Subject: [PATCH] workqueue: fix a workqueue kernel panic issue.
>
> if created workqueue in multi-thread unsynchronized,
Can you please elaborate?
> get_work_pwq() may return NULL, which cause kernel panic. Judge get_work_pwq() return value before use
> pwq->wq->flags.
>
> Signed-off-by: Yifan Zhang <zhangyf@marvell.com>
> ---
> kernel/workqueue.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 5dbe22a..d3ac87f 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1947,9 +1947,19 @@ __acquires(&pool->lock) {
> struct pool_workqueue *pwq = get_work_pwq(work);
> struct worker_pool *pool = worker->pool;
> - bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
> + bool cpu_intensive;
> int work_color;
> struct worker *collision;
> +
> + if (pwq == NULL) {
> + pr_err("BUG: invalid struct work_struct.data %lu\n",
> + atomic_long_read(&work->data));
> + dump_stack();
> + return;
I have difficult time seeing how the above piece of code would be
acceptable but maybe the situation you're trying to explain is weird
enough to justify it.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH] workqueue: fix a workqueue kernel panic issue.
2014-09-22 3:39 ` Tejun Heo
@ 2014-09-22 10:40 ` Yifan Zhang
2014-09-23 14:33 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Yifan Zhang @ 2014-09-22 10:40 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jing Xiang, linux-kernel, yifan.zhangm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5135 bytes --]
Hi Tejun,
This issue is found when we got a kernel panic:
[ 943.673177] c3 7843 (Thread-162) pwm-vibrator pwm-vibrator: Vibrator: Set duration: 10ms
[ 943.717242] c0 7843 (Thread-162) pwm-vibrator pwm-vibrator: Vibrator: Set duration: 30ms
[ 949.571415] c1 779 (Binder_3) B52ISP version: HW 3.36, SWM 1.4, revision 871
[ 949.579291] c1 779 (Binder_3) b52_sensor_set_fmt: mbus code not match
[ 949.749224] c0 779 (Binder_3) b52_sensor_set_fmt: mbus code not match
[ 949.803474] c0 7844 (kworker/0:2) Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 949.813433] c0 7844 (kworker/0:2) pgd = ffffffc00007d000
[ 949.818707] c0 7844 (kworker/0:2) [00000008] *pgd=0000000001214003, *pmd=0000000001215003, *pte=00e00000d4200407
[ 949.829002] c0 7844 (kworker/0:2) Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 949.836425] Modules linked in: tzdd galcore ssipcmisck(P) audiostub cidatattydev gs_modem ccinetdev cci_datastub citty iml_module seh cploaddev msocketk
[ 949.850077] c0 7844 (kworker/0:2) CPU: 0 PID: 7844 Comm: kworker/0:2 Tainted: P 3.10.33 #1
[ 949.859318] c0 7844 (kworker/0:2) task: ffffffc015517500 ti: ffffffc027c48000 task.ti: ffffffc027c48000
[ 949.868641] c0 7844 (kworker/0:2) PC is at process_one_work+0x38/0x410
[ 949.875110] c0 7844 (kworker/0:2) LR is at worker_thread+0x13c/0x3c0
[ 949.881407] c0 7844 (kworker/0:2) pc : [<ffffffc0000bbbd4>] lr : [<ffffffc0000bcab8>] pstate: 600001c5
[ 949.890634] c0 7844 (kworker/0:2) sp : ffffffc027c4bd60
[ 949.895810] R29: ffffffc027c4bd60 R28: 0000000000000009
[ 949.901091] R27: ffffffc0008da108 R26: 0000000000000001
[ 949.906372] R25: ffffffc000aa2902 R24: 0000000000000000
[ 949.911653] R23: ffffffc027c48000 R22: ffffffc02b3b11b0
[ 949.916934] R21: ffffffc034da09c0 R20: ffffffc02b3b1180
[ 949.922215] R19: ffffffc000c05380 R18: 0000000000000000
[ 949.927497] R17: 0000000000000000 R16: ffffffc0001909a4
[ 949.932777] R15: 0000000000000000 R14: 000000000000ca81
[ 949.938059] R13: 00000000f6f99a80 R12: 00000000000003d1
[ 949.943341] R11: 0000000000000004 R10: ffffffc000bef3a8
[ 949.948622] R9 : ffffffc027c4bbe0 R8 : ffffffc015517920
[ 949.953903] R7 : ffffffc000724d98 R6 : ffffffc000724d98
[ 949.959183] R5 : 0000000000000000 R4 : 0000000000000000
[ 949.964465] R3 : ffffffc034da0d40 R2 : ffffffc034da09c0
[ 949.969746] R1 : ffffffc000c05380 R0 : 0000000000000000
[ 949.975019] c0 7844 (kworker/0:2)
[ 949.978390] c0 7844 (kworker/0:2)
[ 949.978390] PC: ffffffc0000bbb54:
You can tell it is a bug when pwq = get_work_pwq() return NULL, and cpu_intensive = pwq->wq->flags use it w/o check.
Normally get_work_pwq doesn't return NULL, but we had a bug in code which makes INIT_WORK(&work, do_work) is called in multi-thread. In some cases, work_struct is re-init just before get_work_pwq is called, it makes work_struct->data is invalid and thus causes the problem. It is indeed a bug of ourselves, and after fix it there is no such issue. But I wonder we still a NULL check before dereference pwq here anyway, since get_work_pwq may return NULL in some cases.
What do you think :-) ?
BR,
YIfan
-----Original Message-----
From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of Tejun Heo
Sent: 2014å¹´9æ22æ¥ 11:40
To: Yifan Zhang
Cc: Jing Xiang; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] workqueue: fix a workqueue kernel panic issue.
On Sun, Sep 21, 2014 at 08:30:32PM -0700, Yifan Zhang wrote:
> Hi Tejun,
>
> What's do you think of this patch ? Any concern ?
Hmmm? Haven't I already responded to this patch?
> -----Original Message-----
> From: Yifan Zhang [mailto:zhangyf@marvell.com]
> Sent: 2014å¹´9æ17æ¥ 16:18
> To: Tejun Heo; Jing Xiang; linux-kernel@vger.kernel.org
> Cc: Yifan Zhang
> Subject: [PATCH] workqueue: fix a workqueue kernel panic issue.
>
> if created workqueue in multi-thread unsynchronized,
Can you please elaborate?
> get_work_pwq() may return NULL, which cause kernel panic. Judge
> get_work_pwq() return value before use
> pwq->wq->flags.
>
> Signed-off-by: Yifan Zhang <zhangyf@marvell.com>
> ---
> kernel/workqueue.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c index
> 5dbe22a..d3ac87f 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1947,9 +1947,19 @@ __acquires(&pool->lock) {
> struct pool_workqueue *pwq = get_work_pwq(work);
> struct worker_pool *pool = worker->pool;
> - bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
> + bool cpu_intensive;
> int work_color;
> struct worker *collision;
> +
> + if (pwq == NULL) {
> + pr_err("BUG: invalid struct work_struct.data %lu\n",
> + atomic_long_read(&work->data));
> + dump_stack();
> + return;
I have difficult time seeing how the above piece of code would be acceptable but maybe the situation you're trying to explain is weird enough to justify it.
Thanks.
--
tejun
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] workqueue: fix a workqueue kernel panic issue.
2014-09-22 10:40 ` Yifan Zhang
@ 2014-09-23 14:33 ` Tejun Heo
0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2014-09-23 14:33 UTC (permalink / raw)
To: Yifan Zhang; +Cc: Jing Xiang, linux-kernel, yifan.zhangm
On Mon, Sep 22, 2014 at 03:40:55AM -0700, Yifan Zhang wrote:
> You can tell it is a bug when pwq = get_work_pwq() return NULL, and
> cpu_intensive = pwq->wq->flags use it w/o check.
A bug somewhere else.
> Normally get_work_pwq doesn't return NULL, but we had a bug in code
> which makes INIT_WORK(&work, do_work) is called in multi-thread. In
> some cases, work_struct is re-init just before get_work_pwq is
> called, it makes work_struct->data is invalid and thus causes the
> problem. It is indeed a bug of ourselves, and after fix it there is
> no such issue. But I wonder we still a NULL check before dereference
> pwq here anyway, since get_work_pwq may return NULL in some cases.
Do you realize how timing dependent that particular pattern of
breakage is? If you're doing INIT_WORK() in racy way, there are many
places which can break in workqueue. It's not that different from
random memory corruption. It doesn't make any sense at all to add a
special case code for that in one particular place where this specific
incidence happens to trigger. In general, don't do things like this
anywhere in the kernel.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-23 14:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1410941884-26034-1-git-send-email-zhangyf@marvell.com>
2014-09-22 3:30 ` [PATCH] workqueue: fix a workqueue kernel panic issue Yifan Zhang
2014-09-22 3:39 ` Tejun Heo
2014-09-22 10:40 ` Yifan Zhang
2014-09-23 14:33 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).