* Subject: Warning in workqueue.c
@ 2014-02-07 14:39 Jason J. Herne
2014-02-07 16:51 ` Tejun Heo
0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-07 14:39 UTC (permalink / raw)
To: tj, linux-kernel
I've been able to reproduce the following warning using several kernel
versions on the S390 platform, including the latest master: 3.14-rc1
(38dbfb59d1175ef458d006556061adeaa8751b72).
[28718.212810] ------------[ cut here ]------------
[28718.212819] WARNING: at kernel/workqueue.c:2156
[28718.212822] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi tape_3590 qeth_l2 tape tape_class vhost_net tun
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[28718.212857] CPU: 2 PID: 20 Comm: kworker/3:0 Not tainted 3.14.0-rc1 #1
[28718.212862] task: 00000000f7b23260 ti: 00000000f7b2c000 task.ti:
00000000f7b2c000
[28718.212874] Krnl PSW : 0404c00180000000 000000000015b0be
(process_one_work+0x2e6/0x4c0)
[28718.212881] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0
PM:0 EA:3
Krnl GPRS: 0000000001727790 0000000000bc2a52 00000000f7f21900
0000000000b92500
[28718.212883] 0000000000b92500 0000000000105b24
0000000000000000 0000000000bc2a4e
[28718.212887] 0000000000000000 0000000084a2b500
0000000084a27000 0000000084a27018
[28718.212888] 00000000f7f21900 0000000000b92500
00000000f7b2fdd0 00000000f7b2fd70
[28718.212907] Krnl Code: 000000000015b0b2: 95001000 cli 0(%r1),0
000000000015b0b6: a774fece brc 7,15ae52
#000000000015b0ba: a7f40001 brc 15,15b0bc
>000000000015b0be: 92011000 mvi 0(%r1),1
000000000015b0c2: a7f4fec8 brc 15,15ae52
000000000015b0c6: e31003180004 lg %r1,792
000000000015b0cc: 58301024 l %r3,36(%r1)
000000000015b0d0: a73a0001 ahi %r3,1
[28718.212937] Call Trace:
[28718.212940] ([<000000000015b08c>] process_one_work+0x2b4/0x4c0)
[28718.212944] [<000000000015b858>] worker_thread+0x178/0x39c
[28718.212949] [<0000000000164652>] kthread+0x10e/0x128
[28718.212956] [<0000000000728c66>] kernel_thread_starter+0x6/0xc
[28718.212960] [<0000000000728c60>] kernel_thread_starter+0x0/0xc
[28718.212962] Last Breaking-Event-Address:
[28718.212965] [<000000000015b0ba>] process_one_work+0x2e2/0x4c0
[28718.212968] ---[ end trace 6d115577307998c2 ]---
The workload is:
2 processes onlining random cpus in a tight loop by using 'echo 1 >
/sys/bus/cpu.../online'
2 processes offlining random cpus in a tight loop by using 'echo 0 >
/sys/bus/cpu.../online'
Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
The machine has 10 processors.
The warning message some times hits within a few minutes on starting the
workload. Other times it takes several hours.
The particular spot in the code is:
/*
* Ensure we're on the correct CPU. DISASSOCIATED test is
* necessary to avoid spurious warnings from rescuers servicing the
* unbound or a disassociated pool.
*/
WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
!(pool->flags & POOL_DISASSOCIATED) &&
raw_smp_processor_id() != pool->cpu);
I'm not familiar with scheduling or work queuing internals so I'm not
sure how to further debug.
I would be happy to run tests and/or collect debugging data.
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-07 14:39 Subject: Warning in workqueue.c Jason J. Herne
@ 2014-02-07 16:51 ` Tejun Heo
2014-02-07 17:55 ` Jason J. Herne
0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-07 16:51 UTC (permalink / raw)
To: Jason J. Herne; +Cc: linux-kernel, Lai Jiangshan
Hello,
(cc'ing Lai as he knows a lot of workqueue code and quoting the whole
body for him)
Hmmm.... my memory is a bit rusty and nothing rings a bell
immediately. Can you please try the patch at the end of this message
and report the debug message? Let's first find out what's going on.
Thanks
On Fri, Feb 07, 2014 at 09:39:24AM -0500, Jason J. Herne wrote:
> I've been able to reproduce the following warning using several
> kernel versions on the S390 platform, including the latest master:
> 3.14-rc1 (38dbfb59d1175ef458d006556061adeaa8751b72).
>
> [28718.212810] ------------[ cut here ]------------
> [28718.212819] WARNING: at kernel/workqueue.c:2156
> [28718.212822] Modules linked in: ipt_MASQUERADE iptable_nat
> nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc
> ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi tape_3590 qeth_l2 tape
> tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod
> dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio
> dm_multipath [last unloaded: kvm]
> [28718.212857] CPU: 2 PID: 20 Comm: kworker/3:0 Not tainted 3.14.0-rc1 #1
> [28718.212862] task: 00000000f7b23260 ti: 00000000f7b2c000 task.ti:
> 00000000f7b2c000
> [28718.212874] Krnl PSW : 0404c00180000000 000000000015b0be
> (process_one_work+0x2e6/0x4c0)
> [28718.212881] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3
> CC:0 PM:0 EA:3
> Krnl GPRS: 0000000001727790 0000000000bc2a52 00000000f7f21900
> 0000000000b92500
> [28718.212883] 0000000000b92500 0000000000105b24
> 0000000000000000 0000000000bc2a4e
> [28718.212887] 0000000000000000 0000000084a2b500
> 0000000084a27000 0000000084a27018
> [28718.212888] 00000000f7f21900 0000000000b92500
> 00000000f7b2fdd0 00000000f7b2fd70
> [28718.212907] Krnl Code: 000000000015b0b2: 95001000 cli 0(%r1),0
> 000000000015b0b6: a774fece brc 7,15ae52
> #000000000015b0ba: a7f40001 brc 15,15b0bc
> >000000000015b0be: 92011000 mvi 0(%r1),1
> 000000000015b0c2: a7f4fec8 brc 15,15ae52
> 000000000015b0c6: e31003180004 lg %r1,792
> 000000000015b0cc: 58301024 l %r3,36(%r1)
> 000000000015b0d0: a73a0001 ahi %r3,1
> [28718.212937] Call Trace:
> [28718.212940] ([<000000000015b08c>] process_one_work+0x2b4/0x4c0)
> [28718.212944] [<000000000015b858>] worker_thread+0x178/0x39c
> [28718.212949] [<0000000000164652>] kthread+0x10e/0x128
> [28718.212956] [<0000000000728c66>] kernel_thread_starter+0x6/0xc
> [28718.212960] [<0000000000728c60>] kernel_thread_starter+0x0/0xc
> [28718.212962] Last Breaking-Event-Address:
> [28718.212965] [<000000000015b0ba>] process_one_work+0x2e2/0x4c0
> [28718.212968] ---[ end trace 6d115577307998c2 ]---
>
> The workload is:
> 2 processes onlining random cpus in a tight loop by using 'echo 1 >
> /sys/bus/cpu.../online'
> 2 processes offlining random cpus in a tight loop by using 'echo 0 >
> /sys/bus/cpu.../online'
> Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
>
> The machine has 10 processors.
> The warning message some times hits within a few minutes on starting
> the workload. Other times it takes several hours.
>
> The particular spot in the code is:
> /*
> * Ensure we're on the correct CPU. DISASSOCIATED test is
> * necessary to avoid spurious warnings from rescuers servicing the
> * unbound or a disassociated pool.
> */
> WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
> !(pool->flags & POOL_DISASSOCIATED) &&
> raw_smp_processor_id() != pool->cpu);
>
> I'm not familiar with scheduling or work queuing internals so I'm
> not sure how to further debug.
> I would be happy to run tests and/or collect debugging data.
>
> --
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
>
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..1cc6d05 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2151,9 +2151,12 @@ __acquires(&pool->lock)
* necessary to avoid spurious warnings from rescuers servicing the
* unbound or a disassociated pool.
*/
- WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
- !(pool->flags & POOL_DISASSOCIATED) &&
- raw_smp_processor_id() != pool->cpu);
+ if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+ !(pool->flags & POOL_DISASSOCIATED) &&
+ raw_smp_processor_id() != pool->cpu))
+ pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d\n",
+ worker->flags, pool->flags, raw_smp_processor_id(),
+ pool->cpu);
/*
* A single work shouldn't be executed concurrently by
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-07 16:51 ` Tejun Heo
@ 2014-02-07 17:55 ` Jason J. Herne
2014-02-07 19:36 ` Tejun Heo
0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-07 17:55 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-kernel, Lai Jiangshan
On 02/07/2014 11:51 AM, Tejun Heo wrote:
> Hello,
>
> (cc'ing Lai as he knows a lot of workqueue code and quoting the whole
> body for him)
>
> Hmmm.... my memory is a bit rusty and nothing rings a bell
> immediately. Can you please try the patch at the end of this message
> and report the debug message? Let's first find out what's going on.
I hit it quickly this time :) Here you go:
[ 644.517558] ------------[ cut here ]------------
[ 644.517567] WARNING: at kernel/workqueue.c:2156
[ 644.517569] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[ 644.517603] CPU: 6 PID: 24 Comm: kworker/4:0 Not tainted 3.14.0-rc1 #2
[ 644.517607] task: 00000000f7b49930 ti: 00000000f7b58000 task.ti:
00000000f7b58000
[ 644.517616] Krnl PSW : 0404c00180000000 000000000015b2a2
(process_one_work+0x4ca/0x4ec)
[ 644.517620] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0
PM:0 EA:3
Krnl GPRS: 000000000020f16f 00000000f7f0ce00 0000000000bc2a52
0000000000000001
[ 644.517625] 0000000000000000 0000000000000006
0000000000000004 0000000000bc2a4e
[ 644.517629] 0000000000000000 0000000084a37500
0000000084a33000 0000000084a33018
[ 644.517632] 00000000f7f21c00 0000000000b92500
00000000f7b5bdd0 00000000f7b5bd70
[ 644.517651] Krnl Code: 000000000015b294: ddf29640c05b trt
1600(243,%r9),91(%r12)
000000000015b29a: a7f4fe56 brc 15,15af46
#000000000015b29e: a7f40001 brc 15,15b2a0
>000000000015b2a2: 92012000 mvi 0(%r2),1
000000000015b2a6: a7f4ff10 brc 15,15b0c6
000000000015b2aa: c03000533bd1 larl %r3,bc2a4c
000000000015b2b0: 95003000 cli 0(%r3),0
000000000015b2b4: a774ff4e brc 7,15b150
[ 644.517679] Call Trace:
[ 644.517683] ([<000000000015b094>] process_one_work+0x2bc/0x4ec)
[ 644.517686] [<000000000015b884>] worker_thread+0x178/0x39c
[ 644.517690] [<000000000016467e>] kthread+0x10e/0x128
[ 644.517696] [<0000000000728c96>] kernel_thread_starter+0x6/0xc
[ 644.517699] [<0000000000728c90>] kernel_thread_starter+0x0/0xc
[ 644.517701] Last Breaking-Event-Address:
[ 644.517705] [<000000000015b29e>] process_one_work+0x4c6/0x4ec
[ 644.517707] ---[ end trace 681ce8162e7857a0 ]---
[ 644.517710] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=4
[ 731.367023] XXX: worker->flags=0x1 pool->flags=0x0 cpu=1 pool->cpu=7
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-07 17:55 ` Jason J. Herne
@ 2014-02-07 19:36 ` Tejun Heo
2014-02-10 15:32 ` Jason J. Herne
0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-07 19:36 UTC (permalink / raw)
To: Jason J. Herne; +Cc: linux-kernel, Lai Jiangshan
Hello,
On Fri, Feb 07, 2014 at 12:55:28PM -0500, Jason J. Herne wrote:
> [ 644.517710] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=4
> [ 731.367023] XXX: worker->flags=0x1 pool->flags=0x0 cpu=1 pool->cpu=7
Sorry, still no idea how this can happen. Can you please try the
following?
Thanks!
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..1b11ec0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2151,9 +2151,27 @@ __acquires(&pool->lock)
* necessary to avoid spurious warnings from rescuers servicing the
* unbound or a disassociated pool.
*/
- WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
- !(pool->flags & POOL_DISASSOCIATED) &&
- raw_smp_processor_id() != pool->cpu);
+ if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+ !(pool->flags & POOL_DISASSOCIATED) &&
+ raw_smp_processor_id() != pool->cpu)) {
+ static char buf[PAGE_SIZE];
+ unsigned long now = jiffies;
+
+ pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d rescue_wq=%p\n",
+ worker->flags, pool->flags, raw_smp_processor_id(),
+ pool->cpu, worker->rescue_wq);
+ pr_warning("XXX: last_unbind=%ld last_rebind=%ld last_rebound_clear=%ld nr_exected_after_rebound_clear=%d\n",
+ worker->last_unbind ? worker->last_unbind - now : 999,
+ worker->last_rebind ? worker->last_rebind - now : 999,
+ worker->last_rebound_clear ? worker->last_rebound_clear - now : 999,
+ worker->nr_executed_after_rebound_clear);
+
+ cpulist_scnprintf(buf, sizeof(buf), ¤t->cpus_allowed);
+ pr_warning("XXX: cpus_allowed=%s\n", buf);
+
+ cpulist_scnprintf(buf, sizeof(buf), &worker->cpus_allowed_after_rebinding);
+ pr_warning("XXX: cpus_allowed_after_rebinding=%s\n", buf);
+ }
/*
* A single work shouldn't be executed concurrently by
@@ -2199,6 +2217,8 @@ __acquires(&pool->lock)
*/
set_work_pool_and_clear_pending(work, pool->id);
+ worker->nr_executed_after_rebound_clear++;
+
spin_unlock_irq(&pool->lock);
lock_map_acquire_read(&pwq->wq->lockdep_map);
@@ -2321,6 +2341,10 @@ recheck:
* management if applicable and concurrency management is restored
* after being rebound. See rebind_workers() for details.
*/
+ if (worker->flags & WORKER_REBOUND) {
+ worker->last_rebound_clear = jiffies;
+ worker->nr_executed_after_rebound_clear = 0;
+ }
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
do {
@@ -4576,8 +4600,10 @@ static void wq_unbind_fn(struct work_struct *work)
* before the last CPU down must be on the cpu. After
* this, they may become diasporas.
*/
- for_each_pool_worker(worker, wi, pool)
+ for_each_pool_worker(worker, wi, pool) {
worker->flags |= WORKER_UNBOUND;
+ worker->last_unbind = jiffies;
+ }
pool->flags |= POOL_DISASSOCIATED;
@@ -4633,9 +4659,13 @@ static void rebind_workers(struct worker_pool *pool)
* of all workers first and then clear UNBOUND. As we're called
* from CPU_ONLINE, the following shouldn't fail.
*/
- for_each_pool_worker(worker, wi, pool)
+ for_each_pool_worker(worker, wi, pool) {
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0);
+ worker->last_rebind = jiffies;
+ cpumask_copy(&worker->cpus_allowed_after_rebinding,
+ &worker->task->cpus_allowed);
+ }
spin_lock_irq(&pool->lock);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..95d68c4 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -50,6 +50,11 @@ struct worker {
/* used only by rescuers to point to the target workqueue */
struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
+ unsigned long last_unbind;
+ unsigned long last_rebind;
+ unsigned long last_rebound_clear;
+ int nr_executed_after_rebound_clear;
+ cpumask_t cpus_allowed_after_rebinding;
};
/**
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-07 19:36 ` Tejun Heo
@ 2014-02-10 15:32 ` Jason J. Herne
2014-02-10 23:17 ` Tejun Heo
0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-10 15:32 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-kernel, Lai Jiangshan
On 02/07/2014 02:36 PM, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 07, 2014 at 12:55:28PM -0500, Jason J. Herne wrote:
>> [ 644.517710] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=4
>> [ 731.367023] XXX: worker->flags=0x1 pool->flags=0x0 cpu=1 pool->cpu=7
>
> Sorry, still no idea how this can happen. Can you please try the
> following?
>
Here are the results:
[ 950.778326] ------------[ cut here ]------------
[ 950.778334] WARNING: at kernel/workqueue.c:2156
[ 950.778336] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi tape_3590 tape tape_class qeth_l2 vhost_net tun
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[ 950.778370] CPU: 6 PID: 16 Comm: kworker/2:0 Not tainted 3.14.0-rc1 #1
[ 950.778375] task: 00000000f7cc4b90 ti: 00000000f7ce8000 task.ti:
00000000f7ce8000
[ 950.778385] Krnl PSW : 0404c00180000000 000000000015b386
(process_one_work+0x59a/0x5c0)
[ 950.778389] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0
PM:0 EA:3
Krnl GPRS: 00000000001ee417 00000000f7f0ce00 0000000000bc2a52
0000000000000001
[ 950.778396] 0000000000000000 0000000000000006
0000000000000002 0000000000bc2a4e
[ 950.778400] 0000000000000000 0000000084a1f500
0000000084a1b000 0000000084a1b018
[ 950.778404] 00000000f7f52900 0000000000b92500
00000000f7cebdd0 00000000f7cebd68
[ 950.778423] Krnl Code: 000000000015b37a: 92013000 mvi 0(%r3),1
000000000015b37e: a7f4ff31 brc 15,15b1e0
#000000000015b382: a7f40001 brc 15,15b384
>000000000015b386: 92012000 mvi 0(%r2),1
000000000015b38a: a7f4feae brc 15,15b0e6
000000000015b38e: 41102018 la %r1,24(%r2)
000000000015b392: e31020180020 cg %r1,24(%r2)
000000000015b398: a784ff59 brc 8,15b24a
[ 950.778451] Call Trace:
[ 950.778455] ([<000000000015b0b4>] process_one_work+0x2c8/0x5c0)
[ 950.778459] [<000000000015b98e>] worker_thread+0x19a/0x3c0
[ 950.778463] [<00000000001647c2>] kthread+0x10e/0x128
[ 950.778469] [<0000000000728dd6>] kernel_thread_starter+0x6/0xc
[ 950.778472] [<0000000000728dd0>] kernel_thread_starter+0x0/0xc
[ 950.778475] Last Breaking-Event-Address:
[ 950.778478] [<000000000015b382>] process_one_work+0x596/0x5c0
[ 950.778481] ---[ end trace 643b2850ca4b7ed8 ]---
[ 950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=2
rescue_wq= (null)
[ 950.778488] XXX: last_unbind=-7 last_rebind=0 last_rebound_clear=0
nr_exected_after_rebound_clear=0
[ 950.778492] XXX: cpus_allowed=2
[ 950.778495] XXX: cpus_allowed_after_rebinding=2
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-10 15:32 ` Jason J. Herne
@ 2014-02-10 23:17 ` Tejun Heo
2014-02-12 15:18 ` Jason J. Herne
0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-10 23:17 UTC (permalink / raw)
To: Jason J. Herne; +Cc: linux-kernel, Lai Jiangshan
Hello,
On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
> [ 950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
> pool->cpu=2 rescue_wq= (null)
> [ 950.778488] XXX: last_unbind=-7 last_rebind=0
> last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [ 950.778492] XXX: cpus_allowed=2
> [ 950.778495] XXX: cpus_allowed_after_rebinding=2
So, everything looks kosher from workqueue side. Weird. cpus_allowed
is properly set and everything. The worker just isn't running on the
cpu it's supposed to be on. Can you please try the following?
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..f71ee11 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2151,9 +2151,31 @@ __acquires(&pool->lock)
* necessary to avoid spurious warnings from rescuers servicing the
* unbound or a disassociated pool.
*/
- WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
- !(pool->flags & POOL_DISASSOCIATED) &&
- raw_smp_processor_id() != pool->cpu);
+ if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+ !(pool->flags & POOL_DISASSOCIATED) &&
+ raw_smp_processor_id() != pool->cpu)) {
+ static char buf[PAGE_SIZE];
+ unsigned long now = jiffies;
+
+ pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d(%d) rescue_wq=%p\n",
+ worker->flags, pool->flags, raw_smp_processor_id(),
+ pool->cpu, cpu_online(pool->cpu), worker->rescue_wq);
+ pr_warning("XXX: last_unbind=%ld last_rebind=%ld last_rebound_clear=%ld nr_exected_after_rebound_clear=%d\n",
+ worker->last_unbind ? worker->last_unbind - now : 999,
+ worker->last_rebind ? worker->last_rebind - now : 999,
+ worker->last_rebound_clear ? worker->last_rebound_clear - now : 999,
+ worker->nr_executed_after_rebound_clear);
+
+ cpulist_scnprintf(buf, sizeof(buf), ¤t->cpus_allowed);
+ pr_warning("XXX: cpus_allowed=%s\n", buf);
+
+ cpulist_scnprintf(buf, sizeof(buf), &worker->cpus_allowed_after_rebinding);
+ pr_warning("XXX: cpus_allowed_after_rebinding=%s\n", buf);
+
+ schedule();
+
+ pr_warning("XXX: after schedule(), cpu=%d\n", raw_smp_processor_id());
+ }
/*
* A single work shouldn't be executed concurrently by
@@ -2199,6 +2221,8 @@ __acquires(&pool->lock)
*/
set_work_pool_and_clear_pending(work, pool->id);
+ worker->nr_executed_after_rebound_clear++;
+
spin_unlock_irq(&pool->lock);
lock_map_acquire_read(&pwq->wq->lockdep_map);
@@ -2321,6 +2345,10 @@ recheck:
* management if applicable and concurrency management is restored
* after being rebound. See rebind_workers() for details.
*/
+ if (worker->flags & WORKER_REBOUND) {
+ worker->last_rebound_clear = jiffies;
+ worker->nr_executed_after_rebound_clear = 0;
+ }
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
do {
@@ -4576,8 +4604,10 @@ static void wq_unbind_fn(struct work_struct *work)
* before the last CPU down must be on the cpu. After
* this, they may become diasporas.
*/
- for_each_pool_worker(worker, wi, pool)
+ for_each_pool_worker(worker, wi, pool) {
worker->flags |= WORKER_UNBOUND;
+ worker->last_unbind = jiffies;
+ }
pool->flags |= POOL_DISASSOCIATED;
@@ -4633,9 +4663,13 @@ static void rebind_workers(struct worker_pool *pool)
* of all workers first and then clear UNBOUND. As we're called
* from CPU_ONLINE, the following shouldn't fail.
*/
- for_each_pool_worker(worker, wi, pool)
+ for_each_pool_worker(worker, wi, pool) {
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0);
+ worker->last_rebind = jiffies;
+ cpumask_copy(&worker->cpus_allowed_after_rebinding,
+ &worker->task->cpus_allowed);
+ }
spin_lock_irq(&pool->lock);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..95d68c4 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -50,6 +50,11 @@ struct worker {
/* used only by rescuers to point to the target workqueue */
struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
+ unsigned long last_unbind;
+ unsigned long last_rebind;
+ unsigned long last_rebound_clear;
+ int nr_executed_after_rebound_clear;
+ cpumask_t cpus_allowed_after_rebinding;
};
/**
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-10 23:17 ` Tejun Heo
@ 2014-02-12 15:18 ` Jason J. Herne
2014-02-13 3:02 ` Lai Jiangshan
2014-02-13 3:31 ` Lai Jiangshan
0 siblings, 2 replies; 23+ messages in thread
From: Jason J. Herne @ 2014-02-12 15:18 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-kernel, Lai Jiangshan
On 02/10/2014 06:17 PM, Tejun Heo wrote:
> Hello,
>
> On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
>> [ 950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
>> pool->cpu=2 rescue_wq= (null)
>> [ 950.778488] XXX: last_unbind=-7 last_rebind=0
>> last_rebound_clear=0 nr_exected_after_rebound_clear=0
>> [ 950.778492] XXX: cpus_allowed=2
>> [ 950.778495] XXX: cpus_allowed_after_rebinding=2
>
> So, everything looks kosher from workqueue side. Weird. cpus_allowed
> is properly set and everything. The worker just isn't running on the
> cpu it's supposed to be on. Can you please try the following?
Thanks for the fast responses. Here is the output from the last patch:
[34437.173991] WARNING: at kernel/workqueue.c:2156
[34437.173993] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[34437.174030] CPU: 2 PID: 12 Comm: kworker/1:0 Not tainted 3.14.0-rc1 #1
[34437.174034] task: 00000000f7c9e4c0 ti: 00000000f7cbc000 task.ti:
00000000f7cbc000
[34437.174046] Krnl PSW : 0404c00180000000 000000000015b406
(process_one_work+0x61a/0x640)
[34437.174051] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0
PM:0 EA:3
Krnl GPRS: 000000007c325b00 0000000000bc2a52 00000000f7f52780
0000000000000001
[34437.174057] 0000000000000000 0000000000000002
0000000000000001 0000000000bc2a4e
[34437.174061] 0000000000000000 0000000084a13500
0000000084a0f000 0000000084a0f018
[34437.174065] 00000000f7f52780 0000000000735d18
00000000f7cbfdd0 00000000f7cbfd58
[34437.174085] Krnl Code: 000000000015b3fa: 92013000 mvi 0(%r3),1
000000000015b3fe: a7f4ff31 brc 15,15b260
#000000000015b402: a7f40001 brc 15,15b404
>000000000015b406: 92011000 mvi 0(%r1),1
000000000015b40a: a7f4fe84 brc 15,15b112
000000000015b40e: 41102018 la %r1,24(%r2)
000000000015b412: e31020180020 cg %r1,24(%r2)
000000000015b418: a784ff59 brc 8,15b2ca
[34437.174115] Call Trace:
[34437.174118] ([<000000000015b0e0>] process_one_work+0x2f4/0x640)
[34437.174122] [<000000000015ba0e>] worker_thread+0x19a/0x3c0
[34437.174126] [<0000000000164842>] kthread+0x10e/0x128
[34437.174132] [<0000000000728e56>] kernel_thread_starter+0x6/0xc
[34437.174136] [<0000000000728e50>] kernel_thread_starter+0x0/0xc
[34437.174138] Last Breaking-Event-Address:
[34437.174141] [<000000000015b402>] process_one_work+0x616/0x640
[34437.174144] ---[ end trace 52d8ee96597a2415 ]---
[34437.174148] XXX: worker->flags=0x1 pool->flags=0x0 cpu=2
pool->cpu=1(1) rescue_wq= (null)
[34437.174152] XXX: last_unbind=-34 last_rebind=0 last_rebound_clear=0
nr_exected_after_rebound_clear=0
[34437.174156] XXX: cpus_allowed=1
[34437.174158] XXX: cpus_allowed_after_rebinding=1
[34437.174161] BUG: scheduling while atomic: kworker/1:0/12/0x00000002
[34437.174164] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[34437.174194] CPU: 2 PID: 12 Comm: kworker/1:0 Tainted: G W
3.14.0-rc1 #1
[34437.174197] 00000000f7cbfb80 00000000f7cbfb90 0000000000000002
0000000000000000
00000000f7cbfc20 00000000f7cbfb98 00000000f7cbfb98 0000000000115f6a
0000000000000003 0000000000c4da00 000000000000000a 000000000000000a
00000000f7cbfbe0 00000000f7cbfb80 0000000000000000 0000000000000000
0000000000000000 0000000000115f6a 00000000f7cbfb80 00000000f7cbfbd0
[34437.174273] Call Trace:
[34437.174279] ([<0000000000115e5c>] show_trace+0x100/0x148)
[34437.174282] [<0000000000115f18>] show_stack+0x74/0xf4
[34437.174288] [<0000000000720a9c>] dump_stack+0x88/0xb8
[34437.174291] [<000000000071aa40>] __schedule_bug+0x78/0x90
[34437.174295] [<00000000007237e4>] __schedule+0xb8c/0xbac
[34437.174299] [<000000000015b21e>] process_one_work+0x432/0x640
[34437.174302] [<000000000015ba0e>] worker_thread+0x19a/0x3c0
[34437.174306] [<0000000000164842>] kthread+0x10e/0x128
[34437.174309] [<0000000000728e56>] kernel_thread_starter+0x6/0xc
[34437.174313] [<0000000000728e50>] kernel_thread_starter+0x0/0xc
[34437.174317] XXX: after schedule(), cpu=2
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-12 15:18 ` Jason J. Herne
@ 2014-02-13 3:02 ` Lai Jiangshan
2014-02-13 3:31 ` Lai Jiangshan
1 sibling, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2014-02-13 3:02 UTC (permalink / raw)
To: jjherne; +Cc: Tejun Heo, linux-kernel
On 02/12/2014 11:18 PM, Jason J. Herne wrote:
> On 02/10/2014 06:17 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
>>> [ 950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
>>> pool->cpu=2 rescue_wq= (null)
>>> [ 950.778488] XXX: last_unbind=-7 last_rebind=0
>>> last_rebound_clear=0 nr_exected_after_rebound_clear=0
>>> [ 950.778492] XXX: cpus_allowed=2
>>> [ 950.778495] XXX: cpus_allowed_after_rebinding=2
>>
>> So, everything looks kosher from workqueue side. Weird. cpus_allowed
>> is properly set and everything. The worker just isn't running on the
>> cpu it's supposed to be on. Can you please try the following?
>
>
> Thanks for the fast responses. Here is the output from the last patch:
I can't hit the bug. Could you give me more information to help me to reproduce the bug?
I think we need to use "git-bisect" to address the suspicious patch.
Thanks,
Lai
>
>
> [34437.173991] WARNING: at kernel/workqueue.c:2156
> [34437.173993] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174030] CPU: 2 PID: 12 Comm: kworker/1:0 Not tainted 3.14.0-rc1 #1
> [34437.174034] task: 00000000f7c9e4c0 ti: 00000000f7cbc000 task.ti: 00000000f7cbc000
> [34437.174046] Krnl PSW : 0404c00180000000 000000000015b406 (process_one_work+0x61a/0x640)
> [34437.174051] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
> Krnl GPRS: 000000007c325b00 0000000000bc2a52 00000000f7f52780 0000000000000001
> [34437.174057] 0000000000000000 0000000000000002 0000000000000001 0000000000bc2a4e
> [34437.174061] 0000000000000000 0000000084a13500 0000000084a0f000 0000000084a0f018
> [34437.174065] 00000000f7f52780 0000000000735d18 00000000f7cbfdd0 00000000f7cbfd58
> [34437.174085] Krnl Code: 000000000015b3fa: 92013000 mvi 0(%r3),1
> 000000000015b3fe: a7f4ff31 brc 15,15b260
> #000000000015b402: a7f40001 brc 15,15b404
> >000000000015b406: 92011000 mvi 0(%r1),1
> 000000000015b40a: a7f4fe84 brc 15,15b112
> 000000000015b40e: 41102018 la %r1,24(%r2)
> 000000000015b412: e31020180020 cg %r1,24(%r2)
> 000000000015b418: a784ff59 brc 8,15b2ca
> [34437.174115] Call Trace:
> [34437.174118] ([<000000000015b0e0>] process_one_work+0x2f4/0x640)
> [34437.174122] [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174126] [<0000000000164842>] kthread+0x10e/0x128
> [34437.174132] [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174136] [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174138] Last Breaking-Event-Address:
> [34437.174141] [<000000000015b402>] process_one_work+0x616/0x640
> [34437.174144] ---[ end trace 52d8ee96597a2415 ]---
> [34437.174148] XXX: worker->flags=0x1 pool->flags=0x0 cpu=2 pool->cpu=1(1) rescue_wq= (null)
> [34437.174152] XXX: last_unbind=-34 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [34437.174156] XXX: cpus_allowed=1
> [34437.174158] XXX: cpus_allowed_after_rebinding=1
> [34437.174161] BUG: scheduling while atomic: kworker/1:0/12/0x00000002
> [34437.174164] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174194] CPU: 2 PID: 12 Comm: kworker/1:0 Tainted: G W 3.14.0-rc1 #1
> [34437.174197] 00000000f7cbfb80 00000000f7cbfb90 0000000000000002 0000000000000000
> 00000000f7cbfc20 00000000f7cbfb98 00000000f7cbfb98 0000000000115f6a
> 0000000000000003 0000000000c4da00 000000000000000a 000000000000000a
> 00000000f7cbfbe0 00000000f7cbfb80 0000000000000000 0000000000000000
> 0000000000000000 0000000000115f6a 00000000f7cbfb80 00000000f7cbfbd0
> [34437.174273] Call Trace:
> [34437.174279] ([<0000000000115e5c>] show_trace+0x100/0x148)
> [34437.174282] [<0000000000115f18>] show_stack+0x74/0xf4
> [34437.174288] [<0000000000720a9c>] dump_stack+0x88/0xb8
> [34437.174291] [<000000000071aa40>] __schedule_bug+0x78/0x90
> [34437.174295] [<00000000007237e4>] __schedule+0xb8c/0xbac
> [34437.174299] [<000000000015b21e>] process_one_work+0x432/0x640
> [34437.174302] [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174306] [<0000000000164842>] kthread+0x10e/0x128
> [34437.174309] [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174313] [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174317] XXX: after schedule(), cpu=2
>
>
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-12 15:18 ` Jason J. Herne
2014-02-13 3:02 ` Lai Jiangshan
@ 2014-02-13 3:31 ` Lai Jiangshan
2014-02-13 17:58 ` Jason J. Herne
1 sibling, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2014-02-13 3:31 UTC (permalink / raw)
To: jjherne; +Cc: Tejun Heo, linux-kernel
On 02/12/2014 11:18 PM, Jason J. Herne wrote:
> On 02/10/2014 06:17 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
>>> [ 950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
>>> pool->cpu=2 rescue_wq= (null)
>>> [ 950.778488] XXX: last_unbind=-7 last_rebind=0
>>> last_rebound_clear=0 nr_exected_after_rebound_clear=0
>>> [ 950.778492] XXX: cpus_allowed=2
>>> [ 950.778495] XXX: cpus_allowed_after_rebinding=2
>>
>> So, everything looks kosher from workqueue side. Weird. cpus_allowed
>> is properly set and everything. The worker just isn't running on the
>> cpu it's supposed to be on. Can you please try the following?
>
>
> Thanks for the fast responses. Here is the output from the last patch:
>
>
> [34437.173991] WARNING: at kernel/workqueue.c:2156
> [34437.173993] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174030] CPU: 2 PID: 12 Comm: kworker/1:0 Not tainted 3.14.0-rc1 #1
> [34437.174034] task: 00000000f7c9e4c0 ti: 00000000f7cbc000 task.ti: 00000000f7cbc000
> [34437.174046] Krnl PSW : 0404c00180000000 000000000015b406 (process_one_work+0x61a/0x640)
> [34437.174051] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
> Krnl GPRS: 000000007c325b00 0000000000bc2a52 00000000f7f52780 0000000000000001
> [34437.174057] 0000000000000000 0000000000000002 0000000000000001 0000000000bc2a4e
> [34437.174061] 0000000000000000 0000000084a13500 0000000084a0f000 0000000084a0f018
> [34437.174065] 00000000f7f52780 0000000000735d18 00000000f7cbfdd0 00000000f7cbfd58
> [34437.174085] Krnl Code: 000000000015b3fa: 92013000 mvi 0(%r3),1
> 000000000015b3fe: a7f4ff31 brc 15,15b260
> #000000000015b402: a7f40001 brc 15,15b404
> >000000000015b406: 92011000 mvi 0(%r1),1
> 000000000015b40a: a7f4fe84 brc 15,15b112
> 000000000015b40e: 41102018 la %r1,24(%r2)
> 000000000015b412: e31020180020 cg %r1,24(%r2)
> 000000000015b418: a784ff59 brc 8,15b2ca
> [34437.174115] Call Trace:
> [34437.174118] ([<000000000015b0e0>] process_one_work+0x2f4/0x640)
> [34437.174122] [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174126] [<0000000000164842>] kthread+0x10e/0x128
> [34437.174132] [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174136] [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174138] Last Breaking-Event-Address:
> [34437.174141] [<000000000015b402>] process_one_work+0x616/0x640
> [34437.174144] ---[ end trace 52d8ee96597a2415 ]---
> [34437.174148] XXX: worker->flags=0x1 pool->flags=0x0 cpu=2 pool->cpu=1(1) rescue_wq= (null)
> [34437.174152] XXX: last_unbind=-34 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [34437.174156] XXX: cpus_allowed=1
> [34437.174158] XXX: cpus_allowed_after_rebinding=1
> [34437.174161] BUG: scheduling while atomic: kworker/1:0/12/0x00000002
> [34437.174164] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174194] CPU: 2 PID: 12 Comm: kworker/1:0 Tainted: G W 3.14.0-rc1 #1
> [34437.174197] 00000000f7cbfb80 00000000f7cbfb90 0000000000000002 0000000000000000
> 00000000f7cbfc20 00000000f7cbfb98 00000000f7cbfb98 0000000000115f6a
> 0000000000000003 0000000000c4da00 000000000000000a 000000000000000a
> 00000000f7cbfbe0 00000000f7cbfb80 0000000000000000 0000000000000000
> 0000000000000000 0000000000115f6a 00000000f7cbfb80 00000000f7cbfbd0
> [34437.174273] Call Trace:
> [34437.174279] ([<0000000000115e5c>] show_trace+0x100/0x148)
> [34437.174282] [<0000000000115f18>] show_stack+0x74/0xf4
> [34437.174288] [<0000000000720a9c>] dump_stack+0x88/0xb8
> [34437.174291] [<000000000071aa40>] __schedule_bug+0x78/0x90
> [34437.174295] [<00000000007237e4>] __schedule+0xb8c/0xbac
> [34437.174299] [<000000000015b21e>] process_one_work+0x432/0x640
> [34437.174302] [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174306] [<0000000000164842>] kthread+0x10e/0x128
> [34437.174309] [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174313] [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174317] XXX: after schedule(), cpu=2
Could you use the following patch for test if Tejun doesn't give you a new one.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b010eac..cdd7a10 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -825,6 +825,7 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
{
struct worker *worker = kthread_data(task);
+ worker->wake_up = jiffies;
if (!(worker->flags & WORKER_NOT_RUNNING)) {
WARN_ON_ONCE(worker->pool->cpu != cpu);
atomic_inc(&worker->pool->nr_running);
@@ -851,6 +852,7 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu)
struct worker *worker = kthread_data(task), *to_wakeup = NULL;
struct worker_pool *pool;
+ worker->sleep = jiffies;
/*
* Rescuers, which may not have all the fields set up like normal
* workers, also reach here, let's not access anything before
@@ -2134,6 +2136,7 @@ __acquires(&pool->lock)
bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
int work_color;
struct worker *collision;
+ int worker_bug = 0;
#ifdef CONFIG_LOCKDEP
/*
* It is permissible to free the struct work_struct from
@@ -2151,9 +2154,31 @@ __acquires(&pool->lock)
* necessary to avoid spurious warnings from rescuers servicing the
* unbound or a disassociated pool.
*/
- WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
- !(pool->flags & POOL_DISASSOCIATED) &&
- raw_smp_processor_id() != pool->cpu);
+ if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+ !(pool->flags & POOL_DISASSOCIATED) &&
+ raw_smp_processor_id() != pool->cpu)) {
+ static char buf[PAGE_SIZE];
+ unsigned long now = jiffies;
+
+ worker_bug = 1;
+ pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d(%d) rescue_wq=%p\n",
+ worker->flags, pool->flags, raw_smp_processor_id(),
+ pool->cpu, cpu_online(pool->cpu), worker->rescue_wq);
+ pr_warning("XXX: last_unbind=%ld last_rebind=%ld last_rebound_clear=%ld nr_exected_after_rebound_clear=%d\n",
+ worker->last_unbind ? worker->last_unbind - now : 999,
+ worker->last_rebind ? worker->last_rebind - now : 999,
+ worker->last_rebound_clear ? worker->last_rebound_clear - now : 999,
+ worker->nr_executed_after_rebound_clear);
+ pr_warning("XXX: sleep=%ld wakeup=%ld\n",
+ worker->sleep ? worker->sleep - now : 999,
+ worker->wake_up ? worker->wake_up - now : 999);
+
+ cpulist_scnprintf(buf, sizeof(buf), ¤t->cpus_allowed);
+ pr_warning("XXX: cpus_allowed=%s\n", buf);
+
+ cpulist_scnprintf(buf, sizeof(buf), &worker->cpus_allowed_after_rebinding);
+ pr_warning("XXX: cpus_allowed_after_rebinding=%s\n", buf);
+ }
/*
* A single work shouldn't be executed concurrently by
@@ -2199,8 +2224,15 @@ __acquires(&pool->lock)
*/
set_work_pool_and_clear_pending(work, pool->id);
+ worker->nr_executed_after_rebound_clear++;
+
spin_unlock_irq(&pool->lock);
+ if (worker_bug) {
+ schedule();
+
+ pr_warning("XXX: after schedule(), cpu=%d\n", raw_smp_processor_id());
+ }
lock_map_acquire_read(&pwq->wq->lockdep_map);
lock_map_acquire(&lockdep_map);
trace_workqueue_execute_start(work);
@@ -2298,6 +2330,7 @@ woke_up:
}
worker_leave_idle(worker);
+ worker->wake_up = jiffies;
recheck:
/* no more worker necessary? */
if (!need_more_worker(pool))
@@ -2321,6 +2354,10 @@ recheck:
* management if applicable and concurrency management is restored
* after being rebound. See rebind_workers() for details.
*/
+ if (worker->flags & WORKER_REBOUND) {
+ worker->last_rebound_clear = jiffies;
+ worker->nr_executed_after_rebound_clear = 0;
+ }
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
do {
@@ -2341,6 +2378,7 @@ recheck:
worker_set_flags(worker, WORKER_PREP, false);
sleep:
+ worker->sleep = jiffies;
if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
goto recheck;
@@ -4576,8 +4614,10 @@ static void wq_unbind_fn(struct work_struct *work)
* before the last CPU down must be on the cpu. After
* this, they may become diasporas.
*/
- for_each_pool_worker(worker, wi, pool)
+ for_each_pool_worker(worker, wi, pool) {
worker->flags |= WORKER_UNBOUND;
+ worker->last_unbind = jiffies;
+ }
pool->flags |= POOL_DISASSOCIATED;
@@ -4633,9 +4673,13 @@ static void rebind_workers(struct worker_pool *pool)
* of all workers first and then clear UNBOUND. As we're called
* from CPU_ONLINE, the following shouldn't fail.
*/
- for_each_pool_worker(worker, wi, pool)
+ for_each_pool_worker(worker, wi, pool) {
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0);
+ worker->last_rebind = jiffies;
+ cpumask_copy(&worker->cpus_allowed_after_rebinding,
+ &worker->task->cpus_allowed);
+ }
spin_lock_irq(&pool->lock);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..db17e2c 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -50,6 +50,13 @@ struct worker {
/* used only by rescuers to point to the target workqueue */
struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */
+ unsigned long last_unbind;
+ unsigned long last_rebind;
+ unsigned long sleep;
+ unsigned long wake_up;
+ unsigned long last_rebound_clear;
+ int nr_executed_after_rebound_clear;
+ cpumask_t cpus_allowed_after_rebinding;
};
/**
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-13 3:31 ` Lai Jiangshan
@ 2014-02-13 17:58 ` Jason J. Herne
2014-02-13 20:41 ` Tejun Heo
0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-13 17:58 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Tejun Heo, linux-kernel
On 02/12/2014 10:31 PM, Lai Jiangshan wrote:
> On 02/12/2014 11:18 PM, Jason J. Herne wrote:
> Could you use the following patch for test if Tejun doesn't give you a new one.
Lai,
Here is the output using the patch you asked me to run with.
[ 5779.795687] ------------[ cut here ]------------
[ 5779.795695] WARNING: at kernel/workqueue.c:2159
[ 5779.795698] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi tape_3590 qeth_l2 tape tape_class vhost_net tun
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[ 5779.795733] CPU: 4 PID: 270 Comm: kworker/5:1 Not tainted 3.14.0-rc1 #1
[ 5779.795738] task: 0000000001938000 ti: 00000000f4d9c000 task.ti:
00000000f4d9c000
[ 5779.795750] Krnl PSW : 0404c00180000000 000000000015b452
(process_one_work+0x666/0x688)
[ 5779.795756] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0
PM:0 EA:3
Krnl GPRS: 000003210f9db000 0000000000bc2a52 0000000001b640c0
0000000000000001
[ 5779.795757] 0000000000000000 0000000000000004
0000000000000005 00000000ffffffff
[ 5779.795759] 0000000000000000 0000000084a43500
0000000084a3f000 0000000084a3f018
[ 5779.795763] 0000000001b640c0 0000000000735d18
00000000f4d9fdc8 00000000f4d9fd50
[ 5779.795781] Krnl Code: 000000000015b444: dd1a9640c05b trt
1600(27,%r9),91(%r12)
000000000015b44a: a7f4fd9e brc 15,15af86
#000000000015b44e: a7f40001 brc 15,15b450
>000000000015b452: 92011000 mvi 0(%r1),1
000000000015b456: a7f4fe63 brc 15,15b11c
000000000015b45a: c03000533af9 larl %r3,bc2a4c
000000000015b460: 95003000 cli 0(%r3),0
000000000015b464: a774ff3e brc 7,15b2e0
[ 5779.795810] Call Trace:
[ 5779.795814] ([<000000000015b0ea>] process_one_work+0x2fe/0x688)
[ 5779.795817] [<000000000015ba62>] worker_thread+0x1a6/0x3d4
[ 5779.795822] [<00000000001648c2>] kthread+0x10e/0x128
[ 5779.795828] [<0000000000728ed6>] kernel_thread_starter+0x6/0xc
[ 5779.795832] [<0000000000728ed0>] kernel_thread_starter+0x0/0xc
[ 5779.795834] Last Breaking-Event-Address:
[ 5779.795837] [<000000000015b44e>] process_one_work+0x662/0x688
[ 5779.795840] ---[ end trace 8b6353b0f2821ec9 ]---
[ 5779.795844] XXX: worker->flags=0x1 pool->flags=0x0 cpu=4
pool->cpu=5(1) rescue_wq= (null)
[ 5779.795848] XXX: last_unbind=-44 last_rebind=0 last_rebound_clear=0
nr_exected_after_rebound_clear=0
[ 5779.795852] XXX: sleep=-39 wakeup=0
[ 5779.795855] XXX: cpus_allowed=5
[ 5779.795857] XXX: cpus_allowed_after_rebinding=5
[ 5779.795861] XXX: after schedule(), cpu=4
You had asked about reproducing this. This is on the S390 platform, I'm
not sure if that makes any difference.
The workload is:
2 processes onlining random cpus in a tight loop by using 'echo 1 >
/sys/bus/cpu.../online'
2 processes offlining random cpus in a tight loop by using 'echo 0 >
/sys/bus/cpu.../online'
Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
The machine has 10 processors.
The warning message some times hits within a few minutes on starting the
workload. Other times it takes several hours.
Please let me know if you have further questions.
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-13 17:58 ` Jason J. Herne
@ 2014-02-13 20:41 ` Tejun Heo
2014-02-14 14:56 ` Jason J. Herne
2014-02-14 16:09 ` Peter Zijlstra
0 siblings, 2 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-13 20:41 UTC (permalink / raw)
To: Jason J. Herne; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar, Peter Zijlstra
Hello,
(cc'ing Ingo and Peter)
On Thu, Feb 13, 2014 at 12:58:10PM -0500, Jason J. Herne wrote:
> [ 5779.795687] ------------[ cut here ]------------
> [ 5779.795695] WARNING: at kernel/workqueue.c:2159
....
> [ 5779.795844] XXX: worker->flags=0x1 pool->flags=0x0 cpu=4 pool->cpu=5(1) rescue_wq= (null)
> [ 5779.795848] XXX: last_unbind=-44 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [ 5779.795852] XXX: sleep=-39 wakeup=0
> [ 5779.795855] XXX: cpus_allowed=5
> [ 5779.795857] XXX: cpus_allowed_after_rebinding=5
> [ 5779.795861] XXX: after schedule(), cpu=4
>
> You had asked about reproducing this. This is on the S390 platform,
> I'm not sure if that makes any difference.
>
> The workload is:
> 2 processes onlining random cpus in a tight loop by using 'echo 1 > /sys/bus/cpu.../online'
> 2 processes offlining random cpus in a tight loop by using 'echo 0 > /sys/bus/cpu.../online'
> Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
>
> The machine has 10 processors.
> The warning message some times hits within a few minutes on starting
> the workload. Other times it takes several hours.
Ingo, Peter, Jason is reporting workqueue triggering warning because a
worker is running on the wrong CPU, which is relatively reliably
reproducible with the above workload on s390. The weird thing is that
everything looks correct from workqueue side. The worker has proper
cpus_allowed set and the CPU it's supposed to run on is online and yet
the worker is on the wrong CPU and even doing explicit schedule()
after detecting the condition doesn't change the situation. Any
ideas?
Jason, I don't have much idea from workqueue side. Have you been
running this test with older kernels too? Can you confirm whether
this failure is something recent? Bisection would be awesome but just
confirming, say, 3.12 doesn't have this issue would be very helpful.
Thanks!
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-13 20:41 ` Tejun Heo
@ 2014-02-14 14:56 ` Jason J. Herne
2014-02-14 14:58 ` Tejun Heo
2014-02-14 16:09 ` Peter Zijlstra
1 sibling, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-14 14:56 UTC (permalink / raw)
To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar, Peter Zijlstra
On 02/13/2014 03:41 PM, Tejun Heo wrote:
> Jason, I don't have much idea from workqueue side. Have you been
> running this test with older kernels too? Can you confirm whether
> this failure is something recent? Bisection would be awesome but just
> confirming, say, 3.12 doesn't have this issue would be very helpful.
I will try to find a kernel where this is not reproducible and attempt
bisect from there. I will let you know what I find.
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-14 14:56 ` Jason J. Herne
@ 2014-02-14 14:58 ` Tejun Heo
0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-14 14:58 UTC (permalink / raw)
To: Jason J. Herne; +Cc: Lai Jiangshan, lkml, Ingo Molnar, Peter Zijlstra
Hello, Jason.
On Fri, Feb 14, 2014 at 9:56 AM, Jason J. Herne
<jjherne@linux.vnet.ibm.com> wrote:
> I will try to find a kernel where this is not reproducible and attempt
> bisect from there. I will let you know what I find.
Yes, even just finding that there's an earlier version where this
doesn't happen would be great.
Thank you very much.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-13 20:41 ` Tejun Heo
2014-02-14 14:56 ` Jason J. Herne
@ 2014-02-14 16:09 ` Peter Zijlstra
2014-02-14 16:25 ` Tejun Heo
1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-02-14 16:09 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar
On Thu, Feb 13, 2014 at 03:41:02PM -0500, Tejun Heo wrote:
> Hello,
>
> (cc'ing Ingo and Peter)
>
> On Thu, Feb 13, 2014 at 12:58:10PM -0500, Jason J. Herne wrote:
> > [ 5779.795687] ------------[ cut here ]------------
> > [ 5779.795695] WARNING: at kernel/workqueue.c:2159
> ....
> > [ 5779.795844] XXX: worker->flags=0x1 pool->flags=0x0 cpu=4 pool->cpu=5(1) rescue_wq= (null)
> > [ 5779.795848] XXX: last_unbind=-44 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> > [ 5779.795852] XXX: sleep=-39 wakeup=0
> > [ 5779.795855] XXX: cpus_allowed=5
> > [ 5779.795857] XXX: cpus_allowed_after_rebinding=5
> > [ 5779.795861] XXX: after schedule(), cpu=4
> >
> > You had asked about reproducing this. This is on the S390 platform,
> > I'm not sure if that makes any difference.
> >
> > The workload is:
> > 2 processes onlining random cpus in a tight loop by using 'echo 1 > /sys/bus/cpu.../online'
> > 2 processes offlining random cpus in a tight loop by using 'echo 0 > /sys/bus/cpu.../online'
> > Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
> >
> > The machine has 10 processors.
> > The warning message some times hits within a few minutes on starting
> > the workload. Other times it takes several hours.
>
> Ingo, Peter, Jason is reporting workqueue triggering warning because a
> worker is running on the wrong CPU, which is relatively reliably
> reproducible with the above workload on s390.
Wasn't that a feature of workqueues? You know we've had arguments about
that behaviour -- I'm strongly in favour of flushing and killing workers
on unplug, but you let them run on the wrong cpu.
So strongly in fact, I'd call the current behaviour quite insane and
broken :-)
> The weird thing is that
> everything looks correct from workqueue side. The worker has proper
> cpus_allowed set and the CPU it's supposed to run on is online and yet
> the worker is on the wrong CPU and even doing explicit schedule()
> after detecting the condition doesn't change the situation.
Yeah, just calling schedule() won't fix placement, you need to actually
block and wake-up. But given you've called things like
set_cpus_allowed_ptr() and such to set the mask back to 5..
You can try something like the below which makes it slightly more
aggressive about moving tasks about.
> Any ideas?
Not really; s390 doesn't have NUMA, so all those changes are out.
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fb9764fbc537..20bd4de44bb3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4504,7 +4504,8 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
rq = task_rq_lock(p, &flags);
- if (cpumask_equal(&p->cpus_allowed, new_mask))
+ if (cpumask_equal(&p->cpus_allowed, new_mask) &&
+ cpumask_test_cpu(rq->cpu, &p->cpus_allowed))
goto out;
if (!cpumask_intersects(new_mask, cpu_active_mask)) {
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-14 16:09 ` Peter Zijlstra
@ 2014-02-14 16:25 ` Tejun Heo
2014-02-14 16:28 ` Peter Zijlstra
2014-02-24 15:01 ` Jason J. Herne
0 siblings, 2 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-14 16:25 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar
Hey, Peter.
On Fri, Feb 14, 2014 at 05:09:23PM +0100, Peter Zijlstra wrote:
> > Ingo, Peter, Jason is reporting workqueue triggering warning because a
> > worker is running on the wrong CPU, which is relatively reliably
> > reproducible with the above workload on s390.
>
> Wasn't that a feature of workqueues? You know we've had arguments about
> that behaviour -- I'm strongly in favour of flushing and killing workers
> on unplug, but you let them run on the wrong cpu.
>
> So strongly in fact, I'd call the current behaviour quite insane and
> broken :-)
Hey, we now even keep normal kthreads across cpu down/ups. :)
> Yeah, just calling schedule() won't fix placement, you need to actually
> block and wake-up. But given you've called things like
> set_cpus_allowed_ptr() and such to set the mask back to 5..
Hmmm... I see. It's kinda weird that the code has been there for so
long and this is the first time it's getting reported.
> You can try something like the below which makes it slightly more
> aggressive about moving tasks about.
>
> > Any ideas?
>
> Not really; s390 doesn't have NUMA, so all those changes are out.
>
> ---
> kernel/sched/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fb9764fbc537..20bd4de44bb3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4504,7 +4504,8 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>
> rq = task_rq_lock(p, &flags);
>
> - if (cpumask_equal(&p->cpus_allowed, new_mask))
> + if (cpumask_equal(&p->cpus_allowed, new_mask) &&
> + cpumask_test_cpu(rq->cpu, &p->cpus_allowed))
> goto out;
>
> if (!cpumask_intersects(new_mask, cpu_active_mask)) {
Hmmm... weird, p's rq shouldn't have changed without its cpus_allowed
busted. Anyways, let's wait for Jason's test results and see whether
this is a regression at all.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-14 16:25 ` Tejun Heo
@ 2014-02-14 16:28 ` Peter Zijlstra
2014-02-14 16:38 ` Tejun Heo
2014-02-24 15:01 ` Jason J. Herne
1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-02-14 16:28 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar
On Fri, Feb 14, 2014 at 11:25:56AM -0500, Tejun Heo wrote:
> Hey, we now even keep normal kthreads across cpu down/ups. :)
Well, we keep them, but parked, they're not allowed to run.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-14 16:28 ` Peter Zijlstra
@ 2014-02-14 16:38 ` Tejun Heo
0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-14 16:38 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar
On Fri, Feb 14, 2014 at 05:28:54PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 14, 2014 at 11:25:56AM -0500, Tejun Heo wrote:
> > Hey, we now even keep normal kthreads across cpu down/ups. :)
>
> Well, we keep them, but parked, they're not allowed to run.
Yeah, that's true. In the long term, I think we'll need to have
better distinction between the work items which actually need affinity
for correctness and which is just doing it as an optimization. It's
already happening now in a rather ad-hoc way with WQ_POWER_EFFICIENT.
Anyways, once we actually make that distinction clearly, it should
probably be possible to flush all per-cpu ones and keep them parked
across offline. Right now, we have mixture of the two and the
knowledge is only present in cpu_down callbacks of each user making it
difficult to do that.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-14 16:25 ` Tejun Heo
2014-02-14 16:28 ` Peter Zijlstra
@ 2014-02-24 15:01 ` Jason J. Herne
2014-02-24 18:35 ` Tejun Heo
1 sibling, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-24 15:01 UTC (permalink / raw)
To: Tejun Heo, Peter Zijlstra; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar
On 02/14/2014 11:25 AM, Tejun Heo wrote:
> ...
> Hmmm... weird, p's rq shouldn't have changed without its cpus_allowed
> busted. Anyways, let's wait for Jason's test results and see whether
> this is a regression at all.
I was unable to determine exactly when this behavior was introduced. The
reason for this is because I keep hitting other bugs that prevent the
test case from running, the most notable of which causes an immediate
system hang. I also hit other warnings and bug messages and I'm not sure
if they are related or if they will influence the probability of hitting
the problem we are trying to solve here. What I did find is the following:
We hit this problem as far back as v3.10.
The warning was introduced after v3.5 but before v3.6.
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-24 15:01 ` Jason J. Herne
@ 2014-02-24 18:35 ` Tejun Heo
2014-02-25 10:37 ` Peter Zijlstra
0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-24 18:35 UTC (permalink / raw)
To: Jason J. Herne; +Cc: Peter Zijlstra, Lai Jiangshan, linux-kernel, Ingo Molnar
On Mon, Feb 24, 2014 at 10:01:55AM -0500, Jason J. Herne wrote:
> On 02/14/2014 11:25 AM, Tejun Heo wrote:
> >...
> >Hmmm... weird, p's rq shouldn't have changed without its cpus_allowed
> >busted. Anyways, let's wait for Jason's test results and see whether
> >this is a regression at all.
>
> I was unable to determine exactly when this behavior was introduced.
> The reason for this is because I keep hitting other bugs that
> prevent the test case from running, the most notable of which causes
> an immediate system hang. I also hit other warnings and bug messages
> and I'm not sure if they are related or if they will influence the
> probability of hitting the problem we are trying to solve here.
> What I did find is the following:
>
> We hit this problem as far back as v3.10.
> The warning was introduced after v3.5 but before v3.6.
That's a bummer but it at least isn't a very new regression. Peter,
any ideas on debugging this? I can make workqueue to play block /
unblock dance to try to work around the issue but that'd be very
yucky. It'd be great to root cause where the cpu selection anomaly is
coming from.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-24 18:35 ` Tejun Heo
@ 2014-02-25 10:37 ` Peter Zijlstra
2014-03-10 14:37 ` Jason J. Herne
0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-02-25 10:37 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar
On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
> That's a bummer but it at least isn't a very new regression. Peter,
> any ideas on debugging this? I can make workqueue to play block /
> unblock dance to try to work around the issue but that'd be very
> yucky. It'd be great to root cause where the cpu selection anomaly is
> coming from.
I'm assuming you're using set_cpus_allowed_ptr() to flip them between
CPUs; the below adds some error paths to that code. In particular we
propagate the __migrate_task() fail (returns the number of tasks
migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
This way we can see if there was a problem with the migration.
You should be able to now reliably use the return value of
set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
allowed mask.
I've also included an #if 0 retry loop for the fail case; but I suspect
that that might end up deadlocking your machine if you hit that just
wrong, something like the waking CPU endlessly trying to migrate the
task over while the wakee CPU is waiting for completion of something
from the waking CPU.
But its worth a prod I suppose.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84b23cec0aeb..4c384efac8b3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
do_set_cpus_allowed(p, new_mask);
+again: __maybe_unused
+
/* Can the task run on the task's current CPU? If so, we're done */
- if (cpumask_test_cpu(task_cpu(p), new_mask))
+ if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
goto out;
- dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
+ dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
if (p->on_rq) {
struct migration_arg arg = { p, dest_cpu };
+
/* Need help from migration thread: drop lock and wait. */
task_rq_unlock(rq, p, &flags);
- stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+ ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+#if 0
+ if (ret) {
+ rq = task_rq_lock(p, &flags);
+ goto again;
+ }
+#endif
tlb_migrate_finish(p->mm);
- return 0;
+
+ return ret;
}
out:
task_rq_unlock(rq, p, &flags);
@@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int nid)
static int migration_cpu_stop(void *data)
{
struct migration_arg *arg = data;
+ int ret = 0;
/*
* The original target cpu might have gone down and we might
* be on another cpu but it doesn't matter.
*/
local_irq_disable();
- __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
+ if (!__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu))
+ ret = -EAGAIN;
local_irq_enable();
- return 0;
+
+ return ret;
}
#ifdef CONFIG_HOTPLUG_CPU
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-02-25 10:37 ` Peter Zijlstra
@ 2014-03-10 14:37 ` Jason J. Herne
2014-03-17 14:51 ` Jason J. Herne
2014-03-17 15:16 ` Peter Zijlstra
0 siblings, 2 replies; 23+ messages in thread
From: Jason J. Herne @ 2014-03-10 14:37 UTC (permalink / raw)
To: Peter Zijlstra, Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar
On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>
>> That's a bummer but it at least isn't a very new regression. Peter,
>> any ideas on debugging this? I can make workqueue to play block /
>> unblock dance to try to work around the issue but that'd be very
>> yucky. It'd be great to root cause where the cpu selection anomaly is
>> coming from.
>
> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
> CPUs; the below adds some error paths to that code. In particular we
> propagate the __migrate_task() fail (returns the number of tasks
> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>
> This way we can see if there was a problem with the migration.
>
> You should be able to now reliably use the return value of
> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
> allowed mask.
>
> I've also included an #if 0 retry loop for the fail case; but I suspect
> that that might end up deadlocking your machine if you hit that just
> wrong, something like the waking CPU endlessly trying to migrate the
> task over while the wakee CPU is waiting for completion of something
> from the waking CPU.
>
> But its worth a prod I suppose.
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 84b23cec0aeb..4c384efac8b3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>
> do_set_cpus_allowed(p, new_mask);
>
> +again: __maybe_unused
> +
> /* Can the task run on the task's current CPU? If so, we're done */
> - if (cpumask_test_cpu(task_cpu(p), new_mask))
> + if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
> goto out;
>
> - dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> + dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
> if (p->on_rq) {
> struct migration_arg arg = { p, dest_cpu };
> +
> /* Need help from migration thread: drop lock and wait. */
> task_rq_unlock(rq, p, &flags);
> - stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> + ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> +#if 0
> + if (ret) {
> + rq = task_rq_lock(p, &flags);
> + goto again;
> + }
> +#endif
> tlb_migrate_finish(p->mm);
> - return 0;
> +
> + return ret;
> }
> out:
> task_rq_unlock(rq, p, &flags);
> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int nid)
> static int migration_cpu_stop(void *data)
> {
> struct migration_arg *arg = data;
> + int ret = 0;
>
> /*
> * The original target cpu might have gone down and we might
> * be on another cpu but it doesn't matter.
> */
> local_irq_disable();
> - __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
> + if (!__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu))
> + ret = -EAGAIN;
> local_irq_enable();
> - return 0;
> +
> + return ret;
> }
>
> #ifdef CONFIG_HOTPLUG_CPU
>
Peter,
Did you intend for me to run with this patch or was it posted for
discussion only? If you want it run, please tell me what to look for.
Also, if I should run this, should I include any other patches, either
the last one you posted in this thread or any of Tejun's?
Thanks.
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-03-10 14:37 ` Jason J. Herne
@ 2014-03-17 14:51 ` Jason J. Herne
2014-03-17 15:16 ` Peter Zijlstra
1 sibling, 0 replies; 23+ messages in thread
From: Jason J. Herne @ 2014-03-17 14:51 UTC (permalink / raw)
To: Peter Zijlstra, Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar
On 03/10/2014 10:37 AM, Jason J. Herne wrote:
> On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
>> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>>
>>> That's a bummer but it at least isn't a very new regression. Peter,
>>> any ideas on debugging this? I can make workqueue to play block /
>>> unblock dance to try to work around the issue but that'd be very
>>> yucky. It'd be great to root cause where the cpu selection anomaly is
>>> coming from.
>>
>> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
>> CPUs; the below adds some error paths to that code. In particular we
>> propagate the __migrate_task() fail (returns the number of tasks
>> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>>
>> This way we can see if there was a problem with the migration.
>>
>> You should be able to now reliably use the return value of
>> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
>> allowed mask.
>>
>> I've also included an #if 0 retry loop for the fail case; but I suspect
>> that that might end up deadlocking your machine if you hit that just
>> wrong, something like the waking CPU endlessly trying to migrate the
>> task over while the wakee CPU is waiting for completion of something
>> from the waking CPU.
>>
>> But its worth a prod I suppose.
>>
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 84b23cec0aeb..4c384efac8b3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct
>> *p, const struct cpumask *new_mask)
>>
>> do_set_cpus_allowed(p, new_mask);
>>
>> +again: __maybe_unused
>> +
>> /* Can the task run on the task's current CPU? If so, we're done */
>> - if (cpumask_test_cpu(task_cpu(p), new_mask))
>> + if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
>> goto out;
>>
>> - dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> + dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
>> if (p->on_rq) {
>> struct migration_arg arg = { p, dest_cpu };
>> +
>> /* Need help from migration thread: drop lock and wait. */
>> task_rq_unlock(rq, p, &flags);
>> - stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> + ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> +#if 0
>> + if (ret) {
>> + rq = task_rq_lock(p, &flags);
>> + goto again;
>> + }
>> +#endif
>> tlb_migrate_finish(p->mm);
>> - return 0;
>> +
>> + return ret;
>> }
>> out:
>> task_rq_unlock(rq, p, &flags);
>> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int
>> nid)
>> static int migration_cpu_stop(void *data)
>> {
>> struct migration_arg *arg = data;
>> + int ret = 0;
>>
>> /*
>> * The original target cpu might have gone down and we might
>> * be on another cpu but it doesn't matter.
>> */
>> local_irq_disable();
>> - __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
>> + if (!__migrate_task(arg->task, raw_smp_processor_id(),
>> arg->dest_cpu))
>> + ret = -EAGAIN;
>> local_irq_enable();
>> - return 0;
>> +
>> + return ret;
>> }
>>
>> #ifdef CONFIG_HOTPLUG_CPU
>>
>
> Peter,
>
> Did you intend for me to run with this patch or was it posted for
> discussion only? If you want it run, please tell me what to look for.
> Also, if I should run this, should I include any other patches, either
> the last one you posted in this thread or any of Tejun's?
>
> Thanks.
>
Ping?
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Subject: Warning in workqueue.c
2014-03-10 14:37 ` Jason J. Herne
2014-03-17 14:51 ` Jason J. Herne
@ 2014-03-17 15:16 ` Peter Zijlstra
1 sibling, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2014-03-17 15:16 UTC (permalink / raw)
To: Jason J. Herne; +Cc: Tejun Heo, Lai Jiangshan, linux-kernel, Ingo Molnar
On Mon, Mar 10, 2014 at 10:37:42AM -0400, Jason J. Herne wrote:
>
> Did you intend for me to run with this patch or was it posted for discussion
> only? If you want it run, please tell me what to look for.
> Also, if I should run this, should I include any other patches, either the
> last one you posted in this thread or any of Tejun's?
It needs some workqueue side debugging too; I'm not entirely sure where
the moving back to the online cpu happens, but you want to check the
set_cpus_allowed_ptr() return value there.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2014-03-17 15:16 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-07 14:39 Subject: Warning in workqueue.c Jason J. Herne
2014-02-07 16:51 ` Tejun Heo
2014-02-07 17:55 ` Jason J. Herne
2014-02-07 19:36 ` Tejun Heo
2014-02-10 15:32 ` Jason J. Herne
2014-02-10 23:17 ` Tejun Heo
2014-02-12 15:18 ` Jason J. Herne
2014-02-13 3:02 ` Lai Jiangshan
2014-02-13 3:31 ` Lai Jiangshan
2014-02-13 17:58 ` Jason J. Herne
2014-02-13 20:41 ` Tejun Heo
2014-02-14 14:56 ` Jason J. Herne
2014-02-14 14:58 ` Tejun Heo
2014-02-14 16:09 ` Peter Zijlstra
2014-02-14 16:25 ` Tejun Heo
2014-02-14 16:28 ` Peter Zijlstra
2014-02-14 16:38 ` Tejun Heo
2014-02-24 15:01 ` Jason J. Herne
2014-02-24 18:35 ` Tejun Heo
2014-02-25 10:37 ` Peter Zijlstra
2014-03-10 14:37 ` Jason J. Herne
2014-03-17 14:51 ` Jason J. Herne
2014-03-17 15:16 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.