All of lore.kernel.org
 help / color / mirror / Atom feed
* Subject: Warning in workqueue.c
@ 2014-02-07 14:39 Jason J. Herne
  2014-02-07 16:51 ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-07 14:39 UTC (permalink / raw)
  To: tj, linux-kernel

I've been able to reproduce the following warning using several kernel 
versions on the S390 platform, including the latest master: 3.14-rc1 
(38dbfb59d1175ef458d006556061adeaa8751b72).

[28718.212810] ------------[ cut here ]------------
[28718.212819] WARNING: at kernel/workqueue.c:2156
[28718.212822] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi tape_3590 qeth_l2 tape tape_class vhost_net tun 
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp 
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[28718.212857] CPU: 2 PID: 20 Comm: kworker/3:0 Not tainted 3.14.0-rc1 #1
[28718.212862] task: 00000000f7b23260 ti: 00000000f7b2c000 task.ti: 
00000000f7b2c000
[28718.212874] Krnl PSW : 0404c00180000000 000000000015b0be 
(process_one_work+0x2e6/0x4c0)
[28718.212881]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 EA:3
Krnl GPRS: 0000000001727790 0000000000bc2a52 00000000f7f21900 
0000000000b92500
[28718.212883]            0000000000b92500 0000000000105b24 
0000000000000000 0000000000bc2a4e
[28718.212887]            0000000000000000 0000000084a2b500 
0000000084a27000 0000000084a27018
[28718.212888]            00000000f7f21900 0000000000b92500 
00000000f7b2fdd0 00000000f7b2fd70
[28718.212907] Krnl Code: 000000000015b0b2: 95001000		cli	0(%r1),0
            000000000015b0b6: a774fece		brc	7,15ae52
           #000000000015b0ba: a7f40001		brc	15,15b0bc
           >000000000015b0be: 92011000		mvi	0(%r1),1
            000000000015b0c2: a7f4fec8		brc	15,15ae52
            000000000015b0c6: e31003180004	lg	%r1,792
            000000000015b0cc: 58301024		l	%r3,36(%r1)
            000000000015b0d0: a73a0001		ahi	%r3,1
[28718.212937] Call Trace:
[28718.212940] ([<000000000015b08c>] process_one_work+0x2b4/0x4c0)
[28718.212944]  [<000000000015b858>] worker_thread+0x178/0x39c
[28718.212949]  [<0000000000164652>] kthread+0x10e/0x128
[28718.212956]  [<0000000000728c66>] kernel_thread_starter+0x6/0xc
[28718.212960]  [<0000000000728c60>] kernel_thread_starter+0x0/0xc
[28718.212962] Last Breaking-Event-Address:
[28718.212965]  [<000000000015b0ba>] process_one_work+0x2e2/0x4c0
[28718.212968] ---[ end trace 6d115577307998c2 ]---

The workload is:
2 processes onlining random cpus in a tight loop by using 'echo 1 > 
/sys/bus/cpu.../online'
2 processes offlining random cpus in a tight loop by using 'echo 0 > 
/sys/bus/cpu.../online'
Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27

The machine has 10 processors.
The warning message some times hits within a few minutes on starting the 
workload. Other times it takes several hours.

The particular spot in the code is:
	/*
	 * Ensure we're on the correct CPU.  DISASSOCIATED test is
	 * necessary to avoid spurious warnings from rescuers servicing the
	 * unbound or a disassociated pool.
	 */
	WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
		     !(pool->flags & POOL_DISASSOCIATED) &&
		     raw_smp_processor_id() != pool->cpu);

I'm not familiar with scheduling or work queuing internals so I'm not 
sure how to further debug.
I would be happy to run tests and/or collect debugging data.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-07 14:39 Subject: Warning in workqueue.c Jason J. Herne
@ 2014-02-07 16:51 ` Tejun Heo
  2014-02-07 17:55   ` Jason J. Herne
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-07 16:51 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: linux-kernel, Lai Jiangshan

Hello,

(cc'ing Lai as he knows a lot of workqueue code and quoting the whole
body for him)

Hmmm.... my memory is a bit rusty and nothing rings a bell
immediately.  Can you please try the patch at the end of this message
and report the debug message?  Let's first find out what's going on.

Thanks

On Fri, Feb 07, 2014 at 09:39:24AM -0500, Jason J. Herne wrote:
> I've been able to reproduce the following warning using several
> kernel versions on the S390 platform, including the latest master:
> 3.14-rc1 (38dbfb59d1175ef458d006556061adeaa8751b72).
> 
> [28718.212810] ------------[ cut here ]------------
> [28718.212819] WARNING: at kernel/workqueue.c:2156
> [28718.212822] Modules linked in: ipt_MASQUERADE iptable_nat
> nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc
> ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi tape_3590 qeth_l2 tape
> tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod
> dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio
> dm_multipath [last unloaded: kvm]
> [28718.212857] CPU: 2 PID: 20 Comm: kworker/3:0 Not tainted 3.14.0-rc1 #1
> [28718.212862] task: 00000000f7b23260 ti: 00000000f7b2c000 task.ti:
> 00000000f7b2c000
> [28718.212874] Krnl PSW : 0404c00180000000 000000000015b0be
> (process_one_work+0x2e6/0x4c0)
> [28718.212881]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3
> CC:0 PM:0 EA:3
> Krnl GPRS: 0000000001727790 0000000000bc2a52 00000000f7f21900
> 0000000000b92500
> [28718.212883]            0000000000b92500 0000000000105b24
> 0000000000000000 0000000000bc2a4e
> [28718.212887]            0000000000000000 0000000084a2b500
> 0000000084a27000 0000000084a27018
> [28718.212888]            00000000f7f21900 0000000000b92500
> 00000000f7b2fdd0 00000000f7b2fd70
> [28718.212907] Krnl Code: 000000000015b0b2: 95001000		cli	0(%r1),0
>            000000000015b0b6: a774fece		brc	7,15ae52
>           #000000000015b0ba: a7f40001		brc	15,15b0bc
>           >000000000015b0be: 92011000		mvi	0(%r1),1
>            000000000015b0c2: a7f4fec8		brc	15,15ae52
>            000000000015b0c6: e31003180004	lg	%r1,792
>            000000000015b0cc: 58301024		l	%r3,36(%r1)
>            000000000015b0d0: a73a0001		ahi	%r3,1
> [28718.212937] Call Trace:
> [28718.212940] ([<000000000015b08c>] process_one_work+0x2b4/0x4c0)
> [28718.212944]  [<000000000015b858>] worker_thread+0x178/0x39c
> [28718.212949]  [<0000000000164652>] kthread+0x10e/0x128
> [28718.212956]  [<0000000000728c66>] kernel_thread_starter+0x6/0xc
> [28718.212960]  [<0000000000728c60>] kernel_thread_starter+0x0/0xc
> [28718.212962] Last Breaking-Event-Address:
> [28718.212965]  [<000000000015b0ba>] process_one_work+0x2e2/0x4c0
> [28718.212968] ---[ end trace 6d115577307998c2 ]---
> 
> The workload is:
> 2 processes onlining random cpus in a tight loop by using 'echo 1 >
> /sys/bus/cpu.../online'
> 2 processes offlining random cpus in a tight loop by using 'echo 0 >
> /sys/bus/cpu.../online'
> Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
> 
> The machine has 10 processors.
> The warning message some times hits within a few minutes on starting
> the workload. Other times it takes several hours.
> 
> The particular spot in the code is:
> 	/*
> 	 * Ensure we're on the correct CPU.  DISASSOCIATED test is
> 	 * necessary to avoid spurious warnings from rescuers servicing the
> 	 * unbound or a disassociated pool.
> 	 */
> 	WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
> 		     !(pool->flags & POOL_DISASSOCIATED) &&
> 		     raw_smp_processor_id() != pool->cpu);
> 
> I'm not familiar with scheduling or work queuing internals so I'm
> not sure how to further debug.
> I would be happy to run tests and/or collect debugging data.
> 
> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..1cc6d05 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2151,9 +2151,12 @@ __acquires(&pool->lock)
 	 * necessary to avoid spurious warnings from rescuers servicing the
 	 * unbound or a disassociated pool.
 	 */
-	WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
-		     !(pool->flags & POOL_DISASSOCIATED) &&
-		     raw_smp_processor_id() != pool->cpu);
+	if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+			 !(pool->flags & POOL_DISASSOCIATED) &&
+			 raw_smp_processor_id() != pool->cpu))
+		pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d\n",
+			   worker->flags, pool->flags, raw_smp_processor_id(),
+			   pool->cpu);
 
 	/*
 	 * A single work shouldn't be executed concurrently by

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-07 16:51 ` Tejun Heo
@ 2014-02-07 17:55   ` Jason J. Herne
  2014-02-07 19:36     ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-07 17:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Lai Jiangshan

On 02/07/2014 11:51 AM, Tejun Heo wrote:
> Hello,
>
> (cc'ing Lai as he knows a lot of workqueue code and quoting the whole
> body for him)
>
> Hmmm.... my memory is a bit rusty and nothing rings a bell
> immediately.  Can you please try the patch at the end of this message
> and report the debug message?  Let's first find out what's going on.

I hit it quickly this time :) Here you go:

[  644.517558] ------------[ cut here ]------------
[  644.517567] WARNING: at kernel/workqueue.c:2156
[  644.517569] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun 
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp 
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[  644.517603] CPU: 6 PID: 24 Comm: kworker/4:0 Not tainted 3.14.0-rc1 #2
[  644.517607] task: 00000000f7b49930 ti: 00000000f7b58000 task.ti: 
00000000f7b58000
[  644.517616] Krnl PSW : 0404c00180000000 000000000015b2a2 
(process_one_work+0x4ca/0x4ec)
[  644.517620]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 EA:3
Krnl GPRS: 000000000020f16f 00000000f7f0ce00 0000000000bc2a52 
0000000000000001
[  644.517625]            0000000000000000 0000000000000006 
0000000000000004 0000000000bc2a4e
[  644.517629]            0000000000000000 0000000084a37500 
0000000084a33000 0000000084a33018
[  644.517632]            00000000f7f21c00 0000000000b92500 
00000000f7b5bdd0 00000000f7b5bd70
[  644.517651] Krnl Code: 000000000015b294: ddf29640c05b	trt 
1600(243,%r9),91(%r12)
            000000000015b29a: a7f4fe56		brc	15,15af46
           #000000000015b29e: a7f40001		brc	15,15b2a0
           >000000000015b2a2: 92012000		mvi	0(%r2),1
            000000000015b2a6: a7f4ff10		brc	15,15b0c6
            000000000015b2aa: c03000533bd1	larl	%r3,bc2a4c
            000000000015b2b0: 95003000		cli	0(%r3),0
            000000000015b2b4: a774ff4e		brc	7,15b150
[  644.517679] Call Trace:
[  644.517683] ([<000000000015b094>] process_one_work+0x2bc/0x4ec)
[  644.517686]  [<000000000015b884>] worker_thread+0x178/0x39c
[  644.517690]  [<000000000016467e>] kthread+0x10e/0x128
[  644.517696]  [<0000000000728c96>] kernel_thread_starter+0x6/0xc
[  644.517699]  [<0000000000728c90>] kernel_thread_starter+0x0/0xc
[  644.517701] Last Breaking-Event-Address:
[  644.517705]  [<000000000015b29e>] process_one_work+0x4c6/0x4ec
[  644.517707] ---[ end trace 681ce8162e7857a0 ]---
[  644.517710] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=4
[  731.367023] XXX: worker->flags=0x1 pool->flags=0x0 cpu=1 pool->cpu=7




-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-07 17:55   ` Jason J. Herne
@ 2014-02-07 19:36     ` Tejun Heo
  2014-02-10 15:32       ` Jason J. Herne
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-07 19:36 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: linux-kernel, Lai Jiangshan

Hello,

On Fri, Feb 07, 2014 at 12:55:28PM -0500, Jason J. Herne wrote:
> [  644.517710] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=4
> [  731.367023] XXX: worker->flags=0x1 pool->flags=0x0 cpu=1 pool->cpu=7

Sorry, still no idea how this can happen.  Can you please try the
following?

Thanks!

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..1b11ec0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2151,9 +2151,27 @@ __acquires(&pool->lock)
 	 * necessary to avoid spurious warnings from rescuers servicing the
 	 * unbound or a disassociated pool.
 	 */
-	WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
-		     !(pool->flags & POOL_DISASSOCIATED) &&
-		     raw_smp_processor_id() != pool->cpu);
+	if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+			 !(pool->flags & POOL_DISASSOCIATED) &&
+			 raw_smp_processor_id() != pool->cpu)) {
+		static char buf[PAGE_SIZE];
+		unsigned long now = jiffies;
+
+		pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d rescue_wq=%p\n",
+			   worker->flags, pool->flags, raw_smp_processor_id(),
+			   pool->cpu, worker->rescue_wq);
+		pr_warning("XXX: last_unbind=%ld last_rebind=%ld last_rebound_clear=%ld nr_exected_after_rebound_clear=%d\n",
+			   worker->last_unbind ? worker->last_unbind - now : 999,
+			   worker->last_rebind ? worker->last_rebind - now : 999,
+			   worker->last_rebound_clear ? worker->last_rebound_clear - now : 999,
+			   worker->nr_executed_after_rebound_clear);
+
+		cpulist_scnprintf(buf, sizeof(buf), &current->cpus_allowed);
+		pr_warning("XXX: cpus_allowed=%s\n", buf);
+
+		cpulist_scnprintf(buf, sizeof(buf), &worker->cpus_allowed_after_rebinding);
+		pr_warning("XXX: cpus_allowed_after_rebinding=%s\n", buf);
+	}
 
 	/*
 	 * A single work shouldn't be executed concurrently by
@@ -2199,6 +2217,8 @@ __acquires(&pool->lock)
 	 */
 	set_work_pool_and_clear_pending(work, pool->id);
 
+	worker->nr_executed_after_rebound_clear++;
+
 	spin_unlock_irq(&pool->lock);
 
 	lock_map_acquire_read(&pwq->wq->lockdep_map);
@@ -2321,6 +2341,10 @@ recheck:
 	 * management if applicable and concurrency management is restored
 	 * after being rebound.  See rebind_workers() for details.
 	 */
+	if (worker->flags & WORKER_REBOUND) {
+		worker->last_rebound_clear = jiffies;
+		worker->nr_executed_after_rebound_clear = 0;
+	}
 	worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
 
 	do {
@@ -4576,8 +4600,10 @@ static void wq_unbind_fn(struct work_struct *work)
 		 * before the last CPU down must be on the cpu.  After
 		 * this, they may become diasporas.
 		 */
-		for_each_pool_worker(worker, wi, pool)
+		for_each_pool_worker(worker, wi, pool) {
 			worker->flags |= WORKER_UNBOUND;
+			worker->last_unbind = jiffies;
+		}
 
 		pool->flags |= POOL_DISASSOCIATED;
 
@@ -4633,9 +4659,13 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, wi, pool)
+	for_each_pool_worker(worker, wi, pool) {
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
+		worker->last_rebind = jiffies;
+		cpumask_copy(&worker->cpus_allowed_after_rebinding,
+			     &worker->task->cpus_allowed);
+	}
 
 	spin_lock_irq(&pool->lock);
 
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..95d68c4 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -50,6 +50,11 @@ struct worker {
 
 	/* used only by rescuers to point to the target workqueue */
 	struct workqueue_struct	*rescue_wq;	/* I: the workqueue to rescue */
+	unsigned long		last_unbind;
+	unsigned long		last_rebind;
+	unsigned long		last_rebound_clear;
+	int			nr_executed_after_rebound_clear;
+	cpumask_t		cpus_allowed_after_rebinding;
 };
 
 /**

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-07 19:36     ` Tejun Heo
@ 2014-02-10 15:32       ` Jason J. Herne
  2014-02-10 23:17         ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-10 15:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Lai Jiangshan

On 02/07/2014 02:36 PM, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 07, 2014 at 12:55:28PM -0500, Jason J. Herne wrote:
>> [  644.517710] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=4
>> [  731.367023] XXX: worker->flags=0x1 pool->flags=0x0 cpu=1 pool->cpu=7
>
> Sorry, still no idea how this can happen.  Can you please try the
> following?
>

Here are the results:

[  950.778326] ------------[ cut here ]------------
[  950.778334] WARNING: at kernel/workqueue.c:2156
[  950.778336] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi tape_3590 tape tape_class qeth_l2 vhost_net tun 
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp 
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[  950.778370] CPU: 6 PID: 16 Comm: kworker/2:0 Not tainted 3.14.0-rc1 #1
[  950.778375] task: 00000000f7cc4b90 ti: 00000000f7ce8000 task.ti: 
00000000f7ce8000
[  950.778385] Krnl PSW : 0404c00180000000 000000000015b386 
(process_one_work+0x59a/0x5c0)
[  950.778389]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 EA:3
Krnl GPRS: 00000000001ee417 00000000f7f0ce00 0000000000bc2a52 
0000000000000001
[  950.778396]            0000000000000000 0000000000000006 
0000000000000002 0000000000bc2a4e
[  950.778400]            0000000000000000 0000000084a1f500 
0000000084a1b000 0000000084a1b018
[  950.778404]            00000000f7f52900 0000000000b92500 
00000000f7cebdd0 00000000f7cebd68
[  950.778423] Krnl Code: 000000000015b37a: 92013000		mvi	0(%r3),1
            000000000015b37e: a7f4ff31		brc	15,15b1e0
           #000000000015b382: a7f40001		brc	15,15b384
           >000000000015b386: 92012000		mvi	0(%r2),1
            000000000015b38a: a7f4feae		brc	15,15b0e6
            000000000015b38e: 41102018		la	%r1,24(%r2)
            000000000015b392: e31020180020	cg	%r1,24(%r2)
            000000000015b398: a784ff59		brc	8,15b24a
[  950.778451] Call Trace:
[  950.778455] ([<000000000015b0b4>] process_one_work+0x2c8/0x5c0)
[  950.778459]  [<000000000015b98e>] worker_thread+0x19a/0x3c0
[  950.778463]  [<00000000001647c2>] kthread+0x10e/0x128
[  950.778469]  [<0000000000728dd6>] kernel_thread_starter+0x6/0xc
[  950.778472]  [<0000000000728dd0>] kernel_thread_starter+0x0/0xc
[  950.778475] Last Breaking-Event-Address:
[  950.778478]  [<000000000015b382>] process_one_work+0x596/0x5c0
[  950.778481] ---[ end trace 643b2850ca4b7ed8 ]---
[  950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6 pool->cpu=2 
rescue_wq=          (null)
[  950.778488] XXX: last_unbind=-7 last_rebind=0 last_rebound_clear=0 
nr_exected_after_rebound_clear=0
[  950.778492] XXX: cpus_allowed=2
[  950.778495] XXX: cpus_allowed_after_rebinding=2


-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-10 15:32       ` Jason J. Herne
@ 2014-02-10 23:17         ` Tejun Heo
  2014-02-12 15:18           ` Jason J. Herne
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-10 23:17 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: linux-kernel, Lai Jiangshan

Hello,

On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
> [  950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
> pool->cpu=2 rescue_wq=          (null)
> [  950.778488] XXX: last_unbind=-7 last_rebind=0
> last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [  950.778492] XXX: cpus_allowed=2
> [  950.778495] XXX: cpus_allowed_after_rebinding=2

So, everything looks kosher from workqueue side.  Weird.  cpus_allowed
is properly set and everything.  The worker just isn't running on the
cpu it's supposed to be on.  Can you please try the following?

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 82ef9f3..f71ee11 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2151,9 +2151,31 @@ __acquires(&pool->lock)
 	 * necessary to avoid spurious warnings from rescuers servicing the
 	 * unbound or a disassociated pool.
 	 */
-	WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
-		     !(pool->flags & POOL_DISASSOCIATED) &&
-		     raw_smp_processor_id() != pool->cpu);
+	if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+			 !(pool->flags & POOL_DISASSOCIATED) &&
+			 raw_smp_processor_id() != pool->cpu)) {
+		static char buf[PAGE_SIZE];
+		unsigned long now = jiffies;
+
+		pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d(%d) rescue_wq=%p\n",
+			   worker->flags, pool->flags, raw_smp_processor_id(),
+			   pool->cpu, cpu_online(pool->cpu), worker->rescue_wq);
+		pr_warning("XXX: last_unbind=%ld last_rebind=%ld last_rebound_clear=%ld nr_exected_after_rebound_clear=%d\n",
+			   worker->last_unbind ? worker->last_unbind - now : 999,
+			   worker->last_rebind ? worker->last_rebind - now : 999,
+			   worker->last_rebound_clear ? worker->last_rebound_clear - now : 999,
+			   worker->nr_executed_after_rebound_clear);
+
+		cpulist_scnprintf(buf, sizeof(buf), &current->cpus_allowed);
+		pr_warning("XXX: cpus_allowed=%s\n", buf);
+
+		cpulist_scnprintf(buf, sizeof(buf), &worker->cpus_allowed_after_rebinding);
+		pr_warning("XXX: cpus_allowed_after_rebinding=%s\n", buf);
+
+		schedule();
+
+		pr_warning("XXX: after schedule(), cpu=%d\n", raw_smp_processor_id());
+	}
 
 	/*
 	 * A single work shouldn't be executed concurrently by
@@ -2199,6 +2221,8 @@ __acquires(&pool->lock)
 	 */
 	set_work_pool_and_clear_pending(work, pool->id);
 
+	worker->nr_executed_after_rebound_clear++;
+
 	spin_unlock_irq(&pool->lock);
 
 	lock_map_acquire_read(&pwq->wq->lockdep_map);
@@ -2321,6 +2345,10 @@ recheck:
 	 * management if applicable and concurrency management is restored
 	 * after being rebound.  See rebind_workers() for details.
 	 */
+	if (worker->flags & WORKER_REBOUND) {
+		worker->last_rebound_clear = jiffies;
+		worker->nr_executed_after_rebound_clear = 0;
+	}
 	worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
 
 	do {
@@ -4576,8 +4604,10 @@ static void wq_unbind_fn(struct work_struct *work)
 		 * before the last CPU down must be on the cpu.  After
 		 * this, they may become diasporas.
 		 */
-		for_each_pool_worker(worker, wi, pool)
+		for_each_pool_worker(worker, wi, pool) {
 			worker->flags |= WORKER_UNBOUND;
+			worker->last_unbind = jiffies;
+		}
 
 		pool->flags |= POOL_DISASSOCIATED;
 
@@ -4633,9 +4663,13 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, wi, pool)
+	for_each_pool_worker(worker, wi, pool) {
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
+		worker->last_rebind = jiffies;
+		cpumask_copy(&worker->cpus_allowed_after_rebinding,
+			     &worker->task->cpus_allowed);
+	}
 
 	spin_lock_irq(&pool->lock);
 
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..95d68c4 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -50,6 +50,11 @@ struct worker {
 
 	/* used only by rescuers to point to the target workqueue */
 	struct workqueue_struct	*rescue_wq;	/* I: the workqueue to rescue */
+	unsigned long		last_unbind;
+	unsigned long		last_rebind;
+	unsigned long		last_rebound_clear;
+	int			nr_executed_after_rebound_clear;
+	cpumask_t		cpus_allowed_after_rebinding;
 };
 
 /**


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-10 23:17         ` Tejun Heo
@ 2014-02-12 15:18           ` Jason J. Herne
  2014-02-13  3:02             ` Lai Jiangshan
  2014-02-13  3:31             ` Lai Jiangshan
  0 siblings, 2 replies; 23+ messages in thread
From: Jason J. Herne @ 2014-02-12 15:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Lai Jiangshan

On 02/10/2014 06:17 PM, Tejun Heo wrote:
> Hello,
>
> On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
>> [  950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
>> pool->cpu=2 rescue_wq=          (null)
>> [  950.778488] XXX: last_unbind=-7 last_rebind=0
>> last_rebound_clear=0 nr_exected_after_rebound_clear=0
>> [  950.778492] XXX: cpus_allowed=2
>> [  950.778495] XXX: cpus_allowed_after_rebinding=2
>
> So, everything looks kosher from workqueue side.  Weird.  cpus_allowed
> is properly set and everything.  The worker just isn't running on the
> cpu it's supposed to be on.  Can you please try the following?


Thanks for the fast responses. Here is the output from the last patch:


[34437.173991] WARNING: at kernel/workqueue.c:2156
[34437.173993] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun 
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp 
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[34437.174030] CPU: 2 PID: 12 Comm: kworker/1:0 Not tainted 3.14.0-rc1 #1
[34437.174034] task: 00000000f7c9e4c0 ti: 00000000f7cbc000 task.ti: 
00000000f7cbc000
[34437.174046] Krnl PSW : 0404c00180000000 000000000015b406 
(process_one_work+0x61a/0x640)
[34437.174051]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 EA:3
Krnl GPRS: 000000007c325b00 0000000000bc2a52 00000000f7f52780 
0000000000000001
[34437.174057]            0000000000000000 0000000000000002 
0000000000000001 0000000000bc2a4e
[34437.174061]            0000000000000000 0000000084a13500 
0000000084a0f000 0000000084a0f018
[34437.174065]            00000000f7f52780 0000000000735d18 
00000000f7cbfdd0 00000000f7cbfd58
[34437.174085] Krnl Code: 000000000015b3fa: 92013000		mvi	0(%r3),1
            000000000015b3fe: a7f4ff31		brc	15,15b260
           #000000000015b402: a7f40001		brc	15,15b404
           >000000000015b406: 92011000		mvi	0(%r1),1
            000000000015b40a: a7f4fe84		brc	15,15b112
            000000000015b40e: 41102018		la	%r1,24(%r2)
            000000000015b412: e31020180020	cg	%r1,24(%r2)
            000000000015b418: a784ff59		brc	8,15b2ca
[34437.174115] Call Trace:
[34437.174118] ([<000000000015b0e0>] process_one_work+0x2f4/0x640)
[34437.174122]  [<000000000015ba0e>] worker_thread+0x19a/0x3c0
[34437.174126]  [<0000000000164842>] kthread+0x10e/0x128
[34437.174132]  [<0000000000728e56>] kernel_thread_starter+0x6/0xc
[34437.174136]  [<0000000000728e50>] kernel_thread_starter+0x0/0xc
[34437.174138] Last Breaking-Event-Address:
[34437.174141]  [<000000000015b402>] process_one_work+0x616/0x640
[34437.174144] ---[ end trace 52d8ee96597a2415 ]---
[34437.174148] XXX: worker->flags=0x1 pool->flags=0x0 cpu=2 
pool->cpu=1(1) rescue_wq=          (null)
[34437.174152] XXX: last_unbind=-34 last_rebind=0 last_rebound_clear=0 
nr_exected_after_rebound_clear=0
[34437.174156] XXX: cpus_allowed=1
[34437.174158] XXX: cpus_allowed_after_rebinding=1
[34437.174161] BUG: scheduling while atomic: kworker/1:0/12/0x00000002
[34437.174164] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun 
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp 
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[34437.174194] CPU: 2 PID: 12 Comm: kworker/1:0 Tainted: G        W 
3.14.0-rc1 #1
[34437.174197]        00000000f7cbfb80 00000000f7cbfb90 0000000000000002 
0000000000000000
        00000000f7cbfc20 00000000f7cbfb98 00000000f7cbfb98 0000000000115f6a
        0000000000000003 0000000000c4da00 000000000000000a 000000000000000a
        00000000f7cbfbe0 00000000f7cbfb80 0000000000000000 0000000000000000
        0000000000000000 0000000000115f6a 00000000f7cbfb80 00000000f7cbfbd0
[34437.174273] Call Trace:
[34437.174279] ([<0000000000115e5c>] show_trace+0x100/0x148)
[34437.174282]  [<0000000000115f18>] show_stack+0x74/0xf4
[34437.174288]  [<0000000000720a9c>] dump_stack+0x88/0xb8
[34437.174291]  [<000000000071aa40>] __schedule_bug+0x78/0x90
[34437.174295]  [<00000000007237e4>] __schedule+0xb8c/0xbac
[34437.174299]  [<000000000015b21e>] process_one_work+0x432/0x640
[34437.174302]  [<000000000015ba0e>] worker_thread+0x19a/0x3c0
[34437.174306]  [<0000000000164842>] kthread+0x10e/0x128
[34437.174309]  [<0000000000728e56>] kernel_thread_starter+0x6/0xc
[34437.174313]  [<0000000000728e50>] kernel_thread_starter+0x0/0xc
[34437.174317] XXX: after schedule(), cpu=2




-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-12 15:18           ` Jason J. Herne
@ 2014-02-13  3:02             ` Lai Jiangshan
  2014-02-13  3:31             ` Lai Jiangshan
  1 sibling, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2014-02-13  3:02 UTC (permalink / raw)
  To: jjherne; +Cc: Tejun Heo, linux-kernel

On 02/12/2014 11:18 PM, Jason J. Herne wrote:
> On 02/10/2014 06:17 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
>>> [  950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
>>> pool->cpu=2 rescue_wq=          (null)
>>> [  950.778488] XXX: last_unbind=-7 last_rebind=0
>>> last_rebound_clear=0 nr_exected_after_rebound_clear=0
>>> [  950.778492] XXX: cpus_allowed=2
>>> [  950.778495] XXX: cpus_allowed_after_rebinding=2
>>
>> So, everything looks kosher from workqueue side.  Weird.  cpus_allowed
>> is properly set and everything.  The worker just isn't running on the
>> cpu it's supposed to be on.  Can you please try the following?
> 
> 
> Thanks for the fast responses. Here is the output from the last patch:

I can't hit the bug. Could you give me more information to help me to reproduce the bug?
I think we need to use "git-bisect" to address the suspicious patch.

Thanks,
Lai

> 
> 
> [34437.173991] WARNING: at kernel/workqueue.c:2156
> [34437.173993] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174030] CPU: 2 PID: 12 Comm: kworker/1:0 Not tainted 3.14.0-rc1 #1
> [34437.174034] task: 00000000f7c9e4c0 ti: 00000000f7cbc000 task.ti: 00000000f7cbc000
> [34437.174046] Krnl PSW : 0404c00180000000 000000000015b406 (process_one_work+0x61a/0x640)
> [34437.174051]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
> Krnl GPRS: 000000007c325b00 0000000000bc2a52 00000000f7f52780 0000000000000001
> [34437.174057]            0000000000000000 0000000000000002 0000000000000001 0000000000bc2a4e
> [34437.174061]            0000000000000000 0000000084a13500 0000000084a0f000 0000000084a0f018
> [34437.174065]            00000000f7f52780 0000000000735d18 00000000f7cbfdd0 00000000f7cbfd58
> [34437.174085] Krnl Code: 000000000015b3fa: 92013000        mvi    0(%r3),1
>            000000000015b3fe: a7f4ff31        brc    15,15b260
>           #000000000015b402: a7f40001        brc    15,15b404
>           >000000000015b406: 92011000        mvi    0(%r1),1
>            000000000015b40a: a7f4fe84        brc    15,15b112
>            000000000015b40e: 41102018        la    %r1,24(%r2)
>            000000000015b412: e31020180020    cg    %r1,24(%r2)
>            000000000015b418: a784ff59        brc    8,15b2ca
> [34437.174115] Call Trace:
> [34437.174118] ([<000000000015b0e0>] process_one_work+0x2f4/0x640)
> [34437.174122]  [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174126]  [<0000000000164842>] kthread+0x10e/0x128
> [34437.174132]  [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174136]  [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174138] Last Breaking-Event-Address:
> [34437.174141]  [<000000000015b402>] process_one_work+0x616/0x640
> [34437.174144] ---[ end trace 52d8ee96597a2415 ]---
> [34437.174148] XXX: worker->flags=0x1 pool->flags=0x0 cpu=2 pool->cpu=1(1) rescue_wq=          (null)
> [34437.174152] XXX: last_unbind=-34 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [34437.174156] XXX: cpus_allowed=1
> [34437.174158] XXX: cpus_allowed_after_rebinding=1
> [34437.174161] BUG: scheduling while atomic: kworker/1:0/12/0x00000002
> [34437.174164] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174194] CPU: 2 PID: 12 Comm: kworker/1:0 Tainted: G        W 3.14.0-rc1 #1
> [34437.174197]        00000000f7cbfb80 00000000f7cbfb90 0000000000000002 0000000000000000
>        00000000f7cbfc20 00000000f7cbfb98 00000000f7cbfb98 0000000000115f6a
>        0000000000000003 0000000000c4da00 000000000000000a 000000000000000a
>        00000000f7cbfbe0 00000000f7cbfb80 0000000000000000 0000000000000000
>        0000000000000000 0000000000115f6a 00000000f7cbfb80 00000000f7cbfbd0
> [34437.174273] Call Trace:
> [34437.174279] ([<0000000000115e5c>] show_trace+0x100/0x148)
> [34437.174282]  [<0000000000115f18>] show_stack+0x74/0xf4
> [34437.174288]  [<0000000000720a9c>] dump_stack+0x88/0xb8
> [34437.174291]  [<000000000071aa40>] __schedule_bug+0x78/0x90
> [34437.174295]  [<00000000007237e4>] __schedule+0xb8c/0xbac
> [34437.174299]  [<000000000015b21e>] process_one_work+0x432/0x640
> [34437.174302]  [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174306]  [<0000000000164842>] kthread+0x10e/0x128
> [34437.174309]  [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174313]  [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174317] XXX: after schedule(), cpu=2
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-12 15:18           ` Jason J. Herne
  2014-02-13  3:02             ` Lai Jiangshan
@ 2014-02-13  3:31             ` Lai Jiangshan
  2014-02-13 17:58               ` Jason J. Herne
  1 sibling, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2014-02-13  3:31 UTC (permalink / raw)
  To: jjherne; +Cc: Tejun Heo, linux-kernel

On 02/12/2014 11:18 PM, Jason J. Herne wrote:
> On 02/10/2014 06:17 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Mon, Feb 10, 2014 at 10:32:11AM -0500, Jason J. Herne wrote:
>>> [  950.778485] XXX: worker->flags=0x1 pool->flags=0x0 cpu=6
>>> pool->cpu=2 rescue_wq=          (null)
>>> [  950.778488] XXX: last_unbind=-7 last_rebind=0
>>> last_rebound_clear=0 nr_exected_after_rebound_clear=0
>>> [  950.778492] XXX: cpus_allowed=2
>>> [  950.778495] XXX: cpus_allowed_after_rebinding=2
>>
>> So, everything looks kosher from workqueue side.  Weird.  cpus_allowed
>> is properly set and everything.  The worker just isn't running on the
>> cpu it's supposed to be on.  Can you please try the following?
> 
> 
> Thanks for the fast responses. Here is the output from the last patch:
> 
> 
> [34437.173991] WARNING: at kernel/workqueue.c:2156
> [34437.173993] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174030] CPU: 2 PID: 12 Comm: kworker/1:0 Not tainted 3.14.0-rc1 #1
> [34437.174034] task: 00000000f7c9e4c0 ti: 00000000f7cbc000 task.ti: 00000000f7cbc000
> [34437.174046] Krnl PSW : 0404c00180000000 000000000015b406 (process_one_work+0x61a/0x640)
> [34437.174051]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
> Krnl GPRS: 000000007c325b00 0000000000bc2a52 00000000f7f52780 0000000000000001
> [34437.174057]            0000000000000000 0000000000000002 0000000000000001 0000000000bc2a4e
> [34437.174061]            0000000000000000 0000000084a13500 0000000084a0f000 0000000084a0f018
> [34437.174065]            00000000f7f52780 0000000000735d18 00000000f7cbfdd0 00000000f7cbfd58
> [34437.174085] Krnl Code: 000000000015b3fa: 92013000        mvi    0(%r3),1
>            000000000015b3fe: a7f4ff31        brc    15,15b260
>           #000000000015b402: a7f40001        brc    15,15b404
>           >000000000015b406: 92011000        mvi    0(%r1),1
>            000000000015b40a: a7f4fe84        brc    15,15b112
>            000000000015b40e: 41102018        la    %r1,24(%r2)
>            000000000015b412: e31020180020    cg    %r1,24(%r2)
>            000000000015b418: a784ff59        brc    8,15b2ca
> [34437.174115] Call Trace:
> [34437.174118] ([<000000000015b0e0>] process_one_work+0x2f4/0x640)
> [34437.174122]  [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174126]  [<0000000000164842>] kthread+0x10e/0x128
> [34437.174132]  [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174136]  [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174138] Last Breaking-Event-Address:
> [34437.174141]  [<000000000015b402>] process_one_work+0x616/0x640
> [34437.174144] ---[ end trace 52d8ee96597a2415 ]---
> [34437.174148] XXX: worker->flags=0x1 pool->flags=0x0 cpu=2 pool->cpu=1(1) rescue_wq=          (null)
> [34437.174152] XXX: last_unbind=-34 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [34437.174156] XXX: cpus_allowed=1
> [34437.174158] XXX: cpus_allowed_after_rebinding=1
> [34437.174161] BUG: scheduling while atomic: kworker/1:0/12/0x00000002
> [34437.174164] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
> [34437.174194] CPU: 2 PID: 12 Comm: kworker/1:0 Tainted: G        W 3.14.0-rc1 #1
> [34437.174197]        00000000f7cbfb80 00000000f7cbfb90 0000000000000002 0000000000000000
>        00000000f7cbfc20 00000000f7cbfb98 00000000f7cbfb98 0000000000115f6a
>        0000000000000003 0000000000c4da00 000000000000000a 000000000000000a
>        00000000f7cbfbe0 00000000f7cbfb80 0000000000000000 0000000000000000
>        0000000000000000 0000000000115f6a 00000000f7cbfb80 00000000f7cbfbd0
> [34437.174273] Call Trace:
> [34437.174279] ([<0000000000115e5c>] show_trace+0x100/0x148)
> [34437.174282]  [<0000000000115f18>] show_stack+0x74/0xf4
> [34437.174288]  [<0000000000720a9c>] dump_stack+0x88/0xb8
> [34437.174291]  [<000000000071aa40>] __schedule_bug+0x78/0x90
> [34437.174295]  [<00000000007237e4>] __schedule+0xb8c/0xbac
> [34437.174299]  [<000000000015b21e>] process_one_work+0x432/0x640
> [34437.174302]  [<000000000015ba0e>] worker_thread+0x19a/0x3c0
> [34437.174306]  [<0000000000164842>] kthread+0x10e/0x128
> [34437.174309]  [<0000000000728e56>] kernel_thread_starter+0x6/0xc
> [34437.174313]  [<0000000000728e50>] kernel_thread_starter+0x0/0xc
> [34437.174317] XXX: after schedule(), cpu=2

Could you use the following patch for test if Tejun doesn't give you a new one.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b010eac..cdd7a10 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -825,6 +825,7 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
 {
 	struct worker *worker = kthread_data(task);
 
+	worker->wake_up = jiffies;
 	if (!(worker->flags & WORKER_NOT_RUNNING)) {
 		WARN_ON_ONCE(worker->pool->cpu != cpu);
 		atomic_inc(&worker->pool->nr_running);
@@ -851,6 +852,7 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu)
 	struct worker *worker = kthread_data(task), *to_wakeup = NULL;
 	struct worker_pool *pool;
 
+	worker->sleep = jiffies;
 	/*
 	 * Rescuers, which may not have all the fields set up like normal
 	 * workers, also reach here, let's not access anything before
@@ -2134,6 +2136,7 @@ __acquires(&pool->lock)
 	bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
 	int work_color;
 	struct worker *collision;
+	int worker_bug = 0;
 #ifdef CONFIG_LOCKDEP
 	/*
 	 * It is permissible to free the struct work_struct from
@@ -2151,9 +2154,31 @@ __acquires(&pool->lock)
 	 * necessary to avoid spurious warnings from rescuers servicing the
 	 * unbound or a disassociated pool.
 	 */
-	WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
-		     !(pool->flags & POOL_DISASSOCIATED) &&
-		     raw_smp_processor_id() != pool->cpu);
+	if (WARN_ON_ONCE(!(worker->flags & WORKER_UNBOUND) &&
+			 !(pool->flags & POOL_DISASSOCIATED) &&
+			 raw_smp_processor_id() != pool->cpu)) {
+		static char buf[PAGE_SIZE];
+		unsigned long now = jiffies;
+
+		worker_bug = 1;
+		pr_warning("XXX: worker->flags=0x%x pool->flags=0x%x cpu=%d pool->cpu=%d(%d) rescue_wq=%p\n",
+			   worker->flags, pool->flags, raw_smp_processor_id(),
+			   pool->cpu, cpu_online(pool->cpu), worker->rescue_wq);
+		pr_warning("XXX: last_unbind=%ld last_rebind=%ld last_rebound_clear=%ld nr_exected_after_rebound_clear=%d\n",
+			   worker->last_unbind ? worker->last_unbind - now : 999,
+			   worker->last_rebind ? worker->last_rebind - now : 999,
+			   worker->last_rebound_clear ? worker->last_rebound_clear - now : 999,
+			   worker->nr_executed_after_rebound_clear);
+		pr_warning("XXX: sleep=%ld wakeup=%ld\n",
+			   worker->sleep ? worker->sleep - now : 999,
+			   worker->wake_up ? worker->wake_up - now : 999);
+
+		cpulist_scnprintf(buf, sizeof(buf), &current->cpus_allowed);
+		pr_warning("XXX: cpus_allowed=%s\n", buf);
+
+		cpulist_scnprintf(buf, sizeof(buf), &worker->cpus_allowed_after_rebinding);
+		pr_warning("XXX: cpus_allowed_after_rebinding=%s\n", buf);
+	}
 
 	/*
 	 * A single work shouldn't be executed concurrently by
@@ -2199,8 +2224,15 @@ __acquires(&pool->lock)
 	 */
 	set_work_pool_and_clear_pending(work, pool->id);
 
+	worker->nr_executed_after_rebound_clear++;
+
 	spin_unlock_irq(&pool->lock);
 
+	if (worker_bug) {
+		schedule();
+
+		pr_warning("XXX: after schedule(), cpu=%d\n", raw_smp_processor_id());
+	}
 	lock_map_acquire_read(&pwq->wq->lockdep_map);
 	lock_map_acquire(&lockdep_map);
 	trace_workqueue_execute_start(work);
@@ -2298,6 +2330,7 @@ woke_up:
 	}
 
 	worker_leave_idle(worker);
+	worker->wake_up = jiffies;
 recheck:
 	/* no more worker necessary? */
 	if (!need_more_worker(pool))
@@ -2321,6 +2354,10 @@ recheck:
 	 * management if applicable and concurrency management is restored
 	 * after being rebound.  See rebind_workers() for details.
 	 */
+	if (worker->flags & WORKER_REBOUND) {
+		worker->last_rebound_clear = jiffies;
+		worker->nr_executed_after_rebound_clear = 0;
+	}
 	worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
 
 	do {
@@ -2341,6 +2378,7 @@ recheck:
 
 	worker_set_flags(worker, WORKER_PREP, false);
 sleep:
+	worker->sleep = jiffies;
 	if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
 		goto recheck;
 
@@ -4576,8 +4614,10 @@ static void wq_unbind_fn(struct work_struct *work)
 		 * before the last CPU down must be on the cpu.  After
 		 * this, they may become diasporas.
 		 */
-		for_each_pool_worker(worker, wi, pool)
+		for_each_pool_worker(worker, wi, pool) {
 			worker->flags |= WORKER_UNBOUND;
+			worker->last_unbind = jiffies;
+		}
 
 		pool->flags |= POOL_DISASSOCIATED;
 
@@ -4633,9 +4673,13 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, wi, pool)
+	for_each_pool_worker(worker, wi, pool) {
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
+		worker->last_rebind = jiffies;
+		cpumask_copy(&worker->cpus_allowed_after_rebinding,
+			     &worker->task->cpus_allowed);
+	}
 
 	spin_lock_irq(&pool->lock);
 
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..db17e2c 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -50,6 +50,13 @@ struct worker {
 
 	/* used only by rescuers to point to the target workqueue */
 	struct workqueue_struct	*rescue_wq;	/* I: the workqueue to rescue */
+	unsigned long		last_unbind;
+	unsigned long		last_rebind;
+	unsigned long		sleep;
+	unsigned long		wake_up;
+	unsigned long		last_rebound_clear;
+	int			nr_executed_after_rebound_clear;
+	cpumask_t		cpus_allowed_after_rebinding;
 };
 
 /**


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-13  3:31             ` Lai Jiangshan
@ 2014-02-13 17:58               ` Jason J. Herne
  2014-02-13 20:41                 ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-13 17:58 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Tejun Heo, linux-kernel

On 02/12/2014 10:31 PM, Lai Jiangshan wrote:
> On 02/12/2014 11:18 PM, Jason J. Herne wrote:

> Could you use the following patch for test if Tejun doesn't give you a new one.

Lai,

Here is the output using the patch you asked me to run with.

[ 5779.795687] ------------[ cut here ]------------
[ 5779.795695] WARNING: at kernel/workqueue.c:2159
[ 5779.795698] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi tape_3590 qeth_l2 tape tape_class vhost_net tun 
vhost macvtap macvlan lcs dasd_eckd_mod dasd_mod qeth ccwgroup zfcp 
scsi_transport_fc scsi_tgt qdio dm_multipath [last unloaded: kvm]
[ 5779.795733] CPU: 4 PID: 270 Comm: kworker/5:1 Not tainted 3.14.0-rc1 #1
[ 5779.795738] task: 0000000001938000 ti: 00000000f4d9c000 task.ti: 
00000000f4d9c000
[ 5779.795750] Krnl PSW : 0404c00180000000 000000000015b452 
(process_one_work+0x666/0x688)
[ 5779.795756]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 EA:3
Krnl GPRS: 000003210f9db000 0000000000bc2a52 0000000001b640c0 
0000000000000001
[ 5779.795757]            0000000000000000 0000000000000004 
0000000000000005 00000000ffffffff
[ 5779.795759]            0000000000000000 0000000084a43500 
0000000084a3f000 0000000084a3f018
[ 5779.795763]            0000000001b640c0 0000000000735d18 
00000000f4d9fdc8 00000000f4d9fd50
[ 5779.795781] Krnl Code: 000000000015b444: dd1a9640c05b	trt 
1600(27,%r9),91(%r12)
            000000000015b44a: a7f4fd9e		brc	15,15af86
           #000000000015b44e: a7f40001		brc	15,15b450
           >000000000015b452: 92011000		mvi	0(%r1),1
            000000000015b456: a7f4fe63		brc	15,15b11c
            000000000015b45a: c03000533af9	larl	%r3,bc2a4c
            000000000015b460: 95003000		cli	0(%r3),0
            000000000015b464: a774ff3e		brc	7,15b2e0
[ 5779.795810] Call Trace:
[ 5779.795814] ([<000000000015b0ea>] process_one_work+0x2fe/0x688)
[ 5779.795817]  [<000000000015ba62>] worker_thread+0x1a6/0x3d4
[ 5779.795822]  [<00000000001648c2>] kthread+0x10e/0x128
[ 5779.795828]  [<0000000000728ed6>] kernel_thread_starter+0x6/0xc
[ 5779.795832]  [<0000000000728ed0>] kernel_thread_starter+0x0/0xc
[ 5779.795834] Last Breaking-Event-Address:
[ 5779.795837]  [<000000000015b44e>] process_one_work+0x662/0x688
[ 5779.795840] ---[ end trace 8b6353b0f2821ec9 ]---
[ 5779.795844] XXX: worker->flags=0x1 pool->flags=0x0 cpu=4 
pool->cpu=5(1) rescue_wq=          (null)
[ 5779.795848] XXX: last_unbind=-44 last_rebind=0 last_rebound_clear=0 
nr_exected_after_rebound_clear=0
[ 5779.795852] XXX: sleep=-39 wakeup=0
[ 5779.795855] XXX: cpus_allowed=5
[ 5779.795857] XXX: cpus_allowed_after_rebinding=5
[ 5779.795861] XXX: after schedule(), cpu=4

You had asked about reproducing this. This is on the S390 platform, I'm 
not sure if that makes any difference.

The workload is:
2 processes onlining random cpus in a tight loop by using 'echo 1 > 
/sys/bus/cpu.../online'
2 processes offlining random cpus in a tight loop by using 'echo 0 > 
/sys/bus/cpu.../online'
Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27

The machine has 10 processors.
The warning message some times hits within a few minutes on starting the 
workload. Other times it takes several hours.

Please let me know if you have further questions.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-13 17:58               ` Jason J. Herne
@ 2014-02-13 20:41                 ` Tejun Heo
  2014-02-14 14:56                   ` Jason J. Herne
  2014-02-14 16:09                   ` Peter Zijlstra
  0 siblings, 2 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-13 20:41 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar, Peter Zijlstra

Hello,

(cc'ing Ingo and Peter)

On Thu, Feb 13, 2014 at 12:58:10PM -0500, Jason J. Herne wrote:
> [ 5779.795687] ------------[ cut here ]------------
> [ 5779.795695] WARNING: at kernel/workqueue.c:2159
....
> [ 5779.795844] XXX: worker->flags=0x1 pool->flags=0x0 cpu=4 pool->cpu=5(1) rescue_wq=          (null)
> [ 5779.795848] XXX: last_unbind=-44 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> [ 5779.795852] XXX: sleep=-39 wakeup=0
> [ 5779.795855] XXX: cpus_allowed=5
> [ 5779.795857] XXX: cpus_allowed_after_rebinding=5
> [ 5779.795861] XXX: after schedule(), cpu=4
> 
> You had asked about reproducing this. This is on the S390 platform,
> I'm not sure if that makes any difference.
> 
> The workload is:
> 2 processes onlining random cpus in a tight loop by using 'echo 1 > /sys/bus/cpu.../online'
> 2 processes offlining random cpus in a tight loop by using 'echo 0 > /sys/bus/cpu.../online'
> Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
> 
> The machine has 10 processors.
> The warning message some times hits within a few minutes on starting
> the workload. Other times it takes several hours.

Ingo, Peter, Jason is reporting workqueue triggering warning because a
worker is running on the wrong CPU, which is relatively reliably
reproducible with the above workload on s390.  The weird thing is that
everything looks correct from workqueue side.  The worker has proper
cpus_allowed set and the CPU it's supposed to run on is online and yet
the worker is on the wrong CPU and even doing explicit schedule()
after detecting the condition doesn't change the situation.  Any
ideas?

Jason, I don't have much idea from workqueue side.  Have you been
running this test with older kernels too?  Can you confirm whether
this failure is something recent?  Bisection would be awesome but just
confirming, say, 3.12 doesn't have this issue would be very helpful.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-13 20:41                 ` Tejun Heo
@ 2014-02-14 14:56                   ` Jason J. Herne
  2014-02-14 14:58                     ` Tejun Heo
  2014-02-14 16:09                   ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-14 14:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar, Peter Zijlstra

On 02/13/2014 03:41 PM, Tejun Heo wrote:

> Jason, I don't have much idea from workqueue side.  Have you been
> running this test with older kernels too?  Can you confirm whether
> this failure is something recent?  Bisection would be awesome but just
> confirming, say, 3.12 doesn't have this issue would be very helpful.

I will try to find a kernel where this is not reproducible and attempt
bisect from there. I will let you know what I find.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-14 14:56                   ` Jason J. Herne
@ 2014-02-14 14:58                     ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-14 14:58 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: Lai Jiangshan, lkml, Ingo Molnar, Peter Zijlstra

Hello, Jason.

On Fri, Feb 14, 2014 at 9:56 AM, Jason J. Herne
<jjherne@linux.vnet.ibm.com> wrote:
> I will try to find a kernel where this is not reproducible and attempt
> bisect from there. I will let you know what I find.

Yes, even just finding that there's an earlier version where this
doesn't happen would be great.

Thank you very much.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-13 20:41                 ` Tejun Heo
  2014-02-14 14:56                   ` Jason J. Herne
@ 2014-02-14 16:09                   ` Peter Zijlstra
  2014-02-14 16:25                     ` Tejun Heo
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-02-14 16:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar

On Thu, Feb 13, 2014 at 03:41:02PM -0500, Tejun Heo wrote:
> Hello,
> 
> (cc'ing Ingo and Peter)
> 
> On Thu, Feb 13, 2014 at 12:58:10PM -0500, Jason J. Herne wrote:
> > [ 5779.795687] ------------[ cut here ]------------
> > [ 5779.795695] WARNING: at kernel/workqueue.c:2159
> ....
> > [ 5779.795844] XXX: worker->flags=0x1 pool->flags=0x0 cpu=4 pool->cpu=5(1) rescue_wq=          (null)
> > [ 5779.795848] XXX: last_unbind=-44 last_rebind=0 last_rebound_clear=0 nr_exected_after_rebound_clear=0
> > [ 5779.795852] XXX: sleep=-39 wakeup=0
> > [ 5779.795855] XXX: cpus_allowed=5
> > [ 5779.795857] XXX: cpus_allowed_after_rebinding=5
> > [ 5779.795861] XXX: after schedule(), cpu=4
> > 
> > You had asked about reproducing this. This is on the S390 platform,
> > I'm not sure if that makes any difference.
> > 
> > The workload is:
> > 2 processes onlining random cpus in a tight loop by using 'echo 1 > /sys/bus/cpu.../online'
> > 2 processes offlining random cpus in a tight loop by using 'echo 0 > /sys/bus/cpu.../online'
> > Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
> > 
> > The machine has 10 processors.
> > The warning message some times hits within a few minutes on starting
> > the workload. Other times it takes several hours.
> 
> Ingo, Peter, Jason is reporting workqueue triggering warning because a
> worker is running on the wrong CPU, which is relatively reliably
> reproducible with the above workload on s390. 

Wasn't that a feature of workqueues? You know we've had arguments about
that behaviour -- I'm strongly in favour of flushing and killing workers
on unplug, but you let them run on the wrong cpu.

So strongly in fact, I'd call the current behaviour quite insane and
broken :-)

> The weird thing is that
> everything looks correct from workqueue side.  The worker has proper
> cpus_allowed set and the CPU it's supposed to run on is online and yet
> the worker is on the wrong CPU and even doing explicit schedule()
> after detecting the condition doesn't change the situation.

Yeah, just calling schedule() won't fix placement, you need to actually
block and wake-up. But given you've called things like
set_cpus_allowed_ptr() and such to set the mask back to 5..

You can try something like the below which makes it slightly more
aggressive about moving tasks about.

> Any ideas?

Not really; s390 doesn't have NUMA, so all those changes are out.

---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fb9764fbc537..20bd4de44bb3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4504,7 +4504,8 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 
 	rq = task_rq_lock(p, &flags);
 
-	if (cpumask_equal(&p->cpus_allowed, new_mask))
+	if (cpumask_equal(&p->cpus_allowed, new_mask) && 
+	    cpumask_test_cpu(rq->cpu, &p->cpus_allowed))
 		goto out;
 
 	if (!cpumask_intersects(new_mask, cpu_active_mask)) {


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-14 16:09                   ` Peter Zijlstra
@ 2014-02-14 16:25                     ` Tejun Heo
  2014-02-14 16:28                       ` Peter Zijlstra
  2014-02-24 15:01                       ` Jason J. Herne
  0 siblings, 2 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-14 16:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar

Hey, Peter.

On Fri, Feb 14, 2014 at 05:09:23PM +0100, Peter Zijlstra wrote:
> > Ingo, Peter, Jason is reporting workqueue triggering warning because a
> > worker is running on the wrong CPU, which is relatively reliably
> > reproducible with the above workload on s390. 
> 
> Wasn't that a feature of workqueues? You know we've had arguments about
> that behaviour -- I'm strongly in favour of flushing and killing workers
> on unplug, but you let them run on the wrong cpu.
> 
> So strongly in fact, I'd call the current behaviour quite insane and
> broken :-)

Hey, we now even keep normal kthreads across cpu down/ups. :)

> Yeah, just calling schedule() won't fix placement, you need to actually
> block and wake-up. But given you've called things like
> set_cpus_allowed_ptr() and such to set the mask back to 5..

Hmmm... I see.  It's kinda weird that the code has been there for so
long and this is the first time it's getting reported.

> You can try something like the below which makes it slightly more
> aggressive about moving tasks about.
> 
> > Any ideas?
> 
> Not really; s390 doesn't have NUMA, so all those changes are out.
> 
> ---
>  kernel/sched/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fb9764fbc537..20bd4de44bb3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4504,7 +4504,8 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  
>  	rq = task_rq_lock(p, &flags);
>  
> -	if (cpumask_equal(&p->cpus_allowed, new_mask))
> +	if (cpumask_equal(&p->cpus_allowed, new_mask) && 
> +	    cpumask_test_cpu(rq->cpu, &p->cpus_allowed))
>  		goto out;
>  
>  	if (!cpumask_intersects(new_mask, cpu_active_mask)) {

Hmmm... weird, p's rq shouldn't have changed without its cpus_allowed
busted.  Anyways, let's wait for Jason's test results and see whether
this is a regression at all.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-14 16:25                     ` Tejun Heo
@ 2014-02-14 16:28                       ` Peter Zijlstra
  2014-02-14 16:38                         ` Tejun Heo
  2014-02-24 15:01                       ` Jason J. Herne
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-02-14 16:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar

On Fri, Feb 14, 2014 at 11:25:56AM -0500, Tejun Heo wrote:
> Hey, we now even keep normal kthreads across cpu down/ups. :)

Well, we keep them, but parked, they're not allowed to run.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-14 16:28                       ` Peter Zijlstra
@ 2014-02-14 16:38                         ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2014-02-14 16:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar

On Fri, Feb 14, 2014 at 05:28:54PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 14, 2014 at 11:25:56AM -0500, Tejun Heo wrote:
> > Hey, we now even keep normal kthreads across cpu down/ups. :)
> 
> Well, we keep them, but parked, they're not allowed to run.

Yeah, that's true.  In the long term, I think we'll need to have
better distinction between the work items which actually need affinity
for correctness and which is just doing it as an optimization.  It's
already happening now in a rather ad-hoc way with WQ_POWER_EFFICIENT.
Anyways, once we actually make that distinction clearly, it should
probably be possible to flush all per-cpu ones and keep them parked
across offline.  Right now, we have mixture of the two and the
knowledge is only present in cpu_down callbacks of each user making it
difficult to do that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-14 16:25                     ` Tejun Heo
  2014-02-14 16:28                       ` Peter Zijlstra
@ 2014-02-24 15:01                       ` Jason J. Herne
  2014-02-24 18:35                         ` Tejun Heo
  1 sibling, 1 reply; 23+ messages in thread
From: Jason J. Herne @ 2014-02-24 15:01 UTC (permalink / raw)
  To: Tejun Heo, Peter Zijlstra; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar

On 02/14/2014 11:25 AM, Tejun Heo wrote:
> ...
> Hmmm... weird, p's rq shouldn't have changed without its cpus_allowed
> busted.  Anyways, let's wait for Jason's test results and see whether
> this is a regression at all.

I was unable to determine exactly when this behavior was introduced. The 
reason for this is because I keep hitting other bugs that prevent the 
test case from running, the most notable of which causes an immediate 
system hang. I also hit other warnings and bug messages and I'm not sure 
if they are related or if they will influence the probability of hitting 
the problem we are trying to solve here.  What I did find is the following:

We hit this problem as far back as v3.10.
The warning was introduced after v3.5 but before v3.6.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-24 15:01                       ` Jason J. Herne
@ 2014-02-24 18:35                         ` Tejun Heo
  2014-02-25 10:37                           ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2014-02-24 18:35 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: Peter Zijlstra, Lai Jiangshan, linux-kernel, Ingo Molnar

On Mon, Feb 24, 2014 at 10:01:55AM -0500, Jason J. Herne wrote:
> On 02/14/2014 11:25 AM, Tejun Heo wrote:
> >...
> >Hmmm... weird, p's rq shouldn't have changed without its cpus_allowed
> >busted.  Anyways, let's wait for Jason's test results and see whether
> >this is a regression at all.
> 
> I was unable to determine exactly when this behavior was introduced.
> The reason for this is because I keep hitting other bugs that
> prevent the test case from running, the most notable of which causes
> an immediate system hang. I also hit other warnings and bug messages
> and I'm not sure if they are related or if they will influence the
> probability of hitting the problem we are trying to solve here.
> What I did find is the following:
> 
> We hit this problem as far back as v3.10.
> The warning was introduced after v3.5 but before v3.6.

That's a bummer but it at least isn't a very new regression.  Peter,
any ideas on debugging this?  I can make workqueue to play block /
unblock dance to try to work around the issue but that'd be very
yucky.  It'd be great to root cause where the cpu selection anomaly is
coming from.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-24 18:35                         ` Tejun Heo
@ 2014-02-25 10:37                           ` Peter Zijlstra
  2014-03-10 14:37                             ` Jason J. Herne
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-02-25 10:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jason J. Herne, Lai Jiangshan, linux-kernel, Ingo Molnar

On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:

> That's a bummer but it at least isn't a very new regression.  Peter,
> any ideas on debugging this?  I can make workqueue to play block /
> unblock dance to try to work around the issue but that'd be very
> yucky.  It'd be great to root cause where the cpu selection anomaly is
> coming from.

I'm assuming you're using set_cpus_allowed_ptr() to flip them between
CPUs; the below adds some error paths to that code. In particular we
propagate the __migrate_task() fail (returns the number of tasks
migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().

This way we can see if there was a problem with the migration.

You should be able to now reliably use the return value of
set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
allowed mask.

I've also included an #if 0 retry loop for the fail case; but I suspect
that that might end up deadlocking your machine if you hit that just
wrong, something like the waking CPU endlessly trying to migrate the
task over while the wakee CPU is waiting for completion of something
from the waking CPU.

But its worth a prod I suppose.


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84b23cec0aeb..4c384efac8b3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 
 	do_set_cpus_allowed(p, new_mask);
 
+again: __maybe_unused
+
 	/* Can the task run on the task's current CPU? If so, we're done */
-	if (cpumask_test_cpu(task_cpu(p), new_mask))
+	if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
 		goto out;
 
-	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
+	dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
 	if (p->on_rq) {
 		struct migration_arg arg = { p, dest_cpu };
+
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);
-		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+		ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+#if 0
+		if (ret) {
+			rq = task_rq_lock(p, &flags);
+			goto again;
+		}
+#endif
 		tlb_migrate_finish(p->mm);
-		return 0;
+
+		return ret;
 	}
 out:
 	task_rq_unlock(rq, p, &flags);
@@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int nid)
 static int migration_cpu_stop(void *data)
 {
 	struct migration_arg *arg = data;
+	int ret = 0;
 
 	/*
 	 * The original target cpu might have gone down and we might
 	 * be on another cpu but it doesn't matter.
 	 */
 	local_irq_disable();
-	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
+	if (!__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu))
+		ret = -EAGAIN;
 	local_irq_enable();
-	return 0;
+
+	return ret;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-02-25 10:37                           ` Peter Zijlstra
@ 2014-03-10 14:37                             ` Jason J. Herne
  2014-03-17 14:51                               ` Jason J. Herne
  2014-03-17 15:16                               ` Peter Zijlstra
  0 siblings, 2 replies; 23+ messages in thread
From: Jason J. Herne @ 2014-03-10 14:37 UTC (permalink / raw)
  To: Peter Zijlstra, Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar

On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>
>> That's a bummer but it at least isn't a very new regression.  Peter,
>> any ideas on debugging this?  I can make workqueue to play block /
>> unblock dance to try to work around the issue but that'd be very
>> yucky.  It'd be great to root cause where the cpu selection anomaly is
>> coming from.
>
> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
> CPUs; the below adds some error paths to that code. In particular we
> propagate the __migrate_task() fail (returns the number of tasks
> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>
> This way we can see if there was a problem with the migration.
>
> You should be able to now reliably use the return value of
> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
> allowed mask.
>
> I've also included an #if 0 retry loop for the fail case; but I suspect
> that that might end up deadlocking your machine if you hit that just
> wrong, something like the waking CPU endlessly trying to migrate the
> task over while the wakee CPU is waiting for completion of something
> from the waking CPU.
>
> But its worth a prod I suppose.
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 84b23cec0aeb..4c384efac8b3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>
>   	do_set_cpus_allowed(p, new_mask);
>
> +again: __maybe_unused
> +
>   	/* Can the task run on the task's current CPU? If so, we're done */
> -	if (cpumask_test_cpu(task_cpu(p), new_mask))
> +	if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
>   		goto out;
>
> -	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> +	dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
>   	if (p->on_rq) {
>   		struct migration_arg arg = { p, dest_cpu };
> +
>   		/* Need help from migration thread: drop lock and wait. */
>   		task_rq_unlock(rq, p, &flags);
> -		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> +		ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> +#if 0
> +		if (ret) {
> +			rq = task_rq_lock(p, &flags);
> +			goto again;
> +		}
> +#endif
>   		tlb_migrate_finish(p->mm);
> -		return 0;
> +
> +		return ret;
>   	}
>   out:
>   	task_rq_unlock(rq, p, &flags);
> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int nid)
>   static int migration_cpu_stop(void *data)
>   {
>   	struct migration_arg *arg = data;
> +	int ret = 0;
>
>   	/*
>   	 * The original target cpu might have gone down and we might
>   	 * be on another cpu but it doesn't matter.
>   	 */
>   	local_irq_disable();
> -	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
> +	if (!__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu))
> +		ret = -EAGAIN;
>   	local_irq_enable();
> -	return 0;
> +
> +	return ret;
>   }
>
>   #ifdef CONFIG_HOTPLUG_CPU
>

Peter,

Did you intend for me to run with this patch or was it posted for 
discussion only? If you want it run, please tell me what to look for.
Also, if I should run this, should I include any other patches, either 
the last one you posted in this thread or any of Tejun's?

Thanks.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-03-10 14:37                             ` Jason J. Herne
@ 2014-03-17 14:51                               ` Jason J. Herne
  2014-03-17 15:16                               ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Jason J. Herne @ 2014-03-17 14:51 UTC (permalink / raw)
  To: Peter Zijlstra, Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Ingo Molnar

On 03/10/2014 10:37 AM, Jason J. Herne wrote:
> On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
>> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>>
>>> That's a bummer but it at least isn't a very new regression.  Peter,
>>> any ideas on debugging this?  I can make workqueue to play block /
>>> unblock dance to try to work around the issue but that'd be very
>>> yucky.  It'd be great to root cause where the cpu selection anomaly is
>>> coming from.
>>
>> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
>> CPUs; the below adds some error paths to that code. In particular we
>> propagate the __migrate_task() fail (returns the number of tasks
>> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>>
>> This way we can see if there was a problem with the migration.
>>
>> You should be able to now reliably use the return value of
>> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
>> allowed mask.
>>
>> I've also included an #if 0 retry loop for the fail case; but I suspect
>> that that might end up deadlocking your machine if you hit that just
>> wrong, something like the waking CPU endlessly trying to migrate the
>> task over while the wakee CPU is waiting for completion of something
>> from the waking CPU.
>>
>> But its worth a prod I suppose.
>>
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 84b23cec0aeb..4c384efac8b3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct
>> *p, const struct cpumask *new_mask)
>>
>>       do_set_cpus_allowed(p, new_mask);
>>
>> +again: __maybe_unused
>> +
>>       /* Can the task run on the task's current CPU? If so, we're done */
>> -    if (cpumask_test_cpu(task_cpu(p), new_mask))
>> +    if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
>>           goto out;
>>
>> -    dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> +    dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
>>       if (p->on_rq) {
>>           struct migration_arg arg = { p, dest_cpu };
>> +
>>           /* Need help from migration thread: drop lock and wait. */
>>           task_rq_unlock(rq, p, &flags);
>> -        stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> +        ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> +#if 0
>> +        if (ret) {
>> +            rq = task_rq_lock(p, &flags);
>> +            goto again;
>> +        }
>> +#endif
>>           tlb_migrate_finish(p->mm);
>> -        return 0;
>> +
>> +        return ret;
>>       }
>>   out:
>>       task_rq_unlock(rq, p, &flags);
>> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int
>> nid)
>>   static int migration_cpu_stop(void *data)
>>   {
>>       struct migration_arg *arg = data;
>> +    int ret = 0;
>>
>>       /*
>>        * The original target cpu might have gone down and we might
>>        * be on another cpu but it doesn't matter.
>>        */
>>       local_irq_disable();
>> -    __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
>> +    if (!__migrate_task(arg->task, raw_smp_processor_id(),
>> arg->dest_cpu))
>> +        ret = -EAGAIN;
>>       local_irq_enable();
>> -    return 0;
>> +
>> +    return ret;
>>   }
>>
>>   #ifdef CONFIG_HOTPLUG_CPU
>>
>
> Peter,
>
> Did you intend for me to run with this patch or was it posted for
> discussion only? If you want it run, please tell me what to look for.
> Also, if I should run this, should I include any other patches, either
> the last one you posted in this thread or any of Tejun's?
>
> Thanks.
>

Ping?

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Subject: Warning in workqueue.c
  2014-03-10 14:37                             ` Jason J. Herne
  2014-03-17 14:51                               ` Jason J. Herne
@ 2014-03-17 15:16                               ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2014-03-17 15:16 UTC (permalink / raw)
  To: Jason J. Herne; +Cc: Tejun Heo, Lai Jiangshan, linux-kernel, Ingo Molnar

On Mon, Mar 10, 2014 at 10:37:42AM -0400, Jason J. Herne wrote:
> 
> Did you intend for me to run with this patch or was it posted for discussion
> only? If you want it run, please tell me what to look for.
> Also, if I should run this, should I include any other patches, either the
> last one you posted in this thread or any of Tejun's?

It needs some workqueue side debugging too; I'm not entirely sure where
the moving back to the online cpu happens, but you want to check the
set_cpus_allowed_ptr() return value there.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-03-17 15:16 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-07 14:39 Subject: Warning in workqueue.c Jason J. Herne
2014-02-07 16:51 ` Tejun Heo
2014-02-07 17:55   ` Jason J. Herne
2014-02-07 19:36     ` Tejun Heo
2014-02-10 15:32       ` Jason J. Herne
2014-02-10 23:17         ` Tejun Heo
2014-02-12 15:18           ` Jason J. Herne
2014-02-13  3:02             ` Lai Jiangshan
2014-02-13  3:31             ` Lai Jiangshan
2014-02-13 17:58               ` Jason J. Herne
2014-02-13 20:41                 ` Tejun Heo
2014-02-14 14:56                   ` Jason J. Herne
2014-02-14 14:58                     ` Tejun Heo
2014-02-14 16:09                   ` Peter Zijlstra
2014-02-14 16:25                     ` Tejun Heo
2014-02-14 16:28                       ` Peter Zijlstra
2014-02-14 16:38                         ` Tejun Heo
2014-02-24 15:01                       ` Jason J. Herne
2014-02-24 18:35                         ` Tejun Heo
2014-02-25 10:37                           ` Peter Zijlstra
2014-03-10 14:37                             ` Jason J. Herne
2014-03-17 14:51                               ` Jason J. Herne
2014-03-17 15:16                               ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.