[PATCH 0/2] workqueue: fix a bug when numa mapping is changed v4

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] workqueue: fix a bug when numa mapping is changed v4
@ 2014-12-16 16:36 Kamezawa Hiroyuki
  2014-12-16 16:45 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Kamezawa Hiroyuki
  2014-12-16 16:51 ` [PATCH 2/2] workqueue: update cpumask at CPU_ONLINE if necessary Kamezawa Hiroyuki
  0 siblings, 2 replies; 21+ messages in thread
From: Kamezawa Hiroyuki @ 2014-12-16 16:36 UTC (permalink / raw)
  To: Lai Jiangshan, Tejun Heo, linux-kernel
  Cc: "Ishimatsu, Yasuaki/石松 靖章",
	Tang Chen, guz.fnst, Kamezawa Hiroyuki

This is v4. Thank you for hints/commentes to previous versions.

I think this versions only contains necessary things and not invasive.
Tested several patterns of node hotplug and seems to work well.

Changes since v3
 - removed changes against get_unbound_pool()
 - remvoed codes in cpu offline event.
 - added node unregister callback.
   clear wq_numa_possible_mask at node offline rather than cpu offline.
 - updates per-cpu pool's pool-> node at node_(un)register.
 - added more comments.
 - almost all codes are under CONFIG_MEMORY_HOTPLUG

 include/linux/memory_hotplug.h |    3 +
 kernel/workqueue.c             |   81 ++++++++++++++++++++++++++++++++++++++++-
 mm/memory_hotplug.c            |    6 ++-
 3 files changed, 88 insertions(+), 2 deletions(-)

Original problem was a memory allocation failure because pool->node
points to not-online node. This happens when cpu<->node mapping changes.

Yasuaki Ishimatsu hit a allocation failure bug when the numa mapping
between CPU and node is changed. This was the last scene:
 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2014-12-16 16:36 [PATCH 0/2] workqueue: fix a bug when numa mapping is changed v4 Kamezawa Hiroyuki
@ 2014-12-16 16:45 ` Kamezawa Hiroyuki
  2014-12-17  1:36   ` Lai Jiangshan
  2014-12-16 16:51 ` [PATCH 2/2] workqueue: update cpumask at CPU_ONLINE if necessary Kamezawa Hiroyuki
  1 sibling, 1 reply; 21+ messages in thread
From: Kamezawa Hiroyuki @ 2014-12-16 16:45 UTC (permalink / raw)
  To: Lai Jiangshan, Tejun Heo, linux-kernel
  Cc: "Ishimatsu, Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

With node online/offline, cpu<->node relationship is established.
Workqueue uses a info which was established at boot time but
it may be changed by node hotpluging.

Once pool->node points to a stale node, following allocation failure
happens.
  ==
     SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
      cache: kmalloc-192, object size: 192, buffer size: 192, default
order:
    1, min order: 0
      node 0: slabs: 6172, objs: 259224, free: 245741
      node 1: slabs: 3261, objs: 136962, free: 127656
    ==
This patch updates per cpu workqueue pool's node affinity and
updates wq_numa_possible_cpumask at node online/offline event.
This update of mask is very important because it affects cpumasks
and preferred node detection.

Unbound workqueue's per node pool are updated by
by wq_update_unbound_numa() at CPU_DOWN_PREPARE of the last cpu, by existing code.
What important here is to avoid wrong node detection when a cpu get onlined.
And it's handled by wq_numa_possible_cpumask update introduced by this patch.

Changelog v3->v4:
 - added workqueue_node_unregister
 - clear wq_numa_possible_cpumask at node offline.
 - merged a patch which handles per cpu pools.
 - clear per-cpu-pool's pool->node at node offlining.
 - set per-cpu-pool's pool->node at node onlining.
 - dropped modification to get_unbound_pool()
 - dropped per-cpu-pool handling at cpu online/offline.

Reported-by:  Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memory_hotplug.h |  3 +++
 kernel/workqueue.c             | 58 +++++++++++++++++++++++++++++++++++++++++-
 mm/memory_hotplug.c            |  6 ++++-
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 8f1a419..7b4a292 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -270,4 +270,7 @@ extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 
+/* update for workqueues */
+void workqueue_node_register(int node);
+void workqueue_node_unregister(int node);
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6202b08..f6ad05a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -266,7 +266,7 @@ struct workqueue_struct {
 static struct kmem_cache *pwq_cache;
 
 static cpumask_var_t *wq_numa_possible_cpumask;
-					/* possible CPUs of each node */
+					/* PL: possible CPUs of each node */
 
 static bool wq_disable_numa;
 module_param_named(disable_numa, wq_disable_numa, bool, 0444);
@@ -4563,6 +4563,62 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 }
+#ifdef CONFIG_MEMORY_HOTPLUG
+
+static void workqueue_update_cpu_numa_affinity(int cpu, int node)
+{
+	struct worker_pool *pool;
+
+	if (node != cpu_to_node(cpu))
+		return;
+	cpumask_set_cpu(cpu, wq_numa_possible_cpumask[node]);
+	for_each_cpu_worker_pool(pool, cpu)
+		pool->node = node;
+}
+
+/*
+ * When a cpu is physically added, cpu<->node relationship is established
+ * based on firmware info. We can catch the whole view when a new NODE_DATA()
+ * coming up (a node is added).
+ * If we don't update the info, pool->node will points to a not-online node
+ * and the kernel will have allocation failure.
+ *
+ * Update wp_numa_possible_mask at online and clear it at offline.
+ */
+void workqueue_node_register(int node)
+{
+	int cpu;
+
+	mutex_lock(&wq_pool_mutex);
+	for_each_possible_cpu(cpu)
+		workqueue_update_cpu_numa_affinity(cpu, node);
+	/* unbound workqueue will be updated when the 1st cpu comes up.*/
+	mutex_unlock(&wq_pool_mutex);
+}
+
+void workqueue_node_unregister(int node)
+{
+	struct worker_pool *pool;
+	int cpu;
+
+	mutex_lock(&wq_pool_mutex);
+	cpumask_clear(wq_numa_possible_cpumask[node]);
+	for_each_possible_cpu(cpu) {
+		if (node == cpu_to_node(cpu))
+			for_each_cpu_worker_pool(pool, cpu)
+				pool->node = NUMA_NO_NODE;
+	}
+	/*
+	 * unbound workqueue's per-node pwqs are already refleshed
+	 * by wq_update_unbound_numa() at CPU_DOWN_PREPARE of the last cpu
+	 * on this node, because all cpus of this node went down.
+	 * (see wq_calc_node_cpumask()). per-node unbound pwqs has been replaced
+	 * with wq->dfl_pwq, already.
+	 */
+	mutex_unlock(&wq_pool_mutex);
+}
+
+#endif
 
 /*
  * Workqueues should be brought up before normal priority CPU notifiers.
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9fab107..a0cb5c1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1122,6 +1122,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
 	 */
 	reset_node_present_pages(pgdat);
 
+	/* Update workqueue's numa affinity info. */
+	workqueue_node_register(nid);
+
 	return pgdat;
 }
 
@@ -1958,7 +1961,8 @@ void try_offline_node(int nid)
 
 	if (check_and_unmap_cpu_on_node(pgdat))
 		return;
-
+	/* update workqueue's numa affinity info. */
+	workqueue_node_unregister(nid);
 	/*
 	 * all memory/cpu of this node are removed, we can offline this
 	 * node now.
-- 
1.8.3.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] workqueue: update cpumask at CPU_ONLINE if necessary
  2014-12-16 16:36 [PATCH 0/2] workqueue: fix a bug when numa mapping is changed v4 Kamezawa Hiroyuki
  2014-12-16 16:45 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Kamezawa Hiroyuki
@ 2014-12-16 16:51 ` Kamezawa Hiroyuki
  1 sibling, 0 replies; 21+ messages in thread
From: Kamezawa Hiroyuki @ 2014-12-16 16:51 UTC (permalink / raw)
  To: Lai Jiangshan, Tejun Heo, linux-kernel
  Cc: "Ishimatsu, Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

In some case, cpu's numa affinity will be changed in cpu_up().
It happens after a new node is onlined.
(in x86, online cpus are tied to onlined node at boot.
 so, if memory is added later, cpu mapping can be changed at cpu_up()

Although wq_numa_possible_cpumask at el. are maintained against
node hotplug, this case should be handled.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 kernel/workqueue.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f6ad05a..59d8be5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4618,6 +4618,27 @@ void workqueue_node_unregister(int node)
 	mutex_unlock(&wq_pool_mutex);
 }
 
+static void workqueue_may_update_numa_affinity(int cpu)
+{
+	int curnode = cpu_to_node(cpu);
+	int node;
+
+	if (likely(cpumask_test_cpu(cpu, wq_numa_possible_cpumask[curnode])))
+		return;
+
+	/* cpu<->node relationship is changed in cpu_up() */
+	for_each_node_state(node, N_POSSIBLE)
+		cpumask_clear_cpu(cpu, wq_numa_possible_cpumask[node]);
+
+	workqueue_update_cpu_numa_affinity(cpu, curnode);
+}
+#else
+
+static void workqueue_may_update_numa_affinity(int cpu)
+{
+	return;
+}
+
 #endif
 
 /*
@@ -4647,6 +4668,8 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 	case CPU_ONLINE:
 		mutex_lock(&wq_pool_mutex);
 
+		workqueue_may_update_numa_affinity(cpu);
+
 		for_each_pool(pool, pi) {
 			mutex_lock(&pool->attach_mutex);
 
-- 
1.8.3.1




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2014-12-16 16:45 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Kamezawa Hiroyuki
@ 2014-12-17  1:36   ` Lai Jiangshan
  2014-12-17  3:22     ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 21+ messages in thread
From: Lai Jiangshan @ 2014-12-17  1:36 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Tejun Heo, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

On 12/17/2014 12:45 AM, Kamezawa Hiroyuki wrote:
> With node online/offline, cpu<->node relationship is established.
> Workqueue uses a info which was established at boot time but
> it may be changed by node hotpluging.
> 
> Once pool->node points to a stale node, following allocation failure
> happens.
>   ==
>      SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
>       cache: kmalloc-192, object size: 192, buffer size: 192, default
> order:
>     1, min order: 0
>       node 0: slabs: 6172, objs: 259224, free: 245741
>       node 1: slabs: 3261, objs: 136962, free: 127656
>     ==
> This patch updates per cpu workqueue pool's node affinity and
> updates wq_numa_possible_cpumask at node online/offline event.
> This update of mask is very important because it affects cpumasks
> and preferred node detection.
> 
> Unbound workqueue's per node pool are updated by
> by wq_update_unbound_numa() at CPU_DOWN_PREPARE of the last cpu, by existing code.
> What important here is to avoid wrong node detection when a cpu get onlined.
> And it's handled by wq_numa_possible_cpumask update introduced by this patch.
> 
> Changelog v3->v4:
>  - added workqueue_node_unregister
>  - clear wq_numa_possible_cpumask at node offline.
>  - merged a patch which handles per cpu pools.
>  - clear per-cpu-pool's pool->node at node offlining.
>  - set per-cpu-pool's pool->node at node onlining.
>  - dropped modification to get_unbound_pool()
>  - dropped per-cpu-pool handling at cpu online/offline.
> 
> Reported-by:  Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/memory_hotplug.h |  3 +++
>  kernel/workqueue.c             | 58 +++++++++++++++++++++++++++++++++++++++++-
>  mm/memory_hotplug.c            |  6 ++++-
>  3 files changed, 65 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 8f1a419..7b4a292 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -270,4 +270,7 @@ extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
>  extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
>  					  unsigned long pnum);
>  
> +/* update for workqueues */
> +void workqueue_node_register(int node);
> +void workqueue_node_unregister(int node);
>  #endif /* __LINUX_MEMORY_HOTPLUG_H */
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 6202b08..f6ad05a 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -266,7 +266,7 @@ struct workqueue_struct {
>  static struct kmem_cache *pwq_cache;
>  
>  static cpumask_var_t *wq_numa_possible_cpumask;
> -					/* possible CPUs of each node */
> +					/* PL: possible CPUs of each node */
>  
>  static bool wq_disable_numa;
>  module_param_named(disable_numa, wq_disable_numa, bool, 0444);
> @@ -4563,6 +4563,62 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
>  		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
>  						  pool->attrs->cpumask) < 0);
>  }
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +
> +static void workqueue_update_cpu_numa_affinity(int cpu, int node)
> +{
> +	struct worker_pool *pool;
> +
> +	if (node != cpu_to_node(cpu))
> +		return;
> +	cpumask_set_cpu(cpu, wq_numa_possible_cpumask[node]);
> +	for_each_cpu_worker_pool(pool, cpu)
> +		pool->node = node;

Again, You need to check and update all the wq->numa_pwq_tbl[oldnode],
but in this patchset, the required information is lost and we can't find out oldnode.


cpus of oldnode, 16-31(online),48,56,64,72(offline,randomly assigned to the oldnode by numa_init_array())

and then cpu#48 is allocated for newnode and online

Now, the wq->numa_pwq_tbl[oldnode]'s cpumask still have cpu#48, and it may be scheduled to cpu#48.
See the information of my patch 4/5


> +}
> +
> +/*
> + * When a cpu is physically added, cpu<->node relationship is established
> + * based on firmware info. We can catch the whole view when a new NODE_DATA()
> + * coming up (a node is added).
> + * If we don't update the info, pool->node will points to a not-online node
> + * and the kernel will have allocation failure.
> + *
> + * Update wp_numa_possible_mask at online and clear it at offline.
> + */
> +void workqueue_node_register(int node)
> +{
> +	int cpu;
> +
> +	mutex_lock(&wq_pool_mutex);
> +	for_each_possible_cpu(cpu)
> +		workqueue_update_cpu_numa_affinity(cpu, node);
> +	/* unbound workqueue will be updated when the 1st cpu comes up.*/
> +	mutex_unlock(&wq_pool_mutex);
> +}
> +
> +void workqueue_node_unregister(int node)
> +{
> +	struct worker_pool *pool;
> +	int cpu;
> +
> +	mutex_lock(&wq_pool_mutex);
> +	cpumask_clear(wq_numa_possible_cpumask[node]);
> +	for_each_possible_cpu(cpu) {
> +		if (node == cpu_to_node(cpu))
> +			for_each_cpu_worker_pool(pool, cpu)
> +				pool->node = NUMA_NO_NODE;
> +	}
> +	/*
> +	 * unbound workqueue's per-node pwqs are already refleshed
> +	 * by wq_update_unbound_numa() at CPU_DOWN_PREPARE of the last cpu
> +	 * on this node, because all cpus of this node went down.
> +	 * (see wq_calc_node_cpumask()). per-node unbound pwqs has been replaced
> +	 * with wq->dfl_pwq, already.
> +	 */
> +	mutex_unlock(&wq_pool_mutex);
> +}
> +
> +#endif
>  
>  /*
>   * Workqueues should be brought up before normal priority CPU notifiers.
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9fab107..a0cb5c1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1122,6 +1122,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>  	 */
>  	reset_node_present_pages(pgdat);
>  
> +	/* Update workqueue's numa affinity info. */
> +	workqueue_node_register(nid);
> +
>  	return pgdat;
>  }
>  
> @@ -1958,7 +1961,8 @@ void try_offline_node(int nid)
>  
>  	if (check_and_unmap_cpu_on_node(pgdat))
>  		return;
> -
> +	/* update workqueue's numa affinity info. */
> +	workqueue_node_unregister(nid);
>  	/*
>  	 * all memory/cpu of this node are removed, we can offline this
>  	 * node now.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2014-12-17  1:36   ` Lai Jiangshan
@ 2014-12-17  3:22     ` Kamezawa Hiroyuki
  2014-12-17  4:56       ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 21+ messages in thread
From: Kamezawa Hiroyuki @ 2014-12-17  3:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Tejun Heo, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

(2014/12/17 10:36), Lai Jiangshan wrote:
> On 12/17/2014 12:45 AM, Kamezawa Hiroyuki wrote:
>> With node online/offline, cpu<->node relationship is established.
>> Workqueue uses a info which was established at boot time but
>> it may be changed by node hotpluging.
>>
>> Once pool->node points to a stale node, following allocation failure
>> happens.
>>    ==
>>       SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
>>        cache: kmalloc-192, object size: 192, buffer size: 192, default
>> order:
>>      1, min order: 0
>>        node 0: slabs: 6172, objs: 259224, free: 245741
>>        node 1: slabs: 3261, objs: 136962, free: 127656
>>      ==
>> This patch updates per cpu workqueue pool's node affinity and
>> updates wq_numa_possible_cpumask at node online/offline event.
>> This update of mask is very important because it affects cpumasks
>> and preferred node detection.
>>
>> Unbound workqueue's per node pool are updated by
>> by wq_update_unbound_numa() at CPU_DOWN_PREPARE of the last cpu, by existing code.
>> What important here is to avoid wrong node detection when a cpu get onlined.
>> And it's handled by wq_numa_possible_cpumask update introduced by this patch.
>>
>> Changelog v3->v4:
>>   - added workqueue_node_unregister
>>   - clear wq_numa_possible_cpumask at node offline.
>>   - merged a patch which handles per cpu pools.
>>   - clear per-cpu-pool's pool->node at node offlining.
>>   - set per-cpu-pool's pool->node at node onlining.
>>   - dropped modification to get_unbound_pool()
>>   - dropped per-cpu-pool handling at cpu online/offline.
>>
>> Reported-by:  Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>   include/linux/memory_hotplug.h |  3 +++
>>   kernel/workqueue.c             | 58 +++++++++++++++++++++++++++++++++++++++++-
>>   mm/memory_hotplug.c            |  6 ++++-
>>   3 files changed, 65 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index 8f1a419..7b4a292 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -270,4 +270,7 @@ extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
>>   extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
>>   					  unsigned long pnum);
>>
>> +/* update for workqueues */
>> +void workqueue_node_register(int node);
>> +void workqueue_node_unregister(int node);
>>   #endif /* __LINUX_MEMORY_HOTPLUG_H */
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 6202b08..f6ad05a 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -266,7 +266,7 @@ struct workqueue_struct {
>>   static struct kmem_cache *pwq_cache;
>>
>>   static cpumask_var_t *wq_numa_possible_cpumask;
>> -					/* possible CPUs of each node */
>> +					/* PL: possible CPUs of each node */
>>
>>   static bool wq_disable_numa;
>>   module_param_named(disable_numa, wq_disable_numa, bool, 0444);
>> @@ -4563,6 +4563,62 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
>>   		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
>>   						  pool->attrs->cpumask) < 0);
>>   }
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +
>> +static void workqueue_update_cpu_numa_affinity(int cpu, int node)
>> +{
>> +	struct worker_pool *pool;
>> +
>> +	if (node != cpu_to_node(cpu))
>> +		return;
>> +	cpumask_set_cpu(cpu, wq_numa_possible_cpumask[node]);
>> +	for_each_cpu_worker_pool(pool, cpu)
>> +		pool->node = node;
>
> Again, You need to check and update all the wq->numa_pwq_tbl[oldnode],
> but in this patchset, the required information is lost and we can't find out oldnode.
>
>
> cpus of oldnode, 16-31(online),48,56,64,72(offline,randomly assigned to the oldnode by numa_init_array())
>
> and then cpu#48 is allocated for newnode and online
>
> Now, the wq->numa_pwq_tbl[oldnode]'s cpumask still have cpu#48, and it may be scheduled to cpu#48.
> See the information of my patch 4/5
>

That will not cause page allocation failure, right ? If so, it's out of scope of this patch 1/2.

I think it's handled in patch 2/2, isn't it ?

Thanks,
-Kame



>
>> +}
>> +
>> +/*
>> + * When a cpu is physically added, cpu<->node relationship is established
>> + * based on firmware info. We can catch the whole view when a new NODE_DATA()
>> + * coming up (a node is added).
>> + * If we don't update the info, pool->node will points to a not-online node
>> + * and the kernel will have allocation failure.
>> + *
>> + * Update wp_numa_possible_mask at online and clear it at offline.
>> + */
>> +void workqueue_node_register(int node)
>> +{
>> +	int cpu;
>> +
>> +	mutex_lock(&wq_pool_mutex);
>> +	for_each_possible_cpu(cpu)
>> +		workqueue_update_cpu_numa_affinity(cpu, node);
>> +	/* unbound workqueue will be updated when the 1st cpu comes up.*/
>> +	mutex_unlock(&wq_pool_mutex);
>> +}
>> +
>> +void workqueue_node_unregister(int node)
>> +{
>> +	struct worker_pool *pool;
>> +	int cpu;
>> +
>> +	mutex_lock(&wq_pool_mutex);
>> +	cpumask_clear(wq_numa_possible_cpumask[node]);
>> +	for_each_possible_cpu(cpu) {
>> +		if (node == cpu_to_node(cpu))
>> +			for_each_cpu_worker_pool(pool, cpu)
>> +				pool->node = NUMA_NO_NODE;
>> +	}
>> +	/*
>> +	 * unbound workqueue's per-node pwqs are already refleshed
>> +	 * by wq_update_unbound_numa() at CPU_DOWN_PREPARE of the last cpu
>> +	 * on this node, because all cpus of this node went down.
>> +	 * (see wq_calc_node_cpumask()). per-node unbound pwqs has been replaced
>> +	 * with wq->dfl_pwq, already.
>> +	 */
>> +	mutex_unlock(&wq_pool_mutex);
>> +}
>> +
>> +#endif
>>
>>   /*
>>    * Workqueues should be brought up before normal priority CPU notifiers.
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 9fab107..a0cb5c1 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1122,6 +1122,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>   	 */
>>   	reset_node_present_pages(pgdat);
>>
>> +	/* Update workqueue's numa affinity info. */
>> +	workqueue_node_register(nid);
>> +
>>   	return pgdat;
>>   }
>>
>> @@ -1958,7 +1961,8 @@ void try_offline_node(int nid)
>>
>>   	if (check_and_unmap_cpu_on_node(pgdat))
>>   		return;
>> -
>> +	/* update workqueue's numa affinity info. */
>> +	workqueue_node_unregister(nid);
>>   	/*
>>   	 * all memory/cpu of this node are removed, we can offline this
>>   	 * node now.
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2014-12-17  3:22     ` Kamezawa Hiroyuki
@ 2014-12-17  4:56       ` Kamezawa Hiroyuki
  2014-12-25 20:11         ` Tejun Heo
  0 siblings, 1 reply; 21+ messages in thread
From: Kamezawa Hiroyuki @ 2014-12-17  4:56 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Tejun Heo, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

(2014/12/17 12:22), Kamezawa Hiroyuki wrote:
> (2014/12/17 10:36), Lai Jiangshan wrote:
>> On 12/17/2014 12:45 AM, Kamezawa Hiroyuki wrote:
>>> With node online/offline, cpu<->node relationship is established.
>>> Workqueue uses a info which was established at boot time but
>>> it may be changed by node hotpluging.
>>>
>>> Once pool->node points to a stale node, following allocation failure
>>> happens.
>>>    ==
>>>       SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
>>>        cache: kmalloc-192, object size: 192, buffer size: 192, default
>>> order:
>>>      1, min order: 0
>>>        node 0: slabs: 6172, objs: 259224, free: 245741
>>>        node 1: slabs: 3261, objs: 136962, free: 127656
>>>      ==
>>> This patch updates per cpu workqueue pool's node affinity and
>>> updates wq_numa_possible_cpumask at node online/offline event.
>>> This update of mask is very important because it affects cpumasks
>>> and preferred node detection.
>>>
>>> Unbound workqueue's per node pool are updated by
>>> by wq_update_unbound_numa() at CPU_DOWN_PREPARE of the last cpu, by existing code.
>>> What important here is to avoid wrong node detection when a cpu get onlined.
>>> And it's handled by wq_numa_possible_cpumask update introduced by this patch.
>>>
>>> Changelog v3->v4:
>>>   - added workqueue_node_unregister
>>>   - clear wq_numa_possible_cpumask at node offline.
>>>   - merged a patch which handles per cpu pools.
>>>   - clear per-cpu-pool's pool->node at node offlining.
>>>   - set per-cpu-pool's pool->node at node onlining.
>>>   - dropped modification to get_unbound_pool()
>>>   - dropped per-cpu-pool handling at cpu online/offline.
>>>
>>> Reported-by:  Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> ---
>>>   include/linux/memory_hotplug.h |  3 +++
>>>   kernel/workqueue.c             | 58 +++++++++++++++++++++++++++++++++++++++++-
>>>   mm/memory_hotplug.c            |  6 ++++-
>>>   3 files changed, 65 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>> index 8f1a419..7b4a292 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -270,4 +270,7 @@ extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
>>>   extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
>>>                         unsigned long pnum);
>>>
>>> +/* update for workqueues */
>>> +void workqueue_node_register(int node);
>>> +void workqueue_node_unregister(int node);
>>>   #endif /* __LINUX_MEMORY_HOTPLUG_H */
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index 6202b08..f6ad05a 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -266,7 +266,7 @@ struct workqueue_struct {
>>>   static struct kmem_cache *pwq_cache;
>>>
>>>   static cpumask_var_t *wq_numa_possible_cpumask;
>>> -                    /* possible CPUs of each node */
>>> +                    /* PL: possible CPUs of each node */
>>>
>>>   static bool wq_disable_numa;
>>>   module_param_named(disable_numa, wq_disable_numa, bool, 0444);
>>> @@ -4563,6 +4563,62 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
>>>           WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
>>>                             pool->attrs->cpumask) < 0);
>>>   }
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> +
>>> +static void workqueue_update_cpu_numa_affinity(int cpu, int node)
>>> +{
>>> +    struct worker_pool *pool;
>>> +
>>> +    if (node != cpu_to_node(cpu))
>>> +        return;
>>> +    cpumask_set_cpu(cpu, wq_numa_possible_cpumask[node]);
>>> +    for_each_cpu_worker_pool(pool, cpu)
>>> +        pool->node = node;
>>
>> Again, You need to check and update all the wq->numa_pwq_tbl[oldnode],
>> but in this patchset, the required information is lost and we can't find out oldnode.
>>
>>
>> cpus of oldnode, 16-31(online),48,56,64,72(offline,randomly assigned to the oldnode by numa_init_array())
>>
>> and then cpu#48 is allocated for newnode and online
>>
>> Now, the wq->numa_pwq_tbl[oldnode]'s cpumask still have cpu#48, and it may be scheduled to cpu#48.
>> See the information of my patch 4/5
>>
>
> That will not cause page allocation failure, right ? If so, it's out of scope of this patch 1/2.
>
> I think it's handled in patch 2/2, isn't it ?

Let me correct my words. Main purpose of this patch 1/2 is handling a case "node disappers" after boot.
And try to handle physicall node hotplug caes.

Changes of cpu<->node relationship at CPU_ONLINE is handled in patch 2/2.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2014-12-17  4:56       ` Kamezawa Hiroyuki
@ 2014-12-25 20:11         ` Tejun Heo
  2015-01-13  7:19           ` Lai Jiangshan
  0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2014-12-25 20:11 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Lai Jiangshan, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

On Wed, Dec 17, 2014 at 01:56:29PM +0900, Kamezawa Hiroyuki wrote:
> Let me correct my words. Main purpose of this patch 1/2 is handling a case "node disappers" after boot.
> And try to handle physicall node hotplug caes.
> 
> Changes of cpu<->node relationship at CPU_ONLINE is handled in patch 2/2.

Can you please make numa code itself maintain the cpu to nodemask
maps?  Let's make workqueue a simple consumer of that and we don't
have proper notification mechanism for node up/down events?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2014-12-25 20:11         ` Tejun Heo
@ 2015-01-13  7:19           ` Lai Jiangshan
  2015-01-13 15:22             ` Tejun Heo
  0 siblings, 1 reply; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-13  7:19 UTC (permalink / raw)
  To: Tejun Heo, Kamezawa Hiroyuki
  Cc: linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

On 12/26/2014 04:11 AM, Tejun Heo wrote:
> On Wed, Dec 17, 2014 at 01:56:29PM +0900, Kamezawa Hiroyuki wrote:
>> Let me correct my words. Main purpose of this patch 1/2 is handling a case "node disappers" after boot.
>> And try to handle physicall node hotplug caes.
>>
>> Changes of cpu<->node relationship at CPU_ONLINE is handled in patch 2/2.
> 
> Can you please make numa code itself maintain the cpu to nodemask
> maps?  Let's make workqueue a simple consumer of that and we don't
> have proper notification mechanism for node up/down events?
> 
> Thanks.
> 

The Mapping of the *online* cpus to nodes is already maintained by numa code.

What the workqueue needs is a special Mapping:
	The Mapping of the *possible* cpus to nodes

But this mapping (if the numa code maintain it) is a trouble:
	"possible" implies the mapping is stable/constant/immutable, it is hard to
	ensure it in the numa code.

if mutability of this mapping is acceptable, we just move 20~40 LOC of code
from workqueue to numa code, all the other complexities about it are still in workqueue.c.

Thanks
Lai

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2015-01-13  7:19           ` Lai Jiangshan
@ 2015-01-13 15:22             ` Tejun Heo
  2015-01-14  2:47               ` Lai Jiangshan
  0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2015-01-13 15:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Kamezawa Hiroyuki, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

Hello,

On Tue, Jan 13, 2015 at 03:19:09PM +0800, Lai Jiangshan wrote:
> The Mapping of the *online* cpus to nodes is already maintained by numa code.
> 
> What the workqueue needs is a special Mapping:
> 	The Mapping of the *possible* cpus to nodes
> 
> But this mapping (if the numa code maintain it) is a trouble:
> 	"possible" implies the mapping is stable/constant/immutable, it is hard to
> 	ensure it in the numa code.
> 
> if mutability of this mapping is acceptable, we just move 20~40 LOC
> of code from workqueue to numa code, all the other complexities
> about it are still in workqueue.c.

Make numa code maintain the mapping to the best of its knowledge and
invoke notification callbacks when it changes.  Even if that involves
slightly more code, that's the right thing to do at this point.  This
puts the logic which is complicated by the fact that the mapping may
change where it's caused not some random unrelated place.  It'd be
awesome if somebody more familiar with the numa side can chime in and
explain why this mapping change can't be avoided.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2015-01-13 15:22             ` Tejun Heo
@ 2015-01-14  2:47               ` Lai Jiangshan
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
  2015-01-14 13:57                 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Tejun Heo
  0 siblings, 2 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-14  2:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kamezawa Hiroyuki, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

On 01/13/2015 11:22 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jan 13, 2015 at 03:19:09PM +0800, Lai Jiangshan wrote:
>> The Mapping of the *online* cpus to nodes is already maintained by numa code.
>>
>> What the workqueue needs is a special Mapping:
>> 	The Mapping of the *possible* cpus to nodes
>>
>> But this mapping (if the numa code maintain it) is a trouble:
>> 	"possible" implies the mapping is stable/constant/immutable, it is hard to
>> 	ensure it in the numa code.
>>
>> if mutability of this mapping is acceptable, we just move 20~40 LOC
>> of code from workqueue to numa code, all the other complexities
>> about it are still in workqueue.c.
> 
> Make numa code maintain the mapping to the best of its knowledge and
> invoke notification callbacks when it changes.  

The best of its knowledge is the physical onlined nodes and CPUs.
The cpu_present_mask can represent this knowledge. But it lacks of
the per-node cpu_present_mask and the notification callbacks.

> Even if that involves slightly more code, that's the right thing to do at this point.

Right, but in currently, the workqueue will be the only user, and I don't known
asking who to do it, so I may keep it in the workqueue.c.

> This puts the logic which is complicated by the fact that the mapping may
> change where it's caused not some random unrelated place.  


> It'd be
> awesome if somebody more familiar with the numa side can chime in and
> explain why this mapping change can't be avoided.

I'm also looking for someone answer it.

> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug
  2015-01-14  2:47               ` Lai Jiangshan
@ 2015-01-14  8:54                 ` Lai Jiangshan
  2015-01-14  8:54                   ` [RFC PATCH 1/2 shit_A shit_B] workqueue: reset pool->node and unhash the pool when the node is offline Lai Jiangshan
                                     ` (5 more replies)
  2015-01-14 13:57                 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Tejun Heo
  1 sibling, 6 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-14  8:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Tejun Heo, Yasuaki Ishimatsu, Gu, Zheng, tangchen,
	Hiroyuki KAMEZAWA

Hi, All

This patches are un-changloged, un-compiled, un-booted, un-tested,
they are just shits, I even hope them un-sent or blocked.

The patches include two -solutions-:

Shit_A:
  workqueue: reset pool->node and unhash the pool when the node is
    offline
  update wq_numa when cpu_present_mask changed

 kernel/workqueue.c | 107 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 84 insertions(+), 23 deletions(-)

Shit_B:
  workqueue: reset pool->node and unhash the pool when the node is
    offline
  workqueue: remove wq_numa_possible_cpumask
  workqueue: directly update attrs of pools when cpu hot[un]plug

 kernel/workqueue.c | 135 +++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 101 insertions(+), 34 deletions(-)

Both patch1 of the both solutions are: reset pool->node and unhash the pool,
it is suggested by TJ, I found it is a good leading-step for fixing the bug.

The other patches are handling wq_numa_possible_cpumask where the solutions
diverge.

Solution_A uses present_mask rather than possible_cpumask. It adds
wq_numa_notify_cpu_present_set/cleared() for notifications of
the changes of cpu_present_mask.  But the notifications are un-existed
right now, so I fake one (wq_numa_check_present_cpumask_changes())
to imitate them.  I hope the memory people add a real one.

Solution_B uses online_mask rather than possible_cpumask.
this solution remove more coupling between numa_code and workqueue,
it just depends on cpumask_of_node(node).

Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
overhead when cpu hot[un]plug, Patch3 reduce this overhead.

Thanks,
Lai

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
Cc: tangchen <tangchen@cn.fujitsu.com>
Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
-- 
2.1.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 1/2 shit_A shit_B] workqueue: reset pool->node and unhash the pool when the node is offline
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
@ 2015-01-14  8:54                   ` Lai Jiangshan
  2015-01-14  8:54                   ` [RFC PATCH 2/2 shit_A] workqueue: update wq_numa when cpu_present_mask changed Lai Jiangshan
                                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-14  8:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Tejun Heo, Yasuaki Ishimatsu, Gu, Zheng, tangchen,
	Hiroyuki KAMEZAWA

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
Cc: tangchen <tangchen@cn.fujitsu.com>
Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6202b08..19bca3e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -36,6 +36,7 @@
 #include <linux/notifier.h>
 #include <linux/kthread.h>
 #include <linux/hardirq.h>
+#include <linux/memory.h>
 #include <linux/mempolicy.h>
 #include <linux/freezer.h>
 #include <linux/kallsyms.h>
@@ -4573,6 +4574,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 					       void *hcpu)
 {
 	int cpu = (unsigned long)hcpu;
+	int node = cpu_to_node(cpu);
 	struct worker_pool *pool;
 	struct workqueue_struct *wq;
 	int pi;
@@ -4580,6 +4582,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
 		for_each_cpu_worker_pool(pool, cpu) {
+			pool->node = node;
 			if (pool->nr_workers)
 				continue;
 			if (!create_worker(pool))
@@ -4796,6 +4799,33 @@ out_unlock:
 }
 #endif /* CONFIG_FREEZER */
 
+static int wq_numa_callback(struct notifier_block *self,
+			    unsigned long action, void *arg)
+{
+	struct memory_notify *marg = arg;
+	int node = marg->status_change_nid_normal;
+	struct worker_pool *pool;
+	int pi;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+		mutex_lock(&wq_pool_mutex);
+		for_each_pool(pool, pi) {
+			if (pool->node == node) {
+				pool->node = NUMA_NO_NODE;
+				if (pool->cpu < 0)
+					hash_del(&pool->hash_node);
+			}
+		}
+		mutex_unlock(&wq_pool_mutex);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
 static void __init wq_numa_init(void)
 {
 	cpumask_var_t *tbl;
@@ -4835,6 +4865,7 @@ static void __init wq_numa_init(void)
 	}
 
 	wq_numa_possible_cpumask = tbl;
+	hotplug_memory_notifier(wq_numa_callback, 0);
 	wq_numa_enabled = true;
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 2/2 shit_A] workqueue: update wq_numa when cpu_present_mask changed
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
  2015-01-14  8:54                   ` [RFC PATCH 1/2 shit_A shit_B] workqueue: reset pool->node and unhash the pool when the node is offline Lai Jiangshan
@ 2015-01-14  8:54                   ` Lai Jiangshan
  2015-01-14  8:54                   ` [RFC PATCH 2/3 shit_B] workqueue: remove wq_numa_possible_cpumask Lai Jiangshan
                                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-14  8:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Tejun Heo, Yasuaki Ishimatsu, Gu, Zheng, tangchen,
	Hiroyuki KAMEZAWA

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
Cc: tangchen <tangchen@cn.fujitsu.com>
Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c | 76 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 53 insertions(+), 23 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 19bca3e..5289892 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -266,8 +266,8 @@ struct workqueue_struct {
 
 static struct kmem_cache *pwq_cache;
 
-static cpumask_var_t *wq_numa_possible_cpumask;
-					/* possible CPUs of each node */
+static cpumask_var_t *wq_numa_present_cpumask;
+					/* present CPUs of each node */
 
 static bool wq_disable_numa;
 module_param_named(disable_numa, wq_disable_numa, bool, 0444);
@@ -3506,7 +3506,7 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	if (wq_numa_enabled) {
 		for_each_node(node) {
 			if (cpumask_subset(pool->attrs->cpumask,
-					   wq_numa_possible_cpumask[node])) {
+					   wq_numa_present_cpumask[node])) {
 				pool->node = node;
 				break;
 			}
@@ -3727,8 +3727,8 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 	if (cpumask_empty(cpumask))
 		goto use_dfl;
 
-	/* yeap, return possible CPUs in @node that @attrs wants */
-	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
+	/* yeap, return present CPUs in @node that @attrs wants */
+	cpumask_and(cpumask, attrs->cpumask, wq_numa_present_cpumask[node]);
 	return !cpumask_equal(cpumask, attrs->cpumask);
 
 use_dfl:
@@ -3876,8 +3876,8 @@ enomem:
 /**
  * wq_update_unbound_numa - update NUMA affinity of a wq for CPU hot[un]plug
  * @wq: the target workqueue
- * @cpu: the CPU coming up or going down
- * @online: whether @cpu is coming up or going down
+ * @node: the node to be updated
+ * @cpu_off: the CPU going down
  *
  * This function is to be called from %CPU_DOWN_PREPARE, %CPU_ONLINE and
  * %CPU_DOWN_FAILED.  @cpu is being hot[un]plugged, update NUMA affinity of
@@ -3895,11 +3895,9 @@ enomem:
  * affinity, it's the user's responsibility to flush the work item from
  * CPU_DOWN_PREPARE.
  */
-static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
-				   bool online)
+static void wq_update_unbound_numa(struct workqueue_struct *wq, int node,
+				   int cpu_off)
 {
-	int node = cpu_to_node(cpu);
-	int cpu_off = online ? -1 : cpu;
 	struct pool_workqueue *old_pwq = NULL, *pwq;
 	struct workqueue_attrs *target_attrs;
 	cpumask_t *cpumask;
@@ -4565,6 +4563,43 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 						  pool->attrs->cpumask) < 0);
 }
 
+static void wq_numa_notify_cpu_present_set(int cpu, int node)
+{
+	cpumask_set_cpu(cpu, wq_numa_present_cpumask[node]);
+}
+
+static void wq_numa_notify_cpu_present_cleared(int cpu, int node)
+{
+	struct workqueue_struct *wq;
+
+	cpumask_clear_cpu(cpu, wq_numa_present_cpumask[node]);
+
+	list_for_each_entry(wq, &workqueues, list)
+		wq_update_unbound_numa(wq, node, -1);
+}
+
+/*
+ * the memory system code doesn't have notification for cpu_present_mask
+ * changes, we fake one.
+ */
+static void wq_numa_check_present_cpumask_changes(int cpu)
+{
+	int node;
+
+	if (cpumask_test_cpu(cpu, wq_numa_present_cpumask[cpu_to_node(cpu)]))
+		return;
+
+	mutex_lock(&wq_pool_mutex);
+	for_each_node(node) {
+		if (cpumask_test_cpu(cpu, wq_numa_present_cpumask[node])) {
+			wq_numa_notify_cpu_present_cleared(cpu, node);
+			wq_numa_notify_cpu_present_set(cpu, cpu_to_node(cpu));
+			break;
+		}
+	}
+	mutex_unlock(&wq_pool_mutex);
+}
+
 /*
  * Workqueues should be brought up before normal priority CPU notifiers.
  * This will be registered high priority CPU notifier.
@@ -4588,6 +4623,8 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 			if (!create_worker(pool))
 				return NOTIFY_BAD;
 		}
+
+		wq_numa_check_present_cpumask_changes(cpu);
 		break;
 
 	case CPU_DOWN_FAILED:
@@ -4607,7 +4644,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 
 		/* update NUMA affinity of unbound workqueues */
 		list_for_each_entry(wq, &workqueues, list)
-			wq_update_unbound_numa(wq, cpu, true);
+			wq_update_unbound_numa(wq, node, -1);
 
 		mutex_unlock(&wq_pool_mutex);
 		break;
@@ -4636,7 +4673,7 @@ static int workqueue_cpu_down_callback(struct notifier_block *nfb,
 		/* update NUMA affinity of unbound workqueues */
 		mutex_lock(&wq_pool_mutex);
 		list_for_each_entry(wq, &workqueues, list)
-			wq_update_unbound_numa(wq, cpu, false);
+			wq_update_unbound_numa(wq, cpu_to_node(cpu), cpu);
 		mutex_unlock(&wq_pool_mutex);
 
 		/* wait for per-cpu unbinding to finish */
@@ -4854,17 +4891,10 @@ static void __init wq_numa_init(void)
 		BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
 				node_online(node) ? node : NUMA_NO_NODE));
 
-	for_each_possible_cpu(cpu) {
-		node = cpu_to_node(cpu);
-		if (WARN_ON(node == NUMA_NO_NODE)) {
-			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
-			/* happens iff arch is bonkers, let's just proceed */
-			return;
-		}
-		cpumask_set_cpu(cpu, tbl[node]);
-	}
+	for_each_present_cpu(cpu)
+		cpumask_set_cpu(cpu, tbl[cpu_to_node(cpu)]);
 
-	wq_numa_possible_cpumask = tbl;
+	wq_numa_present_cpumask = tbl;
 	hotplug_memory_notifier(wq_numa_callback, 0);
 	wq_numa_enabled = true;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 2/3 shit_B] workqueue: remove wq_numa_possible_cpumask
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
  2015-01-14  8:54                   ` [RFC PATCH 1/2 shit_A shit_B] workqueue: reset pool->node and unhash the pool when the node is offline Lai Jiangshan
  2015-01-14  8:54                   ` [RFC PATCH 2/2 shit_A] workqueue: update wq_numa when cpu_present_mask changed Lai Jiangshan
@ 2015-01-14  8:54                   ` Lai Jiangshan
  2015-01-14  8:54                   ` [RFC PATCH 3/3 shit_B] workqueue: directly update attrs of pools when cpu hot[un]plug Lai Jiangshan
                                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-14  8:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Tejun Heo, Yasuaki Ishimatsu, Gu, Zheng, tangchen,
	Hiroyuki KAMEZAWA

and use online cpumask instead.

This patch will cause the new per-node pwqs/pools/workers of the node
to be recreatd and the old ones to be exiting when any cpu online
or offline. But it fixed the bug. The next patch will remove
(almost remove) these unneeded rebuilding.

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
Cc: tangchen <tangchen@cn.fujitsu.com>
Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c | 33 +--------------------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 19bca3e..03ce500 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -266,9 +266,6 @@ struct workqueue_struct {
 
 static struct kmem_cache *pwq_cache;
 
-static cpumask_var_t *wq_numa_possible_cpumask;
-					/* possible CPUs of each node */
-
 static bool wq_disable_numa;
 module_param_named(disable_numa, wq_disable_numa, bool, 0444);
 
@@ -3506,7 +3503,7 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	if (wq_numa_enabled) {
 		for_each_node(node) {
 			if (cpumask_subset(pool->attrs->cpumask,
-					   wq_numa_possible_cpumask[node])) {
+					   cpumask_of_node(node))) {
 				pool->node = node;
 				break;
 			}
@@ -3727,8 +3724,6 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 	if (cpumask_empty(cpumask))
 		goto use_dfl;
 
-	/* yeap, return possible CPUs in @node that @attrs wants */
-	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
 	return !cpumask_equal(cpumask, attrs->cpumask);
 
 use_dfl:
@@ -4828,9 +4823,6 @@ static int wq_numa_callback(struct notifier_block *self,
 
 static void __init wq_numa_init(void)
 {
-	cpumask_var_t *tbl;
-	int node, cpu;
-
 	if (num_possible_nodes() <= 1)
 		return;
 
@@ -4842,29 +4834,6 @@ static void __init wq_numa_init(void)
 	wq_update_unbound_numa_attrs_buf = alloc_workqueue_attrs(GFP_KERNEL);
 	BUG_ON(!wq_update_unbound_numa_attrs_buf);
 
-	/*
-	 * We want masks of possible CPUs of each node which isn't readily
-	 * available.  Build one from cpu_to_node() which should have been
-	 * fully initialized by now.
-	 */
-	tbl = kzalloc(nr_node_ids * sizeof(tbl[0]), GFP_KERNEL);
-	BUG_ON(!tbl);
-
-	for_each_node(node)
-		BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
-				node_online(node) ? node : NUMA_NO_NODE));
-
-	for_each_possible_cpu(cpu) {
-		node = cpu_to_node(cpu);
-		if (WARN_ON(node == NUMA_NO_NODE)) {
-			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
-			/* happens iff arch is bonkers, let's just proceed */
-			return;
-		}
-		cpumask_set_cpu(cpu, tbl[node]);
-	}
-
-	wq_numa_possible_cpumask = tbl;
 	hotplug_memory_notifier(wq_numa_callback, 0);
 	wq_numa_enabled = true;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 3/3 shit_B] workqueue: directly update attrs of pools when cpu hot[un]plug
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
                                     ` (2 preceding siblings ...)
  2015-01-14  8:54                   ` [RFC PATCH 2/3 shit_B] workqueue: remove wq_numa_possible_cpumask Lai Jiangshan
@ 2015-01-14  8:54                   ` Lai Jiangshan
  2015-01-16  5:22                   ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Yasuaki Ishimatsu
  2015-01-23  6:13                   ` Izumi, Taku
  5 siblings, 0 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-14  8:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Tejun Heo, Yasuaki Ishimatsu, Gu, Zheng, tangchen,
	Hiroyuki KAMEZAWA

If the "attrs" of the per-node pool is allowed to be updated,
we just directly update it and reduce the rebuild overhead and
memory-uasage. But the overhead of set_cpus_allowed_ptr() can not
be reduced yet.

Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
Cc: tangchen <tangchen@cn.fujitsu.com>
Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 03ce500..53d2757 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -166,9 +166,10 @@ struct worker_pool {
 
 	struct ida		worker_ida;	/* worker IDs for task name */
 
-	struct workqueue_attrs	*attrs;		/* I: worker attributes */
+	struct workqueue_attrs	*attrs;		/* A: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
 	int			refcnt;		/* PL: refcnt for unbound pools */
+	struct list_head	unbound_pwqs;	/* PL: list of pwq->unbound_pwq */
 
 	/*
 	 * The current concurrency level.  As it's likely to be accessed
@@ -202,6 +203,7 @@ struct pool_workqueue {
 	int			max_active;	/* L: max active works */
 	struct list_head	delayed_works;	/* L: delayed works */
 	struct list_head	pwqs_node;	/* WR: node on wq->pwqs */
+	struct list_head	unbound_pwq;	/* PL: node on pool->unbound_pwqs */
 	struct list_head	mayday_node;	/* MD: node on wq->maydays */
 
 	/*
@@ -3554,6 +3556,7 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
 	mutex_unlock(&wq->mutex);
 
 	mutex_lock(&wq_pool_mutex);
+	list_del(&pwq->unbound_pwq);
 	put_unbound_pool(pool);
 	mutex_unlock(&wq_pool_mutex);
 
@@ -3674,6 +3677,7 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
 	}
 
 	init_pwq(pwq, wq, pool);
+	list_add(&pwq->unbound_pwq, &pool->unbound_pwqs);
 	return pwq;
 }
 
@@ -3683,6 +3687,7 @@ static void free_unbound_pwq(struct pool_workqueue *pwq)
 	lockdep_assert_held(&wq_pool_mutex);
 
 	if (pwq) {
+		list_del(&pwq->unbound_pwq);
 		put_unbound_pool(pwq->pool);
 		kmem_cache_free(pwq_cache, pwq);
 	}
@@ -3869,6 +3874,58 @@ enomem:
 }
 
 /**
+ * pool_update_unbound_numa - update NUMA affinity of a pool for CPU hot[un]plug
+ * @pool: the target pool
+ * @cpu: the CPU coming up or going down
+ * @online: whether @cpu is coming up or going down
+ *
+ * This function is to be called from %CPU_DOWN_PREPARE, %CPU_ONLINE and
+ * %CPU_DOWN_FAILED.  @cpu is being hot[un]plugged, update NUMA affinity of
+ * @pool when allowed.
+ */
+static void pool_update_unbound_numa(struct worker_pool *pool, int cpu, bool online)
+{
+	u32 hash;
+	struct pool_workqueue *pwq;
+	struct worker *worker;
+
+	if (pool->cpu >= 0 || pool->node != cpu_to_node(cpu))
+		return; /* it must not be an unbound pool of the node */
+	if (online && cpumask_test_cpu(cpu, pool->attrs->cpumask))
+		return; /* it already has the online cpu */
+	if (!online && !cpumask_test_cpu(cpu, pool->attrs->cpumask))
+		return; /* it doesn't has the cpu to be removed */
+	if (!online && cpumask_weight(pool->attrs->cpumask) == 1)
+		return; /* the last cpu can't be removed from the pool */
+
+	/* It is called from CPU hot[un]plug, wq->unbound_attrs is stable */
+	list_for_each_entry(pwq, &pool->unbound_pwqs, unbound_pwq) {
+		if (wqattrs_equal(pool->attrs, pwq->wq->unbound_attrs))
+			return; /* the pool serves for at least a default pwq */
+		if (online && !cpumask_test_cpu(cpu, pwq->wq->unbound_attrs->cpumask))
+			return; /* this wq doesn't allow us to add the cpu */
+	}
+
+	/* OK, all pwqs allows us to update the pool directly, let's go */
+	mutex_lock(&pool->attach_mutex);
+	if (online)
+		cpumask_set_cpu(cpu, pool->attrs->cpumask);
+	else
+		cpumask_clear_cpu(cpu, pool->attrs->cpumask);
+
+	/* rehash */
+	hash = wqattrs_hash(pool->attrs);
+	hash_del(&pool->hash_node);
+	hash_add(unbound_pool_hash, &pool->hash_node, hash);
+
+	/* update worker's cpumask */
+	for_each_pool_worker(worker, pool)
+		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
+						  pool->attrs->cpumask) < 0);
+	mutex_unlock(&pool->attach_mutex);
+}
+
+/**
  * wq_update_unbound_numa - update NUMA affinity of a wq for CPU hot[un]plug
  * @wq: the target workqueue
  * @cpu: the CPU coming up or going down
@@ -4598,6 +4655,8 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 				restore_unbound_workers_cpumask(pool, cpu);
 
 			mutex_unlock(&pool->attach_mutex);
+
+			pool_update_unbound_numa(pool, cpu, true);
 		}
 
 		/* update NUMA affinity of unbound workqueues */
@@ -4621,6 +4680,8 @@ static int workqueue_cpu_down_callback(struct notifier_block *nfb,
 	int cpu = (unsigned long)hcpu;
 	struct work_struct unbind_work;
 	struct workqueue_struct *wq;
+	struct worker_pool *pool;
+	int pi;
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_DOWN_PREPARE:
@@ -4628,10 +4689,16 @@ static int workqueue_cpu_down_callback(struct notifier_block *nfb,
 		INIT_WORK_ONSTACK(&unbind_work, wq_unbind_fn);
 		queue_work_on(cpu, system_highpri_wq, &unbind_work);
 
-		/* update NUMA affinity of unbound workqueues */
 		mutex_lock(&wq_pool_mutex);
+
+		/* try to update NUMA affinity of unbound pool */
+		for_each_pool(pool, pi)
+			pool_update_unbound_numa(pool, cpu, false);
+
+		/* update NUMA affinity of unbound workqueues */
 		list_for_each_entry(wq, &workqueues, list)
 			wq_update_unbound_numa(wq, cpu, false);
+
 		mutex_unlock(&wq_pool_mutex);
 
 		/* wait for per-cpu unbinding to finish */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2015-01-14  2:47               ` Lai Jiangshan
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
@ 2015-01-14 13:57                 ` Tejun Heo
  2015-01-15  1:23                   ` Lai Jiangshan
  1 sibling, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2015-01-14 13:57 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Kamezawa Hiroyuki, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

Hello, Lai.

On Wed, Jan 14, 2015 at 10:47:16AM +0800, Lai Jiangshan wrote:
> > Even if that involves slightly more code, that's the right thing to do at this point.
> 
> Right, but in currently, the workqueue will be the only user, and I don't known
> asking who to do it, so I may keep it in the workqueue.c.

The problem is that working around this in workqueue effectively hides
what needs to be actively looked upon and decided.  It curently isn't
currently defined even when such mappings can change or for which
cpus?  Are all offline cpus up for grabs or just !present ones?  These
are questions which can only be answered / determined from NUMA side
and the sooner we deal with this properly the better.

> > It'd be
> > awesome if somebody more familiar with the numa side can chime in and
> > explain why this mapping change can't be avoided.
> 
> I'm also looking for someone answer it.

Exactly, whoever is requiring NUMA node remapping should explain and
justify that and how the model to handle it can only be determined
from that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug
  2015-01-14 13:57                 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Tejun Heo
@ 2015-01-15  1:23                   ` Lai Jiangshan
  0 siblings, 0 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-15  1:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kamezawa Hiroyuki, linux-kernel, "Ishimatsu,
	Yasuaki/石松 靖章",
	Tang Chen, guz.fnst

On 01/14/2015 09:57 PM, Tejun Heo wrote:
> Hello, Lai.
> 
> On Wed, Jan 14, 2015 at 10:47:16AM +0800, Lai Jiangshan wrote:
>>> Even if that involves slightly more code, that's the right thing to do at this point.
>>
>> Right, but in currently, the workqueue will be the only user, and I don't known
>> asking who to do it, so I may keep it in the workqueue.c.
> 
> The problem is that working around this in workqueue effectively hides
> what needs to be actively looked upon and decided.  It curently isn't
> currently defined even when such mappings can change or for which
> cpus?  Are all offline cpus up for grabs or just !present ones?  These
> are questions which can only be answered / determined from NUMA side
> and the sooner we deal with this properly the better.

So the solution_B totally keeps away from this spaghetti.

> 
>>> It'd be
>>> awesome if somebody more familiar with the numa side can chime in and
>>> explain why this mapping change can't be avoided.
>>
>> I'm also looking for someone answer it.
> 
> Exactly, whoever is requiring NUMA node remapping should explain and
> justify that and how the model to handle it can only be determined
> from that.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
                                     ` (3 preceding siblings ...)
  2015-01-14  8:54                   ` [RFC PATCH 3/3 shit_B] workqueue: directly update attrs of pools when cpu hot[un]plug Lai Jiangshan
@ 2015-01-16  5:22                   ` Yasuaki Ishimatsu
  2015-01-16  8:04                     ` Lai Jiangshan
  2015-01-23  6:13                   ` Izumi, Taku
  5 siblings, 1 reply; 21+ messages in thread
From: Yasuaki Ishimatsu @ 2015-01-16  5:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Tejun Heo, Gu, Zheng, tangchen, Hiroyuki KAMEZAWA

Hi Lai,

Thanks you for posting the patch-set.

I'll try your it next Monday. So, please wait a while.

Thanks,
Yasuaki Ishimatsu


(2015/01/14 17:54), Lai Jiangshan wrote:
> Hi, All
> 
> This patches are un-changloged, un-compiled, un-booted, un-tested,
> they are just shits, I even hope them un-sent or blocked.
> 
> The patches include two -solutions-:
> 
> Shit_A:
>    workqueue: reset pool->node and unhash the pool when the node is
>      offline
>    update wq_numa when cpu_present_mask changed
> 
>   kernel/workqueue.c | 107 +++++++++++++++++++++++++++++++++++++++++------------
>   1 file changed, 84 insertions(+), 23 deletions(-)
> 
> 
> Shit_B:
>    workqueue: reset pool->node and unhash the pool when the node is
>      offline
>    workqueue: remove wq_numa_possible_cpumask
>    workqueue: directly update attrs of pools when cpu hot[un]plug
> 
>   kernel/workqueue.c | 135 +++++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 101 insertions(+), 34 deletions(-)
> 
> 
> Both patch1 of the both solutions are: reset pool->node and unhash the pool,
> it is suggested by TJ, I found it is a good leading-step for fixing the bug.
> 
> The other patches are handling wq_numa_possible_cpumask where the solutions
> diverge.
> 
> Solution_A uses present_mask rather than possible_cpumask. It adds
> wq_numa_notify_cpu_present_set/cleared() for notifications of
> the changes of cpu_present_mask.  But the notifications are un-existed
> right now, so I fake one (wq_numa_check_present_cpumask_changes())
> to imitate them.  I hope the memory people add a real one.
> 
> Solution_B uses online_mask rather than possible_cpumask.
> this solution remove more coupling between numa_code and workqueue,
> it just depends on cpumask_of_node(node).
> 
> Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
> overhead when cpu hot[un]plug, Patch3 reduce this overhead.
> 
> Thanks,
> Lai
> 
> 
> Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
> Cc: tangchen <tangchen@cn.fujitsu.com>
> Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug
  2015-01-16  5:22                   ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Yasuaki Ishimatsu
@ 2015-01-16  8:04                     ` Lai Jiangshan
  0 siblings, 0 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-16  8:04 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: linux-kernel, Tejun Heo, Gu, Zheng, tangchen, Hiroyuki KAMEZAWA

On 01/16/2015 01:22 PM, Yasuaki Ishimatsu wrote:
> Hi Lai,
> 
> Thanks you for posting the patch-set.
> 
> I'll try your it next Monday. So, please wait a while.
> 

I think it is just waste for testing before the maintainer make the decision.
(discussions/ideas are welcome.)

Before TJ's decision, maybe you can do this at first:

"Make numa code maintain the mapping to the best of its knowledge and
invoke notification callbacks when it changes. "

Thanks,
Lai

> Thanks,
> Yasuaki Ishimatsu
> 
> 
> (2015/01/14 17:54), Lai Jiangshan wrote:
>> Hi, All
>>
>> This patches are un-changloged, un-compiled, un-booted, un-tested,
>> they are just shits, I even hope them un-sent or blocked.
>>
>> The patches include two -solutions-:
>>
>> Shit_A:
>>    workqueue: reset pool->node and unhash the pool when the node is
>>      offline
>>    update wq_numa when cpu_present_mask changed
>>
>>   kernel/workqueue.c | 107 +++++++++++++++++++++++++++++++++++++++++------------
>>   1 file changed, 84 insertions(+), 23 deletions(-)
>>
>>
>> Shit_B:
>>    workqueue: reset pool->node and unhash the pool when the node is
>>      offline
>>    workqueue: remove wq_numa_possible_cpumask
>>    workqueue: directly update attrs of pools when cpu hot[un]plug
>>
>>   kernel/workqueue.c | 135 +++++++++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 101 insertions(+), 34 deletions(-)
>>
>>
>> Both patch1 of the both solutions are: reset pool->node and unhash the pool,
>> it is suggested by TJ, I found it is a good leading-step for fixing the bug.
>>
>> The other patches are handling wq_numa_possible_cpumask where the solutions
>> diverge.
>>
>> Solution_A uses present_mask rather than possible_cpumask. It adds
>> wq_numa_notify_cpu_present_set/cleared() for notifications of
>> the changes of cpu_present_mask.  But the notifications are un-existed
>> right now, so I fake one (wq_numa_check_present_cpumask_changes())
>> to imitate them.  I hope the memory people add a real one.
>>
>> Solution_B uses online_mask rather than possible_cpumask.
>> this solution remove more coupling between numa_code and workqueue,
>> it just depends on cpumask_of_node(node).
>>
>> Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
>> overhead when cpu hot[un]plug, Patch3 reduce this overhead.
>>
>> Thanks,
>> Lai
>>
>>
>> Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
>> Cc: tangchen <tangchen@cn.fujitsu.com>
>> Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug
  2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
                                     ` (4 preceding siblings ...)
  2015-01-16  5:22                   ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Yasuaki Ishimatsu
@ 2015-01-23  6:13                   ` Izumi, Taku
  2015-01-23  8:18                     ` Lai Jiangshan
  5 siblings, 1 reply; 21+ messages in thread
From: Izumi, Taku @ 2015-01-23  6:13 UTC (permalink / raw)
  To: Lai, Jiangshan, linux-kernel
  Cc: Lai, Jiangshan, Tejun Heo, Ishimatsu, Yasuaki, Gu, Zheng, Tang,
	Chen, Kamezawa, Hiroyuki

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 5811 bytes --]


> This patches are un-changloged, un-compiled, un-booted, un-tested,
> they are just shits, I even hope them un-sent or blocked.
> 
> The patches include two -solutions-:
> 
> Shit_A:
>   workqueue: reset pool->node and unhash the pool when the node is
>     offline
>   update wq_numa when cpu_present_mask changed
> 
>  kernel/workqueue.c | 107 +++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 84 insertions(+), 23 deletions(-)
> 
> 
> Shit_B:
>   workqueue: reset pool->node and unhash the pool when the node is
>     offline
>   workqueue: remove wq_numa_possible_cpumask
>   workqueue: directly update attrs of pools when cpu hot[un]plug
> 
>  kernel/workqueue.c | 135 +++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 101 insertions(+), 34 deletions(-)
> 

  I tried your patchsets.
  linux-3.18.3 + Shit_A:

    Build OK. 
    I tried to reproduce the problem that Ishimatsu had reported, but it doesn't occur.
    It seems that your patch fixes this problem.

  linux-3.18.3  + Shit_B: 

    Build OK, but I encountered kernel panic at boot time.

[    0.189000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    0.189000] IP: [<ffffffff8131ef96>] __list_add+0x16/0xc0
[    0.189000] PGD 0 
[    0.189000] Oops: 0000 [#1] SMP 
[    0.189000] Modules linked in:
[    0.189000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.3+ #3
[    0.189000] Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 01.81 12/03/2014
[    0.189000] task: ffff880869678000 ti: ffff880869664000 task.ti: ffff880869664000
[    0.189000] RIP: 0010:[<ffffffff8131ef96>]  [<ffffffff8131ef96>] __list_add+0x16/0xc0
[    0.189000] RSP: 0000:ffff880869667be8  EFLAGS: 00010296
[    0.189000] RAX: ffff88087f83cda8 RBX: ffff88087f83cd80 RCX: 0000000000000000
[    0.189000] RDX: 0000000000000000 RSI: ffff88086912bb98 RDI: ffff88087f83cd80
[    0.189000] RBP: ffff880869667c08 R08: 0000000000000000 R09: ffff88087f807480
[    0.189000] R10: ffffffff810911b6 R11: ffffffff810956ac R12: 0000000000000000
[    0.189000] R13: ffff88086912bb98 R14: 0000000000000400 R15: 0000000000000400
[    0.189000] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
[    0.189000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.189000] CR2: 0000000000000008 CR3: 0000000001998000 CR4: 00000000001407f0
[    0.189000] Stack:
[    0.189000]  000000000000000a ffff88086912b800 ffff88087f83cd00 ffff88087f80c000
[    0.189000]  ffff880869667c48 ffffffff810912c8 ffff880869667c28 ffff88087f803f00
[    0.189000]  00000000fffffff4 ffff88086964b760 ffff88086964b6a0 ffff88086964b740
[    0.189000] Call Trace:
[    0.189000]  [<ffffffff810912c8>] alloc_unbound_pwq+0x298/0x3b0
[    0.189000]  [<ffffffff81091ce8>] apply_workqueue_attrs+0x158/0x4c0
[    0.189000]  [<ffffffff81092424>] __alloc_workqueue_key+0x174/0x5b0
[    0.189000]  [<ffffffff813052a6>] ? alloc_cpumask_var_node+0x56/0x80
[    0.189000]  [<ffffffff81b21573>] init_workqueues+0x33d/0x40f
[    0.189000]  [<ffffffff81b21236>] ? ftrace_define_fields_workqueue_execute_start+0x6a/0x6a
[    0.189000]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[    0.189000]  [<ffffffff81b12f4d>] ? native_smp_prepare_cpus+0x34d/0x352
[    0.189000]  [<ffffffff81b0026d>] kernel_init_freeable+0xf5/0x23c
[    0.189000]  [<ffffffff81653370>] ? rest_init+0x80/0x80
[    0.189000]  [<ffffffff8165337e>] kernel_init+0xe/0xf0
[    0.189000]  [<ffffffff8166bcfc>] ret_from_fork+0x7c/0xb0
[    0.189000]  [<ffffffff81653370>] ? rest_init+0x80/0x80
[    0.189000] Code: ff b8 f4 ff ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb 48 83 ec 08 <4c> 8b 42 08 49 39 f0 75 2e 4d 8b 45 00 4d 39 c4 75 6c 4c 39 e3 
[    0.189000] RIP  [<ffffffff8131ef96>] __list_add+0x16/0xc0
[    0.189000]  RSP <ffff880869667be8>
[    0.189000] CR2: 0000000000000008
[    0.189000] ---[ end trace 58feee6875cf67cf ]---
[    0.189000] Kernel panic - not syncing: Fatal exception
[    0.189000] ---[ end Kernel panic - not syncing: Fatal exception

   
  Sincerely,
  Taku Izumi


> Both patch1 of the both solutions are: reset pool->node and unhash the pool,
> it is suggested by TJ, I found it is a good leading-step for fixing the bug.
> 
> The other patches are handling wq_numa_possible_cpumask where the solutions
> diverge.
> 
> Solution_A uses present_mask rather than possible_cpumask. It adds
> wq_numa_notify_cpu_present_set/cleared() for notifications of
> the changes of cpu_present_mask.  But the notifications are un-existed
> right now, so I fake one (wq_numa_check_present_cpumask_changes())
> to imitate them.  I hope the memory people add a real one.
> 
> Solution_B uses online_mask rather than possible_cpumask.
> this solution remove more coupling between numa_code and workqueue,
> it just depends on cpumask_of_node(node).
> 
> Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
> overhead when cpu hot[un]plug, Patch3 reduce this overhead.
> 
> Thanks,
> Lai
> 
> 
> Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
> Cc: tangchen <tangchen@cn.fujitsu.com>
> Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug
  2015-01-23  6:13                   ` Izumi, Taku
@ 2015-01-23  8:18                     ` Lai Jiangshan
  0 siblings, 0 replies; 21+ messages in thread
From: Lai Jiangshan @ 2015-01-23  8:18 UTC (permalink / raw)
  To: "Izumi, Taku/泉 拓", linux-kernel
  Cc: Tejun Heo, "Ishimatsu,
	Yasuaki/石松 靖章",
	"Gu, Zheng/顾 政",
	"Tang, Chen/汤 晨",
	"Kamezawa, Hiroyuki/亀澤 寛之"

On 01/23/2015 02:13 PM, Izumi, Taku/泉 拓 wrote:
> 
>> This patches are un-changloged, un-compiled, un-booted, un-tested,
>> they are just shits, I even hope them un-sent or blocked.
>>
>> The patches include two -solutions-:
>>
>> Shit_A:
>>   workqueue: reset pool->node and unhash the pool when the node is
>>     offline
>>   update wq_numa when cpu_present_mask changed
>>
>>  kernel/workqueue.c | 107 +++++++++++++++++++++++++++++++++++++++++------------
>>  1 file changed, 84 insertions(+), 23 deletions(-)
>>
>>
>> Shit_B:
>>   workqueue: reset pool->node and unhash the pool when the node is
>>     offline
>>   workqueue: remove wq_numa_possible_cpumask
>>   workqueue: directly update attrs of pools when cpu hot[un]plug
>>
>>  kernel/workqueue.c | 135 +++++++++++++++++++++++++++++++++++++++--------------
>>  1 file changed, 101 insertions(+), 34 deletions(-)
>>
> 
>   I tried your patchsets.
>   linux-3.18.3 + Shit_A:
> 
>     Build OK. 
>     I tried to reproduce the problem that Ishimatsu had reported, but it doesn't occur.
>     It seems that your patch fixes this problem.
> 
>   linux-3.18.3  + Shit_B: 
> 
>     Build OK, but I encountered kernel panic at boot time.

pool->unbound_pwqs was forgotten to be initialized.

Even though, I prefer to this solution_B.

> 
> [    0.189000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [    0.189000] IP: [<ffffffff8131ef96>] __list_add+0x16/0xc0
> [    0.189000] PGD 0 
> [    0.189000] Oops: 0000 [#1] SMP 
> [    0.189000] Modules linked in:
> [    0.189000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.3+ #3
> [    0.189000] Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 01.81 12/03/2014
> [    0.189000] task: ffff880869678000 ti: ffff880869664000 task.ti: ffff880869664000
> [    0.189000] RIP: 0010:[<ffffffff8131ef96>]  [<ffffffff8131ef96>] __list_add+0x16/0xc0
> [    0.189000] RSP: 0000:ffff880869667be8  EFLAGS: 00010296
> [    0.189000] RAX: ffff88087f83cda8 RBX: ffff88087f83cd80 RCX: 0000000000000000
> [    0.189000] RDX: 0000000000000000 RSI: ffff88086912bb98 RDI: ffff88087f83cd80
> [    0.189000] RBP: ffff880869667c08 R08: 0000000000000000 R09: ffff88087f807480
> [    0.189000] R10: ffffffff810911b6 R11: ffffffff810956ac R12: 0000000000000000
> [    0.189000] R13: ffff88086912bb98 R14: 0000000000000400 R15: 0000000000000400
> [    0.189000] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
> [    0.189000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.189000] CR2: 0000000000000008 CR3: 0000000001998000 CR4: 00000000001407f0
> [    0.189000] Stack:
> [    0.189000]  000000000000000a ffff88086912b800 ffff88087f83cd00 ffff88087f80c000
> [    0.189000]  ffff880869667c48 ffffffff810912c8 ffff880869667c28 ffff88087f803f00
> [    0.189000]  00000000fffffff4 ffff88086964b760 ffff88086964b6a0 ffff88086964b740
> [    0.189000] Call Trace:
> [    0.189000]  [<ffffffff810912c8>] alloc_unbound_pwq+0x298/0x3b0
> [    0.189000]  [<ffffffff81091ce8>] apply_workqueue_attrs+0x158/0x4c0
> [    0.189000]  [<ffffffff81092424>] __alloc_workqueue_key+0x174/0x5b0
> [    0.189000]  [<ffffffff813052a6>] ? alloc_cpumask_var_node+0x56/0x80
> [    0.189000]  [<ffffffff81b21573>] init_workqueues+0x33d/0x40f
> [    0.189000]  [<ffffffff81b21236>] ? ftrace_define_fields_workqueue_execute_start+0x6a/0x6a
> [    0.189000]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
> [    0.189000]  [<ffffffff81b12f4d>] ? native_smp_prepare_cpus+0x34d/0x352
> [    0.189000]  [<ffffffff81b0026d>] kernel_init_freeable+0xf5/0x23c
> [    0.189000]  [<ffffffff81653370>] ? rest_init+0x80/0x80
> [    0.189000]  [<ffffffff8165337e>] kernel_init+0xe/0xf0
> [    0.189000]  [<ffffffff8166bcfc>] ret_from_fork+0x7c/0xb0
> [    0.189000]  [<ffffffff81653370>] ? rest_init+0x80/0x80
> [    0.189000] Code: ff b8 f4 ff ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb 48 83 ec 08 <4c> 8b 42 08 49 39 f0 75 2e 4d 8b 45 00 4d 39 c4 75 6c 4c 39 e3 
> [    0.189000] RIP  [<ffffffff8131ef96>] __list_add+0x16/0xc0
> [    0.189000]  RSP <ffff880869667be8>
> [    0.189000] CR2: 0000000000000008
> [    0.189000] ---[ end trace 58feee6875cf67cf ]---
> [    0.189000] Kernel panic - not syncing: Fatal exception
> [    0.189000] ---[ end Kernel panic - not syncing: Fatal exception
> 
>    
>   Sincerely,
>   Taku Izumi
> 
> 
>> Both patch1 of the both solutions are: reset pool->node and unhash the pool,
>> it is suggested by TJ, I found it is a good leading-step for fixing the bug.
>>
>> The other patches are handling wq_numa_possible_cpumask where the solutions
>> diverge.
>>
>> Solution_A uses present_mask rather than possible_cpumask. It adds
>> wq_numa_notify_cpu_present_set/cleared() for notifications of
>> the changes of cpu_present_mask.  But the notifications are un-existed
>> right now, so I fake one (wq_numa_check_present_cpumask_changes())
>> to imitate them.  I hope the memory people add a real one.
>>
>> Solution_B uses online_mask rather than possible_cpumask.
>> this solution remove more coupling between numa_code and workqueue,
>> it just depends on cpumask_of_node(node).
>>
>> Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
>> overhead when cpu hot[un]plug, Patch3 reduce this overhead.
>>
>> Thanks,
>> Lai
>>
>>
>> Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
>> Cc: tangchen <tangchen@cn.fujitsu.com>
>> Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-01-23  8:17 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-16 16:36 [PATCH 0/2] workqueue: fix a bug when numa mapping is changed v4 Kamezawa Hiroyuki
2014-12-16 16:45 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Kamezawa Hiroyuki
2014-12-17  1:36   ` Lai Jiangshan
2014-12-17  3:22     ` Kamezawa Hiroyuki
2014-12-17  4:56       ` Kamezawa Hiroyuki
2014-12-25 20:11         ` Tejun Heo
2015-01-13  7:19           ` Lai Jiangshan
2015-01-13 15:22             ` Tejun Heo
2015-01-14  2:47               ` Lai Jiangshan
2015-01-14  8:54                 ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Lai Jiangshan
2015-01-14  8:54                   ` [RFC PATCH 1/2 shit_A shit_B] workqueue: reset pool->node and unhash the pool when the node is offline Lai Jiangshan
2015-01-14  8:54                   ` [RFC PATCH 2/2 shit_A] workqueue: update wq_numa when cpu_present_mask changed Lai Jiangshan
2015-01-14  8:54                   ` [RFC PATCH 2/3 shit_B] workqueue: remove wq_numa_possible_cpumask Lai Jiangshan
2015-01-14  8:54                   ` [RFC PATCH 3/3 shit_B] workqueue: directly update attrs of pools when cpu hot[un]plug Lai Jiangshan
2015-01-16  5:22                   ` [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug Yasuaki Ishimatsu
2015-01-16  8:04                     ` Lai Jiangshan
2015-01-23  6:13                   ` Izumi, Taku
2015-01-23  8:18                     ` Lai Jiangshan
2015-01-14 13:57                 ` [PATCH 1/2] workqueue: update numa affinity info at node hotplug Tejun Heo
2015-01-15  1:23                   ` Lai Jiangshan
2014-12-16 16:51 ` [PATCH 2/2] workqueue: update cpumask at CPU_ONLINE if necessary Kamezawa Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.