All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] Make cpuid <-> nodeid mapping persistent
@ 2016-07-19  7:28 ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.


For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
https://lkml.org/lkml/2015/7/7/200
https://lkml.org/lkml/2015/9/27/209
https://lkml.org/lkml/2016/5/19/212

Change log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 2.
3. Fix some comment.

Dou Liyang (2):
  Provide the mechanism to validate processors in the ACPI tables
  Provide the interface to validate the proc_id which they give

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
    boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
    persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
    booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c       |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  10 ++--
 arch/x86/kernel/apic/apic.c   |  85 +++++++++++++++++++++++++---
 arch/x86/mm/numa.c            |  27 +++++----
 drivers/acpi/acpi_processor.c | 105 ++++++++++++++++++++++++++++++++++-
 drivers/acpi/bus.c            |   3 +
 drivers/acpi/processor_core.c | 126 +++++++++++++++++++++++++++++++++++-------
 include/linux/acpi.h          |   5 ++
 9 files changed, 314 insertions(+), 51 deletions(-)

-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v8 0/7] Make cpuid <-> nodeid mapping persistent
@ 2016-07-19  7:28 ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.


For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
https://lkml.org/lkml/2015/7/7/200
https://lkml.org/lkml/2015/9/27/209
https://lkml.org/lkml/2016/5/19/212

Change log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 2.
3. Fix some comment.

Dou Liyang (2):
  Provide the mechanism to validate processors in the ACPI tables
  Provide the interface to validate the proc_id which they give

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
    boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
    persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
    booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c       |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  10 ++--
 arch/x86/kernel/apic/apic.c   |  85 +++++++++++++++++++++++++---
 arch/x86/mm/numa.c            |  27 +++++----
 drivers/acpi/acpi_processor.c | 105 ++++++++++++++++++++++++++++++++++-
 drivers/acpi/bus.c            |   3 +
 drivers/acpi/processor_core.c | 126 +++++++++++++++++++++++++++++++++++-------
 include/linux/acpi.h          |   5 ++
 9 files changed, 314 insertions(+), 51 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
  2016-07-19  7:28 ` Dou Liyang
@ 2016-07-19  7:28   ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Tang Chen, Zhu Guihua,
	Dou Liyang

From: Tang Chen <tangchen@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a 1-1 mapping. It means the
cpu will be mapped to the node it belongs to, and will never be changed.
If a node has only cpus but no memory, the cpus on it will be mapped to
a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9c086c5..2a87a28 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -723,22 +723,19 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-	int n, val;
-	int min_val = INT_MAX;
-	int best_node = -1;
+	unsigned long zones_size[MAX_NR_ZONES] = {0};
+	unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-	for_each_online_node(n) {
-		val = node_distance(node, n);
+	/* Allocate and initialize node data. Memory-less node is now online.*/
+	alloc_node_data(nid);
+	free_area_init_node(nid, zones_size, 0, zholes_size);
 
-		if (val < min_val) {
-			min_val = val;
-			best_node = n;
-		}
-	}
-
-	return best_node;
+	/*
+	 * All zonelists will be built later in start_kernel() after per cpu
+	 * areas are initialized.
+	 */
 }
 
 /*
@@ -767,8 +764,10 @@ void __init init_cpu_to_node(void)
 
 		if (node == NUMA_NO_NODE)
 			continue;
+
 		if (!node_online(node))
-			node = find_near_online_node(node);
+			init_memory_less_node(node);
+
 		numa_set_node(cpu, node);
 	}
 }
-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
@ 2016-07-19  7:28   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Tang Chen, Zhu Guihua,
	Dou Liyang

From: Tang Chen <tangchen@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a 1-1 mapping. It means the
cpu will be mapped to the node it belongs to, and will never be changed.
If a node has only cpus but no memory, the cpus on it will be mapped to
a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9c086c5..2a87a28 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -723,22 +723,19 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-	int n, val;
-	int min_val = INT_MAX;
-	int best_node = -1;
+	unsigned long zones_size[MAX_NR_ZONES] = {0};
+	unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-	for_each_online_node(n) {
-		val = node_distance(node, n);
+	/* Allocate and initialize node data. Memory-less node is now online.*/
+	alloc_node_data(nid);
+	free_area_init_node(nid, zones_size, 0, zholes_size);
 
-		if (val < min_val) {
-			min_val = val;
-			best_node = n;
-		}
-	}
-
-	return best_node;
+	/*
+	 * All zonelists will be built later in start_kernel() after per cpu
+	 * areas are initialized.
+	 */
 }
 
 /*
@@ -767,8 +764,10 @@ void __init init_cpu_to_node(void)
 
 		if (node == NUMA_NO_NODE)
 			continue;
+
 		if (!node_online(node))
-			node = find_near_online_node(node);
+			init_memory_less_node(node);
+
 		numa_set_node(cpu, node);
 	}
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.
  2016-07-19  7:28 ` Dou Liyang
@ 2016-07-19  7:28   ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 60078a6..8e3c377 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-int generic_processor_info(int apicid, int version)
+static int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2032,7 +2032,8 @@ int generic_processor_info(int apicid, int version)
 			   " Processor %d/0x%x ignored.\n",
 			   thiscpu, apicid);
 
-		disabled_cpus++;
+		if (enabled)
+			disabled_cpus++;
 		return -ENODEV;
 	}
 
@@ -2049,7 +2050,8 @@ int generic_processor_info(int apicid, int version)
 			" reached. Keeping one slot for boot cpu."
 			"  Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
 
-		disabled_cpus++;
+		if (enabled)
+			disabled_cpus++;
 		return -ENODEV;
 	}
 
@@ -2060,11 +2062,14 @@ int generic_processor_info(int apicid, int version)
 			"ACPI: NR_CPUS/possible_cpus limit of %i reached."
 			"  Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
 
-		disabled_cpus++;
+		if (enabled)
+			disabled_cpus++;
 		return -EINVAL;
 	}
 
-	num_processors++;
+	if (enabled)
+		num_processors++;
+
 	if (apicid == boot_cpu_physical_apicid) {
 		/*
 		 * x86_bios_cpu_apicid is required to have processors listed
@@ -2106,7 +2111,8 @@ int generic_processor_info(int apicid, int version)
 			apic_version[boot_cpu_physical_apicid], cpu, version);
 	}
 
-	physid_set(apicid, phys_cpu_present_map);
+	if (enabled)
+		physid_set(apicid, phys_cpu_present_map);
 	if (apicid > max_physical_apicid)
 		max_physical_apicid = apicid;
 
@@ -2119,11 +2125,17 @@ int generic_processor_info(int apicid, int version)
 		apic->x86_32_early_logical_apicid(cpu);
 #endif
 	set_cpu_possible(cpu, true);
-	set_cpu_present(cpu, true);
+	if (enabled)
+		set_cpu_present(cpu, true);
 
 	return cpu;
 }
 
+int generic_processor_info(int apicid, int version)
+{
+	return __generic_processor_info(apicid, version, true);
+}
+
 int hard_smp_processor_id(void)
 {
 	return read_apic_id();
-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.
@ 2016-07-19  7:28   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 60078a6..8e3c377 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-int generic_processor_info(int apicid, int version)
+static int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2032,7 +2032,8 @@ int generic_processor_info(int apicid, int version)
 			   " Processor %d/0x%x ignored.\n",
 			   thiscpu, apicid);
 
-		disabled_cpus++;
+		if (enabled)
+			disabled_cpus++;
 		return -ENODEV;
 	}
 
@@ -2049,7 +2050,8 @@ int generic_processor_info(int apicid, int version)
 			" reached. Keeping one slot for boot cpu."
 			"  Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
 
-		disabled_cpus++;
+		if (enabled)
+			disabled_cpus++;
 		return -ENODEV;
 	}
 
@@ -2060,11 +2062,14 @@ int generic_processor_info(int apicid, int version)
 			"ACPI: NR_CPUS/possible_cpus limit of %i reached."
 			"  Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
 
-		disabled_cpus++;
+		if (enabled)
+			disabled_cpus++;
 		return -EINVAL;
 	}
 
-	num_processors++;
+	if (enabled)
+		num_processors++;
+
 	if (apicid == boot_cpu_physical_apicid) {
 		/*
 		 * x86_bios_cpu_apicid is required to have processors listed
@@ -2106,7 +2111,8 @@ int generic_processor_info(int apicid, int version)
 			apic_version[boot_cpu_physical_apicid], cpu, version);
 	}
 
-	physid_set(apicid, phys_cpu_present_map);
+	if (enabled)
+		physid_set(apicid, phys_cpu_present_map);
 	if (apicid > max_physical_apicid)
 		max_physical_apicid = apicid;
 
@@ -2119,11 +2125,17 @@ int generic_processor_info(int apicid, int version)
 		apic->x86_32_early_logical_apicid(cpu);
 #endif
 	set_cpu_possible(cpu, true);
-	set_cpu_present(cpu, true);
+	if (enabled)
+		set_cpu_present(cpu, true);
 
 	return cpu;
 }
 
+int generic_processor_info(int apicid, int version)
+{
+	return __generic_processor_info(apicid, version, true);
+}
+
 int hard_smp_processor_id(void)
 {
 	return read_apic_id();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.
  2016-07-19  7:28 ` Dou Liyang
@ 2016-07-19  7:28   ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  6 ++---
 arch/x86/kernel/apic/apic.c   | 61 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE	BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9414f84..37248c3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
 		return -EINVAL;
 	}
 
-	if (!enabled) {
+	if (!enabled)
 		++disabled_cpus;
-		return -EINVAL;
-	}
 
 	if (boot_cpu_physical_apicid != -1U)
 		ver = apic_version[boot_cpu_physical_apicid];
 
-	return generic_processor_info(id, ver);
+	return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8e3c377..366fbbc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+	int i;
+
+	/*
+	 * cpuid <-> apicid mapping is persistent, so when a cpu is up,
+	 * check if the kernel has allocated a cpuid for it.
+	 */
+	for (i = 0; i < nr_logical_cpuids; i++) {
+		if (cpuid_to_apicid[i] == apicid)
+			return i;
+	}
+
+	/* Allocate a new cpuid. */
+	if (nr_logical_cpuids >= nr_cpu_ids) {
+		WARN_ONCE(1, "Only %d processors supported."
+			     "Processor %d/0x%x and the rest are ignored.\n",
+			     nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+		return -1;
+	}
+
+	cpuid_to_apicid[nr_logical_cpuids] = apicid;
+	return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2079,8 +2125,17 @@ static int __generic_processor_info(int apicid, int version, bool enabled)
 		 * for BSP.
 		 */
 		cpu = 0;
-	} else
-		cpu = cpumask_next_zero(-1, cpu_present_mask);
+
+		/* Logical cpuid 0 is reserved for BSP. */
+		cpuid_to_apicid[0] = apicid;
+	} else {
+		cpu = allocate_logical_cpuid(apicid);
+		if (cpu < 0) {
+			if (enabled)
+				disabled_cpus++;
+			return -EINVAL;
+		}
+	}
 
 	/*
 	 * This can happen on physical hotplug. The sanity check at boot time
-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.
@ 2016-07-19  7:28   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  6 ++---
 arch/x86/kernel/apic/apic.c   | 61 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE	BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9414f84..37248c3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
 		return -EINVAL;
 	}
 
-	if (!enabled) {
+	if (!enabled)
 		++disabled_cpus;
-		return -EINVAL;
-	}
 
 	if (boot_cpu_physical_apicid != -1U)
 		ver = apic_version[boot_cpu_physical_apicid];
 
-	return generic_processor_info(id, ver);
+	return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8e3c377..366fbbc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+	int i;
+
+	/*
+	 * cpuid <-> apicid mapping is persistent, so when a cpu is up,
+	 * check if the kernel has allocated a cpuid for it.
+	 */
+	for (i = 0; i < nr_logical_cpuids; i++) {
+		if (cpuid_to_apicid[i] == apicid)
+			return i;
+	}
+
+	/* Allocate a new cpuid. */
+	if (nr_logical_cpuids >= nr_cpu_ids) {
+		WARN_ONCE(1, "Only %d processors supported."
+			     "Processor %d/0x%x and the rest are ignored.\n",
+			     nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+		return -1;
+	}
+
+	cpuid_to_apicid[nr_logical_cpuids] = apicid;
+	return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2079,8 +2125,17 @@ static int __generic_processor_info(int apicid, int version, bool enabled)
 		 * for BSP.
 		 */
 		cpu = 0;
-	} else
-		cpu = cpumask_next_zero(-1, cpu_present_mask);
+
+		/* Logical cpuid 0 is reserved for BSP. */
+		cpuid_to_apicid[0] = apicid;
+	} else {
+		cpu = allocate_logical_cpuid(apicid);
+		if (cpu < 0) {
+			if (enabled)
+				disabled_cpus++;
+			return -EINVAL;
+		}
+	}
 
 	/*
 	 * This can happen on physical hotplug. The sanity check at boot time
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  2016-07-19  7:28 ` Dou Liyang
@ 2016-07-19  7:28   ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm        (persistent)
2. apicid (physical cpu id)   <->   nodeid     (persistent)
3. cpuid (logical cpu id)     <->   apicid     (not persistent, now persistent by step 2)
4. cpuid (logical cpu id)     <->   nodeid     (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++++++++++++++++++++++++++----------------
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
 	 *  Extra Processor objects may be enumerated on MP systems with
 	 *  less than the max # of CPUs. They should be ignored _iff
 	 *  they are physically not present.
+	 *
+	 *  NOTE: Even if the processor has a cpuid, it may not present because
+	 *  cpuid <-> apicid mapping is persistent now.
 	 */
-	if (invalid_logical_cpuid(pr->id)) {
+	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
 		int ret = acpi_processor_hotadd_init(pr);
 		if (ret)
 			return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-		 u32 acpi_id, phys_cpuid_t *apic_id)
+		 u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
 	struct acpi_madt_local_apic *lapic =
 		container_of(entry, struct acpi_madt_local_apic, header);
 
-	if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_x2apic *apic =
 		container_of(entry, struct acpi_madt_local_x2apic, header);
 
-	if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_sapic *lsapic =
 		container_of(entry, struct acpi_madt_local_sapic, header);
 
-	if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
+		bool ignore_disabled)
 {
 	struct acpi_madt_generic_interrupt *gicc =
 	    container_of(entry, struct acpi_madt_generic_interrupt, header);
 
-	if (!(gicc->flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	/* device_declaration means Device object in DSDT, in the
@@ -108,7 +111,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
 	return -EINVAL;
 }
 
-static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
+static phys_cpuid_t map_madt_entry(int type, u32 acpi_id, bool ignore_disabled)
 {
 	unsigned long madt_end, entry;
 	phys_cpuid_t phys_id = PHYS_CPUID_INVALID;	/* CPU hardware ID */
@@ -128,16 +131,20 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
 		struct acpi_subtable_header *header =
 			(struct acpi_subtable_header *)entry;
 		if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
-			if (!map_lapic_id(header, acpi_id, &phys_id))
+			if (!map_lapic_id(header, acpi_id, &phys_id,
+					  ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
-			if (!map_x2apic_id(header, type, acpi_id, &phys_id))
+			if (!map_x2apic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
-			if (!map_lsapic_id(header, type, acpi_id, &phys_id))
+			if (!map_lsapic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT) {
-			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id))
+			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id,
+					    ignore_disabled))
 				break;
 		}
 		entry += header->length;
@@ -145,7 +152,8 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
 	return phys_id;
 }
 
-static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id,
+				  bool ignore_disabled)
 {
 	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
 	union acpi_object *obj;
@@ -166,30 +174,37 @@ static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
 
 	header = (struct acpi_subtable_header *)obj->buffer.pointer;
 	if (header->type == ACPI_MADT_TYPE_LOCAL_APIC)
-		map_lapic_id(header, acpi_id, &phys_id);
+		map_lapic_id(header, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC)
-		map_lsapic_id(header, type, acpi_id, &phys_id);
+		map_lsapic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC)
-		map_x2apic_id(header, type, acpi_id, &phys_id);
+		map_x2apic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT)
-		map_gicc_mpidr(header, type, acpi_id, &phys_id);
+		map_gicc_mpidr(header, type, acpi_id, &phys_id,
+			       ignore_disabled);
 
 exit:
 	kfree(buffer.pointer);
 	return phys_id;
 }
 
-phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t __acpi_get_phys_id(acpi_handle handle, int type,
+				       u32 acpi_id, bool ignore_disabled)
 {
 	phys_cpuid_t phys_id;
 
-	phys_id = map_mat_entry(handle, type, acpi_id);
+	phys_id = map_mat_entry(handle, type, acpi_id, ignore_disabled);
 	if (invalid_phys_cpuid(phys_id))
-		phys_id = map_madt_entry(type, acpi_id);
+		phys_id = map_madt_entry(type, acpi_id, ignore_disabled);
 
 	return phys_id;
 }
 
+phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+{
+	return __acpi_get_phys_id(handle, type, acpi_id, true);
+}
+
 int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id)
 {
 #ifdef CONFIG_SMP
-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
@ 2016-07-19  7:28   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm        (persistent)
2. apicid (physical cpu id)   <->   nodeid     (persistent)
3. cpuid (logical cpu id)     <->   apicid     (not persistent, now persistent by step 2)
4. cpuid (logical cpu id)     <->   nodeid     (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++++++++++++++++++++++++++----------------
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
 	 *  Extra Processor objects may be enumerated on MP systems with
 	 *  less than the max # of CPUs. They should be ignored _iff
 	 *  they are physically not present.
+	 *
+	 *  NOTE: Even if the processor has a cpuid, it may not present because
+	 *  cpuid <-> apicid mapping is persistent now.
 	 */
-	if (invalid_logical_cpuid(pr->id)) {
+	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
 		int ret = acpi_processor_hotadd_init(pr);
 		if (ret)
 			return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-		 u32 acpi_id, phys_cpuid_t *apic_id)
+		 u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
 	struct acpi_madt_local_apic *lapic =
 		container_of(entry, struct acpi_madt_local_apic, header);
 
-	if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_x2apic *apic =
 		container_of(entry, struct acpi_madt_local_x2apic, header);
 
-	if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_sapic *lsapic =
 		container_of(entry, struct acpi_madt_local_sapic, header);
 
-	if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
+		bool ignore_disabled)
 {
 	struct acpi_madt_generic_interrupt *gicc =
 	    container_of(entry, struct acpi_madt_generic_interrupt, header);
 
-	if (!(gicc->flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	/* device_declaration means Device object in DSDT, in the
@@ -108,7 +111,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
 	return -EINVAL;
 }
 
-static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
+static phys_cpuid_t map_madt_entry(int type, u32 acpi_id, bool ignore_disabled)
 {
 	unsigned long madt_end, entry;
 	phys_cpuid_t phys_id = PHYS_CPUID_INVALID;	/* CPU hardware ID */
@@ -128,16 +131,20 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
 		struct acpi_subtable_header *header =
 			(struct acpi_subtable_header *)entry;
 		if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
-			if (!map_lapic_id(header, acpi_id, &phys_id))
+			if (!map_lapic_id(header, acpi_id, &phys_id,
+					  ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
-			if (!map_x2apic_id(header, type, acpi_id, &phys_id))
+			if (!map_x2apic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
-			if (!map_lsapic_id(header, type, acpi_id, &phys_id))
+			if (!map_lsapic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT) {
-			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id))
+			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id,
+					    ignore_disabled))
 				break;
 		}
 		entry += header->length;
@@ -145,7 +152,8 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
 	return phys_id;
 }
 
-static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id,
+				  bool ignore_disabled)
 {
 	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
 	union acpi_object *obj;
@@ -166,30 +174,37 @@ static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
 
 	header = (struct acpi_subtable_header *)obj->buffer.pointer;
 	if (header->type == ACPI_MADT_TYPE_LOCAL_APIC)
-		map_lapic_id(header, acpi_id, &phys_id);
+		map_lapic_id(header, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC)
-		map_lsapic_id(header, type, acpi_id, &phys_id);
+		map_lsapic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC)
-		map_x2apic_id(header, type, acpi_id, &phys_id);
+		map_x2apic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT)
-		map_gicc_mpidr(header, type, acpi_id, &phys_id);
+		map_gicc_mpidr(header, type, acpi_id, &phys_id,
+			       ignore_disabled);
 
 exit:
 	kfree(buffer.pointer);
 	return phys_id;
 }
 
-phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t __acpi_get_phys_id(acpi_handle handle, int type,
+				       u32 acpi_id, bool ignore_disabled)
 {
 	phys_cpuid_t phys_id;
 
-	phys_id = map_mat_entry(handle, type, acpi_id);
+	phys_id = map_mat_entry(handle, type, acpi_id, ignore_disabled);
 	if (invalid_phys_cpuid(phys_id))
-		phys_id = map_madt_entry(type, acpi_id);
+		phys_id = map_madt_entry(type, acpi_id, ignore_disabled);
 
 	return phys_id;
 }
 
+phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+{
+	return __acpi_get_phys_id(handle, type, acpi_id, true);
+}
+
 int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id)
 {
 #ifdef CONFIG_SMP
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
  2016-07-19  7:28 ` Dou Liyang
@ 2016-07-19  7:28   ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c       |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 ++++
 drivers/acpi/bus.c            |  3 ++
 drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h          |  2 ++
 6 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 #endif
 	return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 37248c3..0900264f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include <acpi/processor.h>
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	int nid;
@@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 		numa_set_node(cpu, nid);
 	}
 #endif
+	return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+	return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
 	unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..d8b7272 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
 	acpi_sleep_proc_init();
 	acpi_wakeup_device_init();
 	acpi_debugger_init();
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+	acpi_set_processor_mapping();
+#endif
 	return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..69fb027 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,71 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
+{
+	int type;
+	u32 acpi_id;
+	acpi_status status;
+	acpi_object_type acpi_type;
+	unsigned long long tmp;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_get_type(handle, &acpi_type);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	switch (acpi_type) {
+	case ACPI_TYPE_PROCESSOR:
+		status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = object.processor.proc_id;
+		break;
+	case ACPI_TYPE_DEVICE:
+		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = tmp;
+		break;
+	default:
+		return false;
+	}
+
+	type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
+
+	*phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
+	*cpuid = acpi_map_cpuid(*phys_id, acpi_id);
+	if (*cpuid == -1)
+		return false;
+
+	return true;
+}
+
+static acpi_status __init
+set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
+			   void **rv)
+{
+	phys_cpuid_t phys_id;
+	int cpu_id;
+
+	if (!map_processor(handle, &phys_id, &cpu_id))
+		return AE_ERROR;
+
+	acpi_map_cpu2node(handle, cpu_id, phys_id);
+	return AE_OK;
+}
+
+void __init acpi_set_processor_mapping(void)
+{
+	/* Set persistent cpu <-> node mapping for all processors. */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+			    ACPI_UINT32_MAX, set_processor_node_mapping,
+			    NULL, NULL, NULL);
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 static int get_ioapic_id(struct acpi_subtable_header *entry, u32 gsi_base,
 			 u64 *phys_addr, int *ioapic_id)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 288fac5..53b3014 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -258,6 +258,8 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
 int acpi_unmap_cpu(int cpu);
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid);
+void __init acpi_set_processor_mapping(void);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
@ 2016-07-19  7:28   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c       |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 ++++
 drivers/acpi/bus.c            |  3 ++
 drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h          |  2 ++
 6 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 #endif
 	return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 37248c3..0900264f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include <acpi/processor.h>
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	int nid;
@@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 		numa_set_node(cpu, nid);
 	}
 #endif
+	return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+	return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
 	unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..d8b7272 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
 	acpi_sleep_proc_init();
 	acpi_wakeup_device_init();
 	acpi_debugger_init();
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+	acpi_set_processor_mapping();
+#endif
 	return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..69fb027 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,71 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
+{
+	int type;
+	u32 acpi_id;
+	acpi_status status;
+	acpi_object_type acpi_type;
+	unsigned long long tmp;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_get_type(handle, &acpi_type);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	switch (acpi_type) {
+	case ACPI_TYPE_PROCESSOR:
+		status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = object.processor.proc_id;
+		break;
+	case ACPI_TYPE_DEVICE:
+		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = tmp;
+		break;
+	default:
+		return false;
+	}
+
+	type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
+
+	*phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
+	*cpuid = acpi_map_cpuid(*phys_id, acpi_id);
+	if (*cpuid == -1)
+		return false;
+
+	return true;
+}
+
+static acpi_status __init
+set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
+			   void **rv)
+{
+	phys_cpuid_t phys_id;
+	int cpu_id;
+
+	if (!map_processor(handle, &phys_id, &cpu_id))
+		return AE_ERROR;
+
+	acpi_map_cpu2node(handle, cpu_id, phys_id);
+	return AE_OK;
+}
+
+void __init acpi_set_processor_mapping(void)
+{
+	/* Set persistent cpu <-> node mapping for all processors. */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+			    ACPI_UINT32_MAX, set_processor_node_mapping,
+			    NULL, NULL, NULL);
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 static int get_ioapic_id(struct acpi_subtable_header *entry, u32 gsi_base,
 			 u64 *phys_addr, int *ioapic_id)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 288fac5..53b3014 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -258,6 +258,8 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
 int acpi_unmap_cpu(int cpu);
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid);
+void __init acpi_set_processor_mapping(void);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 6/7] Provide the mechanism to validate processors in the ACPI tables
  2016-07-19  7:28 ` Dou Liyang
@ 2016-07-19  7:28   ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this: 

proc_id   |    pxm
--------------------
0      <->      0
1      <->      0
2       <->     1
3      <->      1
89      <->     0
89      <->     0
89      <->     0
89      <->     1
89      <->     1
89      <->     2
89      <->     3
.....

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:
 
 foreach Processor in DSDT
  proc_id= get_ACPI_Processor_number(Processor)
   if(the proc_id has alreadly existed )
     mark both of them as being unreasonable;
				    
The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDS
which mean that the processor objects in question are not valid. 

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler processor_container_handler = {
 	.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+	int i;
+
+	if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+		return;
+
+	/*
+	 * Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+	 * already in the IDs, do nothing.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return;
+	}
+
+	/*
+	 * Secondly, compare the proc_id with unique IDs, if the proc_id is in
+	 * the IDs, put it in the duplicate IDs.
+	 */
+	for (i = 0; i < nr_unique_ids; i++) {
+		if (unique_processor_ids[i] == proc_id) {
+			duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+			nr_duplicate_ids++;
+			return;
+		}
+	}
+
+	/*
+	 * Lastly, the proc_id is a unique ID, put it in the unique IDs.
+	 */
+	unique_processor_ids[nr_unique_ids] = proc_id;
+	nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+						u32 lvl,
+						void *context,
+						void **rv)
+{
+	acpi_status status;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+	if (ACPI_FAILURE(status))
+		acpi_handle_info(handle, "Not get the processor object\n");
+	else
+		processor_validated_ids_update(object.processor.proc_id);
+
+	return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+	/* Search all processor nodes in ACPI namespace */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+						ACPI_UINT32_MAX,
+						acpi_processor_ids_walk,
+						NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+	acpi_processor_duplication_valiate();
 	acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
 	acpi_scan_add_handler(&processor_container_handler);
 }
-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 6/7] Provide the mechanism to validate processors in the ACPI tables
@ 2016-07-19  7:28   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this: 

proc_id   |    pxm
--------------------
0      <->      0
1      <->      0
2       <->     1
3      <->      1
89      <->     0
89      <->     0
89      <->     0
89      <->     1
89      <->     1
89      <->     2
89      <->     3
.....

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:
 
 foreach Processor in DSDT
  proc_id= get_ACPI_Processor_number(Processor)
   if(the proc_id has alreadly existed )
     mark both of them as being unreasonable;
				    
The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDS
which mean that the processor objects in question are not valid. 

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler processor_container_handler = {
 	.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+	int i;
+
+	if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+		return;
+
+	/*
+	 * Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+	 * already in the IDs, do nothing.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return;
+	}
+
+	/*
+	 * Secondly, compare the proc_id with unique IDs, if the proc_id is in
+	 * the IDs, put it in the duplicate IDs.
+	 */
+	for (i = 0; i < nr_unique_ids; i++) {
+		if (unique_processor_ids[i] == proc_id) {
+			duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+			nr_duplicate_ids++;
+			return;
+		}
+	}
+
+	/*
+	 * Lastly, the proc_id is a unique ID, put it in the unique IDs.
+	 */
+	unique_processor_ids[nr_unique_ids] = proc_id;
+	nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+						u32 lvl,
+						void *context,
+						void **rv)
+{
+	acpi_status status;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+	if (ACPI_FAILURE(status))
+		acpi_handle_info(handle, "Not get the processor object\n");
+	else
+		processor_validated_ids_update(object.processor.proc_id);
+
+	return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+	/* Search all processor nodes in ACPI namespace */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+						ACPI_UINT32_MAX,
+						acpi_processor_ids_walk,
+						NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+	acpi_processor_duplication_valiate();
 	acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
 	acpi_scan_add_handler(&processor_container_handler);
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 7/7] Provide the interface to validate the proc_id which they give
  2016-07-19  7:28 ` Dou Liyang
@ 2016-07-19  7:28   ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, false represents available.

When we establish all possible cpuid <-> nodeid mapping, we will use the
proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we will
stop the mapping.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 ++++++++++++++++
 drivers/acpi/processor_core.c |  4 ++++
 include/linux/acpi.h          |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
 						NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+	int i;
+
+	/*
+	 * compare the proc_id with duplicate IDs, if the proc_id is already
+	 * in the duplicate IDs, return true, otherwise, return false.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return true;
+	}
+
+	return false;
+}
+
 void __init acpi_processor_init(void)
 {
 	acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 69fb027..b8fad20 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -282,6 +282,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
 		if (ACPI_FAILURE(status))
 			return false;
 		acpi_id = object.processor.proc_id;
+
+		/* validate the acpi_id */
+		if(acpi_processor_validate_proc_id(acpi_id))
+			return false;
 		break;
 	case ACPI_TYPE_DEVICE:
 		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 53b3014..94ceae1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -254,6 +254,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 	return phys_id == PHYS_CPUID_INVALID;
 }
 
+/*validate the processor object's proc_id*/
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 7/7] Provide the interface to validate the proc_id which they give
@ 2016-07-19  7:28   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-19  7:28 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, false represents available.

When we establish all possible cpuid <-> nodeid mapping, we will use the
proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we will
stop the mapping.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 ++++++++++++++++
 drivers/acpi/processor_core.c |  4 ++++
 include/linux/acpi.h          |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
 						NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+	int i;
+
+	/*
+	 * compare the proc_id with duplicate IDs, if the proc_id is already
+	 * in the duplicate IDs, return true, otherwise, return false.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return true;
+	}
+
+	return false;
+}
+
 void __init acpi_processor_init(void)
 {
 	acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 69fb027..b8fad20 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -282,6 +282,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
 		if (ACPI_FAILURE(status))
 			return false;
 		acpi_id = object.processor.proc_id;
+
+		/* validate the acpi_id */
+		if(acpi_processor_validate_proc_id(acpi_id))
+			return false;
 		break;
 	case ACPI_TYPE_DEVICE:
 		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 53b3014..94ceae1 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -254,6 +254,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 	return phys_id == PHYS_CPUID_INVALID;
 }
 
+/*validate the processor object's proc_id*/
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
  2016-07-19  7:28   ` Dou Liyang
@ 2016-07-19 18:50     ` Tejun Heo
  -1 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2016-07-19 18:50 UTC (permalink / raw)
  To: Dou Liyang
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Tang Chen, Zhu Guihua

Hello,

On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:
> In this series of patches, we are going to construct cpu <-> node mapping
> for all possible cpus at boot time, which is a 1-1 mapping. It means the

1-1 mapping means that each cpu is mapped to its own private node
which isn't the case.  Just call it a persistent mapping?

> cpu will be mapped to the node it belongs to, and will never be changed.
> If a node has only cpus but no memory, the cpus on it will be mapped to
> a memory-less node. And the memory-less node should be onlined.
> 
> This patch allocate pgdats for all memory-less nodes and online them at
> boot time. Then build zonelists for these nodes. As a result, when cpus
> on these memory-less nodes try to allocate memory from local node, it
> will automatically fall back to the proper zones in the zonelists.

Yeah, I think this is an a lot better approach for memory-less nodes.

> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
@ 2016-07-19 18:50     ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2016-07-19 18:50 UTC (permalink / raw)
  To: Dou Liyang
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Tang Chen, Zhu Guihua

Hello,

On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:
> In this series of patches, we are going to construct cpu <-> node mapping
> for all possible cpus at boot time, which is a 1-1 mapping. It means the

1-1 mapping means that each cpu is mapped to its own private node
which isn't the case.  Just call it a persistent mapping?

> cpu will be mapped to the node it belongs to, and will never be changed.
> If a node has only cpus but no memory, the cpus on it will be mapped to
> a memory-less node. And the memory-less node should be onlined.
> 
> This patch allocate pgdats for all memory-less nodes and online them at
> boot time. Then build zonelists for these nodes. As a result, when cpus
> on these memory-less nodes try to allocate memory from local node, it
> will automatically fall back to the proper zones in the zonelists.

Yeah, I think this is an a lot better approach for memory-less nodes.

> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 7/7] Provide the interface to validate the proc_id which they give
  2016-07-19  7:28   ` Dou Liyang
@ 2016-07-19 18:53     ` Tejun Heo
  -1 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2016-07-19 18:53 UTC (permalink / raw)
  To: Dou Liyang
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm

On Tue, Jul 19, 2016 at 03:28:08PM +0800, Dou Liyang wrote:
> When we want to identify whether the proc_id is unreasonable or not, we
> can call the "acpi_processor_validate_proc_id" function. It will search
> in the duplicate IDs. If we find the proc_id in the IDs, we return true
> to the call function. Conversely, false represents available.
> 
> When we establish all possible cpuid <-> nodeid mapping, we will use the
> proc_id from ACPI table.
> 
> We do validation when we get the proc_id. If the result is true, we will
> stop the mapping.

The patch title probably should include "acpi:" header.  I can't tell
much about the specifics of the acpi changes but I think this is the
right approach for handling cpu hotplugs.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 7/7] Provide the interface to validate the proc_id which they give
@ 2016-07-19 18:53     ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2016-07-19 18:53 UTC (permalink / raw)
  To: Dou Liyang
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm

On Tue, Jul 19, 2016 at 03:28:08PM +0800, Dou Liyang wrote:
> When we want to identify whether the proc_id is unreasonable or not, we
> can call the "acpi_processor_validate_proc_id" function. It will search
> in the duplicate IDs. If we find the proc_id in the IDs, we return true
> to the call function. Conversely, false represents available.
> 
> When we establish all possible cpuid <-> nodeid mapping, we will use the
> proc_id from ACPI table.
> 
> We do validation when we get the proc_id. If the result is true, we will
> stop the mapping.

The patch title probably should include "acpi:" header.  I can't tell
much about the specifics of the acpi changes but I think this is the
right approach for handling cpu hotplugs.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
  2016-07-19  7:28   ` Dou Liyang
@ 2016-07-19 20:06     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2016-07-19 20:06 UTC (permalink / raw)
  To: Dou Liyang
  Cc: cl, tj, mika.j.penttila, mingo, akpm, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Gu Zheng, Tang Chen, Zhu Guihua

On Tuesday, July 19, 2016 03:28:06 PM Dou Liyang wrote:
> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
> 
> The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
> when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
> wq_numa_possible_cpumask will not cause any problem.
> It contains 4 steps:
> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
> 2. Introduce a new array storing all possible cpuid <-> apicid mapping.
> 3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
> 4. Establish all possible cpuid <-> nodeid mapping.
> 
> This patch finishes step 4.
> 
> This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
> processors at boot time via an additional acpi namespace walk for processors.
> 
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
> ---
>  arch/ia64/kernel/acpi.c       |  3 +-
>  arch/x86/kernel/acpi/boot.c   |  4 ++-
>  drivers/acpi/acpi_processor.c |  5 ++++
>  drivers/acpi/bus.c            |  3 ++
>  drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h          |  2 ++
>  6 files changed, 80 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
> index b1698bc..bb36515 100644
> --- a/arch/ia64/kernel/acpi.c
> +++ b/arch/ia64/kernel/acpi.c
> @@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
>   *  ACPI based hotplug CPU support
>   */
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
> -static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  {
>  #ifdef CONFIG_ACPI_NUMA
>  	/*
> @@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  #endif
>  	return 0;
>  }
> +EXPORT_SYMBOL(acpi_map_cpu2node);
>  
>  int additional_cpus __initdata = -1;
>  
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 37248c3..0900264f 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  #include <acpi/processor.h>
>  
> -static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  {
>  #ifdef CONFIG_ACPI_NUMA
>  	int nid;
> @@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  		numa_set_node(cpu, nid);
>  	}
>  #endif
> +	return 0;
>  }
> +EXPORT_SYMBOL(acpi_map_cpu2node);
>  
>  int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
>  {
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index e85b19a..0c15828 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
>  
>  void __weak arch_unregister_cpu(int cpu) {}
>  
> +int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
> +{
> +	return -ENODEV;
> +}
> +
>  static int acpi_processor_hotadd_init(struct acpi_processor *pr)
>  {
>  	unsigned long long sta;
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 262ca31..d8b7272 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
>  	acpi_sleep_proc_init();
>  	acpi_wakeup_device_init();
>  	acpi_debugger_init();
> +#ifdef CONFIG_ACPI_HOTPLUG_CPU
> +	acpi_set_processor_mapping();
> +#endif

This doesn't look nice.

What about providing an empty definition of acpi_set_processor_mapping()
for CONFIG_ACPI_HOTPLUG_CPU unset?

>  	return 0;
>  }

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
@ 2016-07-19 20:06     ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2016-07-19 20:06 UTC (permalink / raw)
  To: Dou Liyang
  Cc: cl, tj, mika.j.penttila, mingo, akpm, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Gu Zheng, Tang Chen, Zhu Guihua

On Tuesday, July 19, 2016 03:28:06 PM Dou Liyang wrote:
> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
> 
> The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
> when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
> wq_numa_possible_cpumask will not cause any problem.
> It contains 4 steps:
> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
> 2. Introduce a new array storing all possible cpuid <-> apicid mapping.
> 3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
> 4. Establish all possible cpuid <-> nodeid mapping.
> 
> This patch finishes step 4.
> 
> This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
> processors at boot time via an additional acpi namespace walk for processors.
> 
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
> ---
>  arch/ia64/kernel/acpi.c       |  3 +-
>  arch/x86/kernel/acpi/boot.c   |  4 ++-
>  drivers/acpi/acpi_processor.c |  5 ++++
>  drivers/acpi/bus.c            |  3 ++
>  drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h          |  2 ++
>  6 files changed, 80 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
> index b1698bc..bb36515 100644
> --- a/arch/ia64/kernel/acpi.c
> +++ b/arch/ia64/kernel/acpi.c
> @@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
>   *  ACPI based hotplug CPU support
>   */
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
> -static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  {
>  #ifdef CONFIG_ACPI_NUMA
>  	/*
> @@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  #endif
>  	return 0;
>  }
> +EXPORT_SYMBOL(acpi_map_cpu2node);
>  
>  int additional_cpus __initdata = -1;
>  
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 37248c3..0900264f 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  #include <acpi/processor.h>
>  
> -static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  {
>  #ifdef CONFIG_ACPI_NUMA
>  	int nid;
> @@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>  		numa_set_node(cpu, nid);
>  	}
>  #endif
> +	return 0;
>  }
> +EXPORT_SYMBOL(acpi_map_cpu2node);
>  
>  int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
>  {
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index e85b19a..0c15828 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
>  
>  void __weak arch_unregister_cpu(int cpu) {}
>  
> +int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
> +{
> +	return -ENODEV;
> +}
> +
>  static int acpi_processor_hotadd_init(struct acpi_processor *pr)
>  {
>  	unsigned long long sta;
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 262ca31..d8b7272 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
>  	acpi_sleep_proc_init();
>  	acpi_wakeup_device_init();
>  	acpi_debugger_init();
> +#ifdef CONFIG_ACPI_HOTPLUG_CPU
> +	acpi_set_processor_mapping();
> +#endif

This doesn't look nice.

What about providing an empty definition of acpi_set_processor_mapping()
for CONFIG_ACPI_HOTPLUG_CPU unset?

>  	return 0;
>  }

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 7/7] Provide the interface to validate the proc_id which they give
  2016-07-19 18:53     ` Tejun Heo
  (?)
@ 2016-07-20  0:55       ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  0:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm



在 2016年07月20日 02:53, Tejun Heo 写道:
> On Tue, Jul 19, 2016 at 03:28:08PM +0800, Dou Liyang wrote:
>> When we want to identify whether the proc_id is unreasonable or not, we
>> can call the "acpi_processor_validate_proc_id" function. It will search
>> in the duplicate IDs. If we find the proc_id in the IDs, we return true
>> to the call function. Conversely, false represents available.
>>
>> When we establish all possible cpuid <-> nodeid mapping, we will use the
>> proc_id from ACPI table.
>>
>> We do validation when we get the proc_id. If the result is true, we will
>> stop the mapping.
> The patch title probably should include "acpi:" header.  I can't tell
> much about the specifics of the acpi changes but I think this is the
> right approach for handling cpu hotplugs.

I will change the title in the next version.

Thanks.
> Thanks.
>



--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 7/7] Provide the interface to validate the proc_id which they give
@ 2016-07-20  0:55       ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  0:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm



在 2016年07月20日 02:53, Tejun Heo 写道:
> On Tue, Jul 19, 2016 at 03:28:08PM +0800, Dou Liyang wrote:
>> When we want to identify whether the proc_id is unreasonable or not, we
>> can call the "acpi_processor_validate_proc_id" function. It will search
>> in the duplicate IDs. If we find the proc_id in the IDs, we return true
>> to the call function. Conversely, false represents available.
>>
>> When we establish all possible cpuid <-> nodeid mapping, we will use the
>> proc_id from ACPI table.
>>
>> We do validation when we get the proc_id. If the result is true, we will
>> stop the mapping.
> The patch title probably should include "acpi:" header.  I can't tell
> much about the specifics of the acpi changes but I think this is the
> right approach for handling cpu hotplugs.

I will change the title in the next version.

Thanks.
> Thanks.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 7/7] Provide the interface to validate the proc_id which they give
@ 2016-07-20  0:55       ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  0:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm



a?? 2016a1'07ae??20ae?JPY 02:53, Tejun Heo a??e??:
> On Tue, Jul 19, 2016 at 03:28:08PM +0800, Dou Liyang wrote:
>> When we want to identify whether the proc_id is unreasonable or not, we
>> can call the "acpi_processor_validate_proc_id" function. It will search
>> in the duplicate IDs. If we find the proc_id in the IDs, we return true
>> to the call function. Conversely, false represents available.
>>
>> When we establish all possible cpuid <-> nodeid mapping, we will use the
>> proc_id from ACPI table.
>>
>> We do validation when we get the proc_id. If the result is true, we will
>> stop the mapping.
> The patch title probably should include "acpi:" header.  I can't tell
> much about the specifics of the acpi changes but I think this is the
> right approach for handling cpu hotplugs.

I will change the title in the next version.

Thanks.
> Thanks.
>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
  2016-07-19 20:06     ` Rafael J. Wysocki
  (?)
@ 2016-07-20  1:25       ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: cl, tj, mika.j.penttila, mingo, akpm, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Gu Zheng, Tang Chen, Zhu Guihua



在 2016年07月20日 04:06, Rafael J. Wysocki 写道:
> On Tuesday, July 19, 2016 03:28:06 PM Dou Liyang wrote:
>> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
>>
>> The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
>> when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
>> wq_numa_possible_cpumask will not cause any problem.
>> It contains 4 steps:
>> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
>> 2. Introduce a new array storing all possible cpuid <-> apicid mapping.
>> 3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
>> 4. Establish all possible cpuid <-> nodeid mapping.
>>
>> This patch finishes step 4.
>>
>> This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
>> processors at boot time via an additional acpi namespace walk for processors.
>>
>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
>> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
>> ---
>>   arch/ia64/kernel/acpi.c       |  3 +-
>>   arch/x86/kernel/acpi/boot.c   |  4 ++-
>>   drivers/acpi/acpi_processor.c |  5 ++++
>>   drivers/acpi/bus.c            |  3 ++
>>   drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/acpi.h          |  2 ++
>>   6 files changed, 80 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
>> index b1698bc..bb36515 100644
>> --- a/arch/ia64/kernel/acpi.c
>> +++ b/arch/ia64/kernel/acpi.c
>> @@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
>>    *  ACPI based hotplug CPU support
>>    */
>>   #ifdef CONFIG_ACPI_HOTPLUG_CPU
>> -static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   {
>>   #ifdef CONFIG_ACPI_NUMA
>>   	/*
>> @@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   #endif
>>   	return 0;
>>   }
>> +EXPORT_SYMBOL(acpi_map_cpu2node);
>>   
>>   int additional_cpus __initdata = -1;
>>   
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 37248c3..0900264f 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
>>   #ifdef CONFIG_ACPI_HOTPLUG_CPU
>>   #include <acpi/processor.h>
>>   
>> -static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   {
>>   #ifdef CONFIG_ACPI_NUMA
>>   	int nid;
>> @@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   		numa_set_node(cpu, nid);
>>   	}
>>   #endif
>> +	return 0;
>>   }
>> +EXPORT_SYMBOL(acpi_map_cpu2node);
>>   
>>   int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
>>   {
>> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
>> index e85b19a..0c15828 100644
>> --- a/drivers/acpi/acpi_processor.c
>> +++ b/drivers/acpi/acpi_processor.c
>> @@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
>>   
>>   void __weak arch_unregister_cpu(int cpu) {}
>>   
>> +int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>   static int acpi_processor_hotadd_init(struct acpi_processor *pr)
>>   {
>>   	unsigned long long sta;
>> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
>> index 262ca31..d8b7272 100644
>> --- a/drivers/acpi/bus.c
>> +++ b/drivers/acpi/bus.c
>> @@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
>>   	acpi_sleep_proc_init();
>>   	acpi_wakeup_device_init();
>>   	acpi_debugger_init();
>> +#ifdef CONFIG_ACPI_HOTPLUG_CPU
>> +	acpi_set_processor_mapping();
>> +#endif
> This doesn't look nice.
>
> What about providing an empty definition of acpi_set_processor_mapping()
> for CONFIG_ACPI_HOTPLUG_CPU unset?

Good,  I  will do it.

Thanks,
Dou

>
>>   	return 0;
>>   }
> Thanks,
> Rafael
>
>
>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
@ 2016-07-20  1:25       ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: cl, tj, mika.j.penttila, mingo, akpm, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Gu Zheng, Tang Chen, Zhu Guihua



在 2016年07月20日 04:06, Rafael J. Wysocki 写道:
> On Tuesday, July 19, 2016 03:28:06 PM Dou Liyang wrote:
>> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
>>
>> The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
>> when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
>> wq_numa_possible_cpumask will not cause any problem.
>> It contains 4 steps:
>> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
>> 2. Introduce a new array storing all possible cpuid <-> apicid mapping.
>> 3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
>> 4. Establish all possible cpuid <-> nodeid mapping.
>>
>> This patch finishes step 4.
>>
>> This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
>> processors at boot time via an additional acpi namespace walk for processors.
>>
>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
>> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
>> ---
>>   arch/ia64/kernel/acpi.c       |  3 +-
>>   arch/x86/kernel/acpi/boot.c   |  4 ++-
>>   drivers/acpi/acpi_processor.c |  5 ++++
>>   drivers/acpi/bus.c            |  3 ++
>>   drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/acpi.h          |  2 ++
>>   6 files changed, 80 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
>> index b1698bc..bb36515 100644
>> --- a/arch/ia64/kernel/acpi.c
>> +++ b/arch/ia64/kernel/acpi.c
>> @@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
>>    *  ACPI based hotplug CPU support
>>    */
>>   #ifdef CONFIG_ACPI_HOTPLUG_CPU
>> -static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   {
>>   #ifdef CONFIG_ACPI_NUMA
>>   	/*
>> @@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   #endif
>>   	return 0;
>>   }
>> +EXPORT_SYMBOL(acpi_map_cpu2node);
>>   
>>   int additional_cpus __initdata = -1;
>>   
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 37248c3..0900264f 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
>>   #ifdef CONFIG_ACPI_HOTPLUG_CPU
>>   #include <acpi/processor.h>
>>   
>> -static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   {
>>   #ifdef CONFIG_ACPI_NUMA
>>   	int nid;
>> @@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   		numa_set_node(cpu, nid);
>>   	}
>>   #endif
>> +	return 0;
>>   }
>> +EXPORT_SYMBOL(acpi_map_cpu2node);
>>   
>>   int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
>>   {
>> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
>> index e85b19a..0c15828 100644
>> --- a/drivers/acpi/acpi_processor.c
>> +++ b/drivers/acpi/acpi_processor.c
>> @@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
>>   
>>   void __weak arch_unregister_cpu(int cpu) {}
>>   
>> +int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>   static int acpi_processor_hotadd_init(struct acpi_processor *pr)
>>   {
>>   	unsigned long long sta;
>> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
>> index 262ca31..d8b7272 100644
>> --- a/drivers/acpi/bus.c
>> +++ b/drivers/acpi/bus.c
>> @@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
>>   	acpi_sleep_proc_init();
>>   	acpi_wakeup_device_init();
>>   	acpi_debugger_init();
>> +#ifdef CONFIG_ACPI_HOTPLUG_CPU
>> +	acpi_set_processor_mapping();
>> +#endif
> This doesn't look nice.
>
> What about providing an empty definition of acpi_set_processor_mapping()
> for CONFIG_ACPI_HOTPLUG_CPU unset?

Good,  I  will do it.

Thanks,
Dou

>
>>   	return 0;
>>   }
> Thanks,
> Rafael
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
@ 2016-07-20  1:25       ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: cl, tj, mika.j.penttila, mingo, akpm, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Gu Zheng, Tang Chen, Zhu Guihua



a?? 2016a1'07ae??20ae?JPY 04:06, Rafael J. Wysocki a??e??:
> On Tuesday, July 19, 2016 03:28:06 PM Dou Liyang wrote:
>> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
>>
>> The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
>> when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
>> wq_numa_possible_cpumask will not cause any problem.
>> It contains 4 steps:
>> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
>> 2. Introduce a new array storing all possible cpuid <-> apicid mapping.
>> 3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
>> 4. Establish all possible cpuid <-> nodeid mapping.
>>
>> This patch finishes step 4.
>>
>> This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
>> processors at boot time via an additional acpi namespace walk for processors.
>>
>> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
>> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
>> ---
>>   arch/ia64/kernel/acpi.c       |  3 +-
>>   arch/x86/kernel/acpi/boot.c   |  4 ++-
>>   drivers/acpi/acpi_processor.c |  5 ++++
>>   drivers/acpi/bus.c            |  3 ++
>>   drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/acpi.h          |  2 ++
>>   6 files changed, 80 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
>> index b1698bc..bb36515 100644
>> --- a/arch/ia64/kernel/acpi.c
>> +++ b/arch/ia64/kernel/acpi.c
>> @@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
>>    *  ACPI based hotplug CPU support
>>    */
>>   #ifdef CONFIG_ACPI_HOTPLUG_CPU
>> -static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   {
>>   #ifdef CONFIG_ACPI_NUMA
>>   	/*
>> @@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   #endif
>>   	return 0;
>>   }
>> +EXPORT_SYMBOL(acpi_map_cpu2node);
>>   
>>   int additional_cpus __initdata = -1;
>>   
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 37248c3..0900264f 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
>>   #ifdef CONFIG_ACPI_HOTPLUG_CPU
>>   #include <acpi/processor.h>
>>   
>> -static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   {
>>   #ifdef CONFIG_ACPI_NUMA
>>   	int nid;
>> @@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>>   		numa_set_node(cpu, nid);
>>   	}
>>   #endif
>> +	return 0;
>>   }
>> +EXPORT_SYMBOL(acpi_map_cpu2node);
>>   
>>   int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
>>   {
>> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
>> index e85b19a..0c15828 100644
>> --- a/drivers/acpi/acpi_processor.c
>> +++ b/drivers/acpi/acpi_processor.c
>> @@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
>>   
>>   void __weak arch_unregister_cpu(int cpu) {}
>>   
>> +int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>   static int acpi_processor_hotadd_init(struct acpi_processor *pr)
>>   {
>>   	unsigned long long sta;
>> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
>> index 262ca31..d8b7272 100644
>> --- a/drivers/acpi/bus.c
>> +++ b/drivers/acpi/bus.c
>> @@ -1124,6 +1124,9 @@ static int __init acpi_init(void)
>>   	acpi_sleep_proc_init();
>>   	acpi_wakeup_device_init();
>>   	acpi_debugger_init();
>> +#ifdef CONFIG_ACPI_HOTPLUG_CPU
>> +	acpi_set_processor_mapping();
>> +#endif
> This doesn't look nice.
>
> What about providing an empty definition of acpi_set_processor_mapping()
> for CONFIG_ACPI_HOTPLUG_CPU unset?

Good,  I  will do it.

Thanks,
Dou

>
>>   	return 0;
>>   }
> Thanks,
> Rafael
>
>
>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
  2016-07-19 18:50     ` Tejun Heo
@ 2016-07-20  1:52       ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  1:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Tang Chen, Zhu Guihua

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]



在 2016年07月20日 02:50, Tejun Heo 写道:
> Hello,
>
> On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:
>> In this series of patches, we are going to construct cpu <-> node mapping
>> for all possible cpus at boot time, which is a 1-1 mapping. It means the
> 1-1 mapping means that each cpu is mapped to its own private node
> which isn't the case.  Just call it a persistent mapping?

Yes, each cpu is just in a persistent node.
However, the opposite is not true.

I will modify it.


>
>> cpu will be mapped to the node it belongs to, and will never be changed.
>> If a node has only cpus but no memory, the cpus on it will be mapped to
>> a memory-less node. And the memory-less node should be onlined.
>>
>> This patch allocate pgdats for all memory-less nodes and online them at
>> boot time. Then build zonelists for these nodes. As a result, when cpus
>> on these memory-less nodes try to allocate memory from local node, it
>> will automatically fall back to the proper zones in the zonelists.
> Yeah, I think this is an a lot better approach for memory-less nodes.
>
>> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>

Thanks,

Dou





[-- Attachment #2: Type: text/html, Size: 2619 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
@ 2016-07-20  1:52       ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  1:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm, Tang Chen, Zhu Guihua

[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]



a?? 2016a1'07ae??20ae?JPY 02:50, Tejun Heo a??e??:
> Hello,
>
> On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:
>> In this series of patches, we are going to construct cpu <-> node mapping
>> for all possible cpus at boot time, which is a 1-1 mapping. It means the
> 1-1 mapping means that each cpu is mapped to its own private node
> which isn't the case.  Just call it a persistent mapping?

Yes, each cpu is just in a persistent node.
However, the opposite is not true.

I will modify it.


>
>> cpu will be mapped to the node it belongs to, and will never be changed.
>> If a node has only cpus but no memory, the cpus on it will be mapped to
>> a memory-less node. And the memory-less node should be onlined.
>>
>> This patch allocate pgdats for all memory-less nodes and online them at
>> boot time. Then build zonelists for these nodes. As a result, when cpus
>> on these memory-less nodes try to allocate memory from local node, it
>> will automatically fall back to the proper zones in the zonelists.
> Yeah, I think this is an a lot better approach for memory-less nodes.
>
>> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>

Thanks,

Dou





[-- Attachment #2: Type: text/html, Size: 2559 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
  2016-07-19 18:50     ` Tejun Heo
  (?)
@ 2016-07-20  2:28       ` Dou Liyang
  -1 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm

在 2016年07月20日 02:50, Tejun Heo 写道:

> Hello,
>
> On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:
>> In this series of patches, we are going to construct cpu <-> node mapping
>> for all possible cpus at boot time, which is a 1-1 mapping. It means the
> 1-1 mapping means that each cpu is mapped to its own private node
> which isn't the case.  Just call it a persistent mapping?

Yes, for cpus, each cpu is in a persistent node.
However, the opposite is not that.

I will modify it.

Thanks.
Dou


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
@ 2016-07-20  2:28       ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm

在 2016年07月20日 02:50, Tejun Heo 写道:

> Hello,
>
> On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:
>> In this series of patches, we are going to construct cpu <-> node mapping
>> for all possible cpus at boot time, which is a 1-1 mapping. It means the
> 1-1 mapping means that each cpu is mapped to its own private node
> which isn't the case.  Just call it a persistent mapping?

Yes, for cpus, each cpu is in a persistent node.
However, the opposite is not that.

I will modify it.

Thanks.
Dou

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
@ 2016-07-20  2:28       ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-07-20  2:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cl, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael, x86, linux-acpi,
	linux-kernel, linux-mm

a?? 2016a1'07ae??20ae?JPY 02:50, Tejun Heo a??e??:

> Hello,
>
> On Tue, Jul 19, 2016 at 03:28:02PM +0800, Dou Liyang wrote:
>> In this series of patches, we are going to construct cpu <-> node mapping
>> for all possible cpus at boot time, which is a 1-1 mapping. It means the
> 1-1 mapping means that each cpu is mapped to its own private node
> which isn't the case.  Just call it a persistent mapping?

Yes, for cpus, each cpu is in a persistent node.
However, the opposite is not that.

I will modify it.

Thanks.
Dou


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-07-20  2:30 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19  7:28 [PATCH v8 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
2016-07-19  7:28 ` Dou Liyang
2016-07-19  7:28 ` [PATCH v8 1/7] x86, memhp, numa: Online memory-less nodes at boot time Dou Liyang
2016-07-19  7:28   ` Dou Liyang
2016-07-19 18:50   ` Tejun Heo
2016-07-19 18:50     ` Tejun Heo
2016-07-20  1:52     ` Dou Liyang
2016-07-20  1:52       ` Dou Liyang
2016-07-20  2:28     ` Dou Liyang
2016-07-20  2:28       ` Dou Liyang
2016-07-20  2:28       ` Dou Liyang
2016-07-19  7:28 ` [PATCH v8 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus " Dou Liyang
2016-07-19  7:28   ` Dou Liyang
2016-07-19  7:28 ` [PATCH v8 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping Dou Liyang
2016-07-19  7:28   ` Dou Liyang
2016-07-19  7:28 ` [PATCH v8 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Dou Liyang
2016-07-19  7:28   ` Dou Liyang
2016-07-19  7:28 ` [PATCH v8 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Dou Liyang
2016-07-19  7:28   ` Dou Liyang
2016-07-19 20:06   ` Rafael J. Wysocki
2016-07-19 20:06     ` Rafael J. Wysocki
2016-07-20  1:25     ` Dou Liyang
2016-07-20  1:25       ` Dou Liyang
2016-07-20  1:25       ` Dou Liyang
2016-07-19  7:28 ` [PATCH v8 6/7] Provide the mechanism to validate processors in the ACPI tables Dou Liyang
2016-07-19  7:28   ` Dou Liyang
2016-07-19  7:28 ` [PATCH v8 7/7] Provide the interface to validate the proc_id which they give Dou Liyang
2016-07-19  7:28   ` Dou Liyang
2016-07-19 18:53   ` Tejun Heo
2016-07-19 18:53     ` Tejun Heo
2016-07-20  0:55     ` Dou Liyang
2016-07-20  0:55       ` Dou Liyang
2016-07-20  0:55       ` Dou Liyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.