linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent
@ 2016-08-25  8:35 Dou Liyang
  2016-08-25  8:35 ` [PATCH v12 1/7] x86, memhp, numa: Online memory-less nodes at boot time Dou Liyang
                   ` (8 more replies)
  0 siblings, 9 replies; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

[Summary]

Use ACPI tables: MADT, DSDT.
1. Create cpuid in order based on Local Apic ID in MADT(apicid).
2. Obtain the nodeid by the proc_id in DSDT.
3. Make the cpuid <-> nodeid mapping persistent.

The mapping relations:

proc_id in DSDT <--> Processor ID in MADT(acpiid) <--> Local Apic ID in MADT(apicid) 
        ^                                                        ^ 
        |                                                        |
        v                                                        v 
   pxm in DSDT                                                 cpuid 
        ^ 
        |
        v
     nodeid

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.


For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
https://lkml.org/lkml/2015/7/7/200
https://lkml.org/lkml/2015/9/27/209
https://lkml.org/lkml/2016/5/19/212
https://lkml.org/lkml/2016/7/19/181
https://lkml.org/lkml/2016/7/25/99
https://lkml.org/lkml/2016/7/26/52
https://lkml.org/lkml/2016/8/8/96

Change log v11 -> v12:
1. Rebase
2. Add a short summary

Change log v10 -> v11:
1. Reduce the number of repeat judgment of online/offline
2. Seperate out the functionality in the enable or disable situation

Change log v9 -> v10:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset. In patch 5.
2. Fix auto build test ERROR on ia64/next. In patch 5.
3. Fix some comment.

Change log v8 -> v9:
1. Providing an empty definition of acpi_set_processor_mapping() for 
CONFIG_ACPI_HOTPLUG_CPU unset.

Change log v7 -> v8:
1. Provide the mechanism to validate processors in the ACPI tables.
2. Provide the interface to validate the proc_id when setting the mapping. 

Change log v6 -> v7:
1. Fix arm64 build failure.

Change log v5 -> v6:
1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build zonelists
   before per cpu areas were initialized.

Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall 
   back to proper nodes automatically.

Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 2.
3. Fix some comment.

Dou Liyang (2):
  acpi: Provide the mechanism to validate processors in the ACPI tables
  acpi: Provide the interface to validate the proc_id

Gu Zheng (4):
  x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
    boot time.
  x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
    persistent cpuid <-> apicid mapping.
  x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
    booting.

Tang Chen (1):
  x86, memhp, numa: Online memory-less nodes at boot time.

 arch/ia64/kernel/acpi.c       |   3 +-
 arch/x86/include/asm/mpspec.h |   1 +
 arch/x86/kernel/acpi/boot.c   |  11 ++--
 arch/x86/kernel/apic/apic.c   |  77 +++++++++++++++++++++++--
 arch/x86/mm/numa.c            |  27 +++++----
 drivers/acpi/acpi_processor.c | 105 ++++++++++++++++++++++++++++++++-
 drivers/acpi/bus.c            |   1 +
 drivers/acpi/processor_core.c | 131 +++++++++++++++++++++++++++++++++++-------
 include/linux/acpi.h          |   6 ++
 9 files changed, 311 insertions(+), 51 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v12 1/7] x86, memhp, numa: Online memory-less nodes at boot time.
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
@ 2016-08-25  8:35 ` Dou Liyang
  2016-09-22 19:09   ` [tip:x86/apic] x86/numa: " tip-bot for Tang Chen
  2016-08-25  8:35 ` [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus " Dou Liyang
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Tang Chen, Zhu Guihua,
	Dou Liyang

From: Tang Chen <tangchen@cn.fujitsu.com>

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/mm/numa.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fb68210..3f35b48 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -722,22 +722,19 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-	int n, val;
-	int min_val = INT_MAX;
-	int best_node = -1;
+	unsigned long zones_size[MAX_NR_ZONES] = {0};
+	unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-	for_each_online_node(n) {
-		val = node_distance(node, n);
+	/* Allocate and initialize node data. Memory-less node is now online.*/
+	alloc_node_data(nid);
+	free_area_init_node(nid, zones_size, 0, zholes_size);
 
-		if (val < min_val) {
-			min_val = val;
-			best_node = n;
-		}
-	}
-
-	return best_node;
+	/*
+	 * All zonelists will be built later in start_kernel() after per cpu
+	 * areas are initialized.
+	 */
 }
 
 /*
@@ -766,8 +763,10 @@ void __init init_cpu_to_node(void)
 
 		if (node == NUMA_NO_NODE)
 			continue;
+
 		if (!node_online(node))
-			node = find_near_online_node(node);
+			init_memory_less_node(node);
+
 		numa_set_node(cpu, node);
 	}
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
  2016-08-25  8:35 ` [PATCH v12 1/7] x86, memhp, numa: Online memory-less nodes at boot time Dou Liyang
@ 2016-08-25  8:35 ` Dou Liyang
  2016-08-25  8:57   ` Dou Liyang
  2016-09-22 19:10   ` [tip:x86/apic] x86/acpi: " tip-bot for Gu Zheng
  2016-08-25  8:35 ` [PATCH v12 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping Dou Liyang
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/kernel/apic/apic.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index cea4fc1..e5612a9 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2024,7 +2024,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-int generic_processor_info(int apicid, int version)
+static int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2090,7 +2090,6 @@ int generic_processor_info(int apicid, int version)
 		return -EINVAL;
 	}
 
-	num_processors++;
 	if (apicid == boot_cpu_physical_apicid) {
 		/*
 		 * x86_bios_cpu_apicid is required to have processors listed
@@ -2113,6 +2112,7 @@ int generic_processor_info(int apicid, int version)
 
 		pr_warning("APIC: Package limit reached. Processor %d/0x%x ignored.\n",
 			   thiscpu, apicid);
+
 		disabled_cpus++;
 		return -ENOSPC;
 	}
@@ -2132,7 +2132,6 @@ int generic_processor_info(int apicid, int version)
 			apic_version[boot_cpu_physical_apicid], cpu, version);
 	}
 
-	physid_set(apicid, phys_cpu_present_map);
 	if (apicid > max_physical_apicid)
 		max_physical_apicid = apicid;
 
@@ -2145,11 +2144,22 @@ int generic_processor_info(int apicid, int version)
 		apic->x86_32_early_logical_apicid(cpu);
 #endif
 	set_cpu_possible(cpu, true);
-	set_cpu_present(cpu, true);
+
+	if (enabled) {
+		num_processors++;
+		physid_set(apicid, phys_cpu_present_map);
+		set_cpu_present(cpu, true);
+	} else
+		disabled_cpus++;
 
 	return cpu;
 }
 
+int generic_processor_info(int apicid, int version)
+{
+	return __generic_processor_info(apicid, version, true);
+}
+
 int hard_smp_processor_id(void)
 {
 	return read_apic_id();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v12 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
  2016-08-25  8:35 ` [PATCH v12 1/7] x86, memhp, numa: Online memory-less nodes at boot time Dou Liyang
  2016-08-25  8:35 ` [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus " Dou Liyang
@ 2016-08-25  8:35 ` Dou Liyang
  2016-09-22 19:10   ` [tip:x86/apic] x86/acpi: Introduce persistent storage for " tip-bot for Gu Zheng
  2016-08-25  8:35 ` [PATCH v12 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Dou Liyang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  7 +----
 arch/x86/kernel/apic/apic.c   | 61 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE	BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 90d84c3..abd939c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -176,15 +176,10 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
 		return -EINVAL;
 	}
 
-	if (!enabled) {
-		++disabled_cpus;
-		return -EINVAL;
-	}
-
 	if (boot_cpu_physical_apicid != -1U)
 		ver = apic_version[boot_cpu_physical_apicid];
 
-	cpu = generic_processor_info(id, ver);
+	cpu = __generic_processor_info(id, ver, enabled);
 	if (cpu >= 0)
 		early_per_cpu(x86_cpu_to_acpiid, cpu) = acpiid;
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index e5612a9..7aa9863 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2024,7 +2024,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+	int i;
+
+	/*
+	 * cpuid <-> apicid mapping is persistent, so when a cpu is up,
+	 * check if the kernel has allocated a cpuid for it.
+	 */
+	for (i = 0; i < nr_logical_cpuids; i++) {
+		if (cpuid_to_apicid[i] == apicid)
+			return i;
+	}
+
+	/* Allocate a new cpuid. */
+	if (nr_logical_cpuids >= nr_cpu_ids) {
+		WARN_ONCE(1, "Only %d processors supported."
+			     "Processor %d/0x%x and the rest are ignored.\n",
+			     nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+		return -1;
+	}
+
+	cpuid_to_apicid[nr_logical_cpuids] = apicid;
+	return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2099,8 +2145,17 @@ static int __generic_processor_info(int apicid, int version, bool enabled)
 		 * for BSP.
 		 */
 		cpu = 0;
-	} else
-		cpu = cpumask_next_zero(-1, cpu_present_mask);
+
+		/* Logical cpuid 0 is reserved for BSP. */
+		cpuid_to_apicid[0] = apicid;
+	} else {
+		cpu = allocate_logical_cpuid(apicid);
+		if (cpu < 0) {
+
+			disabled_cpus++;
+			return -EINVAL;
+		}
+	}
 
 	/*
 	 * This can happen on physical hotplug. The sanity check at boot time
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v12 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
                   ` (2 preceding siblings ...)
  2016-08-25  8:35 ` [PATCH v12 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping Dou Liyang
@ 2016-08-25  8:35 ` Dou Liyang
  2016-09-22 19:11   ` [tip:x86/apic] x86/acpi: Enable MADT APIs to return disabled apicids tip-bot for Gu Zheng
  2016-08-25  8:35 ` [PATCH v12 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Dou Liyang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm        (persistent)
2. apicid (physical cpu id)   <->   nodeid     (persistent)
3. cpuid (logical cpu id)     <->   apicid     (not persistent, now persistent by step 2)
4. cpuid (logical cpu id)     <->   nodeid     (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 60 +++++++++++++++++++++++++++----------------
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
 	 *  Extra Processor objects may be enumerated on MP systems with
 	 *  less than the max # of CPUs. They should be ignored _iff
 	 *  they are physically not present.
+	 *
+	 *  NOTE: Even if the processor has a cpuid, it may not present because
+	 *  cpuid <-> apicid mapping is persistent now.
 	 */
-	if (invalid_logical_cpuid(pr->id)) {
+	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
 		int ret = acpi_processor_hotadd_init(pr);
 		if (ret)
 			return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 9125d7d..fd59ae8 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-		 u32 acpi_id, phys_cpuid_t *apic_id)
+		 u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
 	struct acpi_madt_local_apic *lapic =
 		container_of(entry, struct acpi_madt_local_apic, header);
 
-	if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_x2apic *apic =
 		container_of(entry, struct acpi_madt_local_x2apic, header);
 
-	if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_sapic *lsapic =
 		container_of(entry, struct acpi_madt_local_sapic, header);
 
-	if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
+		bool ignore_disabled)
 {
 	struct acpi_madt_generic_interrupt *gicc =
 	    container_of(entry, struct acpi_madt_generic_interrupt, header);
 
-	if (!(gicc->flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	/* device_declaration means Device object in DSDT, in the
@@ -109,7 +112,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
 }
 
 static phys_cpuid_t map_madt_entry(struct acpi_table_madt *madt,
-				   int type, u32 acpi_id)
+				   int type, u32 acpi_id, bool ignore_disabled)
 {
 	unsigned long madt_end, entry;
 	phys_cpuid_t phys_id = PHYS_CPUID_INVALID;	/* CPU hardware ID */
@@ -127,16 +130,20 @@ static phys_cpuid_t map_madt_entry(struct acpi_table_madt *madt,
 		struct acpi_subtable_header *header =
 			(struct acpi_subtable_header *)entry;
 		if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
-			if (!map_lapic_id(header, acpi_id, &phys_id))
+			if (!map_lapic_id(header, acpi_id, &phys_id,
+					  ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
-			if (!map_x2apic_id(header, type, acpi_id, &phys_id))
+			if (!map_x2apic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
-			if (!map_lsapic_id(header, type, acpi_id, &phys_id))
+			if (!map_lsapic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT) {
-			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id))
+			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id,
+					    ignore_disabled))
 				break;
 		}
 		entry += header->length;
@@ -156,14 +163,15 @@ phys_cpuid_t __init acpi_map_madt_entry(u32 acpi_id)
 	if (!madt)
 		return PHYS_CPUID_INVALID;
 
-	rv = map_madt_entry(madt, 1, acpi_id);
+	rv = map_madt_entry(madt, 1, acpi_id, true);
 
 	early_acpi_os_unmap_memory(madt, tbl_size);
 
 	return rv;
 }
 
-static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id,
+				  bool ignore_disabled)
 {
 	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
 	union acpi_object *obj;
@@ -184,30 +192,38 @@ static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
 
 	header = (struct acpi_subtable_header *)obj->buffer.pointer;
 	if (header->type == ACPI_MADT_TYPE_LOCAL_APIC)
-		map_lapic_id(header, acpi_id, &phys_id);
+		map_lapic_id(header, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC)
-		map_lsapic_id(header, type, acpi_id, &phys_id);
+		map_lsapic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC)
-		map_x2apic_id(header, type, acpi_id, &phys_id);
+		map_x2apic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT)
-		map_gicc_mpidr(header, type, acpi_id, &phys_id);
+		map_gicc_mpidr(header, type, acpi_id, &phys_id,
+			       ignore_disabled);
 
 exit:
 	kfree(buffer.pointer);
 	return phys_id;
 }
 
-phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t __acpi_get_phys_id(acpi_handle handle, int type,
+				       u32 acpi_id, bool ignore_disabled)
 {
 	phys_cpuid_t phys_id;
 
-	phys_id = map_mat_entry(handle, type, acpi_id);
+	phys_id = map_mat_entry(handle, type, acpi_id, ignore_disabled);
 	if (invalid_phys_cpuid(phys_id))
-		phys_id = map_madt_entry(get_madt_table(), type, acpi_id);
+		phys_id = map_madt_entry(get_madt_table(), type, acpi_id,
+					   ignore_disabled);
 
 	return phys_id;
 }
 
+phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+{
+	return __acpi_get_phys_id(handle, type, acpi_id, true);
+}
+
 int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id)
 {
 #ifdef CONFIG_SMP
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v12 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
                   ` (3 preceding siblings ...)
  2016-08-25  8:35 ` [PATCH v12 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Dou Liyang
@ 2016-08-25  8:35 ` Dou Liyang
  2016-09-22 19:11   ` [tip:x86/apic] x86/acpi: " tip-bot for Gu Zheng
  2016-08-25  8:35 ` [PATCH v12 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables Dou Liyang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng, Tang Chen,
	Zhu Guihua, Dou Liyang

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/ia64/kernel/acpi.c       |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 ++++
 drivers/acpi/bus.c            |  1 +
 drivers/acpi/processor_core.c | 67 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h          |  3 ++
 6 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 92b7bc9..6534871 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 #endif
 	return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index abd939c..807037c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -700,7 +700,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include <acpi/processor.h>
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	int nid;
@@ -711,7 +711,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 		numa_set_node(cpu, nid);
 	}
 #endif
+	return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+	return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
 	unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 85b7d07..a760dac 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1193,6 +1193,7 @@ static int __init acpi_init(void)
 	acpi_wakeup_device_init();
 	acpi_debugger_init();
 	acpi_setup_sb_notify_handler();
+	acpi_set_processor_mapping();
 	return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index fd59ae8..7827c71 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -280,6 +280,73 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
+{
+	int type;
+	u32 acpi_id;
+	acpi_status status;
+	acpi_object_type acpi_type;
+	unsigned long long tmp;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_get_type(handle, &acpi_type);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	switch (acpi_type) {
+	case ACPI_TYPE_PROCESSOR:
+		status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = object.processor.proc_id;
+		break;
+	case ACPI_TYPE_DEVICE:
+		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = tmp;
+		break;
+	default:
+		return false;
+	}
+
+	type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
+
+	*phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
+	*cpuid = acpi_map_cpuid(*phys_id, acpi_id);
+	if (*cpuid == -1)
+		return false;
+
+	return true;
+}
+
+static acpi_status __init
+set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
+			   void **rv)
+{
+	phys_cpuid_t phys_id;
+	int cpu_id;
+
+	if (!map_processor(handle, &phys_id, &cpu_id))
+		return AE_ERROR;
+
+	acpi_map_cpu2node(handle, cpu_id, phys_id);
+	return AE_OK;
+}
+
+void __init acpi_set_processor_mapping(void)
+{
+	/* Set persistent cpu <-> node mapping for all processors. */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+			    ACPI_UINT32_MAX, set_processor_node_mapping,
+			    NULL, NULL, NULL);
+}
+#else
+void __init acpi_set_processor_mapping(void) {}
+#endif /* CONFIG_ACPI_HOTPLUG_CPU */
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 static int get_ioapic_id(struct acpi_subtable_header *entry, u32 gsi_base,
 			 u64 *phys_addr, int *ioapic_id)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4d8452c..ea67776 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -271,8 +271,11 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
 int acpi_unmap_cpu(int cpu);
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
+void __init acpi_set_processor_mapping(void);
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
 #endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v12 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
                   ` (4 preceding siblings ...)
  2016-08-25  8:35 ` [PATCH v12 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Dou Liyang
@ 2016-08-25  8:35 ` Dou Liyang
  2016-09-22 19:12   ` [tip:x86/apic] acpi: Provide " tip-bot for Dou Liyang
  2016-08-25  8:35 ` [PATCH v12 7/7] acpi: Provide the interface to validate the proc_id Dou Liyang
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this:

proc_id   |    pxm
--------------------
0       <->     0
1       <->     0
2       <->     1
3       <->     1
89      <->     0
89      <->     0
89      <->     0
89      <->     1
89      <->     1
89      <->     2
89      <->     3
.....

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:

foreach Processor in DSDT
	proc_id= get_ACPI_Processor_number(Processor)
	if(the proc_id has alreadly existed )
		mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDs
which mean that the processor objects in question are not valid.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler processor_container_handler = {
 	.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+	int i;
+
+	if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+		return;
+
+	/*
+	 * Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+	 * already in the IDs, do nothing.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return;
+	}
+
+	/*
+	 * Secondly, compare the proc_id with unique IDs, if the proc_id is in
+	 * the IDs, put it in the duplicate IDs.
+	 */
+	for (i = 0; i < nr_unique_ids; i++) {
+		if (unique_processor_ids[i] == proc_id) {
+			duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+			nr_duplicate_ids++;
+			return;
+		}
+	}
+
+	/*
+	 * Lastly, the proc_id is a unique ID, put it in the unique IDs.
+	 */
+	unique_processor_ids[nr_unique_ids] = proc_id;
+	nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+						u32 lvl,
+						void *context,
+						void **rv)
+{
+	acpi_status status;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+	if (ACPI_FAILURE(status))
+		acpi_handle_info(handle, "Not get the processor object\n");
+	else
+		processor_validated_ids_update(object.processor.proc_id);
+
+	return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+	/* Search all processor nodes in ACPI namespace */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+						ACPI_UINT32_MAX,
+						acpi_processor_ids_walk,
+						NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+	acpi_processor_duplication_valiate();
 	acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
 	acpi_scan_add_handler(&processor_container_handler);
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v12 7/7] acpi: Provide the interface to validate the proc_id
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
                   ` (5 preceding siblings ...)
  2016-08-25  8:35 ` [PATCH v12 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables Dou Liyang
@ 2016-08-25  8:35 ` Dou Liyang
  2016-09-22 19:12   ` [tip:x86/apic] acpi: Validate processor id when mapping the processor tip-bot for Dou Liyang
  2016-08-25  9:08 ` [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
  2016-09-02  6:57 ` Dou Liyang
  8 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:35 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm, Dou Liyang

When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, the false represents available.

When we establish all possible cpuid <-> nodeid mapping to handle the
cpu hotplugs, we will use the proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we
will stop the mapping.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 drivers/acpi/acpi_processor.c | 16 ++++++++++++++++
 drivers/acpi/processor_core.c |  4 ++++
 include/linux/acpi.h          |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
 						NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+	int i;
+
+	/*
+	 * compare the proc_id with duplicate IDs, if the proc_id is already
+	 * in the duplicate IDs, return true, otherwise, return false.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return true;
+	}
+
+	return false;
+}
+
 void __init acpi_processor_init(void)
 {
 	acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 7827c71..bf72097 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -301,6 +301,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
 		if (ACPI_FAILURE(status))
 			return false;
 		acpi_id = object.processor.proc_id;
+
+		/* validate the acpi_id */
+		if(acpi_processor_validate_proc_id(acpi_id))
+			return false;
 		break;
 	case ACPI_TYPE_DEVICE:
 		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index ea67776..929ff8f 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -267,6 +267,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 	return phys_id == PHYS_CPUID_INVALID;
 }
 
+/* Validate the processor object's proc_id */
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.
  2016-08-25  8:35 ` [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus " Dou Liyang
@ 2016-08-25  8:57   ` Dou Liyang
  2016-09-22 19:10   ` [tip:x86/apic] x86/acpi: " tip-bot for Gu Zheng
  1 sibling, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  8:57 UTC (permalink / raw)
  To: tglx
  Cc: tj, rjw, akpm, isimatu.yasuaki, kamezawa.hiroyu, izumi.taku,
	rafael, x86, linux-acpi, linux-kernel, linux-mm

Hi tglx,

At 08/25/2016 04:35 PM, Dou Liyang wrote:
>  arch/x86/kernel/apic/apic.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index cea4fc1..e5612a9 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2024,7 +2024,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
>  	apic_write(APIC_LVT1, value);
>  }
>
> -int generic_processor_info(int apicid, int version)
> +static int __generic_processor_info(int apicid, int version, bool enabled)
>  {
>  	int cpu, max = nr_cpu_ids;
>  	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
> @@ -2090,7 +2090,6 @@ int generic_processor_info(int apicid, int version)
>  		return -EINVAL;
>  	}
>
> -	num_processors++;
>  	if (apicid == boot_cpu_physical_apicid) {

I move the "num_processors++" below.
Because I think that if "apicid == boot_cpu_physical_apicid" is true,
The "disabled_cpus" will plus one that may conflict with the
"num_processors++"

Is my thought right?

>  		/*
>  		 * x86_bios_cpu_apicid is required to have processors listed
> @@ -2113,6 +2112,7 @@ int generic_processor_info(int apicid, int version)
>
>  		pr_warning("APIC: Package limit reached. Processor %d/0x%x ignored.\n",
>  			   thiscpu, apicid);
> +
>  		disabled_cpus++;
>  		return -ENOSPC;
>  	}
> @@ -2132,7 +2132,6 @@ int generic_processor_info(int apicid, int version)
>  			apic_version[boot_cpu_physical_apicid], cpu, version);
>  	}
>
> -	physid_set(apicid, phys_cpu_present_map);
>  	if (apicid > max_physical_apicid)
>  		max_physical_apicid = apicid;
>
> @@ -2145,11 +2144,22 @@ int generic_processor_info(int apicid, int version)
>  		apic->x86_32_early_logical_apicid(cpu);
>  #endif
>  	set_cpu_possible(cpu, true);
> -	set_cpu_present(cpu, true);
> +
> +	if (enabled) {
> +		num_processors++;
> +		physid_set(apicid, phys_cpu_present_map);
> +		set_cpu_present(cpu, true);
> +	} else
> +		disabled_cpus++;
>

I remove all the "if (enabled)" code and do the unified
judgment here.

Thanks,
Dou

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
                   ` (6 preceding siblings ...)
  2016-08-25  8:35 ` [PATCH v12 7/7] acpi: Provide the interface to validate the proc_id Dou Liyang
@ 2016-08-25  9:08 ` Dou Liyang
  2016-09-02  6:57 ` Dou Liyang
  8 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-08-25  9:08 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm

Hi all,

These patches are used to fixing the memory allocation failure.
and it's fine from the ACPI perspective.

I hope that RJ<rjw@rjwysocki.net> can apply them.

Due to these patches are also related to x86 and mm,
so, I need the ACKs from the x86 and mm maintainers.   :)

Thanks,
Dou.

At 08/25/2016 04:35 PM, Dou Liyang wrote:
> [Summary]
>
> Use ACPI tables: MADT, DSDT.
> 1. Create cpuid in order based on Local Apic ID in MADT(apicid).
> 2. Obtain the nodeid by the proc_id in DSDT.
> 3. Make the cpuid <-> nodeid mapping persistent.
>
> The mapping relations:
>
> proc_id in DSDT <--> Processor ID in MADT(acpiid) <--> Local Apic ID in MADT(apicid)
>         ^                                                        ^
>         |                                                        |
>         v                                                        v 
>    pxm in DSDT                                                 cpuid
>         ^
>         |
>         v
>      nodeid
>
> [Problem]
>
> cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
> the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
>
> When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
> which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
> workqueue does not update wq_numa_possible_cpumask.
>
> So here is the problem:
>
> Assume we have the following cpuid <-> nodeid in the beginning:
>
>   Node | CPU
> ------------------------
> node 0 |  0-14, 60-74
> node 1 | 15-29, 75-89
> node 2 | 30-44, 90-104
> node 3 | 45-59, 105-119
>
> and we hot-remove node2 and node3, it becomes:
>
>   Node | CPU
> ------------------------
> node 0 |  0-14, 60-74
> node 1 | 15-29, 75-89
>
> and we hot-add node4 and node5, it becomes:
>
>   Node | CPU
> ------------------------
> node 0 |  0-14, 60-74
> node 1 | 15-29, 75-89
> node 4 | 30-59
> node 5 | 90-119
>
> But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
>
> When a pool workqueue is initialized, if its cpumask belongs to a node, its
> pool->node will be mapped to that node. And memory used by this workqueue will
> also be allocated on that node.
>
> static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
> ...
>         /* if cpumask is contained inside a NUMA node, we belong to that node */
>         if (wq_numa_enabled) {
>                 for_each_node(node) {
>                         if (cpumask_subset(pool->attrs->cpumask,
>                                            wq_numa_possible_cpumask[node])) {
>                                 pool->node = node;
>                                 break;
>                         }
>                 }
>         }
>
> Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
> which will lead to memory allocation failure:
>
>  SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
>   cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
>   node 0: slabs: 6172, objs: 259224, free: 245741
>   node 1: slabs: 3261, objs: 136962, free: 127656
>
> It happens here:
>
> create_worker(struct worker_pool *pool)
>  |--> worker = alloc_worker(pool->node);
>
> static struct worker *alloc_worker(int node)
> {
>         struct worker *worker;
>
>         worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
>
>         ......
>
>         return worker;
> }
>
>
> [Solution]
>
> There are four mappings in the kernel:
> 1. nodeid (logical node id)   <->   pxm
> 2. apicid (physical cpu id)   <->   nodeid
> 3. cpuid (logical cpu id)     <->   apicid
> 4. cpuid (logical cpu id)     <->   nodeid
>
> 1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
>    mapping is setup at boot time. This mapping is persistent, won't change.
>
> 2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
>    time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
>    persistent.
>
> 3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
>    allocated, lower ids first, and released at CPU hotremove time, reused for other
>    hotadded CPUs. So this mapping is not persistent.
>
> 4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
>    cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.
>
> To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
> cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
> cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
> mapping. So the key point is obtaining all cpus' apicid.
>
> apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
> MADT (Multiple APIC Description Table). So we finish the job in the following steps:
>
> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
>    This is done by introducing an extra parameter to generic_processor_info to let the
>    caller control if disabled cpus are ignored.
>
> 2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
>    the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
>    registering local apic. Store the mapping in this array.
>
> 3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
>    This is also done by introducing an extra parameter to these apis to let the caller
>    control if disabled cpus are ignored.
>
> 4. Establish all possible cpuid <-> nodeid mapping.
>    This is done via an additional acpi namespace walk for processors.
>
>
> For previous discussion, please refer to:
> https://lkml.org/lkml/2015/2/27/145
> https://lkml.org/lkml/2015/3/25/989
> https://lkml.org/lkml/2015/5/14/244
> https://lkml.org/lkml/2015/7/7/200
> https://lkml.org/lkml/2015/9/27/209
> https://lkml.org/lkml/2016/5/19/212
> https://lkml.org/lkml/2016/7/19/181
> https://lkml.org/lkml/2016/7/25/99
> https://lkml.org/lkml/2016/7/26/52
> https://lkml.org/lkml/2016/8/8/96
>
> Change log v11 -> v12:
> 1. Rebase
> 2. Add a short summary
>
> Change log v10 -> v11:
> 1. Reduce the number of repeat judgment of online/offline
> 2. Seperate out the functionality in the enable or disable situation
>
> Change log v9 -> v10:
> 1. Providing an empty definition of acpi_set_processor_mapping() for
> CONFIG_ACPI_HOTPLUG_CPU unset. In patch 5.
> 2. Fix auto build test ERROR on ia64/next. In patch 5.
> 3. Fix some comment.
>
> Change log v8 -> v9:
> 1. Providing an empty definition of acpi_set_processor_mapping() for
> CONFIG_ACPI_HOTPLUG_CPU unset.
>
> Change log v7 -> v8:
> 1. Provide the mechanism to validate processors in the ACPI tables.
> 2. Provide the interface to validate the proc_id when setting the mapping.
>
> Change log v6 -> v7:
> 1. Fix arm64 build failure.
>
> Change log v5 -> v6:
> 1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.
>
> Change log v4 -> v5:
> 1. Remove useless code in patch 1.
> 2. Small improvement of commit message.
>
> Change log v3 -> v4:
> 1. Fix the kernel panic at boot time. The cause is that I tried to build zonelists
>    before per cpu areas were initialized.
>
> Change log v2 -> v3:
> 1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
> 2. Build zonelists for memory-less nodes so that memory allocator will fall
>    back to proper nodes automatically.
>
> Change log v1 -> v2:
> 1. Split code movement and actual changes. Add patch 1.
> 2. Synchronize best near online node record when node hotplug happens. In patch 2.
> 3. Fix some comment.
>
> Dou Liyang (2):
>   acpi: Provide the mechanism to validate processors in the ACPI tables
>   acpi: Provide the interface to validate the proc_id
>
> Gu Zheng (4):
>   x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
>     boot time.
>   x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
>     persistent cpuid <-> apicid mapping.
>   x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
>   x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
>     booting.
>
> Tang Chen (1):
>   x86, memhp, numa: Online memory-less nodes at boot time.
>
>  arch/ia64/kernel/acpi.c       |   3 +-
>  arch/x86/include/asm/mpspec.h |   1 +
>  arch/x86/kernel/acpi/boot.c   |  11 ++--
>  arch/x86/kernel/apic/apic.c   |  77 +++++++++++++++++++++++--
>  arch/x86/mm/numa.c            |  27 +++++----
>  drivers/acpi/acpi_processor.c | 105 ++++++++++++++++++++++++++++++++-
>  drivers/acpi/bus.c            |   1 +
>  drivers/acpi/processor_core.c | 131 +++++++++++++++++++++++++++++++++++-------
>  include/linux/acpi.h          |   6 ++
>  9 files changed, 311 insertions(+), 51 deletions(-)
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent
  2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
                   ` (7 preceding siblings ...)
  2016-08-25  9:08 ` [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
@ 2016-09-02  6:57 ` Dou Liyang
  2016-09-13 11:33   ` Dou Liyang
  8 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-09-02  6:57 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm

Ping...

At 08/25/2016 04:35 PM, Dou Liyang wrote:
> [Summary]
>
> Use ACPI tables: MADT, DSDT.
> 1. Create cpuid in order based on Local Apic ID in MADT(apicid).
> 2. Obtain the nodeid by the proc_id in DSDT.
> 3. Make the cpuid <-> nodeid mapping persistent.
>
> The mapping relations:
>
> proc_id in DSDT <--> Processor ID in MADT(acpiid) <--> Local Apic ID in MADT(apicid)
>         ^                                                        ^
>         |                                                        |
>         v                                                        v 
>    pxm in DSDT                                                 cpuid
>         ^
>         |
>         v
>      nodeid
>
> [Problem]
>
> cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
> the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
>
> When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
> which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
> workqueue does not update wq_numa_possible_cpumask.
>
> So here is the problem:
>
> Assume we have the following cpuid <-> nodeid in the beginning:
>
>   Node | CPU
> ------------------------
> node 0 |  0-14, 60-74
> node 1 | 15-29, 75-89
> node 2 | 30-44, 90-104
> node 3 | 45-59, 105-119
>
> and we hot-remove node2 and node3, it becomes:
>
>   Node | CPU
> ------------------------
> node 0 |  0-14, 60-74
> node 1 | 15-29, 75-89
>
> and we hot-add node4 and node5, it becomes:
>
>   Node | CPU
> ------------------------
> node 0 |  0-14, 60-74
> node 1 | 15-29, 75-89
> node 4 | 30-59
> node 5 | 90-119
>
> But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
>
> When a pool workqueue is initialized, if its cpumask belongs to a node, its
> pool->node will be mapped to that node. And memory used by this workqueue will
> also be allocated on that node.
>
> static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
> ...
>         /* if cpumask is contained inside a NUMA node, we belong to that node */
>         if (wq_numa_enabled) {
>                 for_each_node(node) {
>                         if (cpumask_subset(pool->attrs->cpumask,
>                                            wq_numa_possible_cpumask[node])) {
>                                 pool->node = node;
>                                 break;
>                         }
>                 }
>         }
>
> Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
> which will lead to memory allocation failure:
>
>  SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
>   cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
>   node 0: slabs: 6172, objs: 259224, free: 245741
>   node 1: slabs: 3261, objs: 136962, free: 127656
>
> It happens here:
>
> create_worker(struct worker_pool *pool)
>  |--> worker = alloc_worker(pool->node);
>
> static struct worker *alloc_worker(int node)
> {
>         struct worker *worker;
>
>         worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
>
>         ......
>
>         return worker;
> }
>
>
> [Solution]
>
> There are four mappings in the kernel:
> 1. nodeid (logical node id)   <->   pxm
> 2. apicid (physical cpu id)   <->   nodeid
> 3. cpuid (logical cpu id)     <->   apicid
> 4. cpuid (logical cpu id)     <->   nodeid
>
> 1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
>    mapping is setup at boot time. This mapping is persistent, won't change.
>
> 2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
>    time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
>    persistent.
>
> 3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
>    allocated, lower ids first, and released at CPU hotremove time, reused for other
>    hotadded CPUs. So this mapping is not persistent.
>
> 4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
>    cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.
>
> To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
> cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
> cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
> mapping. So the key point is obtaining all cpus' apicid.
>
> apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
> MADT (Multiple APIC Description Table). So we finish the job in the following steps:
>
> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
>    This is done by introducing an extra parameter to generic_processor_info to let the
>    caller control if disabled cpus are ignored.
>
> 2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
>    the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
>    registering local apic. Store the mapping in this array.
>
> 3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
>    This is also done by introducing an extra parameter to these apis to let the caller
>    control if disabled cpus are ignored.
>
> 4. Establish all possible cpuid <-> nodeid mapping.
>    This is done via an additional acpi namespace walk for processors.
>
>
> For previous discussion, please refer to:
> https://lkml.org/lkml/2015/2/27/145
> https://lkml.org/lkml/2015/3/25/989
> https://lkml.org/lkml/2015/5/14/244
> https://lkml.org/lkml/2015/7/7/200
> https://lkml.org/lkml/2015/9/27/209
> https://lkml.org/lkml/2016/5/19/212
> https://lkml.org/lkml/2016/7/19/181
> https://lkml.org/lkml/2016/7/25/99
> https://lkml.org/lkml/2016/7/26/52
> https://lkml.org/lkml/2016/8/8/96
>
> Change log v11 -> v12:
> 1. Rebase
> 2. Add a short summary
>
> Change log v10 -> v11:
> 1. Reduce the number of repeat judgment of online/offline
> 2. Seperate out the functionality in the enable or disable situation
>
> Change log v9 -> v10:
> 1. Providing an empty definition of acpi_set_processor_mapping() for
> CONFIG_ACPI_HOTPLUG_CPU unset. In patch 5.
> 2. Fix auto build test ERROR on ia64/next. In patch 5.
> 3. Fix some comment.
>
> Change log v8 -> v9:
> 1. Providing an empty definition of acpi_set_processor_mapping() for
> CONFIG_ACPI_HOTPLUG_CPU unset.
>
> Change log v7 -> v8:
> 1. Provide the mechanism to validate processors in the ACPI tables.
> 2. Provide the interface to validate the proc_id when setting the mapping.
>
> Change log v6 -> v7:
> 1. Fix arm64 build failure.
>
> Change log v5 -> v6:
> 1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.
>
> Change log v4 -> v5:
> 1. Remove useless code in patch 1.
> 2. Small improvement of commit message.
>
> Change log v3 -> v4:
> 1. Fix the kernel panic at boot time. The cause is that I tried to build zonelists
>    before per cpu areas were initialized.
>
> Change log v2 -> v3:
> 1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
> 2. Build zonelists for memory-less nodes so that memory allocator will fall
>    back to proper nodes automatically.
>
> Change log v1 -> v2:
> 1. Split code movement and actual changes. Add patch 1.
> 2. Synchronize best near online node record when node hotplug happens. In patch 2.
> 3. Fix some comment.
>
> Dou Liyang (2):
>   acpi: Provide the mechanism to validate processors in the ACPI tables
>   acpi: Provide the interface to validate the proc_id
>
> Gu Zheng (4):
>   x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
>     boot time.
>   x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
>     persistent cpuid <-> apicid mapping.
>   x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
>   x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
>     booting.
>
> Tang Chen (1):
>   x86, memhp, numa: Online memory-less nodes at boot time.
>
>  arch/ia64/kernel/acpi.c       |   3 +-
>  arch/x86/include/asm/mpspec.h |   1 +
>  arch/x86/kernel/acpi/boot.c   |  11 ++--
>  arch/x86/kernel/apic/apic.c   |  77 +++++++++++++++++++++++--
>  arch/x86/mm/numa.c            |  27 +++++----
>  drivers/acpi/acpi_processor.c | 105 ++++++++++++++++++++++++++++++++-
>  drivers/acpi/bus.c            |   1 +
>  drivers/acpi/processor_core.c | 131 +++++++++++++++++++++++++++++++++++-------
>  include/linux/acpi.h          |   6 ++
>  9 files changed, 311 insertions(+), 51 deletions(-)
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent
  2016-09-02  6:57 ` Dou Liyang
@ 2016-09-13 11:33   ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-09-13 11:33 UTC (permalink / raw)
  To: cl, tj, mika.j.penttila, mingo, akpm, rjw, hpa, yasu.isimatu,
	isimatu.yasuaki, kamezawa.hiroyu, izumi.taku, gongzhaogang,
	len.brown, lenb, tglx, chen.tang, rafael
  Cc: x86, linux-acpi, linux-kernel, linux-mm

Ping...

At 09/02/2016 02:57 PM, Dou Liyang wrote:
> Ping...
>
> At 08/25/2016 04:35 PM, Dou Liyang wrote:
>> [Summary]
>>
>> Use ACPI tables: MADT, DSDT.
>> 1. Create cpuid in order based on Local Apic ID in MADT(apicid).
>> 2. Obtain the nodeid by the proc_id in DSDT.
>> 3. Make the cpuid <-> nodeid mapping persistent.
>>
>> The mapping relations:
>>
>> proc_id in DSDT <--> Processor ID in MADT(acpiid) <--> Local Apic ID
>> in MADT(apicid)
>>         ^                                                        ^
>>         |                                                        |
>>         v                                                        v 
>>    pxm in DSDT                                                 cpuid
>>         ^
>>         |
>>         v
>>      nodeid
>>
>> [Problem]
>>
>> cpuid <-> nodeid mapping is firstly established at boot time. And
>> workqueue caches
>> the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
>>
>> When doing node online/offline, cpuid <-> nodeid mapping is
>> established/destroyed,
>> which means, cpuid <-> nodeid mapping will change if node hotplug
>> happens. But
>> workqueue does not update wq_numa_possible_cpumask.
>>
>> So here is the problem:
>>
>> Assume we have the following cpuid <-> nodeid in the beginning:
>>
>>   Node | CPU
>> ------------------------
>> node 0 |  0-14, 60-74
>> node 1 | 15-29, 75-89
>> node 2 | 30-44, 90-104
>> node 3 | 45-59, 105-119
>>
>> and we hot-remove node2 and node3, it becomes:
>>
>>   Node | CPU
>> ------------------------
>> node 0 |  0-14, 60-74
>> node 1 | 15-29, 75-89
>>
>> and we hot-add node4 and node5, it becomes:
>>
>>   Node | CPU
>> ------------------------
>> node 0 |  0-14, 60-74
>> node 1 | 15-29, 75-89
>> node 4 | 30-59
>> node 5 | 90-119
>>
>> But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and
>> the like.
>>
>> When a pool workqueue is initialized, if its cpumask belongs to a
>> node, its
>> pool->node will be mapped to that node. And memory used by this
>> workqueue will
>> also be allocated on that node.
>>
>> static struct worker_pool *get_unbound_pool(const struct
>> workqueue_attrs *attrs){
>> ...
>>         /* if cpumask is contained inside a NUMA node, we belong to
>> that node */
>>         if (wq_numa_enabled) {
>>                 for_each_node(node) {
>>                         if (cpumask_subset(pool->attrs->cpumask,
>>
>> wq_numa_possible_cpumask[node])) {
>>                                 pool->node = node;
>>                                 break;
>>                         }
>>                 }
>>         }
>>
>> Since wq_numa_possible_cpumask is not updated, it could be mapped to
>> an offline node,
>> which will lead to memory allocation failure:
>>
>>  SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
>>   cache: kmalloc-192, object size: 192, buffer size: 192, default
>> order: 1, min order: 0
>>   node 0: slabs: 6172, objs: 259224, free: 245741
>>   node 1: slabs: 3261, objs: 136962, free: 127656
>>
>> It happens here:
>>
>> create_worker(struct worker_pool *pool)
>>  |--> worker = alloc_worker(pool->node);
>>
>> static struct worker *alloc_worker(int node)
>> {
>>         struct worker *worker;
>>
>>         worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); -->
>> Here, useing the wrong node.
>>
>>         ......
>>
>>         return worker;
>> }
>>
>>
>> [Solution]
>>
>> There are four mappings in the kernel:
>> 1. nodeid (logical node id)   <->   pxm
>> 2. apicid (physical cpu id)   <->   nodeid
>> 3. cpuid (logical cpu id)     <->   apicid
>> 4. cpuid (logical cpu id)     <->   nodeid
>>
>> 1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and
>> nodeid <-> pxm
>>    mapping is setup at boot time. This mapping is persistent, won't
>> change.
>>
>> 2. apicid <-> nodeid mapping is setup using info in 1. The mapping is
>> setup at boot
>>    time and CPU hotadd time, and cleared at CPU hotremove time. This
>> mapping is also
>>    persistent.
>>
>> 3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time.
>> cpuid is
>>    allocated, lower ids first, and released at CPU hotremove time,
>> reused for other
>>    hotadded CPUs. So this mapping is not persistent.
>>
>> 4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd
>> time, and
>>    cleared at CPU hotremove time. As a result of 3, this mapping is
>> not persistent.
>>
>> To fix this problem, we establish cpuid <-> nodeid mapping for all the
>> possible
>> cpus at boot time, and make it persistent. And according to
>> init_cpu_to_node(),
>> cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and
>> cpuid <-> apicid
>> mapping. So the key point is obtaining all cpus' apicid.
>>
>> apicid can be obtained by _MAT (Multiple APIC Table Entry) method or
>> found in
>> MADT (Multiple APIC Description Table). So we finish the job in the
>> following steps:
>>
>> 1. Enable apic registeration flow to handle both enabled and disabled
>> cpus.
>>    This is done by introducing an extra parameter to
>> generic_processor_info to let the
>>    caller control if disabled cpus are ignored.
>>
>> 2. Introduce a new array storing all possible cpuid <-> apicid
>> mapping. And also modify
>>    the way cpuid is calculated. Establish all possible cpuid <->
>> apicid mapping when
>>    registering local apic. Store the mapping in this array.
>>
>> 3. Enable _MAT and MADT relative apis to return non-presnet or
>> disabled cpus' apicid.
>>    This is also done by introducing an extra parameter to these apis
>> to let the caller
>>    control if disabled cpus are ignored.
>>
>> 4. Establish all possible cpuid <-> nodeid mapping.
>>    This is done via an additional acpi namespace walk for processors.
>>
>>
>> For previous discussion, please refer to:
>> https://lkml.org/lkml/2015/2/27/145
>> https://lkml.org/lkml/2015/3/25/989
>> https://lkml.org/lkml/2015/5/14/244
>> https://lkml.org/lkml/2015/7/7/200
>> https://lkml.org/lkml/2015/9/27/209
>> https://lkml.org/lkml/2016/5/19/212
>> https://lkml.org/lkml/2016/7/19/181
>> https://lkml.org/lkml/2016/7/25/99
>> https://lkml.org/lkml/2016/7/26/52
>> https://lkml.org/lkml/2016/8/8/96
>>
>> Change log v11 -> v12:
>> 1. Rebase
>> 2. Add a short summary
>>
>> Change log v10 -> v11:
>> 1. Reduce the number of repeat judgment of online/offline
>> 2. Seperate out the functionality in the enable or disable situation
>>
>> Change log v9 -> v10:
>> 1. Providing an empty definition of acpi_set_processor_mapping() for
>> CONFIG_ACPI_HOTPLUG_CPU unset. In patch 5.
>> 2. Fix auto build test ERROR on ia64/next. In patch 5.
>> 3. Fix some comment.
>>
>> Change log v8 -> v9:
>> 1. Providing an empty definition of acpi_set_processor_mapping() for
>> CONFIG_ACPI_HOTPLUG_CPU unset.
>>
>> Change log v7 -> v8:
>> 1. Provide the mechanism to validate processors in the ACPI tables.
>> 2. Provide the interface to validate the proc_id when setting the
>> mapping.
>>
>> Change log v6 -> v7:
>> 1. Fix arm64 build failure.
>>
>> Change log v5 -> v6:
>> 1. Define func acpi_map_cpu2node() for x86 and ia64 respectively.
>>
>> Change log v4 -> v5:
>> 1. Remove useless code in patch 1.
>> 2. Small improvement of commit message.
>>
>> Change log v3 -> v4:
>> 1. Fix the kernel panic at boot time. The cause is that I tried to
>> build zonelists
>>    before per cpu areas were initialized.
>>
>> Change log v2 -> v3:
>> 1. Online memory-less nodes at boot time to map cpus of memory-less
>> nodes.
>> 2. Build zonelists for memory-less nodes so that memory allocator will
>> fall
>>    back to proper nodes automatically.
>>
>> Change log v1 -> v2:
>> 1. Split code movement and actual changes. Add patch 1.
>> 2. Synchronize best near online node record when node hotplug happens.
>> In patch 2.
>> 3. Fix some comment.
>>
>> Dou Liyang (2):
>>   acpi: Provide the mechanism to validate processors in the ACPI tables
>>   acpi: Provide the interface to validate the proc_id
>>
>> Gu Zheng (4):
>>   x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
>>     boot time.
>>   x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
>>     persistent cpuid <-> apicid mapping.
>>   x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
>>   x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
>>     booting.
>>
>> Tang Chen (1):
>>   x86, memhp, numa: Online memory-less nodes at boot time.
>>
>>  arch/ia64/kernel/acpi.c       |   3 +-
>>  arch/x86/include/asm/mpspec.h |   1 +
>>  arch/x86/kernel/acpi/boot.c   |  11 ++--
>>  arch/x86/kernel/apic/apic.c   |  77 +++++++++++++++++++++++--
>>  arch/x86/mm/numa.c            |  27 +++++----
>>  drivers/acpi/acpi_processor.c | 105 ++++++++++++++++++++++++++++++++-
>>  drivers/acpi/bus.c            |   1 +
>>  drivers/acpi/processor_core.c | 131
>> +++++++++++++++++++++++++++++++++++-------
>>  include/linux/acpi.h          |   6 ++
>>  9 files changed, 311 insertions(+), 51 deletions(-)
>>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [tip:x86/apic] x86/numa: Online memory-less nodes at boot time
  2016-08-25  8:35 ` [PATCH v12 1/7] x86, memhp, numa: Online memory-less nodes at boot time Dou Liyang
@ 2016-09-22 19:09   ` tip-bot for Tang Chen
  0 siblings, 0 replies; 33+ messages in thread
From: tip-bot for Tang Chen @ 2016-09-22 19:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: zhugh.fnst, tglx, linux-kernel, tangchen, mingo, douly.fnst, hpa

Commit-ID:  2532fc318db0e1fe68e01407ee27634c76916e44
Gitweb:     http://git.kernel.org/tip/2532fc318db0e1fe68e01407ee27634c76916e44
Author:     Tang Chen <tangchen@cn.fujitsu.com>
AuthorDate: Thu, 25 Aug 2016 16:35:14 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 21 Sep 2016 21:18:38 +0200

x86/numa: Online memory-less nodes at boot time

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.

Allocate pgdats for all memory-less nodes and online them at boot
time. Then build zonelists for these nodes. As a result, when cpus on these
memory-less nodes try to allocate memory from local node, it will
automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-2-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/mm/numa.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fb68210..3f35b48 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -722,22 +722,19 @@ void __init x86_numa_init(void)
 	numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-	int n, val;
-	int min_val = INT_MAX;
-	int best_node = -1;
+	unsigned long zones_size[MAX_NR_ZONES] = {0};
+	unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-	for_each_online_node(n) {
-		val = node_distance(node, n);
+	/* Allocate and initialize node data. Memory-less node is now online.*/
+	alloc_node_data(nid);
+	free_area_init_node(nid, zones_size, 0, zholes_size);
 
-		if (val < min_val) {
-			min_val = val;
-			best_node = n;
-		}
-	}
-
-	return best_node;
+	/*
+	 * All zonelists will be built later in start_kernel() after per cpu
+	 * areas are initialized.
+	 */
 }
 
 /*
@@ -766,8 +763,10 @@ void __init init_cpu_to_node(void)
 
 		if (node == NUMA_NO_NODE)
 			continue;
+
 		if (!node_online(node))
-			node = find_near_online_node(node);
+			init_memory_less_node(node);
+
 		numa_set_node(cpu, node);
 	}
 }

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [tip:x86/apic] x86/acpi: Enable acpi to register all possible cpus at boot time
  2016-08-25  8:35 ` [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus " Dou Liyang
  2016-08-25  8:57   ` Dou Liyang
@ 2016-09-22 19:10   ` tip-bot for Gu Zheng
  1 sibling, 0 replies; 33+ messages in thread
From: tip-bot for Gu Zheng @ 2016-09-22 19:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, linux-kernel, tangchen, hpa, mingo, zhugh.fnst, guz.fnst,
	douly.fnst

Commit-ID:  f7c28833c252031bc68a29e26a18a661797cf3a3
Gitweb:     http://git.kernel.org/tip/f7c28833c252031bc68a29e26a18a661797cf3a3
Author:     Gu Zheng <guz.fnst@cn.fujitsu.com>
AuthorDate: Thu, 25 Aug 2016 16:35:15 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 21 Sep 2016 21:18:38 +0200

x86/acpi: Enable acpi to register all possible cpus at boot time

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 779dae5..a8c94bb 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2021,7 +2021,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-int generic_processor_info(int apicid, int version)
+static int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2087,7 +2087,6 @@ int generic_processor_info(int apicid, int version)
 		return -EINVAL;
 	}
 
-	num_processors++;
 	if (apicid == boot_cpu_physical_apicid) {
 		/*
 		 * x86_bios_cpu_apicid is required to have processors listed
@@ -2110,6 +2109,7 @@ int generic_processor_info(int apicid, int version)
 
 		pr_warning("APIC: Package limit reached. Processor %d/0x%x ignored.\n",
 			   thiscpu, apicid);
+
 		disabled_cpus++;
 		return -ENOSPC;
 	}
@@ -2128,7 +2128,6 @@ int generic_processor_info(int apicid, int version)
 			boot_cpu_apic_version, cpu, version);
 	}
 
-	physid_set(apicid, phys_cpu_present_map);
 	if (apicid > max_physical_apicid)
 		max_physical_apicid = apicid;
 
@@ -2141,11 +2140,23 @@ int generic_processor_info(int apicid, int version)
 		apic->x86_32_early_logical_apicid(cpu);
 #endif
 	set_cpu_possible(cpu, true);
-	set_cpu_present(cpu, true);
+
+	if (enabled) {
+		num_processors++;
+		physid_set(apicid, phys_cpu_present_map);
+		set_cpu_present(cpu, true);
+	} else {
+		disabled_cpus++;
+	}
 
 	return cpu;
 }
 
+int generic_processor_info(int apicid, int version)
+{
+	return __generic_processor_info(apicid, version, true);
+}
+
 int hard_smp_processor_id(void)
 {
 	return read_apic_id();

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-08-25  8:35 ` [PATCH v12 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping Dou Liyang
@ 2016-09-22 19:10   ` tip-bot for Gu Zheng
  2016-10-04  6:02     ` Yinghai Lu
  0 siblings, 1 reply; 33+ messages in thread
From: tip-bot for Gu Zheng @ 2016-09-22 19:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tangchen, linux-kernel, guz.fnst, douly.fnst, mingo, zhugh.fnst,
	hpa, tglx

Commit-ID:  8f54969dc8d6704632b42cbb5e47730cd75cc713
Gitweb:     http://git.kernel.org/tip/8f54969dc8d6704632b42cbb5e47730cd75cc713
Author:     Gu Zheng <guz.fnst@cn.fujitsu.com>
AuthorDate: Thu, 25 Aug 2016 16:35:16 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 21 Sep 2016 21:18:38 +0200

x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  7 +----
 arch/x86/kernel/apic/apic.c   | 60 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 59 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index c2f94dc..3200704 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE	BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 0447e31..7d668d1 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -176,15 +176,10 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
 		return -EINVAL;
 	}
 
-	if (!enabled) {
-		++disabled_cpus;
-		return -EINVAL;
-	}
-
 	if (boot_cpu_physical_apicid != -1U)
 		ver = boot_cpu_apic_version;
 
-	cpu = generic_processor_info(id, ver);
+	cpu = __generic_processor_info(id, ver, enabled);
 	if (cpu >= 0)
 		early_per_cpu(x86_cpu_to_acpiid, cpu) = acpiid;
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a8c94bb..2dc01c3 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2021,7 +2021,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 	apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+	int i;
+
+	/*
+	 * cpuid <-> apicid mapping is persistent, so when a cpu is up,
+	 * check if the kernel has allocated a cpuid for it.
+	 */
+	for (i = 0; i < nr_logical_cpuids; i++) {
+		if (cpuid_to_apicid[i] == apicid)
+			return i;
+	}
+
+	/* Allocate a new cpuid. */
+	if (nr_logical_cpuids >= nr_cpu_ids) {
+		WARN_ONCE(1, "Only %d processors supported."
+			     "Processor %d/0x%x and the rest are ignored.\n",
+			     nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+		return -1;
+	}
+
+	cpuid_to_apicid[nr_logical_cpuids] = apicid;
+	return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
 	int cpu, max = nr_cpu_ids;
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2096,8 +2142,16 @@ static int __generic_processor_info(int apicid, int version, bool enabled)
 		 * for BSP.
 		 */
 		cpu = 0;
-	} else
-		cpu = cpumask_next_zero(-1, cpu_present_mask);
+
+		/* Logical cpuid 0 is reserved for BSP. */
+		cpuid_to_apicid[0] = apicid;
+	} else {
+		cpu = allocate_logical_cpuid(apicid);
+		if (cpu < 0) {
+			disabled_cpus++;
+			return -EINVAL;
+		}
+	}
 
 	/*
 	 * This can happen on physical hotplug. The sanity check at boot time

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [tip:x86/apic] x86/acpi: Enable MADT APIs to return disabled apicids
  2016-08-25  8:35 ` [PATCH v12 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Dou Liyang
@ 2016-09-22 19:11   ` tip-bot for Gu Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: tip-bot for Gu Zheng @ 2016-09-22 19:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tangchen, linux-kernel, mingo, guz.fnst, hpa, tglx, zhugh.fnst,
	douly.fnst

Commit-ID:  8ad893faf2eaedb710a3073afbb5d569df2c3e41
Gitweb:     http://git.kernel.org/tip/8ad893faf2eaedb710a3073afbb5d569df2c3e41
Author:     Gu Zheng <guz.fnst@cn.fujitsu.com>
AuthorDate: Thu, 25 Aug 2016 16:35:17 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 21 Sep 2016 21:18:39 +0200

x86/acpi: Enable MADT APIs to return disabled apicids

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm        (persistent)
2. apicid (physical cpu id)   <->   nodeid     (persistent)
3. cpuid (logical cpu id)     <->   apicid     (not persistent, now persistent by step 2)
4. cpuid (logical cpu id)     <->   nodeid     (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-5-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 60 +++++++++++++++++++++++++++----------------
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..02b84aa 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
 	 *  Extra Processor objects may be enumerated on MP systems with
 	 *  less than the max # of CPUs. They should be ignored _iff
 	 *  they are physically not present.
+	 *
+	 *  NOTE: Even if the processor has a cpuid, it may not be present
+	 *  because cpuid <-> apicid mapping is persistent now.
 	 */
-	if (invalid_logical_cpuid(pr->id)) {
+	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
 		int ret = acpi_processor_hotadd_init(pr);
 		if (ret)
 			return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 9125d7d..fd59ae8 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-		 u32 acpi_id, phys_cpuid_t *apic_id)
+		 u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
 	struct acpi_madt_local_apic *lapic =
 		container_of(entry, struct acpi_madt_local_apic, header);
 
-	if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_x2apic *apic =
 		container_of(entry, struct acpi_madt_local_x2apic, header);
 
-	if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+		bool ignore_disabled)
 {
 	struct acpi_madt_local_sapic *lsapic =
 		container_of(entry, struct acpi_madt_local_sapic, header);
 
-	if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+		int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
+		bool ignore_disabled)
 {
 	struct acpi_madt_generic_interrupt *gicc =
 	    container_of(entry, struct acpi_madt_generic_interrupt, header);
 
-	if (!(gicc->flags & ACPI_MADT_ENABLED))
+	if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
 		return -ENODEV;
 
 	/* device_declaration means Device object in DSDT, in the
@@ -109,7 +112,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
 }
 
 static phys_cpuid_t map_madt_entry(struct acpi_table_madt *madt,
-				   int type, u32 acpi_id)
+				   int type, u32 acpi_id, bool ignore_disabled)
 {
 	unsigned long madt_end, entry;
 	phys_cpuid_t phys_id = PHYS_CPUID_INVALID;	/* CPU hardware ID */
@@ -127,16 +130,20 @@ static phys_cpuid_t map_madt_entry(struct acpi_table_madt *madt,
 		struct acpi_subtable_header *header =
 			(struct acpi_subtable_header *)entry;
 		if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
-			if (!map_lapic_id(header, acpi_id, &phys_id))
+			if (!map_lapic_id(header, acpi_id, &phys_id,
+					  ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
-			if (!map_x2apic_id(header, type, acpi_id, &phys_id))
+			if (!map_x2apic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
-			if (!map_lsapic_id(header, type, acpi_id, &phys_id))
+			if (!map_lsapic_id(header, type, acpi_id, &phys_id,
+					   ignore_disabled))
 				break;
 		} else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT) {
-			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id))
+			if (!map_gicc_mpidr(header, type, acpi_id, &phys_id,
+					    ignore_disabled))
 				break;
 		}
 		entry += header->length;
@@ -156,14 +163,15 @@ phys_cpuid_t __init acpi_map_madt_entry(u32 acpi_id)
 	if (!madt)
 		return PHYS_CPUID_INVALID;
 
-	rv = map_madt_entry(madt, 1, acpi_id);
+	rv = map_madt_entry(madt, 1, acpi_id, true);
 
 	early_acpi_os_unmap_memory(madt, tbl_size);
 
 	return rv;
 }
 
-static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id,
+				  bool ignore_disabled)
 {
 	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
 	union acpi_object *obj;
@@ -184,30 +192,38 @@ static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
 
 	header = (struct acpi_subtable_header *)obj->buffer.pointer;
 	if (header->type == ACPI_MADT_TYPE_LOCAL_APIC)
-		map_lapic_id(header, acpi_id, &phys_id);
+		map_lapic_id(header, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC)
-		map_lsapic_id(header, type, acpi_id, &phys_id);
+		map_lsapic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC)
-		map_x2apic_id(header, type, acpi_id, &phys_id);
+		map_x2apic_id(header, type, acpi_id, &phys_id, ignore_disabled);
 	else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT)
-		map_gicc_mpidr(header, type, acpi_id, &phys_id);
+		map_gicc_mpidr(header, type, acpi_id, &phys_id,
+			       ignore_disabled);
 
 exit:
 	kfree(buffer.pointer);
 	return phys_id;
 }
 
-phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t __acpi_get_phys_id(acpi_handle handle, int type,
+				       u32 acpi_id, bool ignore_disabled)
 {
 	phys_cpuid_t phys_id;
 
-	phys_id = map_mat_entry(handle, type, acpi_id);
+	phys_id = map_mat_entry(handle, type, acpi_id, ignore_disabled);
 	if (invalid_phys_cpuid(phys_id))
-		phys_id = map_madt_entry(get_madt_table(), type, acpi_id);
+		phys_id = map_madt_entry(get_madt_table(), type, acpi_id,
+					   ignore_disabled);
 
 	return phys_id;
 }
 
+phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+{
+	return __acpi_get_phys_id(handle, type, acpi_id, true);
+}
+
 int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id)
 {
 #ifdef CONFIG_SMP

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [tip:x86/apic] x86/acpi: Set persistent cpuid <-> nodeid mapping when booting
  2016-08-25  8:35 ` [PATCH v12 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Dou Liyang
@ 2016-09-22 19:11   ` tip-bot for Gu Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: tip-bot for Gu Zheng @ 2016-09-22 19:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: guz.fnst, tglx, tangchen, zhugh.fnst, linux-kernel, douly.fnst,
	mingo, hpa

Commit-ID:  dc6db24d2476cd09c0ecf2b8d80313539f737a89
Gitweb:     http://git.kernel.org/tip/dc6db24d2476cd09c0ecf2b8d80313539f737a89
Author:     Gu Zheng <guz.fnst@cn.fujitsu.com>
AuthorDate: Thu, 25 Aug 2016 16:35:18 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 21 Sep 2016 21:18:39 +0200

x86/acpi: Set persistent cpuid <-> nodeid mapping when booting

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

[ tglx: Remove the unneeded exports ]

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-6-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/ia64/kernel/acpi.c       |  2 +-
 arch/x86/kernel/acpi/boot.c   |  3 +-
 drivers/acpi/acpi_processor.c |  5 ++++
 drivers/acpi/bus.c            |  1 +
 drivers/acpi/processor_core.c | 68 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h          |  3 ++
 6 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index 92b7bc9..9273e03 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	/*
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7d668d1..fc88410 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -702,7 +702,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include <acpi/processor.h>
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
 	int nid;
@@ -713,6 +713,7 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 		numa_set_node(cpu, nid);
 	}
 #endif
+	return 0;
 }
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 02b84aa..f9f23fd 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+	return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
 	unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 85b7d07..a760dac 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1193,6 +1193,7 @@ static int __init acpi_init(void)
 	acpi_wakeup_device_init();
 	acpi_debugger_init();
 	acpi_setup_sb_notify_handler();
+	acpi_set_processor_mapping();
 	return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index fd59ae8..8801976 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -280,6 +280,74 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool __init
+map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
+{
+	int type;
+	u32 acpi_id;
+	acpi_status status;
+	acpi_object_type acpi_type;
+	unsigned long long tmp;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_get_type(handle, &acpi_type);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	switch (acpi_type) {
+	case ACPI_TYPE_PROCESSOR:
+		status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = object.processor.proc_id;
+		break;
+	case ACPI_TYPE_DEVICE:
+		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
+		if (ACPI_FAILURE(status))
+			return false;
+		acpi_id = tmp;
+		break;
+	default:
+		return false;
+	}
+
+	type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
+
+	*phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
+	*cpuid = acpi_map_cpuid(*phys_id, acpi_id);
+	if (*cpuid == -1)
+		return false;
+
+	return true;
+}
+
+static acpi_status __init
+set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
+			   void **rv)
+{
+	phys_cpuid_t phys_id;
+	int cpu_id;
+
+	if (!map_processor(handle, &phys_id, &cpu_id))
+		return AE_ERROR;
+
+	acpi_map_cpu2node(handle, cpu_id, phys_id);
+	return AE_OK;
+}
+
+void __init acpi_set_processor_mapping(void)
+{
+	/* Set persistent cpu <-> node mapping for all processors. */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+			    ACPI_UINT32_MAX, set_processor_node_mapping,
+			    NULL, NULL, NULL);
+}
+#else
+void __init acpi_set_processor_mapping(void) {}
+#endif /* CONFIG_ACPI_HOTPLUG_CPU */
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 static int get_ioapic_id(struct acpi_subtable_header *entry, u32 gsi_base,
 			 u64 *phys_addr, int *ioapic_id)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c9a596b..5b4f9ac 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -271,8 +271,11 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
 int acpi_unmap_cpu(int cpu);
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid);
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
+void acpi_set_processor_mapping(void);
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
 #endif

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [tip:x86/apic] acpi: Provide mechanism to validate processors in the ACPI tables
  2016-08-25  8:35 ` [PATCH v12 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables Dou Liyang
@ 2016-09-22 19:12   ` tip-bot for Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: tip-bot for Dou Liyang @ 2016-09-22 19:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, linux-kernel, mingo, tglx, douly.fnst

Commit-ID:  8e089eaa1999def4bb954caa91941f29b0672b6a
Gitweb:     http://git.kernel.org/tip/8e089eaa1999def4bb954caa91941f29b0672b6a
Author:     Dou Liyang <douly.fnst@cn.fujitsu.com>
AuthorDate: Thu, 25 Aug 2016 16:35:19 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 21 Sep 2016 21:18:39 +0200

acpi: Provide mechanism to validate processors in the ACPI tables

[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this:

proc_id   |    pxm
--------------------
0       <->     0
1       <->     0
2       <->     1
3       <->     1
89      <->     0
89      <->     0
89      <->     0
89      <->     1
89      <->     1
89      <->     2
89      <->     3
.....

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:

foreach Processor in DSDT
	proc_id = get_ACPI_Processor_number(Processor)
	if (proc_id exists )
		mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDs
which mean that the processor objects in question are not valid.

[ tglx: Add __init[data] annotations ]

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-7-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/acpi/acpi_processor.c | 79 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index f9f23fd..f27c709 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler processor_container_handler = {
 	.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids __initdata;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids __initdata;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] __initdata = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] __initdata = {
+	[0 ... NR_CPUS - 1] = -1,
+};
+
+static void __init processor_validated_ids_update(int proc_id)
+{
+	int i;
+
+	if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+		return;
+
+	/*
+	 * Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+	 * already in the IDs, do nothing.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return;
+	}
+
+	/*
+	 * Secondly, compare the proc_id with unique IDs, if the proc_id is in
+	 * the IDs, put it in the duplicate IDs.
+	 */
+	for (i = 0; i < nr_unique_ids; i++) {
+		if (unique_processor_ids[i] == proc_id) {
+			duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+			nr_duplicate_ids++;
+			return;
+		}
+	}
+
+	/*
+	 * Lastly, the proc_id is a unique ID, put it in the unique IDs.
+	 */
+	unique_processor_ids[nr_unique_ids] = proc_id;
+	nr_unique_ids++;
+}
+
+static acpi_status __init acpi_processor_ids_walk(acpi_handle handle,
+						  u32 lvl,
+						  void *context,
+						  void **rv)
+{
+	acpi_status status;
+	union acpi_object object = { 0 };
+	struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+	status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+	if (ACPI_FAILURE(status))
+		acpi_handle_info(handle, "Not get the processor object\n");
+	else
+		processor_validated_ids_update(object.processor.proc_id);
+
+	return AE_OK;
+}
+
+static void __init acpi_processor_check_duplicates(void)
+{
+	/* Search all processor nodes in ACPI namespace */
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+						ACPI_UINT32_MAX,
+						acpi_processor_ids_walk,
+						NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+	acpi_processor_check_duplicates();
 	acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
 	acpi_scan_add_handler(&processor_container_handler);
 }

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [tip:x86/apic] acpi: Validate processor id when mapping the processor
  2016-08-25  8:35 ` [PATCH v12 7/7] acpi: Provide the interface to validate the proc_id Dou Liyang
@ 2016-09-22 19:12   ` tip-bot for Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: tip-bot for Dou Liyang @ 2016-09-22 19:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, mingo, linux-kernel, douly.fnst, hpa

Commit-ID:  fd74da217df7d4bd25e95411da64e0b92762842e
Gitweb:     http://git.kernel.org/tip/fd74da217df7d4bd25e95411da64e0b92762842e
Author:     Dou Liyang <douly.fnst@cn.fujitsu.com>
AuthorDate: Thu, 25 Aug 2016 16:35:20 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 21 Sep 2016 21:18:40 +0200

acpi: Validate processor id when mapping the processor

When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, the false represents available.

When we establish all possible cpuid <-> nodeid mapping to handle the
cpu hotplugs, we will use the proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we
will stop the mapping.

[ tglx: Mark the new function __init ]

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-8-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/acpi/acpi_processor.c | 15 +++++++++++++++
 drivers/acpi/processor_core.c |  4 ++++
 include/linux/acpi.h          |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index f27c709..3de3b6b 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,21 @@ static void __init acpi_processor_check_duplicates(void)
 						NULL, NULL, NULL);
 }
 
+bool __init acpi_processor_validate_proc_id(int proc_id)
+{
+	int i;
+
+	/*
+	 * compare the proc_id with duplicate IDs, if the proc_id is already
+	 * in the duplicate IDs, return true, otherwise, return false.
+	 */
+	for (i = 0; i < nr_duplicate_ids; i++) {
+		if (duplicate_processor_ids[i] == proc_id)
+			return true;
+	}
+	return false;
+}
+
 void __init acpi_processor_init(void)
 {
 	acpi_processor_check_duplicates();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 8801976..9ac265f 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -302,6 +302,10 @@ map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int *cpuid)
 		if (ACPI_FAILURE(status))
 			return false;
 		acpi_id = object.processor.proc_id;
+
+		/* validate the acpi_id */
+		if(acpi_processor_validate_proc_id(acpi_id))
+			return false;
 		break;
 	case ACPI_TYPE_DEVICE:
 		status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 5b4f9ac..7f307f3 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -267,6 +267,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
 	return phys_id == PHYS_CPUID_INVALID;
 }
 
+/* Validate the processor object's proc_id */
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-09-22 19:10   ` [tip:x86/apic] x86/acpi: Introduce persistent storage for " tip-bot for Gu Zheng
@ 2016-10-04  6:02     ` Yinghai Lu
  2016-10-05 14:04       ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: Yinghai Lu @ 2016-10-04  6:02 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Gu Zheng, Tang Chen, Ingo Molnar,
	douly.fnst, zhugh.fnst, Thomas Gleixner, H. Peter Anvin
  Cc: linux-tip-commits, Tony Luck

On Thu, Sep 22, 2016 at 12:10 PM, tip-bot for Gu Zheng <tipbot@zytor.com> wrote:
>
> x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
>
> The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
> when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
> wq_numa_possible_cpumask will not cause any problem.
> It contains 4 steps:
> 1. Enable apic registeration flow to handle both enabled and disabled cpus.
> 2. Introduce a new array storing all possible cpuid <-> apicid mapping.
> 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
> 4. Establish all possible cpuid <-> nodeid mapping.
>
> This patch finishes step 2.
>
> In this patch, we introduce a new static array named cpuid_to_apicid[],
> which is large enough to store info for all possible cpus.
>
> And then, we modify the cpuid calculation. In generic_processor_info(),
> it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
> mapping changes with node hotplug.
>
> After this patch, we find the next unused cpuid, map it to an apicid,
> and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
> mapping will be persistent.
>
> And finally we will use this array to make cpuid <-> nodeid persistent.
>
> cpuid <-> apicid mapping is established at local apic registeration time.
> But non-present or disabled cpus are ignored.
>
> In this patch, we establish all possible cpuid <-> apicid mapping when
> registering local apic.

Hi,

This one cause one regression on 8 sockets system: MLC from intel does not run
anymore.

the root cause is : cpu index used to be 0-447.
with this patch, cpu index change to 0, 2-448.

The MADT from system is like:
[   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.144598] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.156836] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
...
[   47.552852] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   47.565088] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   47.577322] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   47.589561] ACPI: X2APIC (uid[0x00] apic_id[0x00] enabled)
[   47.600899] ACPI: X2APIC (uid[0x02] apic_id[0x02] enabled)
[   47.612234] ACPI: X2APIC (uid[0x04] apic_id[0x04] enabled)
...

init_cpu_node become:
[   55.477160] init_cpu_to_node:
[   55.483280] cpu 0 -> apicid 0x0 -> node 0
[   55.491558] cpu 1 -> apicid 0xff -> node 1
[   55.500017] cpu 2 -> apicid 0x2 -> node 0
[   55.508296] cpu 3 -> apicid 0x4 -> node 0
[   55.516575] cpu 4 -> apicid 0x6 -> node 0
...

looks like problem is

acpi_parse_lapic==>acpi_register_lapic==>__generic_processor_info==>allocate_logical_cpuid

it will take lapic_id[0xff] take cpu index 1.

Then will have not /dev/cpu/1/msr, that will make the MLC not happy.

Following change could workaround the problem at this point.

Index: linux-2.6/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6/arch/x86/kernel/acpi/boot.c
@@ -163,10 +163,11 @@ static int __init acpi_parse_madt(struct
  * @id: local apic id to register
  * @acpiid: ACPI id to register
  * @enabled: this cpu is enabled or not
+ * @disabled_id: not used apic id
  *
  * Returns the logic cpu number which maps to the local apic
  */
-static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
+static int acpi_register_lapic(int id, u32 acpiid, u8 enabled, int disabled_id)
 {
     unsigned int ver = 0;
     int cpu;
@@ -176,6 +177,11 @@ static int acpi_register_lapic(int id, u
         return -EINVAL;
     }

+    if (!enabled && (id == disabled_id)) {
+        ++disabled_cpus;
+        return -EINVAL;
+    }
+
     if (boot_cpu_physical_apicid != -1U)
         ver = boot_cpu_apic_version;

@@ -213,7 +219,7 @@ acpi_parse_x2apic(struct acpi_subtable_h
     if (!apic->apic_id_valid(apic_id) && enabled)
         printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
     else
-        acpi_register_lapic(apic_id, processor->uid, enabled);
+        acpi_register_lapic(apic_id, processor->uid, enabled, -1);
 #else
     printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
 #endif
@@ -242,7 +248,7 @@ acpi_parse_lapic(struct acpi_subtable_he
      */
     acpi_register_lapic(processor->id,    /* APIC ID */
                 processor->processor_id, /* ACPI ID */
-                processor->lapic_flags & ACPI_MADT_ENABLED);
+                processor->lapic_flags & ACPI_MADT_ENABLED, 0xff);

     return 0;
 }
@@ -261,7 +267,7 @@ acpi_parse_sapic(struct acpi_subtable_he

     acpi_register_lapic((processor->id << 8) | processor->eid,/* APIC ID */
                 processor->processor_id, /* ACPI ID */
-                processor->lapic_flags & ACPI_MADT_ENABLED);
+                processor->lapic_flags & ACPI_MADT_ENABLED, -1);

     return 0;
 }
@@ -725,7 +731,7 @@ int acpi_map_cpu(acpi_handle handle, phy
 {
     int cpu;

-    cpu = acpi_register_lapic(physid, U32_MAX, ACPI_MADT_ENABLED);
+    cpu = acpi_register_lapic(physid, U32_MAX, ACPI_MADT_ENABLED, -1);
     if (cpu < 0) {
         pr_info(PREFIX "Unable to map lapic to logical cpu number\n");
         return cpu;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-04  6:02     ` Yinghai Lu
@ 2016-10-05 14:04       ` Thomas Gleixner
  2016-10-06  4:53         ` Yinghai Lu
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2016-10-05 14:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linux Kernel Mailing List, Gu Zheng, Tang Chen, Ingo Molnar,
	douly.fnst, zhugh.fnst, H. Peter Anvin, linux-tip-commits,
	Tony Luck

On Mon, 3 Oct 2016, Yinghai Lu wrote:
> init_cpu_node become:
> [   55.477160] init_cpu_to_node:
> [   55.483280] cpu 0 -> apicid 0x0 -> node 0
> [   55.491558] cpu 1 -> apicid 0xff -> node 1
> [   55.500017] cpu 2 -> apicid 0x2 -> node 0
> [   55.508296] cpu 3 -> apicid 0x4 -> node 0
> [   55.516575] cpu 4 -> apicid 0x6 -> node 0
> ...
> 
> looks like problem is
> 
> acpi_parse_lapic==>acpi_register_lapic==>__generic_processor_info==>allocate_logical_cpuid
> 
> it will take lapic_id[0xff] take cpu index 1.
> 
> Then will have not /dev/cpu/1/msr, that will make the MLC not happy.

That's pretty irrelevant whether that MLC thing - whatever it is - is
unhappy or not. If it cannot deal with something missing, it's surely not
the kernels problem. Unplug the cpu and the file will be missing as well.

> -static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
> +static int acpi_register_lapic(int id, u32 acpiid, u8 enabled, int disabled_id)
>  {
>      unsigned int ver = 0;
>      int cpu;

This is whitespace damaged. The patch does not apply. Can you finaly after
doing 10 years of kernel development start being more careful?

> @@ -176,6 +177,11 @@ static int acpi_register_lapic(int id, u
>          return -EINVAL;
>      }
> 
> +    if (!enabled && (id == disabled_id)) {
> +        ++disabled_cpus;
> +        return -EINVAL;
> +    }

Why would you need that disabled_id thing at all? The proper fix is to let
the apic driver detect the issue and this boils down to a 5 lines
change. Does the patch below fix the issue for you?

Thanks,

	tglx

8<----------------
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2076,6 +2076,11 @@ int __generic_processor_info(int apicid,
 	bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
 				phys_cpu_present_map);
 
+	if (!apic->apic_id_valid(apicid)) {
+		disabled_cpus++;
+		return -EINVAL;
+	}
+
 	/*
 	 * boot_cpu_physical_apicid is designed to have the apicid
 	 * returned by read_apic_id(), i.e, the apicid of the

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-05 14:04       ` Thomas Gleixner
@ 2016-10-06  4:53         ` Yinghai Lu
  2016-10-06  8:06           ` Dou Liyang
  0 siblings, 1 reply; 33+ messages in thread
From: Yinghai Lu @ 2016-10-06  4:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linux Kernel Mailing List, Gu Zheng, Tang Chen, Ingo Molnar,
	douly.fnst, zhugh.fnst, H. Peter Anvin, linux-tip-commits,
	Tony Luck

On Wed, Oct 5, 2016 at 7:04 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> @@ -176,6 +177,11 @@ static int acpi_register_lapic(int id, u
>>          return -EINVAL;
>>      }
>>
>> +    if (!enabled && (id == disabled_id)) {
>> +        ++disabled_cpus;
>> +        return -EINVAL;
>> +    }
>
> Why would you need that disabled_id thing at all? The proper fix is to let
> the apic driver detect the issue and this boils down to a 5 lines
> change. Does the patch below fix the issue for you?
> 8<----------------
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2076,6 +2076,11 @@ int __generic_processor_info(int apicid,
>         bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
>                                 phys_cpu_present_map);
>
> +       if (!apic->apic_id_valid(apicid)) {
> +               disabled_cpus++;
> +               return -EINVAL;
> +       }
> +
>         /*
>          * boot_cpu_physical_apicid is designed to have the apicid
>          * returned by read_apic_id(), i.e, the apicid of the
>
>

No, That does not fix the issue.

the system have x2apic pre_enabled from BIOS, so at the time
apic is set to &apic_x2apic_cluster.

early_acpi_boot_init ==> early_acpi_process_madt ==> acpi_parse_madt
==> default_acpi_madt_oem_check

default_acpi_madt_oem_check
      ==> apic_x2apic_cluster/x2apic_acpi_madt_oem_check ==> x2apic_enabled
      ==> apic = &apic_x2apic_cluster

and
static int x2apic_apic_id_valid(int apicid)
{
        return 1;
}

To make your change work, may need to update x2apic_apic_id_valid to

static int x2apic_apic_id_valid(int apicid)
{
        if (apicid == 0xff || apicid == -1)
            return 0;

        return 1;
}


Thanks

Yinghai

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-06  4:53         ` Yinghai Lu
@ 2016-10-06  8:06           ` Dou Liyang
  2016-10-06 21:20             ` Yinghai Lu
  2016-10-07 13:07             ` Thomas Gleixner
  0 siblings, 2 replies; 33+ messages in thread
From: Dou Liyang @ 2016-10-06  8:06 UTC (permalink / raw)
  To: Yinghai Lu, Thomas Gleixner
  Cc: Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	linux-tip-commits, Tony Luck

Hi Yinghai,

At 10/06/2016 12:53 PM, Yinghai Lu wrote:
> On Wed, Oct 5, 2016 at 7:04 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> @@ -176,6 +177,11 @@ static int acpi_register_lapic(int id, u
>>>          return -EINVAL;
>>>      }
>>>
>>> +    if (!enabled && (id == disabled_id)) {
>>> +        ++disabled_cpus;
>>> +        return -EINVAL;
>>> +    }
>>
>> Why would you need that disabled_id thing at all? The proper fix is to let
>> the apic driver detect the issue and this boils down to a 5 lines
>> change. Does the patch below fix the issue for you?
>> 8<----------------
>> --- a/arch/x86/kernel/apic/apic.c
>> +++ b/arch/x86/kernel/apic/apic.c
>> @@ -2076,6 +2076,11 @@ int __generic_processor_info(int apicid,
>>         bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
>>                                 phys_cpu_present_map);
>>
>> +       if (!apic->apic_id_valid(apicid)) {
>> +               disabled_cpus++;
>> +               return -EINVAL;
>> +       }
>> +
>>         /*
>>          * boot_cpu_physical_apicid is designed to have the apicid
>>          * returned by read_apic_id(), i.e, the apicid of the
>>
>>
>
> No, That does not fix the issue.
>
> the system have x2apic pre_enabled from BIOS, so at the time
> apic is set to &apic_x2apic_cluster.
>
> early_acpi_boot_init ==> early_acpi_process_madt ==> acpi_parse_madt
> ==> default_acpi_madt_oem_check
>
> default_acpi_madt_oem_check
>       ==> apic_x2apic_cluster/x2apic_acpi_madt_oem_check ==> x2apic_enabled
>       ==> apic = &apic_x2apic_cluster
>
> and
> static int x2apic_apic_id_valid(int apicid)
> {
>         return 1;
> }
>
> To make your change work, may need to update x2apic_apic_id_valid to
>
> static int x2apic_apic_id_valid(int apicid)
> {
>         if (apicid == 0xff || apicid == -1)
>             return 0;
>

I seem to remember that in x2APIC Spec the x2APIC ID may be at 255 or
greater.
If we do that judgment, it may be affect x2APIC's work in some other places.

I saw the MADT, the main reason may be that we define 0xff to acpi_id
in LAPIC mode.
As you said, it was like:
[   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
...

How about doing the acpi_id check when we parse it in
acpi_parse_lapic().

8<----------------

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -233,6 +233,11 @@ acpi_parse_lapic(struct acpi_subtable_header * 
header, const unsigned long end)

         acpi_table_print_madt_entry(header);

+       if (processor->id >= 255) {
+               ++disabled_cpus;
+               return -EINVAL;
+       }
+
         /*
          * We need to register disabled CPU as well to permit
          * counting disabled CPUs. This allows us to size


Thanks

Dou

>         return 1;
> }
>
>
> Thanks
>
> Yinghai
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-06  8:06           ` Dou Liyang
@ 2016-10-06 21:20             ` Yinghai Lu
  2016-10-07  4:35               ` Dou Liyang
                                 ` (2 more replies)
  2016-10-07 13:07             ` Thomas Gleixner
  1 sibling, 3 replies; 33+ messages in thread
From: Yinghai Lu @ 2016-10-06 21:20 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck

On Thu, Oct 6, 2016 at 1:06 AM, Dou Liyang <douly.fnst@cn.fujitsu.com> wrote:

> I seem to remember that in x2APIC Spec the x2APIC ID may be at 255 or
> greater.

Good to know. Maybe later when one package have more cores like 30 cores etc.

> If we do that judgment, it may be affect x2APIC's work in some other places.
>
> I saw the MADT, the main reason may be that we define 0xff to acpi_id
> in LAPIC mode.
> As you said, it was like:
> [   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
> [   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
> [   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
> ...
>
> How about doing the acpi_id check when we parse it in
> acpi_parse_lapic().
>
> 8<----------------
>
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -233,6 +233,11 @@ acpi_parse_lapic(struct acpi_subtable_header * header,
> const unsigned long end)
>
>         acpi_table_print_madt_entry(header);
>
> +       if (processor->id >= 255) {
> +               ++disabled_cpus;
> +               return -EINVAL;
> +       }
> +
>         /*
>          * We need to register disabled CPU as well to permit
>          * counting disabled CPUs. This allows us to size

Yes, that should work. but should do the same thing for x2apic

in acpi_parse_x2apic should have

> +       if (processor->local_apic_id == -1) {
> +               ++disabled_cpus;
> +               return -EINVAL;
> +       }

that is the reason why i want to extend acpi_register_lapic()
to take extra disabled_id (one is 0xff and another is 0xffffffff)
so could save some lines.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-06 21:20             ` Yinghai Lu
@ 2016-10-07  4:35               ` Dou Liyang
  2016-10-07 12:50                 ` Thomas Gleixner
  2016-10-07 11:02               ` Thomas Gleixner
  2016-10-07 11:04               ` Thomas Gleixner
  2 siblings, 1 reply; 33+ messages in thread
From: Dou Liyang @ 2016-10-07  4:35 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck, Rafael J. Wysocki,
	lenb, Zheng, Lv, robert.moore, linux-acpi

Hi Yinghai

At 10/07/2016 05:20 AM, Yinghai Lu wrote:
> On Thu, Oct 6, 2016 at 1:06 AM, Dou Liyang <douly.fnst@cn.fujitsu.com> wrote:
>
>> I seem to remember that in x2APIC Spec the x2APIC ID may be at 255 or
>> greater.
>
> Good to know. Maybe later when one package have more cores like 30 cores etc.
>
>> If we do that judgment, it may be affect x2APIC's work in some other places.
>>
>> I saw the MADT, the main reason may be that we define 0xff to acpi_id
>> in LAPIC mode.
>> As you said, it was like:
>> [   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
>> [   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
>> [   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
>> ...
>>
>> How about doing the acpi_id check when we parse it in
>> acpi_parse_lapic().
>>
>> 8<----------------
>>
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -233,6 +233,11 @@ acpi_parse_lapic(struct acpi_subtable_header * header,
>> const unsigned long end)
>>
>>         acpi_table_print_madt_entry(header);
>>
>> +       if (processor->id >= 255) {
>> +               ++disabled_cpus;
>> +               return -EINVAL;
>> +       }
>> +
>>         /*
>>          * We need to register disabled CPU as well to permit
>>          * counting disabled CPUs. This allows us to size
>
> Yes, that should work. but should do the same thing for x2apic
>
> in acpi_parse_x2apic should have
>
>> +       if (processor->local_apic_id == -1) {
>> +               ++disabled_cpus;
>> +               return -EINVAL;
>> +       }
>
> that is the reason why i want to extend acpi_register_lapic()
> to take extra disabled_id (one is 0xff and another is 0xffffffff)
> so could save some lines.
>

Yes, I understood.
But I think adding an extra disabled_id is not a good way for
validating the apic_id. If the disabled_id is not just one id(-1 or
255), may be two or more, even be a range. what should we do for
extending our code?

Firstly, I am not sure that the "-1" could appear in the MADT, even if
the ACPI tables is unreasonable.

Seondly, I guess if we need the check, there are some reserved methods
in the kernel, such as "default_apic_id_valid", "x2apic_apic_id_valid"
and so on. we should extend all of them and use them for check.


CC'ed: Rafael and Lv

May I ask a question?

Is it possible that the "-1/oxffffffff" could appear in the MADT which 
is one of the ACPI tables?


> Thanks
>
> Yinghai
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-06 21:20             ` Yinghai Lu
  2016-10-07  4:35               ` Dou Liyang
@ 2016-10-07 11:02               ` Thomas Gleixner
  2016-10-07 11:04               ` Thomas Gleixner
  2 siblings, 0 replies; 33+ messages in thread
From: Thomas Gleixner @ 2016-10-07 11:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Dou Liyang, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck

On Thu, 6 Oct 2016, Yinghai Lu wrote:
> On Thu, Oct 6, 2016 at 1:06 AM, Dou Liyang <douly.fnst@cn.fujitsu.com> wrote:
> Yes, that should work. but should do the same thing for x2apic
> 
> in acpi_parse_x2apic should have
> 
> > +       if (processor->local_apic_id == -1) {
> > +               ++disabled_cpus;
> > +               return -EINVAL;
> > +       }
> 
> that is the reason why i want to extend acpi_register_lapic()
> to take extra disabled_id (one is 0xff and another is 0xffffffff)
> so could save some lines.

That just makes the code harder to follow and is not really worth the
trouble.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-06 21:20             ` Yinghai Lu
  2016-10-07  4:35               ` Dou Liyang
  2016-10-07 11:02               ` Thomas Gleixner
@ 2016-10-07 11:04               ` Thomas Gleixner
  2 siblings, 0 replies; 33+ messages in thread
From: Thomas Gleixner @ 2016-10-07 11:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Dou Liyang, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck

On Thu, 6 Oct 2016, Yinghai Lu wrote:
> Yes, that should work. but should do the same thing for x2apic
> 
> in acpi_parse_x2apic should have
> 
> > +       if (processor->local_apic_id == -1) {
> > +               ++disabled_cpus;
> > +               return -EINVAL;
> > +       }

So that means, that x2apic_apic_id_valid() is wrong as well ....

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-07  4:35               ` Dou Liyang
@ 2016-10-07 12:50                 ` Thomas Gleixner
  2016-10-07 13:00                   ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2016-10-07 12:50 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck, Rafael J. Wysocki,
	lenb, Zheng, Lv, robert.moore, linux-acpi

On Fri, 7 Oct 2016, Dou Liyang wrote:
> Is it possible that the "-1/oxffffffff" could appear in the MADT which is one
> of the ACPI tables?

According to the SDM the x2apic id is a 32bit ID, so 0xffffffff is a
legitimate value.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-07 12:50                 ` Thomas Gleixner
@ 2016-10-07 13:00                   ` Thomas Gleixner
  2016-10-07 18:55                     ` Yinghai Lu
  2016-10-08  5:22                     ` Dou Liyang
  0 siblings, 2 replies; 33+ messages in thread
From: Thomas Gleixner @ 2016-10-07 13:00 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck, Rafael J. Wysocki,
	lenb, Zheng, Lv, robert.moore, linux-acpi

On Fri, 7 Oct 2016, Thomas Gleixner wrote:
> On Fri, 7 Oct 2016, Dou Liyang wrote:
> > Is it possible that the "-1/oxffffffff" could appear in the MADT which is one
> > of the ACPI tables?
> 
> According to the SDM the x2apic id is a 32bit ID, so 0xffffffff is a
> legitimate value.

The ACPI spec says that bit 0 of the x2apic flags field tells whether the
logical processor is present or not. So the proper check for x2apic is that
flag.

The lapic structure has the same flag, but the kernel ignores the flags for
both lapic and x2apic.

I'm going to apply the minimal fix of checking for id == 0xff in
acpi_lapic_parse() for now, but this needs to be revisited and fixed
proper.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-06  8:06           ` Dou Liyang
  2016-10-06 21:20             ` Yinghai Lu
@ 2016-10-07 13:07             ` Thomas Gleixner
  2016-10-08  4:14               ` Dou Liyang
  1 sibling, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2016-10-07 13:07 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck

On Thu, 6 Oct 2016, Dou Liyang wrote:
> 
> +       if (processor->id >= 255) {
> +               ++disabled_cpus;

Incrementing disabled_cpus here is simply wrong because 0xff is an invalid
local APIC id. So we can simply return -EINVAL and be done with it.

> +               return -EINVAL;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-07 13:00                   ` Thomas Gleixner
@ 2016-10-07 18:55                     ` Yinghai Lu
  2016-10-08  5:22                     ` Dou Liyang
  1 sibling, 0 replies; 33+ messages in thread
From: Yinghai Lu @ 2016-10-07 18:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dou Liyang, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck, Rafael J. Wysocki,
	lenb, Zheng, Lv, Bob Moore, ACPI Devel Maling List

On Fri, Oct 7, 2016 at 6:00 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 7 Oct 2016, Thomas Gleixner wrote:
>> On Fri, 7 Oct 2016, Dou Liyang wrote:
>> > Is it possible that the "-1/oxffffffff" could appear in the MADT which is one
>> > of the ACPI tables?
>>
>> According to the SDM the x2apic id is a 32bit ID, so 0xffffffff is a
>> legitimate value.
>
> The ACPI spec says that bit 0 of the x2apic flags field tells whether the
> logical processor is present or not. So the proper check for x2apic is that
> flag.
>
> The lapic structure has the same flag, but the kernel ignores the flags for
> both lapic and x2apic.
>
> I'm going to apply the minimal fix of checking for id == 0xff in
> acpi_lapic_parse() for now, but this needs to be revisited and fixed
> proper.

Good to me. Thanks for fixing it.

Yinghai

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-07 13:07             ` Thomas Gleixner
@ 2016-10-08  4:14               ` Dou Liyang
  0 siblings, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-10-08  4:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck

Hi tglx,

At 10/07/2016 09:07 PM, Thomas Gleixner wrote:
> On Thu, 6 Oct 2016, Dou Liyang wrote:
>>
>> +       if (processor->id >= 255) {
>> +               ++disabled_cpus;
>
> Incrementing disabled_cpus here is simply wrong because 0xff is an invalid
> local APIC id. So we can simply return -EINVAL and be done with it.
>

Yes, It is.

>> +               return -EINVAL;
>
> Thanks,
>
> 	tglx
>
>

Thanks,

	Dou

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
  2016-10-07 13:00                   ` Thomas Gleixner
  2016-10-07 18:55                     ` Yinghai Lu
@ 2016-10-08  5:22                     ` Dou Liyang
  1 sibling, 0 replies; 33+ messages in thread
From: Dou Liyang @ 2016-10-08  5:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Yinghai Lu, Linux Kernel Mailing List, Ingo Molnar,
	H. Peter Anvin, linux-tip-commits, Tony Luck, Rafael J. Wysocki,
	lenb, Zheng, Lv, robert.moore, linux-acpi

Hi tglx,

At 10/07/2016 09:00 PM, Thomas Gleixner wrote:
> On Fri, 7 Oct 2016, Thomas Gleixner wrote:
>> On Fri, 7 Oct 2016, Dou Liyang wrote:
>>> Is it possible that the "-1/oxffffffff" could appear in the MADT which is one
>>> of the ACPI tables?
>>
>> According to the SDM the x2apic id is a 32bit ID, so 0xffffffff is a
>> legitimate value.

Yes, I see.

>
> The ACPI spec says that bit 0 of the x2apic flags field tells whether the
> logical processor is present or not. So the proper check for x2apic is that
> flag.
>
> The lapic structure has the same flag, but the kernel ignores the flags for
> both lapic and x2apic.

It seems the kernel uses the flags in this sentence:

	enabled = processor->lapic_flags & ACPI_MADT_ENABLED;


>
> I'm going to apply the minimal fix of checking for id == 0xff in
> acpi_lapic_parse() for now, but this needs to be revisited and fixed
> proper.

Yes, I will do it.


Thanks

	Dou.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-10-08  5:23 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25  8:35 [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
2016-08-25  8:35 ` [PATCH v12 1/7] x86, memhp, numa: Online memory-less nodes at boot time Dou Liyang
2016-09-22 19:09   ` [tip:x86/apic] x86/numa: " tip-bot for Tang Chen
2016-08-25  8:35 ` [PATCH v12 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus " Dou Liyang
2016-08-25  8:57   ` Dou Liyang
2016-09-22 19:10   ` [tip:x86/apic] x86/acpi: " tip-bot for Gu Zheng
2016-08-25  8:35 ` [PATCH v12 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping Dou Liyang
2016-09-22 19:10   ` [tip:x86/apic] x86/acpi: Introduce persistent storage for " tip-bot for Gu Zheng
2016-10-04  6:02     ` Yinghai Lu
2016-10-05 14:04       ` Thomas Gleixner
2016-10-06  4:53         ` Yinghai Lu
2016-10-06  8:06           ` Dou Liyang
2016-10-06 21:20             ` Yinghai Lu
2016-10-07  4:35               ` Dou Liyang
2016-10-07 12:50                 ` Thomas Gleixner
2016-10-07 13:00                   ` Thomas Gleixner
2016-10-07 18:55                     ` Yinghai Lu
2016-10-08  5:22                     ` Dou Liyang
2016-10-07 11:02               ` Thomas Gleixner
2016-10-07 11:04               ` Thomas Gleixner
2016-10-07 13:07             ` Thomas Gleixner
2016-10-08  4:14               ` Dou Liyang
2016-08-25  8:35 ` [PATCH v12 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Dou Liyang
2016-09-22 19:11   ` [tip:x86/apic] x86/acpi: Enable MADT APIs to return disabled apicids tip-bot for Gu Zheng
2016-08-25  8:35 ` [PATCH v12 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Dou Liyang
2016-09-22 19:11   ` [tip:x86/apic] x86/acpi: " tip-bot for Gu Zheng
2016-08-25  8:35 ` [PATCH v12 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables Dou Liyang
2016-09-22 19:12   ` [tip:x86/apic] acpi: Provide " tip-bot for Dou Liyang
2016-08-25  8:35 ` [PATCH v12 7/7] acpi: Provide the interface to validate the proc_id Dou Liyang
2016-09-22 19:12   ` [tip:x86/apic] acpi: Validate processor id when mapping the processor tip-bot for Dou Liyang
2016-08-25  9:08 ` [PATCH v12 0/7] Make cpuid <-> nodeid mapping persistent Dou Liyang
2016-09-02  6:57 ` Dou Liyang
2016-09-13 11:33   ` Dou Liyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).