All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Parse ACPI/PPTT for cache information
@ 2017-08-05  0:11 ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	will.deacon, catalin.marinas

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topologies. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology only. Its quite
trivial to add processor/cluster/???/socket level parsing as well,
but that information isn't as useful as the already provided NUMA
SRAT/SLIT information which provides relative distances. The one
useful thing, is the number of physical sockets but due to the
way arm64 considers "clusters" to be sockets, a larger discussion
is required here.

An example of lstopo with this patch:

[root@mammon-juno-rh ~]# lstopo-no-graphics
Machine (8072MB)
  Package L#0 + L2 L#0 (1024KB)
    L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
    L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
    L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
  Package L#1 + L2 L#1 (2048KB)
    L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
    L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 11ab:4380
            Net L#1 "enp8s0"

Jeremy Linton (4):
  drivers: base: cacheinfo: Add support for ACPI based firmware tables
  arm64: cacheinfo: Add support for ACPI/PPTT generated topology
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64

 arch/arm64/kernel/cacheinfo.c |  23 ++-
 drivers/acpi/arm64/Kconfig    |   3 +
 drivers/acpi/arm64/Makefile   |   1 +
 drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++
 drivers/base/cacheinfo.c      |  15 +-
 include/linux/cacheinfo.h     |   1 +
 6 files changed, 422 insertions(+), 10 deletions(-)
 create mode 100644 drivers/acpi/arm64/pptt.c

-- 
2.9.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 0/4] Parse ACPI/PPTT for cache information
@ 2017-08-05  0:11 ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topologies. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology only. Its quite
trivial to add processor/cluster/???/socket level parsing as well,
but that information isn't as useful as the already provided NUMA
SRAT/SLIT information which provides relative distances. The one
useful thing, is the number of physical sockets but due to the
way arm64 considers "clusters" to be sockets, a larger discussion
is required here.

An example of lstopo with this patch:

[root at mammon-juno-rh ~]# lstopo-no-graphics
Machine (8072MB)
  Package L#0 + L2 L#0 (1024KB)
    L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
    L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
    L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
  Package L#1 + L2 L#1 (2048KB)
    L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
    L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 11ab:4380
            Net L#1 "enp8s0"

Jeremy Linton (4):
  drivers: base: cacheinfo: Add support for ACPI based firmware tables
  arm64: cacheinfo: Add support for ACPI/PPTT generated topology
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64

 arch/arm64/kernel/cacheinfo.c |  23 ++-
 drivers/acpi/arm64/Kconfig    |   3 +
 drivers/acpi/arm64/Makefile   |   1 +
 drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++
 drivers/base/cacheinfo.c      |  15 +-
 include/linux/cacheinfo.h     |   1 +
 6 files changed, 422 insertions(+), 10 deletions(-)
 create mode 100644 drivers/acpi/arm64/pptt.c

-- 
2.9.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 1/4] drivers: base: cacheinfo: Add support for ACPI based firmware tables
  2017-08-05  0:11 ` Jeremy Linton
@ 2017-08-05  0:11   ` Jeremy Linton
  -1 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	will.deacon, catalin.marinas

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/base/cacheinfo.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af27..44fa374 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 1/4] drivers: base: cacheinfo: Add support for ACPI based firmware tables
@ 2017-08-05  0:11   ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/base/cacheinfo.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af27..44fa374 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 2/4] arm64: cacheinfo: Add support for ACPI/PPTT generated topology
  2017-08-05  0:11 ` Jeremy Linton
@ 2017-08-05  0:11   ` Jeremy Linton
  -1 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	will.deacon, catalin.marinas

The ACPI specification now includes a tree based description
of the cache hierarchy. On arm64 the first step is assuring
that we allocate sufficient levels to contain all the
individual cache descriptions beyond what is described
by the individual cores. Lets initially just stub that
out with a routine which indicates that there aren't further
levels beyond what is reported by the cores.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 include/linux/cacheinfo.h     |  1 +
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2..2e2cf0d 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf..e9233c7 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -98,6 +98,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 2/4] arm64: cacheinfo: Add support for ACPI/PPTT generated topology
@ 2017-08-05  0:11   ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel

The ACPI specification now includes a tree based description
of the cache hierarchy. On arm64 the first step is assuring
that we allocate sufficient levels to contain all the
individual cache descriptions beyond what is described
by the individual cores. Lets initially just stub that
out with a routine which indicates that there aren't further
levels beyond what is reported by the cores.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 include/linux/cacheinfo.h     |  1 +
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2..2e2cf0d 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf..e9233c7 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -98,6 +98,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 3/4] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-08-05  0:11 ` Jeremy Linton
@ 2017-08-05  0:11   ` Jeremy Linton
  -1 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	will.deacon, catalin.marinas

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 389 insertions(+)
 create mode 100644 drivers/acpi/arm64/pptt.c

diff --git a/drivers/acpi/arm64/pptt.c b/drivers/acpi/arm64/pptt.c
new file mode 100644
index 0000000..e1ab77d
--- /dev/null
+++ b/drivers/acpi/arm64/pptt.c
@@ -0,0 +1,389 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages. Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+	return NULL;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_err("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache)
+{
+	this_leaf->of_node = (struct device_node *)found_cache;
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level);
+		pr_debug("found = %p\n", found_cache);
+		if (found_cache)
+			update_cache_properties(this_leaf, found_cache);
+
+		index++;
+	}
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return 0;
+}
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 3/4] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-08-05  0:11   ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 389 insertions(+)
 create mode 100644 drivers/acpi/arm64/pptt.c

diff --git a/drivers/acpi/arm64/pptt.c b/drivers/acpi/arm64/pptt.c
new file mode 100644
index 0000000..e1ab77d
--- /dev/null
+++ b/drivers/acpi/arm64/pptt.c
@@ -0,0 +1,389 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages. Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+	return NULL;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_err("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache)
+{
+	this_leaf->of_node = (struct device_node *)found_cache;
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level);
+		pr_debug("found = %p\n", found_cache);
+		if (found_cache)
+			update_cache_properties(this_leaf, found_cache);
+
+		index++;
+	}
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return 0;
+}
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 4/4] ACPI: Enable PPTT support on ARM64
  2017-08-05  0:11 ` Jeremy Linton
@ 2017-08-05  0:11   ` Jeremy Linton
  -1 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	will.deacon, catalin.marinas

Now that we have a parser, and arm64 has appropriate
hooks in place to utilize it, lets enable PPTT
parsing in the arm64 ACPI build.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/arm64/Kconfig  | 3 +++
 drivers/acpi/arm64/Makefile | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80f..74b855a 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 1017def..b6dee6b 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
+obj-$(CONFIG_ACPI_GTDT) 	+= pptt.o
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC 4/4] ACPI: Enable PPTT support on ARM64
@ 2017-08-05  0:11   ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-05  0:11 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we have a parser, and arm64 has appropriate
hooks in place to utilize it, lets enable PPTT
parsing in the arm64 ACPI build.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/arm64/Kconfig  | 3 +++
 drivers/acpi/arm64/Makefile | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80f..74b855a 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 1017def..b6dee6b 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
+obj-$(CONFIG_ACPI_GTDT) 	+= pptt.o
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC 0/4] Parse ACPI/PPTT for cache information
  2017-08-05  0:11 ` Jeremy Linton
@ 2017-08-07 10:20   ` Hanjun Guo
  -1 siblings, 0 replies; 26+ messages in thread
From: Hanjun Guo @ 2017-08-07 10:20 UTC (permalink / raw)
  To: Jeremy Linton, linux-arm-kernel
  Cc: linux-acpi, sudeep.holla, lorenzo.pieralisi, will.deacon,
	catalin.marinas, wangxiongfeng (C),
	linuxarm

+Cc Xiongfeng (who is also working on the PPTT but focusing on
CPU topology)

Hi Jeremy,

On 2017/8/5 8:11, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topologies. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
> 
> This patch parses the table for the cache topology only. Its quite
> trivial to add processor/cluster/???/socket level parsing as well,
> but that information isn't as useful as the already provided NUMA
> SRAT/SLIT information which provides relative distances. The one
> useful thing, is the number of physical sockets but due to the
> way arm64 considers "clusters" to be sockets, a larger discussion
> is required here.

I think we need the socket to represent the true topology of
the SoC, which means that considering clusters to be sockets is
wrong on ARM64 server platforms, a "socket" needs to be a memory
controller attached I think.

Take D05 for example, there are two physical SoC sockets on
the board but with two CPU DIE (with memory controller) on each
physical socket, and 4 clusters on each CPU DIE.

When considering clusters as sockets (that's the code for now),
there are 16 "sockets" to represent to OS for schedule input,
but only 4 NUMA nodes, which are confusing the scheduler a lot...

Xiongfeng was working on the CPU topology based on PPTT, and the code
is under internal review, if it's OK for you, we can send them out
for review comments to see if we can join our effort together, or
we can work on top of your patches, as you like :)

> 
> An example of lstopo with this patch:
> 
> [root@mammon-juno-rh ~]# lstopo-no-graphics
> Machine (8072MB)
>    Package L#0 + L2 L#0 (1024KB)
>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>    Package L#1 + L2 L#1 (2048KB)
>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>    HostBridge L#0
>      PCIBridge
>        PCIBridge
>          PCIBridge
>            PCI 1095:3132
>              Block(Disk) L#0 "sda"
>          PCIBridge
>            PCI 11ab:4380
>              Net L#1 "enp8s0"
> 
> Jeremy Linton (4):
>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>    ACPI: Enable PPTT support on ARM64
> 
>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>   drivers/acpi/arm64/Kconfig    |   3 +
>   drivers/acpi/arm64/Makefile   |   1 +
>   drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++

I think PPTT is not ARM64 only, can be used for x86 too,
shall we locate them on drivers/acpi?

Rafael was working a lot on the PPTT proposal for the
spec, I think he can comment on this :)

Rafael, what do you think?

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 0/4] Parse ACPI/PPTT for cache information
@ 2017-08-07 10:20   ` Hanjun Guo
  0 siblings, 0 replies; 26+ messages in thread
From: Hanjun Guo @ 2017-08-07 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

+Cc Xiongfeng (who is also working on the PPTT but focusing on
CPU topology)

Hi Jeremy,

On 2017/8/5 8:11, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topologies. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
> 
> This patch parses the table for the cache topology only. Its quite
> trivial to add processor/cluster/???/socket level parsing as well,
> but that information isn't as useful as the already provided NUMA
> SRAT/SLIT information which provides relative distances. The one
> useful thing, is the number of physical sockets but due to the
> way arm64 considers "clusters" to be sockets, a larger discussion
> is required here.

I think we need the socket to represent the true topology of
the SoC, which means that considering clusters to be sockets is
wrong on ARM64 server platforms, a "socket" needs to be a memory
controller attached I think.

Take D05 for example, there are two physical SoC sockets on
the board but with two CPU DIE (with memory controller) on each
physical socket, and 4 clusters on each CPU DIE.

When considering clusters as sockets (that's the code for now),
there are 16 "sockets" to represent to OS for schedule input,
but only 4 NUMA nodes, which are confusing the scheduler a lot...

Xiongfeng was working on the CPU topology based on PPTT, and the code
is under internal review, if it's OK for you, we can send them out
for review comments to see if we can join our effort together, or
we can work on top of your patches, as you like :)

> 
> An example of lstopo with this patch:
> 
> [root at mammon-juno-rh ~]# lstopo-no-graphics
> Machine (8072MB)
>    Package L#0 + L2 L#0 (1024KB)
>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>    Package L#1 + L2 L#1 (2048KB)
>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>    HostBridge L#0
>      PCIBridge
>        PCIBridge
>          PCIBridge
>            PCI 1095:3132
>              Block(Disk) L#0 "sda"
>          PCIBridge
>            PCI 11ab:4380
>              Net L#1 "enp8s0"
> 
> Jeremy Linton (4):
>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>    ACPI: Enable PPTT support on ARM64
> 
>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>   drivers/acpi/arm64/Kconfig    |   3 +
>   drivers/acpi/arm64/Makefile   |   1 +
>   drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++

I think PPTT is not ARM64 only, can be used for x86 too,
shall we locate them on drivers/acpi?

Rafael was working a lot on the PPTT proposal for the
spec, I think he can comment on this :)

Rafael, what do you think?

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 0/4] Parse ACPI/PPTT for cache information
  2017-08-07 10:20   ` Hanjun Guo
@ 2017-08-07 17:10     ` Jeffrey Hugo
  -1 siblings, 0 replies; 26+ messages in thread
From: Jeffrey Hugo @ 2017-08-07 17:10 UTC (permalink / raw)
  To: Hanjun Guo, Jeremy Linton, linux-arm-kernel
  Cc: lorenzo.pieralisi, catalin.marinas, will.deacon, linuxarm,
	linux-acpi, sudeep.holla, wangxiongfeng (C),
	Austin Christ

On 8/7/2017 4:20 AM, Hanjun Guo wrote:
> +Cc Xiongfeng (who is also working on the PPTT but focusing on
> CPU topology)
> 
> Hi Jeremy,
> 
> On 2017/8/5 8:11, Jeremy Linton wrote:
>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>> used to describe the processor and cache topologies. Ideally it is
>> used to extend/override information provided by the hardware, but
>> right now ARM64 is entirely dependent on firmware provided tables.
>>
>> This patch parses the table for the cache topology only. Its quite
>> trivial to add processor/cluster/???/socket level parsing as well,
>> but that information isn't as useful as the already provided NUMA
>> SRAT/SLIT information which provides relative distances. The one
>> useful thing, is the number of physical sockets but due to the
>> way arm64 considers "clusters" to be sockets, a larger discussion
>> is required here.
> 
> I think we need the socket to represent the true topology of
> the SoC, which means that considering clusters to be sockets is
> wrong on ARM64 server platforms, a "socket" needs to be a memory
> controller attached I think.
> 
> Take D05 for example, there are two physical SoC sockets on
> the board but with two CPU DIE (with memory controller) on each
> physical socket, and 4 clusters on each CPU DIE.
> 
> When considering clusters as sockets (that's the code for now),
> there are 16 "sockets" to represent to OS for schedule input,
> but only 4 NUMA nodes, which are confusing the scheduler a lot...
> 
> Xiongfeng was working on the CPU topology based on PPTT, and the code
> is under internal review, if it's OK for you, we can send them out
> for review comments to see if we can join our effort together, or
> we can work on top of your patches, as you like :)
> 
>>
>> An example of lstopo with this patch:
>>
>> [root@mammon-juno-rh ~]# lstopo-no-graphics
>> Machine (8072MB)
>>    Package L#0 + L2 L#0 (1024KB)
>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>    Package L#1 + L2 L#1 (2048KB)
>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>>    HostBridge L#0
>>      PCIBridge
>>        PCIBridge
>>          PCIBridge
>>            PCI 1095:3132
>>              Block(Disk) L#0 "sda"
>>          PCIBridge
>>            PCI 11ab:4380
>>              Net L#1 "enp8s0"
>>
>> Jeremy Linton (4):
>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>>    ACPI: Enable PPTT support on ARM64
>>
>>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>>   drivers/acpi/arm64/Kconfig    |   3 +
>>   drivers/acpi/arm64/Makefile   |   1 +
>>   drivers/acpi/arm64/pptt.c     | 389 
>> ++++++++++++++++++++++++++++++++++++++++++
> 
> I think PPTT is not ARM64 only, can be used for x86 too,
> shall we locate them on drivers/acpi?

Austin and I have been working on the CPU topology.  Sounds like we have 
some overlap with your work.  We have a working prototype at 
https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but 
are still doing some cleanup and fixes for our needs.

Drivers/acpi makes sense to us, and was our working assumption.

> 
> Rafael was working a lot on the PPTT proposal for the
> spec, I think he can comment on this :)
> 
> Rafael, what do you think?
> 
> Thanks
> Hanjun
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 0/4] Parse ACPI/PPTT for cache information
@ 2017-08-07 17:10     ` Jeffrey Hugo
  0 siblings, 0 replies; 26+ messages in thread
From: Jeffrey Hugo @ 2017-08-07 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 8/7/2017 4:20 AM, Hanjun Guo wrote:
> +Cc Xiongfeng (who is also working on the PPTT but focusing on
> CPU topology)
> 
> Hi Jeremy,
> 
> On 2017/8/5 8:11, Jeremy Linton wrote:
>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>> used to describe the processor and cache topologies. Ideally it is
>> used to extend/override information provided by the hardware, but
>> right now ARM64 is entirely dependent on firmware provided tables.
>>
>> This patch parses the table for the cache topology only. Its quite
>> trivial to add processor/cluster/???/socket level parsing as well,
>> but that information isn't as useful as the already provided NUMA
>> SRAT/SLIT information which provides relative distances. The one
>> useful thing, is the number of physical sockets but due to the
>> way arm64 considers "clusters" to be sockets, a larger discussion
>> is required here.
> 
> I think we need the socket to represent the true topology of
> the SoC, which means that considering clusters to be sockets is
> wrong on ARM64 server platforms, a "socket" needs to be a memory
> controller attached I think.
> 
> Take D05 for example, there are two physical SoC sockets on
> the board but with two CPU DIE (with memory controller) on each
> physical socket, and 4 clusters on each CPU DIE.
> 
> When considering clusters as sockets (that's the code for now),
> there are 16 "sockets" to represent to OS for schedule input,
> but only 4 NUMA nodes, which are confusing the scheduler a lot...
> 
> Xiongfeng was working on the CPU topology based on PPTT, and the code
> is under internal review, if it's OK for you, we can send them out
> for review comments to see if we can join our effort together, or
> we can work on top of your patches, as you like :)
> 
>>
>> An example of lstopo with this patch:
>>
>> [root at mammon-juno-rh ~]# lstopo-no-graphics
>> Machine (8072MB)
>>    Package L#0 + L2 L#0 (1024KB)
>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>    Package L#1 + L2 L#1 (2048KB)
>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>>    HostBridge L#0
>>      PCIBridge
>>        PCIBridge
>>          PCIBridge
>>            PCI 1095:3132
>>              Block(Disk) L#0 "sda"
>>          PCIBridge
>>            PCI 11ab:4380
>>              Net L#1 "enp8s0"
>>
>> Jeremy Linton (4):
>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>>    ACPI: Enable PPTT support on ARM64
>>
>>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>>   drivers/acpi/arm64/Kconfig    |   3 +
>>   drivers/acpi/arm64/Makefile   |   1 +
>>   drivers/acpi/arm64/pptt.c     | 389 
>> ++++++++++++++++++++++++++++++++++++++++++
> 
> I think PPTT is not ARM64 only, can be used for x86 too,
> shall we locate them on drivers/acpi?

Austin and I have been working on the CPU topology.  Sounds like we have 
some overlap with your work.  We have a working prototype at 
https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but 
are still doing some cleanup and fixes for our needs.

Drivers/acpi makes sense to us, and was our working assumption.

> 
> Rafael was working a lot on the PPTT proposal for the
> spec, I think he can comment on this :)
> 
> Rafael, what do you think?
> 
> Thanks
> Hanjun
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 0/4] Parse ACPI/PPTT for cache information
  2017-08-07 10:20   ` Hanjun Guo
@ 2017-08-07 17:33     ` Jeremy Linton
  -1 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-07 17:33 UTC (permalink / raw)
  To: Hanjun Guo, linux-arm-kernel
  Cc: linux-acpi, sudeep.holla, lorenzo.pieralisi, will.deacon,
	catalin.marinas, wangxiongfeng (C),
	linuxarm


Hi,

On 08/07/2017 05:20 AM, Hanjun Guo wrote:
> +Cc Xiongfeng (who is also working on the PPTT but focusing on
> CPU topology)
> 
> Hi Jeremy,
> 
> On 2017/8/5 8:11, Jeremy Linton wrote:
>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>> used to describe the processor and cache topologies. Ideally it is
>> used to extend/override information provided by the hardware, but
>> right now ARM64 is entirely dependent on firmware provided tables.
>>
>> This patch parses the table for the cache topology only. Its quite
>> trivial to add processor/cluster/???/socket level parsing as well,
>> but that information isn't as useful as the already provided NUMA
>> SRAT/SLIT information which provides relative distances. The one
>> useful thing, is the number of physical sockets but due to the
>> way arm64 considers "clusters" to be sockets, a larger discussion
>> is required here.
> 
> I think we need the socket to represent the true topology of
> the SoC, which means that considering clusters to be sockets is
> wrong on ARM64 server platforms, a "socket" needs to be a memory
> controller attached I think.

If I understand correctly, your suggesting that the socket isn't really 
the physical socket, but a grouping at the memory controller level?

My general take was that thread/core/socket is now insufficient as even 
x86 designs now have more levels of hierarchy than that. Mapping those 
layers, and the numa weighting into something meaningful for linux, well 
that was more than I wanted to start with. Particularly since the PPTT 
spec is silent about memory controller attachments at particular node 
levels, as well as a few other possible short comings.



> 
> Take D05 for example, there are two physical SoC sockets on
> the board but with two CPU DIE (with memory controller) on each
> physical socket, and 4 clusters on each CPU DIE.
> 
> When considering clusters as sockets (that's the code for now),
> there are 16 "sockets" to represent to OS for schedule input,
> but only 4 NUMA nodes, which are confusing the scheduler a lot...
> 
> Xiongfeng was working on the CPU topology based on PPTT, and the code
> is under internal review, if it's OK for you, we can send them out
> for review comments to see if we can join our effort together, or
> we can work on top of your patches, as you like :)
> 
>>
>> An example of lstopo with this patch:
>>
>> [root@mammon-juno-rh ~]# lstopo-no-graphics
>> Machine (8072MB)
>>    Package L#0 + L2 L#0 (1024KB)
>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>    Package L#1 + L2 L#1 (2048KB)
>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>>    HostBridge L#0
>>      PCIBridge
>>        PCIBridge
>>          PCIBridge
>>            PCI 1095:3132
>>              Block(Disk) L#0 "sda"
>>          PCIBridge
>>            PCI 11ab:4380
>>              Net L#1 "enp8s0"
>>
>> Jeremy Linton (4):
>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>>    ACPI: Enable PPTT support on ARM64
>>
>>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>>   drivers/acpi/arm64/Kconfig    |   3 +
>>   drivers/acpi/arm64/Makefile   |   1 +
>>   drivers/acpi/arm64/pptt.c     | 389 
>> ++++++++++++++++++++++++++++++++++++++++++
> 
> I think PPTT is not ARM64 only, can be used for x86 too,
> shall we locate them on drivers/acpi?

Sure.. But, I was using the assumption that the table would only really 
be useful on arm64. On x86 the table is unnecessary and generally would 
have to be dynamically generated by the firmware (as it should be on 
arm) anyway. Put another way, does anyone want to use it on another 
platform?


> 
> Rafael was working a lot on the PPTT proposal for the
> spec, I think he can comment on this :) >
> Rafael, what do you think?
> 
> Thanks
> Hanjun


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 0/4] Parse ACPI/PPTT for cache information
@ 2017-08-07 17:33     ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-08-07 17:33 UTC (permalink / raw)
  To: linux-arm-kernel


Hi,

On 08/07/2017 05:20 AM, Hanjun Guo wrote:
> +Cc Xiongfeng (who is also working on the PPTT but focusing on
> CPU topology)
> 
> Hi Jeremy,
> 
> On 2017/8/5 8:11, Jeremy Linton wrote:
>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>> used to describe the processor and cache topologies. Ideally it is
>> used to extend/override information provided by the hardware, but
>> right now ARM64 is entirely dependent on firmware provided tables.
>>
>> This patch parses the table for the cache topology only. Its quite
>> trivial to add processor/cluster/???/socket level parsing as well,
>> but that information isn't as useful as the already provided NUMA
>> SRAT/SLIT information which provides relative distances. The one
>> useful thing, is the number of physical sockets but due to the
>> way arm64 considers "clusters" to be sockets, a larger discussion
>> is required here.
> 
> I think we need the socket to represent the true topology of
> the SoC, which means that considering clusters to be sockets is
> wrong on ARM64 server platforms, a "socket" needs to be a memory
> controller attached I think.

If I understand correctly, your suggesting that the socket isn't really 
the physical socket, but a grouping at the memory controller level?

My general take was that thread/core/socket is now insufficient as even 
x86 designs now have more levels of hierarchy than that. Mapping those 
layers, and the numa weighting into something meaningful for linux, well 
that was more than I wanted to start with. Particularly since the PPTT 
spec is silent about memory controller attachments at particular node 
levels, as well as a few other possible short comings.



> 
> Take D05 for example, there are two physical SoC sockets on
> the board but with two CPU DIE (with memory controller) on each
> physical socket, and 4 clusters on each CPU DIE.
> 
> When considering clusters as sockets (that's the code for now),
> there are 16 "sockets" to represent to OS for schedule input,
> but only 4 NUMA nodes, which are confusing the scheduler a lot...
> 
> Xiongfeng was working on the CPU topology based on PPTT, and the code
> is under internal review, if it's OK for you, we can send them out
> for review comments to see if we can join our effort together, or
> we can work on top of your patches, as you like :)
> 
>>
>> An example of lstopo with this patch:
>>
>> [root at mammon-juno-rh ~]# lstopo-no-graphics
>> Machine (8072MB)
>>    Package L#0 + L2 L#0 (1024KB)
>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>    Package L#1 + L2 L#1 (2048KB)
>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>>    HostBridge L#0
>>      PCIBridge
>>        PCIBridge
>>          PCIBridge
>>            PCI 1095:3132
>>              Block(Disk) L#0 "sda"
>>          PCIBridge
>>            PCI 11ab:4380
>>              Net L#1 "enp8s0"
>>
>> Jeremy Linton (4):
>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>>    ACPI: Enable PPTT support on ARM64
>>
>>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>>   drivers/acpi/arm64/Kconfig    |   3 +
>>   drivers/acpi/arm64/Makefile   |   1 +
>>   drivers/acpi/arm64/pptt.c     | 389 
>> ++++++++++++++++++++++++++++++++++++++++++
> 
> I think PPTT is not ARM64 only, can be used for x86 too,
> shall we locate them on drivers/acpi?

Sure.. But, I was using the assumption that the table would only really 
be useful on arm64. On x86 the table is unnecessary and generally would 
have to be dynamically generated by the firmware (as it should be on 
arm) anyway. Put another way, does anyone want to use it on another 
platform?


> 
> Rafael was working a lot on the PPTT proposal for the
> spec, I think he can comment on this :) >
> Rafael, what do you think?
> 
> Thanks
> Hanjun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 0/4] Parse ACPI/PPTT for cache information
  2017-08-07 17:10     ` Jeffrey Hugo
@ 2017-08-08  4:19       ` Xiongfeng Wang
  -1 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-08-08  4:19 UTC (permalink / raw)
  To: Jeffrey Hugo, hanjun.guo, Jeremy Linton, linux-arm-kernel
  Cc: lorenzo.pieralisi, austinwc, catalin.marinas, john.garry,
	will.deacon, linuxarm, linux-acpi, sudeep.holla

+Cc John Gary (who is working on the PPTT with me)

Hi Jeffrey,

On 2017/8/8 1:10, Jeffrey Hugo wrote:
> On 8/7/2017 4:20 AM, Hanjun Guo wrote:
>> +Cc Xiongfeng (who is also working on the PPTT but focusing on
>> CPU topology)
>>
>> Hi Jeremy,
>>
>> On 2017/8/5 8:11, Jeremy Linton wrote:
>>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>>> used to describe the processor and cache topologies. Ideally it is
>>> used to extend/override information provided by the hardware, but
>>> right now ARM64 is entirely dependent on firmware provided tables.
>>>
>>> This patch parses the table for the cache topology only. Its quite
>>> trivial to add processor/cluster/???/socket level parsing as well,
>>> but that information isn't as useful as the already provided NUMA
>>> SRAT/SLIT information which provides relative distances. The one
>>> useful thing, is the number of physical sockets but due to the
>>> way arm64 considers "clusters" to be sockets, a larger discussion
>>> is required here.
>>
>> I think we need the socket to represent the true topology of
>> the SoC, which means that considering clusters to be sockets is
>> wrong on ARM64 server platforms, a "socket" needs to be a memory
>> controller attached I think.
>>
>> Take D05 for example, there are two physical SoC sockets on
>> the board but with two CPU DIE (with memory controller) on each
>> physical socket, and 4 clusters on each CPU DIE.
>>
>> When considering clusters as sockets (that's the code for now),
>> there are 16 "sockets" to represent to OS for schedule input,
>> but only 4 NUMA nodes, which are confusing the scheduler a lot...
>>
>> Xiongfeng was working on the CPU topology based on PPTT, and the code
>> is under internal review, if it's OK for you, we can send them out
>> for review comments to see if we can join our effort together, or
>> we can work on top of your patches, as you like :)
>>
>>>
>>> An example of lstopo with this patch:
>>>
>>> [root@mammon-juno-rh ~]# lstopo-no-graphics
>>> Machine (8072MB)
>>>    Package L#0 + L2 L#0 (1024KB)
>>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>>    Package L#1 + L2 L#1 (2048KB)
>>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>>>    HostBridge L#0
>>>      PCIBridge
>>>        PCIBridge
>>>          PCIBridge
>>>            PCI 1095:3132
>>>              Block(Disk) L#0 "sda"
>>>          PCIBridge
>>>            PCI 11ab:4380
>>>              Net L#1 "enp8s0"
>>>
>>> Jeremy Linton (4):
>>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>>>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>>>    ACPI: Enable PPTT support on ARM64
>>>
>>>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>>>   drivers/acpi/arm64/Kconfig    |   3 +
>>>   drivers/acpi/arm64/Makefile   |   1 +
>>>   drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++
>>
>> I think PPTT is not ARM64 only, can be used for x86 too,
>> shall we locate them on drivers/acpi?
> 
> Austin and I have been working on the CPU topology.  Sounds like we have some overlap with your work.  We have a working prototype at https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but are still doing some cleanup and fixes for our needs.
> 
> Drivers/acpi makes sense to us, and was our working assumption.
> 
John and I have also been working on the CPU topology. Our code can be got at https://github.com/fenghusthu/acpi_pptt .

I have tested it on the D05 board and it is OK.

Maybe we can work together, combine our codes, and make a better code structure.


>>
>> Rafael was working a lot on the PPTT proposal for the
>> spec, I think he can comment on this :)
>>
>> Rafael, what do you think?
>>
>> Thanks
>> Hanjun
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 0/4] Parse ACPI/PPTT for cache information
@ 2017-08-08  4:19       ` Xiongfeng Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-08-08  4:19 UTC (permalink / raw)
  To: linux-arm-kernel

+Cc John Gary (who is working on the PPTT with me)

Hi Jeffrey,

On 2017/8/8 1:10, Jeffrey Hugo wrote:
> On 8/7/2017 4:20 AM, Hanjun Guo wrote:
>> +Cc Xiongfeng (who is also working on the PPTT but focusing on
>> CPU topology)
>>
>> Hi Jeremy,
>>
>> On 2017/8/5 8:11, Jeremy Linton wrote:
>>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>>> used to describe the processor and cache topologies. Ideally it is
>>> used to extend/override information provided by the hardware, but
>>> right now ARM64 is entirely dependent on firmware provided tables.
>>>
>>> This patch parses the table for the cache topology only. Its quite
>>> trivial to add processor/cluster/???/socket level parsing as well,
>>> but that information isn't as useful as the already provided NUMA
>>> SRAT/SLIT information which provides relative distances. The one
>>> useful thing, is the number of physical sockets but due to the
>>> way arm64 considers "clusters" to be sockets, a larger discussion
>>> is required here.
>>
>> I think we need the socket to represent the true topology of
>> the SoC, which means that considering clusters to be sockets is
>> wrong on ARM64 server platforms, a "socket" needs to be a memory
>> controller attached I think.
>>
>> Take D05 for example, there are two physical SoC sockets on
>> the board but with two CPU DIE (with memory controller) on each
>> physical socket, and 4 clusters on each CPU DIE.
>>
>> When considering clusters as sockets (that's the code for now),
>> there are 16 "sockets" to represent to OS for schedule input,
>> but only 4 NUMA nodes, which are confusing the scheduler a lot...
>>
>> Xiongfeng was working on the CPU topology based on PPTT, and the code
>> is under internal review, if it's OK for you, we can send them out
>> for review comments to see if we can join our effort together, or
>> we can work on top of your patches, as you like :)
>>
>>>
>>> An example of lstopo with this patch:
>>>
>>> [root at mammon-juno-rh ~]# lstopo-no-graphics
>>> Machine (8072MB)
>>>    Package L#0 + L2 L#0 (1024KB)
>>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>>    Package L#1 + L2 L#1 (2048KB)
>>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>>>    HostBridge L#0
>>>      PCIBridge
>>>        PCIBridge
>>>          PCIBridge
>>>            PCI 1095:3132
>>>              Block(Disk) L#0 "sda"
>>>          PCIBridge
>>>            PCI 11ab:4380
>>>              Net L#1 "enp8s0"
>>>
>>> Jeremy Linton (4):
>>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables
>>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology
>>>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>>>    ACPI: Enable PPTT support on ARM64
>>>
>>>   arch/arm64/kernel/cacheinfo.c |  23 ++-
>>>   drivers/acpi/arm64/Kconfig    |   3 +
>>>   drivers/acpi/arm64/Makefile   |   1 +
>>>   drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++
>>
>> I think PPTT is not ARM64 only, can be used for x86 too,
>> shall we locate them on drivers/acpi?
> 
> Austin and I have been working on the CPU topology.  Sounds like we have some overlap with your work.  We have a working prototype at https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but are still doing some cleanup and fixes for our needs.
> 
> Drivers/acpi makes sense to us, and was our working assumption.
> 
John and I have also been working on the CPU topology. Our code can be got at https://github.com/fenghusthu/acpi_pptt .

I have tested it on the D05 board and it is OK.

Maybe we can work together, combine our codes, and make a better code structure.


>>
>> Rafael was working a lot on the PPTT proposal for the
>> spec, I think he can comment on this :)
>>
>> Rafael, what do you think?
>>
>> Thanks
>> Hanjun
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 0/4] Parse ACPI/PPTT for cache information
       [not found]     ` <CAGHbJ3AL6cPribriuV0G2TvQCx+Qi9URpnpSi=UVRjf_cv_vLg@mail.gmail.com>
@ 2017-08-21  3:15         ` Xiongfeng Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-08-21  3:15 UTC (permalink / raw)
  To: Hanjun Guo, Jeremy Linton
  Cc: linux-arm-kernel, linux-acpi, Sudeep Holla, Lorenzo Pieralisi,
	Will Deacon, Catalin Marinas, Linuxarm, Rafael J. Wysocki

Hi Hanjun

On 2017/8/9 22:08, Hanjun Guo wrote:
> Hi Jeremy,
> 
> On 8 August 2017 at 01:33, Jeremy Linton <jeremy.linton@arm.com <mailto:jeremy.linton@arm.com>> wrote:
>>
>> Hi,
>>
>> On 08/07/2017 05:20 AM, Hanjun Guo wrote:
>>>
>>> +Cc Xiongfeng (who is also working on the PPTT but focusing on
>>> CPU topology)
>>>
>>> Hi Jeremy,
>>>
>>> On 2017/8/5 8:11, Jeremy Linton wrote:
>>>>
>>>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>>>> used to describe the processor and cache topologies. Ideally it is
>>>> used to extend/override information provided by the hardware, but
>>>> right now ARM64 is entirely dependent on firmware provided tables.
>>>>
>>>> This patch parses the table for the cache topology only. Its quite
>>>> trivial to add processor/cluster/???/socket level parsing as well,
>>>> but that information isn't as useful as the already provided NUMA
>>>> SRAT/SLIT information which provides relative distances. The one
>>>> useful thing, is the number of physical sockets but due to the
>>>> way arm64 considers "clusters" to be sockets, a larger discussion
>>>> is required here.
>>>
>>>
>>> I think we need the socket to represent the true topology of
>>> the SoC, which means that considering clusters to be sockets is
>>> wrong on ARM64 server platforms, a "socket" needs to be a memory
>>> controller attached I think.
>>
I agree with you that clusters should not be considered as sockets. Cores in a
cluster may share a L2 cache and should stay in a same local sched_domain for
better performance. This is done in function cpu_coregroup_mask(), which use
cpu_topology[cpu].core_sibling to build a sched_domain. The core_sibling information
is also used in sysfs and considered as cores in a socket by "lscpu".

I think we may need to add members 'physical_package_id' and 'cluster_sibling' in
struct cpu_topology. So that cores in a cluster represented by 'core_sibling' are composed
in a same local sched_domain, and cores in a package represented by 'cluster_sibling'
are considered as a socket by 'lscpu'.


Thanks,
Xiongfeng Wang


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 0/4] Parse ACPI/PPTT for cache information
@ 2017-08-21  3:15         ` Xiongfeng Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-08-21  3:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Hanjun

On 2017/8/9 22:08, Hanjun Guo wrote:
> Hi Jeremy,
> 
> On 8 August 2017 at 01:33, Jeremy Linton <jeremy.linton at arm.com <mailto:jeremy.linton@arm.com>> wrote:
>>
>> Hi,
>>
>> On 08/07/2017 05:20 AM, Hanjun Guo wrote:
>>>
>>> +Cc Xiongfeng (who is also working on the PPTT but focusing on
>>> CPU topology)
>>>
>>> Hi Jeremy,
>>>
>>> On 2017/8/5 8:11, Jeremy Linton wrote:
>>>>
>>>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>>>> used to describe the processor and cache topologies. Ideally it is
>>>> used to extend/override information provided by the hardware, but
>>>> right now ARM64 is entirely dependent on firmware provided tables.
>>>>
>>>> This patch parses the table for the cache topology only. Its quite
>>>> trivial to add processor/cluster/???/socket level parsing as well,
>>>> but that information isn't as useful as the already provided NUMA
>>>> SRAT/SLIT information which provides relative distances. The one
>>>> useful thing, is the number of physical sockets but due to the
>>>> way arm64 considers "clusters" to be sockets, a larger discussion
>>>> is required here.
>>>
>>>
>>> I think we need the socket to represent the true topology of
>>> the SoC, which means that considering clusters to be sockets is
>>> wrong on ARM64 server platforms, a "socket" needs to be a memory
>>> controller attached I think.
>>
I agree with you that clusters should not be considered as sockets. Cores in a
cluster may share a L2 cache and should stay in a same local sched_domain for
better performance. This is done in function cpu_coregroup_mask(), which use
cpu_topology[cpu].core_sibling to build a sched_domain. The core_sibling information
is also used in sysfs and considered as cores in a socket by "lscpu".

I think we may need to add members 'physical_package_id' and 'cluster_sibling' in
struct cpu_topology. So that cores in a cluster represented by 'core_sibling' are composed
in a same local sched_domain, and cores in a package represented by 'cluster_sibling'
are considered as a socket by 'lscpu'.


Thanks,
Xiongfeng Wang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 4/4] ACPI: Enable PPTT support on ARM64
  2017-08-05  0:11   ` Jeremy Linton
@ 2017-09-05  7:22     ` Xiongfeng Wang
  -1 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-09-05  7:22 UTC (permalink / raw)
  To: Jeremy Linton, linux-arm-kernel
  Cc: lorenzo.pieralisi, catalin.marinas, will.deacon, linux-acpi,
	hanjun.guo, sudeep.holla

Hi Jeremy,

On 2017/8/5 8:11, Jeremy Linton wrote:
> Now that we have a parser, and arm64 has appropriate
> hooks in place to utilize it, lets enable PPTT
> parsing in the arm64 ACPI build.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/arm64/Kconfig  | 3 +++
>  drivers/acpi/arm64/Makefile | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index 5a6f80f..74b855a 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -7,3 +7,6 @@ config ACPI_IORT
>  
>  config ACPI_GTDT
>  	bool
> +
> +config ACPI_PPTT
> +	bool

We may need to select ACPI_PPTT in arch/arm64/Kconfig, just like ACPI_GTDT

> \ No newline at end of file
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 1017def..b6dee6b 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -1,2 +1,3 @@
>  obj-$(CONFIG_ACPI_IORT) 	+= iort.o
>  obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
> +obj-$(CONFIG_ACPI_GTDT) 	+= pptt.o
obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
> 


Thanks,
Xiongfeng Wang


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 4/4] ACPI: Enable PPTT support on ARM64
@ 2017-09-05  7:22     ` Xiongfeng Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-09-05  7:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 2017/8/5 8:11, Jeremy Linton wrote:
> Now that we have a parser, and arm64 has appropriate
> hooks in place to utilize it, lets enable PPTT
> parsing in the arm64 ACPI build.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/arm64/Kconfig  | 3 +++
>  drivers/acpi/arm64/Makefile | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index 5a6f80f..74b855a 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -7,3 +7,6 @@ config ACPI_IORT
>  
>  config ACPI_GTDT
>  	bool
> +
> +config ACPI_PPTT
> +	bool

We may need to select ACPI_PPTT in arch/arm64/Kconfig, just like ACPI_GTDT

> \ No newline at end of file
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 1017def..b6dee6b 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -1,2 +1,3 @@
>  obj-$(CONFIG_ACPI_IORT) 	+= iort.o
>  obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
> +obj-$(CONFIG_ACPI_GTDT) 	+= pptt.o
obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
> 


Thanks,
Xiongfeng Wang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 3/4] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-08-05  0:11   ` Jeremy Linton
@ 2017-09-06 10:34     ` Xiongfeng Wang
  -1 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-09-06 10:34 UTC (permalink / raw)
  To: Jeremy Linton, linux-arm-kernel
  Cc: lorenzo.pieralisi, catalin.marinas, will.deacon, linux-acpi,
	hanjun.guo, sudeep.holla

Hi Jeremy,

On 2017/8/5 8:11, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 389 insertions(+)
>  create mode 100644 drivers/acpi/arm64/pptt.c
> 
> diff --git a/drivers/acpi/arm64/pptt.c b/drivers/acpi/arm64/pptt.c
> new file mode 100644
> index 0000000..e1ab77d
> --- /dev/null
> +++ b/drivers/acpi/arm64/pptt.c
> @@ -0,0 +1,389 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages. Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +	return NULL;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_err("Unknown cache type, assume unified\n");
I think we may need a 'pr_debug' here instead of 'pr_err'.
The register CLIDR usually only describe the L1 or L2 cache of a core.
Cache level described by CLIDR is often smaller than described by PPTT.
The cache_type not described by CLIDR will be CACHE_TYPE_NOCACHE, which will
cause a error info print here. This scene is very common. So I think
a 'pr_debug' may be better.

> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache)
> +{
> +	this_leaf->of_node = (struct device_node *)found_cache;
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}
> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level);
> +		pr_debug("found = %p\n", found_cache);
> +		if (found_cache)
> +			update_cache_properties(this_leaf, found_cache);
> +
> +		index++;
> +	}
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later
> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return 0;
> +}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 3/4] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-09-06 10:34     ` Xiongfeng Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Xiongfeng Wang @ 2017-09-06 10:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 2017/8/5 8:11, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 389 insertions(+)
>  create mode 100644 drivers/acpi/arm64/pptt.c
> 
> diff --git a/drivers/acpi/arm64/pptt.c b/drivers/acpi/arm64/pptt.c
> new file mode 100644
> index 0000000..e1ab77d
> --- /dev/null
> +++ b/drivers/acpi/arm64/pptt.c
> @@ -0,0 +1,389 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages. Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +	return NULL;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_err("Unknown cache type, assume unified\n");
I think we may need a 'pr_debug' here instead of 'pr_err'.
The register CLIDR usually only describe the L1 or L2 cache of a core.
Cache level described by CLIDR is often smaller than described by PPTT.
The cache_type not described by CLIDR will be CACHE_TYPE_NOCACHE, which will
cause a error info print here. This scene is very common. So I think
a 'pr_debug' may be better.

> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache)
> +{
> +	this_leaf->of_node = (struct device_node *)found_cache;
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}
> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level);
> +		pr_debug("found = %p\n", found_cache);
> +		if (found_cache)
> +			update_cache_properties(this_leaf, found_cache);
> +
> +		index++;
> +	}
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later
> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return 0;
> +}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC 3/4] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-09-06 10:34     ` Xiongfeng Wang
@ 2017-09-06 20:39       ` Jeremy Linton
  -1 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-09-06 20:39 UTC (permalink / raw)
  To: Xiongfeng Wang, linux-arm-kernel
  Cc: lorenzo.pieralisi, catalin.marinas, will.deacon, linux-acpi,
	hanjun.guo, sudeep.holla

Hi,


On 09/06/2017 05:34 AM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> On 2017/8/5 8:11, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 389 insertions(+)
>>   create mode 100644 drivers/acpi/arm64/pptt.c
>>
>> diff --git a/drivers/acpi/arm64/pptt.c b/drivers/acpi/arm64/pptt.c
>> new file mode 100644
>> index 0000000..e1ab77d
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/pptt.c
>> @@ -0,0 +1,389 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology Table (PPTT)
>> + * which is optionally used to describe the processor and cache topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +
>> +	/* there isn't a subtable at reference 0 */
>> +	if (!pptt_ref)
>> +		return NULL;
>> +
>> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
>> +		return NULL;
>> +
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
>> +
>> +	if (pptt_ref + entry->length > table_hdr->length)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *node, int resource)
>> +{
>> +	u32 ref;
>> +
>> +	if (resource >= node->number_of_priv_resources)
>> +		return NULL;
>> +
>> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +		      sizeof(u32) * resource);
>> +
>> +	return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +				int local_level,
>> +				struct acpi_subtable_header *res,
>> +				struct acpi_pptt_cache **found,
>> +				int level, int type)
>> +{
>> +	struct acpi_pptt_cache *cache;
>> +
>> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +		return 0;
>> +
>> +	cache = (struct acpi_pptt_cache *) res;
>> +	while (cache) {
>> +		local_level++;
>> +
>> +		if ((local_level == level) &&
>> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +			if (*found != NULL)
>> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
>> +
>> +			pr_debug("Found cache @ level %d\n", level);
>> +			*found = cache;
>> +			/*
>> +			 * continue looking at this node's resource list
>> +			 * to verify that we don't find a duplicate
>> +			 * cache node.
>> +			 */
>> +		}
>> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +	}
>> +	return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, and then
>> + * for each cache node, count how many levels exist below (logically above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *cpu_node,
>> +	int *starting_level, int level, int type)
>> +{
>> +	struct acpi_subtable_header *res;
>> +	int number_of_levels = *starting_level;
>> +	int resource = 0;
>> +	struct acpi_pptt_cache *ret = NULL;
>> +	int local_level;
>> +
>> +	/* walk down from processor node */
>> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>> +		resource++;
>> +
>> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +						   res, &ret, level, type);
>> +		/*
>> +		 * we are looking for the max depth. Since its potentially
>> +		 * possible for a given node to have resources with differing
>> +		 * depths verify that the depth we have found is the largest.
>> +		 */
>> +		if (number_of_levels < local_level)
>> +			number_of_levels = local_level;
>> +	}
>> +	if (number_of_levels > *starting_level)
>> +		*starting_level = number_of_levels;
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it and count
>> + * how many levels exist solely for it, and then walk up each level until we hit
>> + * the root node (ignore the package level because it may be possible to have
>> + * caches that exist across packages. Count the number of cache levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +			     struct acpi_pptt_processor *cpu_node)
>> +{
>> +	int total_levels = 0;
>> +
>> +	do {
>> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	} while (cpu_node);
>> +
>> +	return total_levels;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +	struct acpi_table_header *table_hdr,
>> +	u32 acpi_cpu_id)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + sizeof(struct acpi_table_pptt));
>> +
>> +	/* find the processor structure associated with this cpuid */
>> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
>> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
>> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +				/* found the correct entry */
>> +				pr_debug("match found!\n");
>> +				return (struct acpi_pptt_processor *)entry;
>> +			}
>> +		}
>> +
>> +		if (entry->length == 0) {
>> +			pr_err("Invalid zero length subtable\n");
>> +			break;
>> +		}
>> +		entry = (struct acpi_subtable_header *)
>> +			((u8 *)entry + entry->length);
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
>> +{
>> +	int number_of_levels = 0;
>> +	struct acpi_pptt_processor *cpu;
>> +
>> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (cpu)
>> +		number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
>> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +	switch (type) {
>> +	case CACHE_TYPE_DATA:
>> +		pr_debug("Looking for data cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_DATA;
>> +	case CACHE_TYPE_INST:
>> +		pr_debug("Looking for instruction cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_INSTR;
>> +	default:
>> +		pr_err("Unknown cache type, assume unified\n");
> I think we may need a 'pr_debug' here instead of 'pr_err'.
> The register CLIDR usually only describe the L1 or L2 cache of a core.
> Cache level described by CLIDR is often smaller than described by PPTT.
> The cache_type not described by CLIDR will be CACHE_TYPE_NOCACHE, which will
> cause a error info print here. This scene is very common. So I think
> a 'pr_debug' may be better.

That is a good point, thanks.


> 
>> +	case CACHE_TYPE_UNIFIED:
>> +		pr_debug("Looking for unified cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +	}
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +	enum cache_type type, unsigned int level)
>> +{
>> +	int total_levels = 0;
>> +	struct acpi_pptt_cache *found = NULL;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u8 acpi_type = acpi_cache_type(type);
>> +
>> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +		 acpi_cpu_id, level, acpi_type);
>> +
>> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (!cpu_node)
>> +		return NULL;
>> +
>> +	do {
>> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	} while ((cpu_node) && (!found));
>> +
>> +	return found;
>> +}
>> +
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	u32 acpi_cpu_id;
>> +	struct acpi_table_header *table;
>> +	int number_of_levels = 0;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
>> +
>> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
>> +	} else {
>> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
>> +		acpi_put_table(table);
>> +	}
>> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +/*
>> + * The ACPI spec implies that the fields in the cache structures are used to
>> + * extend and correct the information probed from the hardware. In the case
>> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
>> + */
>> +static void update_cache_properties(struct cacheinfo *this_leaf,
>> +				    struct acpi_pptt_cache *found_cache)
>> +{
>> +	this_leaf->of_node = (struct device_node *)found_cache;
>> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> +		this_leaf->size = found_cache->size;
>> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> +		this_leaf->coherency_line_size = found_cache->line_size;
>> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> +		this_leaf->number_of_sets = found_cache->number_of_sets;
>> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> +		this_leaf->ways_of_associativity = found_cache->associativity;
>> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +		case ACPI_6_2_CACHE_POLICY_WT:
>> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
>> +			break;
>> +		case ACPI_6_2_CACHE_POLICY_WB:
>> +			this_leaf->attributes = CACHE_WRITE_BACK;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache policy %d\n",
>> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
>> +		}
>> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
>> +			this_leaf->attributes |=
>> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache allocation policy %d\n",
>> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
>> +		}
>> +}
>> +
>> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>> +				 unsigned int cpu)
>> +{
>> +	struct acpi_pptt_cache *found_cache;
>> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +	struct cacheinfo *this_leaf;
>> +	unsigned int index = 0;
>> +
>> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
>> +		this_leaf = this_cpu_ci->info_list + index;
>> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						   this_leaf->type,
>> +						   this_leaf->level);
>> +		pr_debug("found = %p\n", found_cache);
>> +		if (found_cache)
>> +			update_cache_properties(this_leaf, found_cache);
>> +
>> +		index++;
>> +	}
>> +}
>> +
>> +/*
>> + * simply assign a ACPI cache entry to each known CPU cache entry
>> + * determining which entries are shared is done later
>> + */
>> +int cache_setup_acpi(unsigned int cpu)
>> +{
>> +	struct acpi_table_header *table;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
>> +		return -ENOENT;
>> +	}
>> +
>> +	cache_setup_acpi_cpu(table, cpu);
>> +	acpi_put_table(table);
>> +
>> +	return 0;
>> +}
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC 3/4] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-09-06 20:39       ` Jeremy Linton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeremy Linton @ 2017-09-06 20:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


On 09/06/2017 05:34 AM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> On 2017/8/5 8:11, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 389 insertions(+)
>>   create mode 100644 drivers/acpi/arm64/pptt.c
>>
>> diff --git a/drivers/acpi/arm64/pptt.c b/drivers/acpi/arm64/pptt.c
>> new file mode 100644
>> index 0000000..e1ab77d
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/pptt.c
>> @@ -0,0 +1,389 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology Table (PPTT)
>> + * which is optionally used to describe the processor and cache topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +
>> +	/* there isn't a subtable at reference 0 */
>> +	if (!pptt_ref)
>> +		return NULL;
>> +
>> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
>> +		return NULL;
>> +
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
>> +
>> +	if (pptt_ref + entry->length > table_hdr->length)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *node, int resource)
>> +{
>> +	u32 ref;
>> +
>> +	if (resource >= node->number_of_priv_resources)
>> +		return NULL;
>> +
>> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +		      sizeof(u32) * resource);
>> +
>> +	return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +				int local_level,
>> +				struct acpi_subtable_header *res,
>> +				struct acpi_pptt_cache **found,
>> +				int level, int type)
>> +{
>> +	struct acpi_pptt_cache *cache;
>> +
>> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +		return 0;
>> +
>> +	cache = (struct acpi_pptt_cache *) res;
>> +	while (cache) {
>> +		local_level++;
>> +
>> +		if ((local_level == level) &&
>> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +			if (*found != NULL)
>> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
>> +
>> +			pr_debug("Found cache @ level %d\n", level);
>> +			*found = cache;
>> +			/*
>> +			 * continue looking at this node's resource list
>> +			 * to verify that we don't find a duplicate
>> +			 * cache node.
>> +			 */
>> +		}
>> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +	}
>> +	return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, and then
>> + * for each cache node, count how many levels exist below (logically above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *cpu_node,
>> +	int *starting_level, int level, int type)
>> +{
>> +	struct acpi_subtable_header *res;
>> +	int number_of_levels = *starting_level;
>> +	int resource = 0;
>> +	struct acpi_pptt_cache *ret = NULL;
>> +	int local_level;
>> +
>> +	/* walk down from processor node */
>> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>> +		resource++;
>> +
>> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +						   res, &ret, level, type);
>> +		/*
>> +		 * we are looking for the max depth. Since its potentially
>> +		 * possible for a given node to have resources with differing
>> +		 * depths verify that the depth we have found is the largest.
>> +		 */
>> +		if (number_of_levels < local_level)
>> +			number_of_levels = local_level;
>> +	}
>> +	if (number_of_levels > *starting_level)
>> +		*starting_level = number_of_levels;
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it and count
>> + * how many levels exist solely for it, and then walk up each level until we hit
>> + * the root node (ignore the package level because it may be possible to have
>> + * caches that exist across packages. Count the number of cache levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +			     struct acpi_pptt_processor *cpu_node)
>> +{
>> +	int total_levels = 0;
>> +
>> +	do {
>> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	} while (cpu_node);
>> +
>> +	return total_levels;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +	struct acpi_table_header *table_hdr,
>> +	u32 acpi_cpu_id)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + sizeof(struct acpi_table_pptt));
>> +
>> +	/* find the processor structure associated with this cpuid */
>> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
>> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
>> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +				/* found the correct entry */
>> +				pr_debug("match found!\n");
>> +				return (struct acpi_pptt_processor *)entry;
>> +			}
>> +		}
>> +
>> +		if (entry->length == 0) {
>> +			pr_err("Invalid zero length subtable\n");
>> +			break;
>> +		}
>> +		entry = (struct acpi_subtable_header *)
>> +			((u8 *)entry + entry->length);
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
>> +{
>> +	int number_of_levels = 0;
>> +	struct acpi_pptt_processor *cpu;
>> +
>> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (cpu)
>> +		number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
>> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +	switch (type) {
>> +	case CACHE_TYPE_DATA:
>> +		pr_debug("Looking for data cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_DATA;
>> +	case CACHE_TYPE_INST:
>> +		pr_debug("Looking for instruction cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_INSTR;
>> +	default:
>> +		pr_err("Unknown cache type, assume unified\n");
> I think we may need a 'pr_debug' here instead of 'pr_err'.
> The register CLIDR usually only describe the L1 or L2 cache of a core.
> Cache level described by CLIDR is often smaller than described by PPTT.
> The cache_type not described by CLIDR will be CACHE_TYPE_NOCACHE, which will
> cause a error info print here. This scene is very common. So I think
> a 'pr_debug' may be better.

That is a good point, thanks.


> 
>> +	case CACHE_TYPE_UNIFIED:
>> +		pr_debug("Looking for unified cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +	}
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +	enum cache_type type, unsigned int level)
>> +{
>> +	int total_levels = 0;
>> +	struct acpi_pptt_cache *found = NULL;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u8 acpi_type = acpi_cache_type(type);
>> +
>> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +		 acpi_cpu_id, level, acpi_type);
>> +
>> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (!cpu_node)
>> +		return NULL;
>> +
>> +	do {
>> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	} while ((cpu_node) && (!found));
>> +
>> +	return found;
>> +}
>> +
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	u32 acpi_cpu_id;
>> +	struct acpi_table_header *table;
>> +	int number_of_levels = 0;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
>> +
>> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
>> +	} else {
>> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
>> +		acpi_put_table(table);
>> +	}
>> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +/*
>> + * The ACPI spec implies that the fields in the cache structures are used to
>> + * extend and correct the information probed from the hardware. In the case
>> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
>> + */
>> +static void update_cache_properties(struct cacheinfo *this_leaf,
>> +				    struct acpi_pptt_cache *found_cache)
>> +{
>> +	this_leaf->of_node = (struct device_node *)found_cache;
>> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> +		this_leaf->size = found_cache->size;
>> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> +		this_leaf->coherency_line_size = found_cache->line_size;
>> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> +		this_leaf->number_of_sets = found_cache->number_of_sets;
>> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> +		this_leaf->ways_of_associativity = found_cache->associativity;
>> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +		case ACPI_6_2_CACHE_POLICY_WT:
>> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
>> +			break;
>> +		case ACPI_6_2_CACHE_POLICY_WB:
>> +			this_leaf->attributes = CACHE_WRITE_BACK;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache policy %d\n",
>> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
>> +		}
>> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
>> +			this_leaf->attributes |=
>> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache allocation policy %d\n",
>> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
>> +		}
>> +}
>> +
>> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>> +				 unsigned int cpu)
>> +{
>> +	struct acpi_pptt_cache *found_cache;
>> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +	struct cacheinfo *this_leaf;
>> +	unsigned int index = 0;
>> +
>> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
>> +		this_leaf = this_cpu_ci->info_list + index;
>> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						   this_leaf->type,
>> +						   this_leaf->level);
>> +		pr_debug("found = %p\n", found_cache);
>> +		if (found_cache)
>> +			update_cache_properties(this_leaf, found_cache);
>> +
>> +		index++;
>> +	}
>> +}
>> +
>> +/*
>> + * simply assign a ACPI cache entry to each known CPU cache entry
>> + * determining which entries are shared is done later
>> + */
>> +int cache_setup_acpi(unsigned int cpu)
>> +{
>> +	struct acpi_table_header *table;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
>> +		return -ENOENT;
>> +	}
>> +
>> +	cache_setup_acpi_cpu(table, cpu);
>> +	acpi_put_table(table);
>> +
>> +	return 0;
>> +}
>>
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-09-06 20:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-05  0:11 [RFC 0/4] Parse ACPI/PPTT for cache information Jeremy Linton
2017-08-05  0:11 ` Jeremy Linton
2017-08-05  0:11 ` [RFC 1/4] drivers: base: cacheinfo: Add support for ACPI based firmware tables Jeremy Linton
2017-08-05  0:11   ` Jeremy Linton
2017-08-05  0:11 ` [RFC 2/4] arm64: cacheinfo: Add support for ACPI/PPTT generated topology Jeremy Linton
2017-08-05  0:11   ` Jeremy Linton
2017-08-05  0:11 ` [RFC 3/4] ACPI/PPTT: Add Processor Properties Topology Table parsing Jeremy Linton
2017-08-05  0:11   ` Jeremy Linton
2017-09-06 10:34   ` Xiongfeng Wang
2017-09-06 10:34     ` Xiongfeng Wang
2017-09-06 20:39     ` Jeremy Linton
2017-09-06 20:39       ` Jeremy Linton
2017-08-05  0:11 ` [RFC 4/4] ACPI: Enable PPTT support on ARM64 Jeremy Linton
2017-08-05  0:11   ` Jeremy Linton
2017-09-05  7:22   ` Xiongfeng Wang
2017-09-05  7:22     ` Xiongfeng Wang
2017-08-07 10:20 ` [RFC 0/4] Parse ACPI/PPTT for cache information Hanjun Guo
2017-08-07 10:20   ` Hanjun Guo
2017-08-07 17:10   ` Jeffrey Hugo
2017-08-07 17:10     ` Jeffrey Hugo
2017-08-08  4:19     ` Xiongfeng Wang
2017-08-08  4:19       ` Xiongfeng Wang
2017-08-07 17:33   ` Jeremy Linton
2017-08-07 17:33     ` Jeremy Linton
     [not found]     ` <CAGHbJ3AL6cPribriuV0G2TvQCx+Qi9URpnpSi=UVRjf_cv_vLg@mail.gmail.com>
2017-08-21  3:15       ` Xiongfeng Wang
2017-08-21  3:15         ` Xiongfeng Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.