All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Support PPTT for ARM64
@ 2017-10-12 19:48 ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topology. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
For the latter we also add an additional topology_cod_id() macro,
and a package_id for arm64. Initially the physical id will match
the cluster id, but we update users of the cluster to utilize
the new macro. When we enable ACPI/PPTT for arm64 we map the socket
to the physical id as the remainder of the kernel expects.

For example on juno:
[root@mammon-juno-rh topology]# lstopo-no-graphics
  Package L#0
    L2 L#0 (1024KB)
      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
    L2 L#1 (2048KB)
      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"

v2->v3:

Remove valid bit check on leaf nodes. Now simply being a leaf node
  is sufficient to verify the processor id against the ACPI
  processor ids (gotten from MADT). 

Use the acpi processor for the "level 0" Id. This makes the /sys
  visible core/thread ids more human readable if the firmware uses
  small consecutive values for processor ids.

Added PPTT to the list of injectable ACPI tables.

Fix bug which kept the code from using the processor node as intended
  in v2, caused by misuse of git rebase/fixup.

v1->v2:

The parser keys off the acpi_pptt_processor node to determine
  unique cache's rather than the acpi_pptt_cache referenced by the
  processor node. This allows PPTT tables which "share" cache nodes
  across cpu nodes despite not being a shared cache.

Normalize the socket, cluster and thread mapping so that they match
  linux's traditional mapping for the physical id, and thread id.
  Adding explicit scheduler knowledge of clusters (rather than just
  their cache sharing attributes) is a subject for a future patch.

Jeremy Linton (7):
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
    tables
  Topology: Add cluster on die macros and arm64 decoding
  arm64: Fixup users of topology_physical_package_id
  arm64: topology: Enable ACPI/PPTT based CPU topology.
  ACPI: Add PPTT to injectable table list

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  23 +-
 arch/arm64/kernel/topology.c      |  62 ++++-
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/arm64/Kconfig        |   3 +
 drivers/acpi/pptt.c               | 486 ++++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c             |   3 +-
 drivers/base/cacheinfo.c          |  17 +-
 drivers/cpufreq/arm_big_little.c  |   2 +-
 drivers/firmware/psci_checker.c   |   2 +-
 include/linux/cacheinfo.h         |  11 +-
 include/linux/topology.h          |   4 +
 13 files changed, 599 insertions(+), 20 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 0/7] Support PPTT for ARM64
@ 2017-10-12 19:48 ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topology. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
For the latter we also add an additional topology_cod_id() macro,
and a package_id for arm64. Initially the physical id will match
the cluster id, but we update users of the cluster to utilize
the new macro. When we enable ACPI/PPTT for arm64 we map the socket
to the physical id as the remainder of the kernel expects.

For example on juno:
[root at mammon-juno-rh topology]# lstopo-no-graphics
  Package L#0
    L2 L#0 (1024KB)
      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
    L2 L#1 (2048KB)
      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"

v2->v3:

Remove valid bit check on leaf nodes. Now simply being a leaf node
  is sufficient to verify the processor id against the ACPI
  processor ids (gotten from MADT). 

Use the acpi processor for the "level 0" Id. This makes the /sys
  visible core/thread ids more human readable if the firmware uses
  small consecutive values for processor ids.

Added PPTT to the list of injectable ACPI tables.

Fix bug which kept the code from using the processor node as intended
  in v2, caused by misuse of git rebase/fixup.

v1->v2:

The parser keys off the acpi_pptt_processor node to determine
  unique cache's rather than the acpi_pptt_cache referenced by the
  processor node. This allows PPTT tables which "share" cache nodes
  across cpu nodes despite not being a shared cache.

Normalize the socket, cluster and thread mapping so that they match
  linux's traditional mapping for the physical id, and thread id.
  Adding explicit scheduler knowledge of clusters (rather than just
  their cache sharing attributes) is a subject for a future patch.

Jeremy Linton (7):
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
    tables
  Topology: Add cluster on die macros and arm64 decoding
  arm64: Fixup users of topology_physical_package_id
  arm64: topology: Enable ACPI/PPTT based CPU topology.
  ACPI: Add PPTT to injectable table list

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  23 +-
 arch/arm64/kernel/topology.c      |  62 ++++-
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/arm64/Kconfig        |   3 +
 drivers/acpi/pptt.c               | 486 ++++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c             |   3 +-
 drivers/base/cacheinfo.c          |  17 +-
 drivers/cpufreq/arm_big_little.c  |   2 +-
 drivers/firmware/psci_checker.c   |   2 +-
 include/linux/cacheinfo.h         |  11 +-
 include/linux/topology.h          |   4 +
 13 files changed, 599 insertions(+), 20 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-12 19:48 ` Jeremy Linton
@ 2017-10-12 19:48   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Further, report peers in the topology using setup_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 485 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..c86715fed4a7
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,1 +1,485 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from the processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/* determine if the given node is a leaf node */
+static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
+			       struct acpi_pptt_processor *node)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	u32 node_entry;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->parent == node_entry))
+			return 0;
+		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
+	}
+	return 1;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return NULL;
+}
+
+/*
+ * Given a acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with or we run out of levels
+ * to request.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_debug("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level,
+	struct acpi_pptt_processor **node)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		*node = cpu_node;
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache,
+				    struct acpi_pptt_processor *cpu_node)
+{
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+	struct acpi_pptt_processor *cpu_node = NULL;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level,
+						   &cpu_node);
+		pr_debug("found = %p %p\n", found_cache, cpu_node);
+		if (found_cache)
+			update_cache_properties(this_leaf,
+						found_cache,
+						cpu_node);
+
+		index++;
+	}
+}
+
+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
+				    unsigned int cpu, int level)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
+		/* Only the first level has a guaranteed id */
+		if (level == 0)
+			return cpu_node->acpi_processor_id;
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later.
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
+
+/*
+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
+ * This ID can then be used to group peers.
+ */
+int setup_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_setup_acpi_cpu(table, cpu, level);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Further, report peers in the topology using setup_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 485 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..c86715fed4a7
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,1 +1,485 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from the processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/* determine if the given node is a leaf node */
+static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
+			       struct acpi_pptt_processor *node)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	u32 node_entry;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->parent == node_entry))
+			return 0;
+		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
+	}
+	return 1;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return NULL;
+}
+
+/*
+ * Given a acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with or we run out of levels
+ * to request.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_debug("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level,
+	struct acpi_pptt_processor **node)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		*node = cpu_node;
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache,
+				    struct acpi_pptt_processor *cpu_node)
+{
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+	struct acpi_pptt_processor *cpu_node = NULL;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level,
+						   &cpu_node);
+		pr_debug("found = %p %p\n", found_cache, cpu_node);
+		if (found_cache)
+			update_cache_properties(this_leaf,
+						found_cache,
+						cpu_node);
+
+		index++;
+	}
+}
+
+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
+				    unsigned int cpu, int level)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
+		/* Only the first level has a guaranteed id */
+		if (level == 0)
+			return cpu_node->acpi_processor_id;
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later.
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
+
+/*
+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
+ * This ID can then be used to group peers.
+ */
+int setup_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_setup_acpi_cpu(table, cpu, level);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
  2017-10-12 19:48 ` Jeremy Linton
  (?)
@ 2017-10-12 19:48   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, Jeremy Linton, ahs3, viresh.kumar,
	hanjun.guo, sudeep.holla, austinwc, wangxiongfeng2,
	linux-arm-kernel

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig         | 1 +
 drivers/acpi/Makefile      | 1 +
 drivers/acpi/arm64/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..68c9d1289735 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 90265ab4437a..c92a0c937551 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..74b855a669ea 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig         | 1 +
 drivers/acpi/Makefile      | 1 +
 drivers/acpi/arm64/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..68c9d1289735 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 90265ab4437a..c92a0c937551 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..74b855a669ea 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig         | 1 +
 drivers/acpi/Makefile      | 1 +
 drivers/acpi/arm64/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..68c9d1289735 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 90265ab4437a..c92a0c937551 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..74b855a669ea 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 3/7] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
  2017-10-12 19:48 ` Jeremy Linton
@ 2017-10-12 19:48   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using a PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 drivers/acpi/pptt.c           |  1 +
 drivers/base/cacheinfo.c      | 17 +++++++++++------
 include/linux/cacheinfo.h     | 11 +++++++++--
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..2e2cf0d312ba 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index c86715fed4a7..b5c6de37e328 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -355,6 +355,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
 				    struct acpi_pptt_cache *found_cache,
 				    struct acpi_pptt_processor *cpu_node)
 {
+	this_leaf->firmware_node = cpu_node;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
 		this_leaf->size = found_cache->size;
 	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af2739537..8eca279e50d1 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->firmware_node == this_leaf->firmware_node;
 }
 
 /* OF properties to query for a given cache type */
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf6a06d..d1e9b8e01981 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -36,6 +36,9 @@ enum cache_type {
  * @of_node: if devicetree is used, this represents either the cpu node in
  *	case there's no explicit cache node or the cache node itself in the
  *	device tree
+ * @firmware_node: Shared with of_node. When not using DT, this may contain
+ *	pointers to other firmware based values. Particularly ACPI/PPTT
+ *	unique values.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -64,8 +67,10 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	union {
+		struct device_node *of_node;
+		void *firmware_node;
+	};
 	bool disable_sysfs;
 	void *priv;
 };
@@ -98,6 +103,8 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int cache_setup_acpi(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 3/7] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using a PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 drivers/acpi/pptt.c           |  1 +
 drivers/base/cacheinfo.c      | 17 +++++++++++------
 include/linux/cacheinfo.h     | 11 +++++++++--
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..2e2cf0d312ba 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index c86715fed4a7..b5c6de37e328 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -355,6 +355,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
 				    struct acpi_pptt_cache *found_cache,
 				    struct acpi_pptt_processor *cpu_node)
 {
+	this_leaf->firmware_node = cpu_node;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
 		this_leaf->size = found_cache->size;
 	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af2739537..8eca279e50d1 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->firmware_node == this_leaf->firmware_node;
 }
 
 /* OF properties to query for a given cache type */
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf6a06d..d1e9b8e01981 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -36,6 +36,9 @@ enum cache_type {
  * @of_node: if devicetree is used, this represents either the cpu node in
  *	case there's no explicit cache node or the cache node itself in the
  *	device tree
+ * @firmware_node: Shared with of_node. When not using DT, this may contain
+ *	pointers to other firmware based values. Particularly ACPI/PPTT
+ *	unique values.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -64,8 +67,10 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	union {
+		struct device_node *of_node;
+		void *firmware_node;
+	};
 	bool disable_sysfs;
 	void *priv;
 };
@@ -98,6 +103,8 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int cache_setup_acpi(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 4/7] Topology: Add cluster on die macros and arm64 decoding
  2017-10-12 19:48 ` Jeremy Linton
@ 2017-10-12 19:48   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

Many modern machines have cluster on die (COD) non-uniformity
as well as the traditional multi-socket architectures. Reusing
the multi-socket or NUMA on die concepts for these (as arm64 does)
breaks down when presented with actual multi-socket/COD machines.
Similar, problems are also visible on some x86 machines so it
seems appropriate to start abstracting and making these topologies
visible.

To start a topology_cod_id() macro is added which defaults to returning
the same information as topology_physical_package_id(). Moving forward
we can start to spit out the differences.

For arm64, an additional package_id is added to the cpu_topology array.
Initially this will be equal to the cluster_id as well.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h | 4 +++-
 arch/arm64/kernel/topology.c      | 8 ++++++--
 include/linux/topology.h          | 3 +++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 8b57339823e9..bd7517960d39 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,13 +7,15 @@ struct cpu_topology {
 	int thread_id;
 	int core_id;
 	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
+#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..9147e5b6326d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
+				/* maintain DT cluster == package behavior */
+				cpu_topology[cpu].package_id = cluster_id;
 				cpu_topology[cpu].cluster_id = cluster_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
@@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			       core);
 			return -EINVAL;
 		}
-
+		cpu_topology[cpu].package_id = cluster_id;
 		cpu_topology[cpu].cluster_id = cluster_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
@@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
+	cpuid_topo->package_id = cpuid_topo->cluster_id;
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
 		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
@@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
 		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..4660749a7303 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_cod_id				/* cluster on die */
+#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 4/7] Topology: Add cluster on die macros and arm64 decoding
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

Many modern machines have cluster on die (COD) non-uniformity
as well as the traditional multi-socket architectures. Reusing
the multi-socket or NUMA on die concepts for these (as arm64 does)
breaks down when presented with actual multi-socket/COD machines.
Similar, problems are also visible on some x86 machines so it
seems appropriate to start abstracting and making these topologies
visible.

To start a topology_cod_id() macro is added which defaults to returning
the same information as topology_physical_package_id(). Moving forward
we can start to spit out the differences.

For arm64, an additional package_id is added to the cpu_topology array.
Initially this will be equal to the cluster_id as well.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h | 4 +++-
 arch/arm64/kernel/topology.c      | 8 ++++++--
 include/linux/topology.h          | 3 +++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 8b57339823e9..bd7517960d39 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,13 +7,15 @@ struct cpu_topology {
 	int thread_id;
 	int core_id;
 	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
+#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..9147e5b6326d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
+				/* maintain DT cluster == package behavior */
+				cpu_topology[cpu].package_id = cluster_id;
 				cpu_topology[cpu].cluster_id = cluster_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
@@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			       core);
 			return -EINVAL;
 		}
-
+		cpu_topology[cpu].package_id = cluster_id;
 		cpu_topology[cpu].cluster_id = cluster_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
@@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
+	cpuid_topo->package_id = cpuid_topo->cluster_id;
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
 		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
@@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
 		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..4660749a7303 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_cod_id				/* cluster on die */
+#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 5/7] arm64: Fixup users of topology_physical_package_id
  2017-10-12 19:48 ` Jeremy Linton
@ 2017-10-12 19:48   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

There are a few arm64 specific users (cpufreq, psci, etc) which really
want the cluster rather than the topology_physical_package_id(). Lets
convert those users to topology_cod_id(). That way when we start
differentiating the socket/cluster they will continue to behave correctly.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/cpufreq/arm_big_little.c | 2 +-
 drivers/firmware/psci_checker.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 17504129fd77..6ee69b3820de 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -72,7 +72,7 @@ static struct mutex cluster_lock[MAX_CLUSTERS];
 
 static inline int raw_cpu_to_cluster(int cpu)
 {
-	return topology_physical_package_id(cpu);
+	return topology_cod_id(cpu);
 }
 
 static inline int cpu_to_cluster(int cpu)
diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index 6523ce962865..a9465f5d344a 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -202,7 +202,7 @@ static int hotplug_tests(void)
 	 */
 	for (i = 0; i < nb_cluster; ++i) {
 		int cluster_id =
-			topology_physical_package_id(cpumask_any(clusters[i]));
+			topology_cod_id(cpumask_any(clusters[i]));
 		ssize_t len = cpumap_print_to_pagebuf(true, page_buf,
 						      clusters[i]);
 		/* Remove trailing newline. */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 5/7] arm64: Fixup users of topology_physical_package_id
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

There are a few arm64 specific users (cpufreq, psci, etc) which really
want the cluster rather than the topology_physical_package_id(). Lets
convert those users to topology_cod_id(). That way when we start
differentiating the socket/cluster they will continue to behave correctly.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/cpufreq/arm_big_little.c | 2 +-
 drivers/firmware/psci_checker.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 17504129fd77..6ee69b3820de 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -72,7 +72,7 @@ static struct mutex cluster_lock[MAX_CLUSTERS];
 
 static inline int raw_cpu_to_cluster(int cpu)
 {
-	return topology_physical_package_id(cpu);
+	return topology_cod_id(cpu);
 }
 
 static inline int cpu_to_cluster(int cpu)
diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index 6523ce962865..a9465f5d344a 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -202,7 +202,7 @@ static int hotplug_tests(void)
 	 */
 	for (i = 0; i < nb_cluster; ++i) {
 		int cluster_id =
-			topology_physical_package_id(cpumask_any(clusters[i]));
+			topology_cod_id(cpumask_any(clusters[i]));
 		ssize_t len = cpumap_print_to_pagebuf(true, page_buf,
 						      clusters[i]);
 		/* Remove trailing newline. */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-12 19:48 ` Jeremy Linton
@ 2017-10-12 19:48   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by passing an arbitrary large level to setup_acpi_cpu_topology()
which terminates its search when it finds an ACPI node flagged
as the physical package. If the tree doesn't contain enough
levels to represent all of thread/core/cod/package then the package
id will be used for the missing levels.

Since server/ACPI machines are more likely to be multisocket and NUMA,
this patch also modifies the default clusters=sockets behavior
for ACPI machines to sockets=sockets. DT machines continue to
represent sockets as clusters. For ACPI machines, this results in a
more normalized view of the topology. Cluster level scheduler decisions
are still being made due to the "MC" level in the scheduler which has
knowledge of cache sharing domains.

This code is loosely based on a combination of code from:
Xiongfeng Wang <wangxiongfeng2@huawei.com>
John Garry <john.garry@huawei.com>
Jeffrey Hugo <jhugo@codeaurora.org>

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 54 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/topology.h     |  1 +
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9147e5b6326d..42f3e7f28b2b 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -304,6 +306,54 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	u64 is_threaded;
+	int cpu;
+	int topology_id;
+	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
+	const int max_topo = 0xFF;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = setup_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if (is_threaded) {
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			cpu_topology[cpu].thread_id  = -1;
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+	return 0;
+}
+
+#else
+static int __init parse_acpi_topology(void)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return -EINVAL;
+}
+#endif
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
@@ -312,6 +362,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4660749a7303..cbf2fb13bf92 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,7 @@
 		if (nr_cpus_node(node))
 
 int arch_update_cpu_topology(void);
+int setup_acpi_cpu_topology(unsigned int cpu, int level);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by passing an arbitrary large level to setup_acpi_cpu_topology()
which terminates its search when it finds an ACPI node flagged
as the physical package. If the tree doesn't contain enough
levels to represent all of thread/core/cod/package then the package
id will be used for the missing levels.

Since server/ACPI machines are more likely to be multisocket and NUMA,
this patch also modifies the default clusters=sockets behavior
for ACPI machines to sockets=sockets. DT machines continue to
represent sockets as clusters. For ACPI machines, this results in a
more normalized view of the topology. Cluster level scheduler decisions
are still being made due to the "MC" level in the scheduler which has
knowledge of cache sharing domains.

This code is loosely based on a combination of code from:
Xiongfeng Wang <wangxiongfeng2@huawei.com>
John Garry <john.garry@huawei.com>
Jeffrey Hugo <jhugo@codeaurora.org>

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 54 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/topology.h     |  1 +
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9147e5b6326d..42f3e7f28b2b 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -304,6 +306,54 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	u64 is_threaded;
+	int cpu;
+	int topology_id;
+	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
+	const int max_topo = 0xFF;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = setup_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if (is_threaded) {
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			cpu_topology[cpu].thread_id  = -1;
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+	return 0;
+}
+
+#else
+static int __init parse_acpi_topology(void)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return -EINVAL;
+}
+#endif
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
@@ -312,6 +362,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4660749a7303..cbf2fb13bf92 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,7 @@
 		if (nr_cpus_node(node))
 
 int arch_update_cpu_topology(void);
+int setup_acpi_cpu_topology(unsigned int cpu, int level);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 7/7] ACPI: Add PPTT to injectable table list
  2017-10-12 19:48 ` Jeremy Linton
@ 2017-10-12 19:48   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Jeremy Linton

Add ACPI_SIG_PPTT to the table so initrd's can override the
system topology.

Suggested-by: Geoffrey Blake <geoffrey.blake@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/tables.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 80ce2a7d224b..6d254450115b 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -456,7 +456,8 @@ static const char * const table_sigs[] = {
 	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
 	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
 	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
+	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, ACPI_SIG_PPTT,
+	NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v3 7/7] ACPI: Add PPTT to injectable table list
@ 2017-10-12 19:48   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-12 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

Add ACPI_SIG_PPTT to the table so initrd's can override the
system topology.

Suggested-by: Geoffrey Blake <geoffrey.blake@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/tables.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 80ce2a7d224b..6d254450115b 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -456,7 +456,8 @@ static const char * const table_sigs[] = {
 	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
 	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
 	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
+	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, ACPI_SIG_PPTT,
+	NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
  2017-10-12 19:48   ` Jeremy Linton
  (?)
@ 2017-10-13  9:53     ` Hanjun Guo
  -1 siblings, 0 replies; 104+ messages in thread
From: Hanjun Guo @ 2017-10-13  9:53 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi Jeremy,

On 2017/10/13 3:48, Jeremy Linton wrote:
> Now that we have a PPTT parser, in preparation for its use
> on arm64, lets build it.
>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/Kconfig         | 1 +
>  drivers/acpi/Makefile      | 1 +
>  drivers/acpi/arm64/Kconfig | 3 +++
>  3 files changed, 5 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6a56d4..68c9d1289735 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -7,6 +7,7 @@ config ARM64
>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>  	select ACPI_MCFG if ACPI
>  	select ACPI_SPCR_TABLE if ACPI
> +	select ACPI_PPTT if ACPI
>  	select ARCH_CLOCKSOURCE_DATA
>  	select ARCH_HAS_DEBUG_VIRTUAL
>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 90265ab4437a..c92a0c937551 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>  
>  # processor has its own "processor." module_param namespace
>  processor-y			:= processor_driver.o
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index 5a6f80fce0d6..74b855a669ea 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -7,3 +7,6 @@ config ACPI_IORT
>  
>  config ACPI_GTDT
>  	bool
> +
> +config ACPI_PPTT
> +	bool

Can this be located in drivers/acpi/Kconfig? then other
platform can select ACPI_PPTT if they want.

Thanks
Hanjun

> \ No newline at end of file

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-13  9:53     ` Hanjun Guo
  0 siblings, 0 replies; 104+ messages in thread
From: Hanjun Guo @ 2017-10-13  9:53 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi Jeremy,

On 2017/10/13 3:48, Jeremy Linton wrote:
> Now that we have a PPTT parser, in preparation for its use
> on arm64, lets build it.
>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/Kconfig         | 1 +
>  drivers/acpi/Makefile      | 1 +
>  drivers/acpi/arm64/Kconfig | 3 +++
>  3 files changed, 5 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6a56d4..68c9d1289735 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -7,6 +7,7 @@ config ARM64
>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>  	select ACPI_MCFG if ACPI
>  	select ACPI_SPCR_TABLE if ACPI
> +	select ACPI_PPTT if ACPI
>  	select ARCH_CLOCKSOURCE_DATA
>  	select ARCH_HAS_DEBUG_VIRTUAL
>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 90265ab4437a..c92a0c937551 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>  
>  # processor has its own "processor." module_param namespace
>  processor-y			:= processor_driver.o
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index 5a6f80fce0d6..74b855a669ea 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -7,3 +7,6 @@ config ACPI_IORT
>  
>  config ACPI_GTDT
>  	bool
> +
> +config ACPI_PPTT
> +	bool

Can this be located in drivers/acpi/Kconfig? then other
platform can select ACPI_PPTT if they want.

Thanks
Hanjun

> \ No newline at end of file

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-13  9:53     ` Hanjun Guo
  0 siblings, 0 replies; 104+ messages in thread
From: Hanjun Guo @ 2017-10-13  9:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 2017/10/13 3:48, Jeremy Linton wrote:
> Now that we have a PPTT parser, in preparation for its use
> on arm64, lets build it.
>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/Kconfig         | 1 +
>  drivers/acpi/Makefile      | 1 +
>  drivers/acpi/arm64/Kconfig | 3 +++
>  3 files changed, 5 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6a56d4..68c9d1289735 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -7,6 +7,7 @@ config ARM64
>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>  	select ACPI_MCFG if ACPI
>  	select ACPI_SPCR_TABLE if ACPI
> +	select ACPI_PPTT if ACPI
>  	select ARCH_CLOCKSOURCE_DATA
>  	select ARCH_HAS_DEBUG_VIRTUAL
>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 90265ab4437a..c92a0c937551 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>  
>  # processor has its own "processor." module_param namespace
>  processor-y			:= processor_driver.o
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index 5a6f80fce0d6..74b855a669ea 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -7,3 +7,6 @@ config ACPI_IORT
>  
>  config ACPI_GTDT
>  	bool
> +
> +config ACPI_PPTT
> +	bool

Can this be located in drivers/acpi/Kconfig? then other
platform can select ACPI_PPTT if they want.

Thanks
Hanjun

> \ No newline at end of file

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-13  9:56     ` Julien Thierry
  -1 siblings, 0 replies; 104+ messages in thread
From: Julien Thierry @ 2017-10-13  9:56 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi Jeremy,

Please see below some suggestions.

On 12/10/17 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>   drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 485 insertions(+)
>   create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;

Seeing the usage of pptt_ref to retrieve the subtable, would the 
following be a more accurate check?

	if (pptt_ref < sizeof(struct acpi_table_header))
		return NULL;

> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +

I think this can be simplified as:

	ref = *((u32 *)(node + 1) + resource);

> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;

Can cpu_node be defined inside the loop? It isn't used outside.

> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {

	while ((unsigned long) (entry + 1) < table_end) {

> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));

Can I suggest having two inline functions for this and the above function?

static inline unsigned long acpi_get_table_end(const struct 
acpi_table_header *);

static inline struct acpi_subtable_header *acpi_get_first_entry(const 
struct acpi_table_header *);

(Feel free to adapt the names of course)

> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {

Same as above -> (unsigned long) (entry + 1).


> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);


I also think it would be nicer to have an inline function for this:

static struct acpi_subtable_header *acpi_get_next_entry(const struct 
acpi_subtable_header *);


> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));

Why not combine the do...while loop and the pevious check in a simple 
while loop? The same condion should work as such for a while loop.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-13  9:56     ` Julien Thierry
  0 siblings, 0 replies; 104+ messages in thread
From: Julien Thierry @ 2017-10-13  9:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

Please see below some suggestions.

On 12/10/17 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>   drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 485 insertions(+)
>   create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;

Seeing the usage of pptt_ref to retrieve the subtable, would the 
following be a more accurate check?

	if (pptt_ref < sizeof(struct acpi_table_header))
		return NULL;

> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +

I think this can be simplified as:

	ref = *((u32 *)(node + 1) + resource);

> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;

Can cpu_node be defined inside the loop? It isn't used outside.

> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {

	while ((unsigned long) (entry + 1) < table_end) {

> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));

Can I suggest having two inline functions for this and the above function?

static inline unsigned long acpi_get_table_end(const struct 
acpi_table_header *);

static inline struct acpi_subtable_header *acpi_get_first_entry(const 
struct acpi_table_header *);

(Feel free to adapt the names of course)

> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {

Same as above -> (unsigned long) (entry + 1).


> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);


I also think it would be nicer to have an inline function for this:

static struct acpi_subtable_header *acpi_get_next_entry(const struct 
acpi_subtable_header *);


> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));

Why not combine the do...while loop and the pevious check in a simple 
while loop? The same condion should work as such for a while loop.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 0/7] Support PPTT for ARM64
  2017-10-12 19:48 ` Jeremy Linton
  (?)
@ 2017-10-13 11:08   ` John Garry
  -1 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-13 11:08 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Linuxarm

On 12/10/2017 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topology. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
>
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
> to the physical id as the remainder of the kernel expects.
>

Hi Jeremy,

Can you put this series on a public branch for convenience of review and 
test?

Also, what is your idea for supporting Type 2 ID structure?

Cheers,
John

> For example on juno:
> [root@mammon-juno-rh topology]# lstopo-no-graphics
>   Package L#0
>     L2 L#0 (1024KB)
>       L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>       L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>       L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>       L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>     L2 L#1 (2048KB)
>       L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>       L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>   HostBridge L#0
>     PCIBridge
>       PCIBridge
>         PCIBridge
>           PCI 1095:3132
>             Block(Disk) L#0 "sda"
>         PCIBridge
>           PCI 1002:68f9
>             GPU L#1 "renderD128"
>             GPU L#2 "card0"
>             GPU L#3 "controlD64"
>         PCIBridge
>           PCI 11ab:4380
>             Net L#4 "enp8s0"
>
> v2->v3:
>
> Remove valid bit check on leaf nodes. Now simply being a leaf node
>   is sufficient to verify the processor id against the ACPI
>   processor ids (gotten from MADT).
>
> Use the acpi processor for the "level 0" Id. This makes the /sys
>   visible core/thread ids more human readable if the firmware uses
>   small consecutive values for processor ids.
>
> Added PPTT to the list of injectable ACPI tables.
>
> Fix bug which kept the code from using the processor node as intended
>   in v2, caused by misuse of git rebase/fixup.
>
> v1->v2:
>
> The parser keys off the acpi_pptt_processor node to determine
>   unique cache's rather than the acpi_pptt_cache referenced by the
>   processor node. This allows PPTT tables which "share" cache nodes
>   across cpu nodes despite not being a shared cache.
>
> Normalize the socket, cluster and thread mapping so that they match
>   linux's traditional mapping for the physical id, and thread id.
>   Adding explicit scheduler knowledge of clusters (rather than just
>   their cache sharing attributes) is a subject for a future patch.
>
> Jeremy Linton (7):
>   ACPI/PPTT: Add Processor Properties Topology Table parsing
>   ACPI: Enable PPTT support on ARM64
>   drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
>     tables
>   Topology: Add cluster on die macros and arm64 decoding
>   arm64: Fixup users of topology_physical_package_id
>   arm64: topology: Enable ACPI/PPTT based CPU topology.
>   ACPI: Add PPTT to injectable table list
>
>  arch/arm64/Kconfig                |   1 +
>  arch/arm64/include/asm/topology.h |   4 +-
>  arch/arm64/kernel/cacheinfo.c     |  23 +-
>  arch/arm64/kernel/topology.c      |  62 ++++-
>  drivers/acpi/Makefile             |   1 +
>  drivers/acpi/arm64/Kconfig        |   3 +
>  drivers/acpi/pptt.c               | 486 ++++++++++++++++++++++++++++++++++++++
>  drivers/acpi/tables.c             |   3 +-
>  drivers/base/cacheinfo.c          |  17 +-
>  drivers/cpufreq/arm_big_little.c  |   2 +-
>  drivers/firmware/psci_checker.c   |   2 +-
>  include/linux/cacheinfo.h         |  11 +-
>  include/linux/topology.h          |   4 +
>  13 files changed, 599 insertions(+), 20 deletions(-)
>  create mode 100644 drivers/acpi/pptt.c
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 0/7] Support PPTT for ARM64
@ 2017-10-13 11:08   ` John Garry
  0 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-13 11:08 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Linuxarm

On 12/10/2017 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topology. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
>
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
> to the physical id as the remainder of the kernel expects.
>

Hi Jeremy,

Can you put this series on a public branch for convenience of review and 
test?

Also, what is your idea for supporting Type 2 ID structure?

Cheers,
John

> For example on juno:
> [root@mammon-juno-rh topology]# lstopo-no-graphics
>   Package L#0
>     L2 L#0 (1024KB)
>       L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>       L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>       L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>       L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>     L2 L#1 (2048KB)
>       L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>       L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>   HostBridge L#0
>     PCIBridge
>       PCIBridge
>         PCIBridge
>           PCI 1095:3132
>             Block(Disk) L#0 "sda"
>         PCIBridge
>           PCI 1002:68f9
>             GPU L#1 "renderD128"
>             GPU L#2 "card0"
>             GPU L#3 "controlD64"
>         PCIBridge
>           PCI 11ab:4380
>             Net L#4 "enp8s0"
>
> v2->v3:
>
> Remove valid bit check on leaf nodes. Now simply being a leaf node
>   is sufficient to verify the processor id against the ACPI
>   processor ids (gotten from MADT).
>
> Use the acpi processor for the "level 0" Id. This makes the /sys
>   visible core/thread ids more human readable if the firmware uses
>   small consecutive values for processor ids.
>
> Added PPTT to the list of injectable ACPI tables.
>
> Fix bug which kept the code from using the processor node as intended
>   in v2, caused by misuse of git rebase/fixup.
>
> v1->v2:
>
> The parser keys off the acpi_pptt_processor node to determine
>   unique cache's rather than the acpi_pptt_cache referenced by the
>   processor node. This allows PPTT tables which "share" cache nodes
>   across cpu nodes despite not being a shared cache.
>
> Normalize the socket, cluster and thread mapping so that they match
>   linux's traditional mapping for the physical id, and thread id.
>   Adding explicit scheduler knowledge of clusters (rather than just
>   their cache sharing attributes) is a subject for a future patch.
>
> Jeremy Linton (7):
>   ACPI/PPTT: Add Processor Properties Topology Table parsing
>   ACPI: Enable PPTT support on ARM64
>   drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
>     tables
>   Topology: Add cluster on die macros and arm64 decoding
>   arm64: Fixup users of topology_physical_package_id
>   arm64: topology: Enable ACPI/PPTT based CPU topology.
>   ACPI: Add PPTT to injectable table list
>
>  arch/arm64/Kconfig                |   1 +
>  arch/arm64/include/asm/topology.h |   4 +-
>  arch/arm64/kernel/cacheinfo.c     |  23 +-
>  arch/arm64/kernel/topology.c      |  62 ++++-
>  drivers/acpi/Makefile             |   1 +
>  drivers/acpi/arm64/Kconfig        |   3 +
>  drivers/acpi/pptt.c               | 486 ++++++++++++++++++++++++++++++++++++++
>  drivers/acpi/tables.c             |   3 +-
>  drivers/base/cacheinfo.c          |  17 +-
>  drivers/cpufreq/arm_big_little.c  |   2 +-
>  drivers/firmware/psci_checker.c   |   2 +-
>  include/linux/cacheinfo.h         |  11 +-
>  include/linux/topology.h          |   4 +
>  13 files changed, 599 insertions(+), 20 deletions(-)
>  create mode 100644 drivers/acpi/pptt.c
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 0/7] Support PPTT for ARM64
@ 2017-10-13 11:08   ` John Garry
  0 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-13 11:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/10/2017 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topology. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
>
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
> to the physical id as the remainder of the kernel expects.
>

Hi Jeremy,

Can you put this series on a public branch for convenience of review and 
test?

Also, what is your idea for supporting Type 2 ID structure?

Cheers,
John

> For example on juno:
> [root at mammon-juno-rh topology]# lstopo-no-graphics
>   Package L#0
>     L2 L#0 (1024KB)
>       L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>       L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>       L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>       L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>     L2 L#1 (2048KB)
>       L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>       L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>   HostBridge L#0
>     PCIBridge
>       PCIBridge
>         PCIBridge
>           PCI 1095:3132
>             Block(Disk) L#0 "sda"
>         PCIBridge
>           PCI 1002:68f9
>             GPU L#1 "renderD128"
>             GPU L#2 "card0"
>             GPU L#3 "controlD64"
>         PCIBridge
>           PCI 11ab:4380
>             Net L#4 "enp8s0"
>
> v2->v3:
>
> Remove valid bit check on leaf nodes. Now simply being a leaf node
>   is sufficient to verify the processor id against the ACPI
>   processor ids (gotten from MADT).
>
> Use the acpi processor for the "level 0" Id. This makes the /sys
>   visible core/thread ids more human readable if the firmware uses
>   small consecutive values for processor ids.
>
> Added PPTT to the list of injectable ACPI tables.
>
> Fix bug which kept the code from using the processor node as intended
>   in v2, caused by misuse of git rebase/fixup.
>
> v1->v2:
>
> The parser keys off the acpi_pptt_processor node to determine
>   unique cache's rather than the acpi_pptt_cache referenced by the
>   processor node. This allows PPTT tables which "share" cache nodes
>   across cpu nodes despite not being a shared cache.
>
> Normalize the socket, cluster and thread mapping so that they match
>   linux's traditional mapping for the physical id, and thread id.
>   Adding explicit scheduler knowledge of clusters (rather than just
>   their cache sharing attributes) is a subject for a future patch.
>
> Jeremy Linton (7):
>   ACPI/PPTT: Add Processor Properties Topology Table parsing
>   ACPI: Enable PPTT support on ARM64
>   drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
>     tables
>   Topology: Add cluster on die macros and arm64 decoding
>   arm64: Fixup users of topology_physical_package_id
>   arm64: topology: Enable ACPI/PPTT based CPU topology.
>   ACPI: Add PPTT to injectable table list
>
>  arch/arm64/Kconfig                |   1 +
>  arch/arm64/include/asm/topology.h |   4 +-
>  arch/arm64/kernel/cacheinfo.c     |  23 +-
>  arch/arm64/kernel/topology.c      |  62 ++++-
>  drivers/acpi/Makefile             |   1 +
>  drivers/acpi/arm64/Kconfig        |   3 +
>  drivers/acpi/pptt.c               | 486 ++++++++++++++++++++++++++++++++++++++
>  drivers/acpi/tables.c             |   3 +-
>  drivers/base/cacheinfo.c          |  17 +-
>  drivers/cpufreq/arm_big_little.c  |   2 +-
>  drivers/firmware/psci_checker.c   |   2 +-
>  include/linux/cacheinfo.h         |  11 +-
>  include/linux/topology.h          |   4 +
>  13 files changed, 599 insertions(+), 20 deletions(-)
>  create mode 100644 drivers/acpi/pptt.c
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-13 14:23     ` tn
  -1 siblings, 0 replies; 104+ messages in thread
From: tn @ 2017-10-13 14:23 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi Jeremy,

On 12.10.2017 21:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>   drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 485 insertions(+)
>   create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);

You can use ACPI_ADD_PTR() here.

> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);

ACPI_ADD_PTR()

> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));

ACPI_ADD_PTR()

> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));

ACPI_ADD_PTR()

> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}

For a better table content validation, this could be done at the 
beginning of the loop, like that:

if (WARN_TAINT(entry->length == 0, TAINT_FIRMWARE_WORKAROUND,
        "Invalid zero length subtable, bad PPTT table!\n"))
			break;


> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);

ACPI_ADD_PTR()

> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +

Based on ACPI spec 6.2:

> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)

Bits:3:2: Cache type:
0x0 Data
0x1 Instruction
0x2 or 0x3 Indicate a unified cache

> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)

Bits 1:0: Allocation type
0x0 - Read allocate
0x1 - Write allocate
0x2 or 0x03 indicate Read and Write allocate

BTW, why these are not part of ACPICA code (actbl1.h header) and have 
ACPI_PPTT prefixes?

> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);

Please align line to 80 characters at maximum.

> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}

The 'default' case can never happen, please remove dead code.

> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}

Same here if you fix bits definitions.

> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);
> +
> +		index++;
> +	}
> +}
> +
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);
> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> +		    cpu);
> +	return -ENOENT;
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later.
> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return status;
> +}
> +
> +/*
> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> + * This ID can then be used to group peers.
> + */
> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	int retval;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +	retval = topology_setup_acpi_cpu(table, cpu, level);
> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> +		 cpu, level, retval);
> +	acpi_put_table(table);
> +
> +	return retval;
> +}
> 

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-13 14:23     ` tn
  0 siblings, 0 replies; 104+ messages in thread
From: tn @ 2017-10-13 14:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 12.10.2017 21:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>   drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 485 insertions(+)
>   create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);

You can use ACPI_ADD_PTR() here.

> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);

ACPI_ADD_PTR()

> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));

ACPI_ADD_PTR()

> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));

ACPI_ADD_PTR()

> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}

For a better table content validation, this could be done at the 
beginning of the loop, like that:

if (WARN_TAINT(entry->length == 0, TAINT_FIRMWARE_WORKAROUND,
        "Invalid zero length subtable, bad PPTT table!\n"))
			break;


> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);

ACPI_ADD_PTR()

> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +

Based on ACPI spec 6.2:

> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)

Bits:3:2: Cache type:
0x0 Data
0x1 Instruction
0x2 or 0x3 Indicate a unified cache

> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)

Bits 1:0: Allocation type
0x0 - Read allocate
0x1 - Write allocate
0x2 or 0x03 indicate Read and Write allocate

BTW, why these are not part of ACPICA code (actbl1.h header) and have 
ACPI_PPTT prefixes?

> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);

Please align line to 80 characters at maximum.

> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}

The 'default' case can never happen, please remove dead code.

> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}

Same here if you fix bits definitions.

> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);
> +
> +		index++;
> +	}
> +}
> +
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);
> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> +		    cpu);
> +	return -ENOENT;
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later.
> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return status;
> +}
> +
> +/*
> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> + * This ID can then be used to group peers.
> + */
> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	int retval;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +	retval = topology_setup_acpi_cpu(table, cpu, level);
> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> +		 cpu, level, retval);
> +	acpi_put_table(table);
> +
> +	return retval;
> +}
> 

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
  2017-10-13  9:53     ` Hanjun Guo
@ 2017-10-13 17:51       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 17:51 UTC (permalink / raw)
  To: Hanjun Guo, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi,

On 10/13/2017 04:53 AM, Hanjun Guo wrote:
> Hi Jeremy,
> 
> On 2017/10/13 3:48, Jeremy Linton wrote:
>> Now that we have a PPTT parser, in preparation for its use
>> on arm64, lets build it.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/Kconfig         | 1 +
>>   drivers/acpi/Makefile      | 1 +
>>   drivers/acpi/arm64/Kconfig | 3 +++
>>   3 files changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 0df64a6a56d4..68c9d1289735 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -7,6 +7,7 @@ config ARM64
>>   	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>>   	select ACPI_MCFG if ACPI
>>   	select ACPI_SPCR_TABLE if ACPI
>> +	select ACPI_PPTT if ACPI
>>   	select ARCH_CLOCKSOURCE_DATA
>>   	select ARCH_HAS_DEBUG_VIRTUAL
>>   	select ARCH_HAS_DEVMEM_IS_ALLOWED
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 90265ab4437a..c92a0c937551 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>>   obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>>   obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>>   obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
>> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>>   
>>   # processor has its own "processor." module_param namespace
>>   processor-y			:= processor_driver.o
>> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
>> index 5a6f80fce0d6..74b855a669ea 100644
>> --- a/drivers/acpi/arm64/Kconfig
>> +++ b/drivers/acpi/arm64/Kconfig
>> @@ -7,3 +7,6 @@ config ACPI_IORT
>>   
>>   config ACPI_GTDT
>>   	bool
>> +
>> +config ACPI_PPTT
>> +	bool
> 
> Can this be located in drivers/acpi/Kconfig? then other
> platform can select ACPI_PPTT if they want.

It can be, but I've been resisting doing that because without any 
callers using it will do little but bloat the code of anyone that dares 
enable it.

So my assumption is that when the code to enable PPTT on x86 shows up 
the config option can be moved as well.

How about I meet you 1/2 way, and put it in the acpi/Kconfig but wrapped 
in the arm64 exclusive section?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-13 17:51       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 10/13/2017 04:53 AM, Hanjun Guo wrote:
> Hi Jeremy,
> 
> On 2017/10/13 3:48, Jeremy Linton wrote:
>> Now that we have a PPTT parser, in preparation for its use
>> on arm64, lets build it.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/Kconfig         | 1 +
>>   drivers/acpi/Makefile      | 1 +
>>   drivers/acpi/arm64/Kconfig | 3 +++
>>   3 files changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 0df64a6a56d4..68c9d1289735 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -7,6 +7,7 @@ config ARM64
>>   	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>>   	select ACPI_MCFG if ACPI
>>   	select ACPI_SPCR_TABLE if ACPI
>> +	select ACPI_PPTT if ACPI
>>   	select ARCH_CLOCKSOURCE_DATA
>>   	select ARCH_HAS_DEBUG_VIRTUAL
>>   	select ARCH_HAS_DEVMEM_IS_ALLOWED
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 90265ab4437a..c92a0c937551 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>>   obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>>   obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>>   obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
>> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>>   
>>   # processor has its own "processor." module_param namespace
>>   processor-y			:= processor_driver.o
>> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
>> index 5a6f80fce0d6..74b855a669ea 100644
>> --- a/drivers/acpi/arm64/Kconfig
>> +++ b/drivers/acpi/arm64/Kconfig
>> @@ -7,3 +7,6 @@ config ACPI_IORT
>>   
>>   config ACPI_GTDT
>>   	bool
>> +
>> +config ACPI_PPTT
>> +	bool
> 
> Can this be located in drivers/acpi/Kconfig? then other
> platform can select ACPI_PPTT if they want.

It can be, but I've been resisting doing that because without any 
callers using it will do little but bloat the code of anyone that dares 
enable it.

So my assumption is that when the code to enable PPTT on x86 shows up 
the config option can be moved as well.

How about I meet you 1/2 way, and put it in the acpi/Kconfig but wrapped 
in the arm64 exclusive section?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 0/7] Support PPTT for ARM64
  2017-10-13 11:08   ` John Garry
@ 2017-10-13 19:34     ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 19:34 UTC (permalink / raw)
  To: John Garry, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, Linuxarm

Hi,

On 10/13/2017 06:08 AM, John Garry wrote:
> On 12/10/2017 20:48, Jeremy Linton wrote:
>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>> used to describe the processor and cache topology. Ideally it is
>> used to extend/override information provided by the hardware, but
>> right now ARM64 is entirely dependent on firmware provided tables.
>>
>> This patch parses the table for the cache topology and CPU topology.
>> For the latter we also add an additional topology_cod_id() macro,
>> and a package_id for arm64. Initially the physical id will match
>> the cluster id, but we update users of the cluster to utilize
>> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
>> to the physical id as the remainder of the kernel expects.
>>
> 
> Hi Jeremy,
> 
> Can you put this series on a public branch for convenience of review and 
> test?

Let me see what I can do..

> 
> Also, what is your idea for supporting Type 2 ID structure?

I don't have any plans, as you can see the current patches ignore the ID 
nodes. It should be fairly easy to mine the information from the tables, 
but what parts are necessary or where to use them isn't clear to me.

Suggestions welcome.


> 
> Cheers,
> John
> 
>> For example on juno:
>> [root@mammon-juno-rh topology]# lstopo-no-graphics
>>   Package L#0
>>     L2 L#0 (1024KB)
>>       L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>       L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>       L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>       L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>     L2 L#1 (2048KB)
>>       L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>>       L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>>   HostBridge L#0
>>     PCIBridge
>>       PCIBridge
>>         PCIBridge
>>           PCI 1095:3132
>>             Block(Disk) L#0 "sda"
>>         PCIBridge
>>           PCI 1002:68f9
>>             GPU L#1 "renderD128"
>>             GPU L#2 "card0"
>>             GPU L#3 "controlD64"
>>         PCIBridge
>>           PCI 11ab:4380
>>             Net L#4 "enp8s0"
>>
>> v2->v3:
>>
>> Remove valid bit check on leaf nodes. Now simply being a leaf node
>>   is sufficient to verify the processor id against the ACPI
>>   processor ids (gotten from MADT).
>>
>> Use the acpi processor for the "level 0" Id. This makes the /sys
>>   visible core/thread ids more human readable if the firmware uses
>>   small consecutive values for processor ids.
>>
>> Added PPTT to the list of injectable ACPI tables.
>>
>> Fix bug which kept the code from using the processor node as intended
>>   in v2, caused by misuse of git rebase/fixup.
>>
>> v1->v2:
>>
>> The parser keys off the acpi_pptt_processor node to determine
>>   unique cache's rather than the acpi_pptt_cache referenced by the
>>   processor node. This allows PPTT tables which "share" cache nodes
>>   across cpu nodes despite not being a shared cache.
>>
>> Normalize the socket, cluster and thread mapping so that they match
>>   linux's traditional mapping for the physical id, and thread id.
>>   Adding explicit scheduler knowledge of clusters (rather than just
>>   their cache sharing attributes) is a subject for a future patch.
>>
>> Jeremy Linton (7):
>>   ACPI/PPTT: Add Processor Properties Topology Table parsing
>>   ACPI: Enable PPTT support on ARM64
>>   drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
>>     tables
>>   Topology: Add cluster on die macros and arm64 decoding
>>   arm64: Fixup users of topology_physical_package_id
>>   arm64: topology: Enable ACPI/PPTT based CPU topology.
>>   ACPI: Add PPTT to injectable table list
>>
>>  arch/arm64/Kconfig                |   1 +
>>  arch/arm64/include/asm/topology.h |   4 +-
>>  arch/arm64/kernel/cacheinfo.c     |  23 +-
>>  arch/arm64/kernel/topology.c      |  62 ++++-
>>  drivers/acpi/Makefile             |   1 +
>>  drivers/acpi/arm64/Kconfig        |   3 +
>>  drivers/acpi/pptt.c               | 486 
>> ++++++++++++++++++++++++++++++++++++++
>>  drivers/acpi/tables.c             |   3 +-
>>  drivers/base/cacheinfo.c          |  17 +-
>>  drivers/cpufreq/arm_big_little.c  |   2 +-
>>  drivers/firmware/psci_checker.c   |   2 +-
>>  include/linux/cacheinfo.h         |  11 +-
>>  include/linux/topology.h          |   4 +
>>  13 files changed, 599 insertions(+), 20 deletions(-)
>>  create mode 100644 drivers/acpi/pptt.c
>>
> 
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 0/7] Support PPTT for ARM64
@ 2017-10-13 19:34     ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 10/13/2017 06:08 AM, John Garry wrote:
> On 12/10/2017 20:48, Jeremy Linton wrote:
>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
>> used to describe the processor and cache topology. Ideally it is
>> used to extend/override information provided by the hardware, but
>> right now ARM64 is entirely dependent on firmware provided tables.
>>
>> This patch parses the table for the cache topology and CPU topology.
>> For the latter we also add an additional topology_cod_id() macro,
>> and a package_id for arm64. Initially the physical id will match
>> the cluster id, but we update users of the cluster to utilize
>> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
>> to the physical id as the remainder of the kernel expects.
>>
> 
> Hi Jeremy,
> 
> Can you put this series on a public branch for convenience of review and 
> test?

Let me see what I can do..

> 
> Also, what is your idea for supporting Type 2 ID structure?

I don't have any plans, as you can see the current patches ignore the ID 
nodes. It should be fairly easy to mine the information from the tables, 
but what parts are necessary or where to use them isn't clear to me.

Suggestions welcome.


> 
> Cheers,
> John
> 
>> For example on juno:
>> [root at mammon-juno-rh topology]# lstopo-no-graphics
>> ? Package L#0
>> ??? L2 L#0 (1024KB)
>> ????? L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>> ????? L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>> ????? L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>> ????? L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>> ??? L2 L#1 (2048KB)
>> ????? L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
>> ????? L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
>> ? HostBridge L#0
>> ??? PCIBridge
>> ????? PCIBridge
>> ??????? PCIBridge
>> ????????? PCI 1095:3132
>> ??????????? Block(Disk) L#0 "sda"
>> ??????? PCIBridge
>> ????????? PCI 1002:68f9
>> ??????????? GPU L#1 "renderD128"
>> ??????????? GPU L#2 "card0"
>> ??????????? GPU L#3 "controlD64"
>> ??????? PCIBridge
>> ????????? PCI 11ab:4380
>> ??????????? Net L#4 "enp8s0"
>>
>> v2->v3:
>>
>> Remove valid bit check on leaf nodes. Now simply being a leaf node
>> ? is sufficient to verify the processor id against the ACPI
>> ? processor ids (gotten from MADT).
>>
>> Use the acpi processor for the "level 0" Id. This makes the /sys
>> ? visible core/thread ids more human readable if the firmware uses
>> ? small consecutive values for processor ids.
>>
>> Added PPTT to the list of injectable ACPI tables.
>>
>> Fix bug which kept the code from using the processor node as intended
>> ? in v2, caused by misuse of git rebase/fixup.
>>
>> v1->v2:
>>
>> The parser keys off the acpi_pptt_processor node to determine
>> ? unique cache's rather than the acpi_pptt_cache referenced by the
>> ? processor node. This allows PPTT tables which "share" cache nodes
>> ? across cpu nodes despite not being a shared cache.
>>
>> Normalize the socket, cluster and thread mapping so that they match
>> ? linux's traditional mapping for the physical id, and thread id.
>> ? Adding explicit scheduler knowledge of clusters (rather than just
>> ? their cache sharing attributes) is a subject for a future patch.
>>
>> Jeremy Linton (7):
>> ? ACPI/PPTT: Add Processor Properties Topology Table parsing
>> ? ACPI: Enable PPTT support on ARM64
>> ? drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
>> ??? tables
>> ? Topology: Add cluster on die macros and arm64 decoding
>> ? arm64: Fixup users of topology_physical_package_id
>> ? arm64: topology: Enable ACPI/PPTT based CPU topology.
>> ? ACPI: Add PPTT to injectable table list
>>
>> ?arch/arm64/Kconfig??????????????? |?? 1 +
>> ?arch/arm64/include/asm/topology.h |?? 4 +-
>> ?arch/arm64/kernel/cacheinfo.c???? |? 23 +-
>> ?arch/arm64/kernel/topology.c????? |? 62 ++++-
>> ?drivers/acpi/Makefile???????????? |?? 1 +
>> ?drivers/acpi/arm64/Kconfig??????? |?? 3 +
>> ?drivers/acpi/pptt.c?????????????? | 486 
>> ++++++++++++++++++++++++++++++++++++++
>> ?drivers/acpi/tables.c???????????? |?? 3 +-
>> ?drivers/base/cacheinfo.c????????? |? 17 +-
>> ?drivers/cpufreq/arm_big_little.c? |?? 2 +-
>> ?drivers/firmware/psci_checker.c?? |?? 2 +-
>> ?include/linux/cacheinfo.h???????? |? 11 +-
>> ?include/linux/topology.h????????? |?? 4 +
>> ?13 files changed, 599 insertions(+), 20 deletions(-)
>> ?create mode 100644 drivers/acpi/pptt.c
>>
> 
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-13 14:23     ` tn
@ 2017-10-13 19:58       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 19:58 UTC (permalink / raw)
  To: tn, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi,

On 10/13/2017 09:23 AM, tn wrote:
> Hi Jeremy,
> 
> On 12.10.2017 21:48, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/pptt.c | 485 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 485 insertions(+)
>>   create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..c86715fed4a7
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,1 +1,485 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology 
>> Table (PPTT)
>> + * which is optionally used to describe the processor and cache 
>> topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +
>> +    /* there isn't a subtable at reference 0 */
>> +    if (!pptt_ref)
>> +        return NULL;
>> +
>> +    if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>> table_hdr->length)
>> +        return NULL;
>> +
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> 
> You can use ACPI_ADD_PTR() here.

Hmmm, that is a useful macro.


> 
>> +
>> +    if (pptt_ref + entry->length > table_hdr->length)
>> +        return NULL;
>> +
>> +    return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    return (struct acpi_pptt_processor 
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>> pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *node, int resource)
>> +{
>> +    u32 ref;
>> +
>> +    if (resource >= node->number_of_priv_resources)
>> +        return NULL;
>> +
>> +    ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +              sizeof(u32) * resource);
> 
> ACPI_ADD_PTR()
> 
>> +
>> +    return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +                int local_level,
>> +                struct acpi_subtable_header *res,
>> +                struct acpi_pptt_cache **found,
>> +                int level, int type)
>> +{
>> +    struct acpi_pptt_cache *cache;
>> +
>> +    if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +        return 0;
>> +
>> +    cache = (struct acpi_pptt_cache *) res;
>> +    while (cache) {
>> +        local_level++;
>> +
>> +        if ((local_level == level) &&
>> +            (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +            ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +            if (*found != NULL)
>> +                pr_err("Found duplicate cache level/type unable to 
>> determine uniqueness\n");
>> +
>> +            pr_debug("Found cache @ level %d\n", level);
>> +            *found = cache;
>> +            /*
>> +             * continue looking at this node's resource list
>> +             * to verify that we don't find a duplicate
>> +             * cache node.
>> +             */
>> +        }
>> +        cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +    }
>> +    return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, 
>> and then
>> + * for each cache node, count how many levels exist below (logically 
>> above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *cpu_node,
>> +    int *starting_level, int level, int type)
>> +{
>> +    struct acpi_subtable_header *res;
>> +    int number_of_levels = *starting_level;
>> +    int resource = 0;
>> +    struct acpi_pptt_cache *ret = NULL;
>> +    int local_level;
>> +
>> +    /* walk down from the processor node */
>> +    while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>> resource))) {
>> +        resource++;
>> +
>> +        local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +                           res, &ret, level, type);
>> +        /*
>> +         * we are looking for the max depth. Since its potentially
>> +         * possible for a given node to have resources with differing
>> +         * depths verify that the depth we have found is the largest.
>> +         */
>> +        if (number_of_levels < local_level)
>> +            number_of_levels = local_level;
>> +    }
>> +    if (number_of_levels > *starting_level)
>> +        *starting_level = number_of_levels;
>> +
>> +    return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it 
>> and count
>> + * how many levels exist solely for it, and then walk up each level 
>> until we hit
>> + * the root node (ignore the package level because it may be possible 
>> to have
>> + * caches that exist across packages). Count the number of cache 
>> levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +                 struct acpi_pptt_processor *cpu_node)
>> +{
>> +    int total_levels = 0;
>> +
>> +    do {
>> +        acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +        cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +    } while (cpu_node);
>> +
>> +    return total_levels;
>> +}
>> +
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +                   struct acpi_pptt_processor *node)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +    unsigned long table_end;
>> +    u32 node_entry;
>> +    struct acpi_pptt_processor *cpu_node;
>> +
>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>> +    node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +                        sizeof(struct acpi_table_pptt));
> 
> ACPI_ADD_PTR()
> 
>> +
>> +    while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +            (cpu_node->parent == node_entry))
>> +            return 0;
>> +        entry = (struct acpi_subtable_header *)((u8 *)entry + 
>> entry->length);
>> +    }
>> +    return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +    struct acpi_table_header *table_hdr,
>> +    u32 acpi_cpu_id)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +    unsigned long table_end;
>> +    struct acpi_pptt_processor *cpu_node;
>> +
>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +                        sizeof(struct acpi_table_pptt));
> 
> ACPI_ADD_PTR()
> 
>> +
>> +    /* find the processor structure associated with this cpuid */
>> +    while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +            acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +            pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +                 acpi_cpu_id, cpu_node->acpi_processor_id);
>> +            if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +                /* found the correct entry */
>> +                pr_debug("match found!\n");
>> +                return (struct acpi_pptt_processor *)entry;
>> +            }
>> +        }
>> +
>> +        if (entry->length == 0) {
>> +            pr_err("Invalid zero length subtable\n");
>> +            break;
>> +        }
> 
> For a better table content validation, this could be done at the 
> beginning of the loop, like that:
> 
> if (WARN_TAINT(entry->length == 0, TAINT_FIRMWARE_WORKAROUND,
>         "Invalid zero length subtable, bad PPTT table!\n"))
>              break;
> 
> 
>> +        entry = (struct acpi_subtable_header *)
>> +            ((u8 *)entry + entry->length);
> 
> ACPI_ADD_PTR()
> 
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *cpu,
>> +    int level)
>> +{
>> +    struct acpi_pptt_processor *prev_node;
>> +
>> +    while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
>> +        pr_debug("level %d\n", level);
>> +        prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +        if (prev_node == NULL)
>> +            break;
>> +        cpu = prev_node;
>> +        level--;
>> +    }
>> +    return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>> acpi_cpu_id)
>> +{
>> +    int number_of_levels = 0;
>> +    struct acpi_pptt_processor *cpu;
>> +
>> +    cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +    if (cpu)
>> +        number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +    return number_of_levels;
>> +}
>> +
> 
> Based on ACPI spec 6.2:
> 
>> +#define ACPI_6_2_CACHE_TYPE_DATA              (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR              (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED              (1<<3)
> 
> Bits:3:2: Cache type:
> 0x0 Data
> 0x1 Instruction
> 0x2 or 0x3 Indicate a unified cache

Originally I was trying to do something more clever than the switch 
(given the less than optimal bit definitions), but the result wasn't as 
clear as the switch, so I just plugged that in but forgot about the 3rd 
case.

> 
>> +#define ACPI_6_2_CACHE_POLICY_WB              (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT              (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE              (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE              (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE              (0x02)
> 
> Bits 1:0: Allocation type
> 0x0 - Read allocate
> 0x1 - Write allocate
> 0x2 or 0x03 indicate Read and Write allocate
> 
> BTW, why these are not part of ACPICA code (actbl1.h header) and have 
> ACPI_PPTT prefixes?

Well I guess they probably should be the only question is how one goes 
about defining the duplicates..

AKA:

#define ACPI_PPTT_CACHE_RW_ALLOCATE              (0x02)
#define ACPI_PPTT_CACHE_RW_ALLOCATE_ALT          (0x03)

> 
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +    switch (type) {
>> +    case CACHE_TYPE_DATA:
>> +        pr_debug("Looking for data cache\n");
>> +        return ACPI_6_2_CACHE_TYPE_DATA;
>> +    case CACHE_TYPE_INST:
>> +        pr_debug("Looking for instruction cache\n");
>> +        return ACPI_6_2_CACHE_TYPE_INSTR;
>> +    default:
>> +        pr_debug("Unknown cache type, assume unified\n");
>> +    case CACHE_TYPE_UNIFIED:
>> +        pr_debug("Looking for unified cache\n");
>> +        return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +    }
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given 
>> CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +    struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +    enum cache_type type, unsigned int level,
>> +    struct acpi_pptt_processor **node)
>> +{
>> +    int total_levels = 0;
>> +    struct acpi_pptt_cache *found = NULL;
>> +    struct acpi_pptt_processor *cpu_node;
>> +    u8 acpi_type = acpi_cache_type(type);
>> +
>> +    pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +         acpi_cpu_id, level, acpi_type);
>> +
>> +    cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +    if (!cpu_node)
>> +        return NULL;
>> +
>> +    do {
>> +        found = acpi_find_cache_level(table_hdr, cpu_node, 
>> &total_levels, level, acpi_type);
> 
> Please align line to 80 characters at maximum.

ok,

> 
>> +        *node = cpu_node;
>> +        cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +    } while ((cpu_node) && (!found));
>> +
>> +    return found;
>> +}
>> +
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +    u32 acpi_cpu_id;
>> +    struct acpi_table_header *table;
>> +    int number_of_levels = 0;
>> +    acpi_status status;
>> +
>> +    pr_debug("Cache Setup find last level cpu=%d\n", cpu);
>> +
>> +    acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +    status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +    if (ACPI_FAILURE(status)) {
>> +        pr_err_once("No PPTT table found, cache topology may be 
>> inaccurate\n");
>> +    } else {
>> +        number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
>> +        acpi_put_table(table);
>> +    }
>> +    pr_debug("Cache Setup find last level level=%d\n", 
>> number_of_levels);
>> +
>> +    return number_of_levels;
>> +}
>> +
>> +/*
>> + * The ACPI spec implies that the fields in the cache structures are 
>> used to
>> + * extend and correct the information probed from the hardware. In 
>> the case
>> + * of arm64 the CCSIDR probing has been removed because it might be 
>> incorrect.
>> + */
>> +static void update_cache_properties(struct cacheinfo *this_leaf,
>> +                    struct acpi_pptt_cache *found_cache,
>> +                    struct acpi_pptt_processor *cpu_node)
>> +{
>> +    if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> +        this_leaf->size = found_cache->size;
>> +    if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> +        this_leaf->coherency_line_size = found_cache->line_size;
>> +    if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> +        this_leaf->number_of_sets = found_cache->number_of_sets;
>> +    if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> +        this_leaf->ways_of_associativity = found_cache->associativity;
>> +    if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
>> +        switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +        case ACPI_6_2_CACHE_POLICY_WT:
>> +            this_leaf->attributes = CACHE_WRITE_THROUGH;
>> +            break;
>> +        case ACPI_6_2_CACHE_POLICY_WB:
>> +            this_leaf->attributes = CACHE_WRITE_BACK;
>> +            break;
>> +        default:
>> +            pr_err("Unknown ACPI cache policy %d\n",
>> +                  found_cache->attributes & 
>> ACPI_PPTT_MASK_WRITE_POLICY);
>> +        }
> 
> The 'default' case can never happen, please remove dead code.

Ok,

> 
>> +    if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
>> +        switch (found_cache->attributes & 
>> ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +        case ACPI_6_2_CACHE_READ_ALLOCATE:
>> +            this_leaf->attributes |= CACHE_READ_ALLOCATE;
>> +            break;
>> +        case ACPI_6_2_CACHE_WRITE_ALLOCATE:
>> +            this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
>> +            break;
>> +        case ACPI_6_2_CACHE_RW_ALLOCATE:
>> +            this_leaf->attributes |=
>> +                CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
>> +            break;
>> +        default:
>> +            pr_err("Unknown ACPI cache allocation policy %d\n",
>> +               found_cache->attributes & 
>> ACPI_PPTT_MASK_ALLOCATION_TYPE);
>> +        }
> 
> Same here if you fix bits definitions.

Sure,

> 
>> +}
>> +
>> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>> +                 unsigned int cpu)
>> +{
>> +    struct acpi_pptt_cache *found_cache;
>> +    struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> +    u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +    struct cacheinfo *this_leaf;
>> +    unsigned int index = 0;
>> +    struct acpi_pptt_processor *cpu_node = NULL;
>> +
>> +    while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
>> +        this_leaf = this_cpu_ci->info_list + index;
>> +        found_cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +                           this_leaf->type,
>> +                           this_leaf->level,
>> +                           &cpu_node);
>> +        pr_debug("found = %p %p\n", found_cache, cpu_node);
>> +        if (found_cache)
>> +            update_cache_properties(this_leaf,
>> +                        found_cache,
>> +                        cpu_node);
>> +
>> +        index++;
>> +    }
>> +}
>> +
>> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
>> +                    unsigned int cpu, int level)
>> +{
>> +    struct acpi_pptt_processor *cpu_node;
>> +    u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +
>> +    cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +    if (cpu_node) {
>> +        cpu_node = acpi_find_processor_package_id(table, cpu_node, 
>> level);
>> +        /* Only the first level has a guaranteed id */
>> +        if (level == 0)
>> +            return cpu_node->acpi_processor_id;
>> +        return (int)((u8 *)cpu_node - (u8 *)table);
>> +    }
>> +    pr_err_once("PPTT table found, but unable to locate core for %d\n",
>> +            cpu);
>> +    return -ENOENT;
>> +}
>> +
>> +/*
>> + * simply assign a ACPI cache entry to each known CPU cache entry
>> + * determining which entries are shared is done later.
>> + */
>> +int cache_setup_acpi(unsigned int cpu)
>> +{
>> +    struct acpi_table_header *table;
>> +    acpi_status status;
>> +
>> +    pr_debug("Cache Setup ACPI cpu %d\n", cpu);
>> +
>> +    status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +    if (ACPI_FAILURE(status)) {
>> +        pr_err_once("No PPTT table found, cache topology may be 
>> inaccurate\n");
>> +        return -ENOENT;
>> +    }
>> +
>> +    cache_setup_acpi_cpu(table, cpu);
>> +    acpi_put_table(table);
>> +
>> +    return status;
>> +}
>> +
>> +/*
>> + * Determine a topology unique ID for each 
>> thread/core/cluster/socket/etc.
>> + * This ID can then be used to group peers.
>> + */
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
>> +{
>> +    struct acpi_table_header *table;
>> +    acpi_status status;
>> +    int retval;
>> +
>> +    status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +    if (ACPI_FAILURE(status)) {
>> +        pr_err_once("No PPTT table found, cpu topology may be 
>> inaccurate\n");
>> +        return -ENOENT;
>> +    }
>> +    retval = topology_setup_acpi_cpu(table, cpu, level);
>> +    pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
>> +         cpu, level, retval);
>> +    acpi_put_table(table);
>> +
>> +    return retval;
>> +}
>>
> 
> Thanks,
> Tomasz
> 

Thanks for taking the time to look at this.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-13 19:58       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 19:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 10/13/2017 09:23 AM, tn wrote:
> Hi Jeremy,
> 
> On 12.10.2017 21:48, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>> ? drivers/acpi/pptt.c | 485 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ? 1 file changed, 485 insertions(+)
>> ? create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..c86715fed4a7
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,1 +1,485 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.? See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology 
>> Table (PPTT)
>> + * which is optionally used to describe the processor and cache 
>> topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +
>> +??? /* there isn't a subtable at reference 0 */
>> +??? if (!pptt_ref)
>> +??????? return NULL;
>> +
>> +??? if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>> table_hdr->length)
>> +??????? return NULL;
>> +
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> 
> You can use ACPI_ADD_PTR() here.

Hmmm, that is a useful macro.


> 
>> +
>> +??? if (pptt_ref + entry->length > table_hdr->length)
>> +??????? return NULL;
>> +
>> +??? return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? return (struct acpi_pptt_processor 
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>> pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *node, int resource)
>> +{
>> +??? u32 ref;
>> +
>> +??? if (resource >= node->number_of_priv_resources)
>> +??????? return NULL;
>> +
>> +??? ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +????????????? sizeof(u32) * resource);
> 
> ACPI_ADD_PTR()
> 
>> +
>> +??? return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +??????????????? int local_level,
>> +??????????????? struct acpi_subtable_header *res,
>> +??????????????? struct acpi_pptt_cache **found,
>> +??????????????? int level, int type)
>> +{
>> +??? struct acpi_pptt_cache *cache;
>> +
>> +??? if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +??????? return 0;
>> +
>> +??? cache = (struct acpi_pptt_cache *) res;
>> +??? while (cache) {
>> +??????? local_level++;
>> +
>> +??????? if ((local_level == level) &&
>> +??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +??????????? if (*found != NULL)
>> +??????????????? pr_err("Found duplicate cache level/type unable to 
>> determine uniqueness\n");
>> +
>> +??????????? pr_debug("Found cache @ level %d\n", level);
>> +??????????? *found = cache;
>> +??????????? /*
>> +???????????? * continue looking at this node's resource list
>> +???????????? * to verify that we don't find a duplicate
>> +???????????? * cache node.
>> +???????????? */
>> +??????? }
>> +??????? cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +??? }
>> +??? return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, 
>> and then
>> + * for each cache node, count how many levels exist below (logically 
>> above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *cpu_node,
>> +??? int *starting_level, int level, int type)
>> +{
>> +??? struct acpi_subtable_header *res;
>> +??? int number_of_levels = *starting_level;
>> +??? int resource = 0;
>> +??? struct acpi_pptt_cache *ret = NULL;
>> +??? int local_level;
>> +
>> +??? /* walk down from the processor node */
>> +??? while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>> resource))) {
>> +??????? resource++;
>> +
>> +??????? local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +?????????????????????????? res, &ret, level, type);
>> +??????? /*
>> +???????? * we are looking for the max depth. Since its potentially
>> +???????? * possible for a given node to have resources with differing
>> +???????? * depths verify that the depth we have found is the largest.
>> +???????? */
>> +??????? if (number_of_levels < local_level)
>> +??????????? number_of_levels = local_level;
>> +??? }
>> +??? if (number_of_levels > *starting_level)
>> +??????? *starting_level = number_of_levels;
>> +
>> +??? return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it 
>> and count
>> + * how many levels exist solely for it, and then walk up each level 
>> until we hit
>> + * the root node (ignore the package level because it may be possible 
>> to have
>> + * caches that exist across packages). Count the number of cache 
>> levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +???????????????? struct acpi_pptt_processor *cpu_node)
>> +{
>> +??? int total_levels = 0;
>> +
>> +??? do {
>> +??????? acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +??? } while (cpu_node);
>> +
>> +??? return total_levels;
>> +}
>> +
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +?????????????????? struct acpi_pptt_processor *node)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +??? unsigned long table_end;
>> +??? u32 node_entry;
>> +??? struct acpi_pptt_processor *cpu_node;
>> +
>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>> +??? node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +??????????????????????? sizeof(struct acpi_table_pptt));
> 
> ACPI_ADD_PTR()
> 
>> +
>> +??? while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +??????????? (cpu_node->parent == node_entry))
>> +??????????? return 0;
>> +??????? entry = (struct acpi_subtable_header *)((u8 *)entry + 
>> entry->length);
>> +??? }
>> +??? return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +??? struct acpi_table_header *table_hdr,
>> +??? u32 acpi_cpu_id)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +??? unsigned long table_end;
>> +??? struct acpi_pptt_processor *cpu_node;
>> +
>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +??????????????????????? sizeof(struct acpi_table_pptt));
> 
> ACPI_ADD_PTR()
> 
>> +
>> +??? /* find the processor structure associated with this cpuid */
>> +??? while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +??????????? acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +??????????? pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +???????????????? acpi_cpu_id, cpu_node->acpi_processor_id);
>> +??????????? if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +??????????????? /* found the correct entry */
>> +??????????????? pr_debug("match found!\n");
>> +??????????????? return (struct acpi_pptt_processor *)entry;
>> +??????????? }
>> +??????? }
>> +
>> +??????? if (entry->length == 0) {
>> +??????????? pr_err("Invalid zero length subtable\n");
>> +??????????? break;
>> +??????? }
> 
> For a better table content validation, this could be done at the 
> beginning of the loop, like that:
> 
> if (WARN_TAINT(entry->length == 0, TAINT_FIRMWARE_WORKAROUND,
>  ?????? "Invalid zero length subtable, bad PPTT table!\n"))
>  ??????????? break;
> 
> 
>> +??????? entry = (struct acpi_subtable_header *)
>> +??????????? ((u8 *)entry + entry->length);
> 
> ACPI_ADD_PTR()
> 
>> +??? }
>> +
>> +??? return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *cpu,
>> +??? int level)
>> +{
>> +??? struct acpi_pptt_processor *prev_node;
>> +
>> +??? while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
>> +??????? pr_debug("level %d\n", level);
>> +??????? prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +??????? if (prev_node == NULL)
>> +??????????? break;
>> +??????? cpu = prev_node;
>> +??????? level--;
>> +??? }
>> +??? return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>> acpi_cpu_id)
>> +{
>> +??? int number_of_levels = 0;
>> +??? struct acpi_pptt_processor *cpu;
>> +
>> +??? cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +??? if (cpu)
>> +??????? number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +??? return number_of_levels;
>> +}
>> +
> 
> Based on ACPI spec 6.2:
> 
>> +#define ACPI_6_2_CACHE_TYPE_DATA????????????? (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR????????????? (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED????????????? (1<<3)
> 
> Bits:3:2: Cache type:
> 0x0 Data
> 0x1 Instruction
> 0x2 or 0x3 Indicate a unified cache

Originally I was trying to do something more clever than the switch 
(given the less than optimal bit definitions), but the result wasn't as 
clear as the switch, so I just plugged that in but forgot about the 3rd 
case.

> 
>> +#define ACPI_6_2_CACHE_POLICY_WB????????????? (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT????????????? (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE????????????? (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE????????????? (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE????????????? (0x02)
> 
> Bits 1:0: Allocation type
> 0x0 - Read allocate
> 0x1 - Write allocate
> 0x2 or 0x03 indicate Read and Write allocate
> 
> BTW, why these are not part of ACPICA code (actbl1.h header) and have 
> ACPI_PPTT prefixes?

Well I guess they probably should be the only question is how one goes 
about defining the duplicates..

AKA:

#define ACPI_PPTT_CACHE_RW_ALLOCATE              (0x02)
#define ACPI_PPTT_CACHE_RW_ALLOCATE_ALT          (0x03)

> 
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +??? switch (type) {
>> +??? case CACHE_TYPE_DATA:
>> +??????? pr_debug("Looking for data cache\n");
>> +??????? return ACPI_6_2_CACHE_TYPE_DATA;
>> +??? case CACHE_TYPE_INST:
>> +??????? pr_debug("Looking for instruction cache\n");
>> +??????? return ACPI_6_2_CACHE_TYPE_INSTR;
>> +??? default:
>> +??????? pr_debug("Unknown cache type, assume unified\n");
>> +??? case CACHE_TYPE_UNIFIED:
>> +??????? pr_debug("Looking for unified cache\n");
>> +??????? return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +??? }
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given 
>> CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +??? struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +??? enum cache_type type, unsigned int level,
>> +??? struct acpi_pptt_processor **node)
>> +{
>> +??? int total_levels = 0;
>> +??? struct acpi_pptt_cache *found = NULL;
>> +??? struct acpi_pptt_processor *cpu_node;
>> +??? u8 acpi_type = acpi_cache_type(type);
>> +
>> +??? pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +???????? acpi_cpu_id, level, acpi_type);
>> +
>> +??? cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +??? if (!cpu_node)
>> +??????? return NULL;
>> +
>> +??? do {
>> +??????? found = acpi_find_cache_level(table_hdr, cpu_node, 
>> &total_levels, level, acpi_type);
> 
> Please align line to 80 characters at maximum.

ok,

> 
>> +??????? *node = cpu_node;
>> +??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +??? } while ((cpu_node) && (!found));
>> +
>> +??? return found;
>> +}
>> +
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +??? u32 acpi_cpu_id;
>> +??? struct acpi_table_header *table;
>> +??? int number_of_levels = 0;
>> +??? acpi_status status;
>> +
>> +??? pr_debug("Cache Setup find last level cpu=%d\n", cpu);
>> +
>> +??? acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +??? status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +??? if (ACPI_FAILURE(status)) {
>> +??????? pr_err_once("No PPTT table found, cache topology may be 
>> inaccurate\n");
>> +??? } else {
>> +??????? number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
>> +??????? acpi_put_table(table);
>> +??? }
>> +??? pr_debug("Cache Setup find last level level=%d\n", 
>> number_of_levels);
>> +
>> +??? return number_of_levels;
>> +}
>> +
>> +/*
>> + * The ACPI spec implies that the fields in the cache structures are 
>> used to
>> + * extend and correct the information probed from the hardware. In 
>> the case
>> + * of arm64 the CCSIDR probing has been removed because it might be 
>> incorrect.
>> + */
>> +static void update_cache_properties(struct cacheinfo *this_leaf,
>> +??????????????????? struct acpi_pptt_cache *found_cache,
>> +??????????????????? struct acpi_pptt_processor *cpu_node)
>> +{
>> +??? if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> +??????? this_leaf->size = found_cache->size;
>> +??? if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> +??????? this_leaf->coherency_line_size = found_cache->line_size;
>> +??? if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> +??????? this_leaf->number_of_sets = found_cache->number_of_sets;
>> +??? if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> +??????? this_leaf->ways_of_associativity = found_cache->associativity;
>> +??? if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
>> +??????? switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +??????? case ACPI_6_2_CACHE_POLICY_WT:
>> +??????????? this_leaf->attributes = CACHE_WRITE_THROUGH;
>> +??????????? break;
>> +??????? case ACPI_6_2_CACHE_POLICY_WB:
>> +??????????? this_leaf->attributes = CACHE_WRITE_BACK;
>> +??????????? break;
>> +??????? default:
>> +??????????? pr_err("Unknown ACPI cache policy %d\n",
>> +????????????????? found_cache->attributes & 
>> ACPI_PPTT_MASK_WRITE_POLICY);
>> +??????? }
> 
> The 'default' case can never happen, please remove dead code.

Ok,

> 
>> +??? if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
>> +??????? switch (found_cache->attributes & 
>> ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +??????? case ACPI_6_2_CACHE_READ_ALLOCATE:
>> +??????????? this_leaf->attributes |= CACHE_READ_ALLOCATE;
>> +??????????? break;
>> +??????? case ACPI_6_2_CACHE_WRITE_ALLOCATE:
>> +??????????? this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
>> +??????????? break;
>> +??????? case ACPI_6_2_CACHE_RW_ALLOCATE:
>> +??????????? this_leaf->attributes |=
>> +??????????????? CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
>> +??????????? break;
>> +??????? default:
>> +??????????? pr_err("Unknown ACPI cache allocation policy %d\n",
>> +?????????????? found_cache->attributes & 
>> ACPI_PPTT_MASK_ALLOCATION_TYPE);
>> +??????? }
> 
> Same here if you fix bits definitions.

Sure,

> 
>> +}
>> +
>> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>> +???????????????? unsigned int cpu)
>> +{
>> +??? struct acpi_pptt_cache *found_cache;
>> +??? struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> +??? u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +??? struct cacheinfo *this_leaf;
>> +??? unsigned int index = 0;
>> +??? struct acpi_pptt_processor *cpu_node = NULL;
>> +
>> +??? while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
>> +??????? this_leaf = this_cpu_ci->info_list + index;
>> +??????? found_cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +?????????????????????????? this_leaf->type,
>> +?????????????????????????? this_leaf->level,
>> +?????????????????????????? &cpu_node);
>> +??????? pr_debug("found = %p %p\n", found_cache, cpu_node);
>> +??????? if (found_cache)
>> +??????????? update_cache_properties(this_leaf,
>> +??????????????????????? found_cache,
>> +??????????????????????? cpu_node);
>> +
>> +??????? index++;
>> +??? }
>> +}
>> +
>> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
>> +??????????????????? unsigned int cpu, int level)
>> +{
>> +??? struct acpi_pptt_processor *cpu_node;
>> +??? u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
>> +
>> +??? cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +??? if (cpu_node) {
>> +??????? cpu_node = acpi_find_processor_package_id(table, cpu_node, 
>> level);
>> +??????? /* Only the first level has a guaranteed id */
>> +??????? if (level == 0)
>> +??????????? return cpu_node->acpi_processor_id;
>> +??????? return (int)((u8 *)cpu_node - (u8 *)table);
>> +??? }
>> +??? pr_err_once("PPTT table found, but unable to locate core for %d\n",
>> +??????????? cpu);
>> +??? return -ENOENT;
>> +}
>> +
>> +/*
>> + * simply assign a ACPI cache entry to each known CPU cache entry
>> + * determining which entries are shared is done later.
>> + */
>> +int cache_setup_acpi(unsigned int cpu)
>> +{
>> +??? struct acpi_table_header *table;
>> +??? acpi_status status;
>> +
>> +??? pr_debug("Cache Setup ACPI cpu %d\n", cpu);
>> +
>> +??? status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +??? if (ACPI_FAILURE(status)) {
>> +??????? pr_err_once("No PPTT table found, cache topology may be 
>> inaccurate\n");
>> +??????? return -ENOENT;
>> +??? }
>> +
>> +??? cache_setup_acpi_cpu(table, cpu);
>> +??? acpi_put_table(table);
>> +
>> +??? return status;
>> +}
>> +
>> +/*
>> + * Determine a topology unique ID for each 
>> thread/core/cluster/socket/etc.
>> + * This ID can then be used to group peers.
>> + */
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
>> +{
>> +??? struct acpi_table_header *table;
>> +??? acpi_status status;
>> +??? int retval;
>> +
>> +??? status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +??? if (ACPI_FAILURE(status)) {
>> +??????? pr_err_once("No PPTT table found, cpu topology may be 
>> inaccurate\n");
>> +??????? return -ENOENT;
>> +??? }
>> +??? retval = topology_setup_acpi_cpu(table, cpu, level);
>> +??? pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
>> +???????? cpu, level, retval);
>> +??? acpi_put_table(table);
>> +
>> +??? return retval;
>> +}
>>
> 
> Thanks,
> Tomasz
> 

Thanks for taking the time to look at this.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-13  9:56     ` Julien Thierry
@ 2017-10-13 22:41       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 22:41 UTC (permalink / raw)
  To: Julien Thierry, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi,


Thanks for spending the time to take a look at this.


On 10/13/2017 04:56 AM, Julien Thierry wrote:
> Hi Jeremy,
> 
> Please see below some suggestions.
> 
> On 12/10/17 20:48, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/pptt.c | 485 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 485 insertions(+)
>>   create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..c86715fed4a7
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,1 +1,485 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology 
>> Table (PPTT)
>> + * which is optionally used to describe the processor and cache 
>> topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +
>> +    /* there isn't a subtable at reference 0 */
>> +    if (!pptt_ref)
>> +        return NULL;
> 
> Seeing the usage of pptt_ref to retrieve the subtable, would the 
> following be a more accurate check?
> 
>      if (pptt_ref < sizeof(struct acpi_table_header))
>          return NULL;

Yes, that makes it better match the comment, and I guess tightens up the 
sanity checking. The original intention was just to catch null 
references that were encoded as parent/etc fields.

> 
>> +
>> +    if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>> table_hdr->length)
>> +        return NULL;
>> +
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
>> +
>> +    if (pptt_ref + entry->length > table_hdr->length)
>> +        return NULL;
>> +
>> +    return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    return (struct acpi_pptt_processor 
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>> pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *node, int resource)
>> +{
>> +    u32 ref;
>> +
>> +    if (resource >= node->number_of_priv_resources)
>> +        return NULL;
>> +
>> +    ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +              sizeof(u32) * resource);
>> +
> 
> I think this can be simplified as:
> 
>      ref = *((u32 *)(node + 1) + resource);

I think Thomasz had a better suggestion with regard to ACPI_ADD_PTR() 
for avoiding the explicit pointer math, although it may not be that 
clean either because it doesn't fit 1:1 with the macro at the moment, 
maybe i'm doing it wrong...

> 
>> +    return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +                int local_level,
>> +                struct acpi_subtable_header *res,
>> +                struct acpi_pptt_cache **found,
>> +                int level, int type)
>> +{
>> +    struct acpi_pptt_cache *cache;
>> +
>> +    if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +        return 0;
>> +
>> +    cache = (struct acpi_pptt_cache *) res;
>> +    while (cache) {
>> +        local_level++;
>> +
>> +        if ((local_level == level) &&
>> +            (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +            ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +            if (*found != NULL)
>> +                pr_err("Found duplicate cache level/type unable to 
>> determine uniqueness\n");
>> +
>> +            pr_debug("Found cache @ level %d\n", level);
>> +            *found = cache;
>> +            /*
>> +             * continue looking at this node's resource list
>> +             * to verify that we don't find a duplicate
>> +             * cache node.
>> +             */
>> +        }
>> +        cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +    }
>> +    return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, 
>> and then
>> + * for each cache node, count how many levels exist below (logically 
>> above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *cpu_node,
>> +    int *starting_level, int level, int type)
>> +{
>> +    struct acpi_subtable_header *res;
>> +    int number_of_levels = *starting_level;
>> +    int resource = 0;
>> +    struct acpi_pptt_cache *ret = NULL;
>> +    int local_level;
>> +
>> +    /* walk down from the processor node */
>> +    while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>> resource))) {
>> +        resource++;
>> +
>> +        local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +                           res, &ret, level, type);
>> +        /*
>> +         * we are looking for the max depth. Since its potentially
>> +         * possible for a given node to have resources with differing
>> +         * depths verify that the depth we have found is the largest.
>> +         */
>> +        if (number_of_levels < local_level)
>> +            number_of_levels = local_level;
>> +    }
>> +    if (number_of_levels > *starting_level)
>> +        *starting_level = number_of_levels;
>> +
>> +    return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it 
>> and count
>> + * how many levels exist solely for it, and then walk up each level 
>> until we hit
>> + * the root node (ignore the package level because it may be possible 
>> to have
>> + * caches that exist across packages). Count the number of cache 
>> levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +                 struct acpi_pptt_processor *cpu_node)
>> +{
>> +    int total_levels = 0;
>> +
>> +    do {
>> +        acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +        cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +    } while (cpu_node);
>> +
>> +    return total_levels;
>> +}
>> +
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +                   struct acpi_pptt_processor *node)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +    unsigned long table_end;
>> +    u32 node_entry;
>> +    struct acpi_pptt_processor *cpu_node;
> 
> Can cpu_node be defined inside the loop? It isn't used outside.

Yes, but i'm not sure that is the style of the acpi code, if you look at 
scan.c, acpi_ipmi.c maybe others, they seem to be following the "all 
definitions at the top of the block" form despite having a few loops 
with variables that are only used in the block.

> 
>> +
>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>> +    node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +                        sizeof(struct acpi_table_pptt));
>> +
>> +    while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
> 
>      while ((unsigned long) (entry + 1) < table_end) {
> 
>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +            (cpu_node->parent == node_entry))
>> +            return 0;
>> +        entry = (struct acpi_subtable_header *)((u8 *)entry + 
>> entry->length);
>> +    }
>> +    return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +    struct acpi_table_header *table_hdr,
>> +    u32 acpi_cpu_id)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +    unsigned long table_end;
>> +    struct acpi_pptt_processor *cpu_node;
>> +
>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +                        sizeof(struct acpi_table_pptt));
> 
> Can I suggest having two inline functions for this and the above function?
> 
> static inline unsigned long acpi_get_table_end(const struct 
> acpi_table_header *);

Which is a bit overkill for an add, let me think about this one.

> 
> static inline struct acpi_subtable_header *acpi_get_first_entry(const 
> struct acpi_table_header *);

This one and the below are really just degenerate cases of 
fetch_pptt_subtable().

> 
> (Feel free to adapt the names of course)
> 
>> +
>> +    /* find the processor structure associated with this cpuid */
>> +    while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
> 
> Same as above -> (unsigned long) (entry + 1).
> 
> 
>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +            acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +            pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +                 acpi_cpu_id, cpu_node->acpi_processor_id);
>> +            if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +                /* found the correct entry */
>> +                pr_debug("match found!\n");
>> +                return (struct acpi_pptt_processor *)entry;
>> +            }
>> +        }
>> +
>> +        if (entry->length == 0) {
>> +            pr_err("Invalid zero length subtable\n");
>> +            break;
>> +        }
>> +        entry = (struct acpi_subtable_header *)
>> +            ((u8 *)entry + entry->length);
> 
> 
> I also think it would be nicer to have an inline function for this:
> 
> static struct acpi_subtable_header *acpi_get_next_entry(const struct 
> acpi_subtable_header *);

Which is just a degenerate case of fetch_pptt_subtable() in both cases 
after having had the macro in actypes.h pointed out, I think most of 
this manipulation is going to just get buried behind those macros.


> 
> 
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *cpu,
>> +    int level)
>> +{
>> +    struct acpi_pptt_processor *prev_node;
>> +
>> +    while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
>> +        pr_debug("level %d\n", level);
>> +        prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +        if (prev_node == NULL)
>> +            break;
>> +        cpu = prev_node;
>> +        level--;
>> +    }
>> +    return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>> acpi_cpu_id)
>> +{
>> +    int number_of_levels = 0;
>> +    struct acpi_pptt_processor *cpu;
>> +
>> +    cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +    if (cpu)
>> +        number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +    return number_of_levels;
>> +}
>> +
>> +#define ACPI_6_2_CACHE_TYPE_DATA              (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR              (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED              (1<<3)
>> +#define ACPI_6_2_CACHE_POLICY_WB              (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT              (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE              (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE              (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE              (0x02)
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +    switch (type) {
>> +    case CACHE_TYPE_DATA:
>> +        pr_debug("Looking for data cache\n");
>> +        return ACPI_6_2_CACHE_TYPE_DATA;
>> +    case CACHE_TYPE_INST:
>> +        pr_debug("Looking for instruction cache\n");
>> +        return ACPI_6_2_CACHE_TYPE_INSTR;
>> +    default:
>> +        pr_debug("Unknown cache type, assume unified\n");
>> +    case CACHE_TYPE_UNIFIED:
>> +        pr_debug("Looking for unified cache\n");
>> +        return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +    }
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given 
>> CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +    struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +    enum cache_type type, unsigned int level,
>> +    struct acpi_pptt_processor **node)
>> +{
>> +    int total_levels = 0;
>> +    struct acpi_pptt_cache *found = NULL;
>> +    struct acpi_pptt_processor *cpu_node;
>> +    u8 acpi_type = acpi_cache_type(type);
>> +
>> +    pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +         acpi_cpu_id, level, acpi_type);
>> +
>> +    cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +    if (!cpu_node)
>> +        return NULL;
>> +
>> +    do {
>> +        found = acpi_find_cache_level(table_hdr, cpu_node, 
>> &total_levels, level, acpi_type);
>> +        *node = cpu_node;
>> +        cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +    } while ((cpu_node) && (!found));
> 
> Why not combine the do...while loop and the pevious check in a simple 
> while loop? The same condion should work as such for a while loop.

Ok, sure...

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-13 22:41       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-13 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


Thanks for spending the time to take a look at this.


On 10/13/2017 04:56 AM, Julien Thierry wrote:
> Hi Jeremy,
> 
> Please see below some suggestions.
> 
> On 12/10/17 20:48, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>> ? drivers/acpi/pptt.c | 485 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ? 1 file changed, 485 insertions(+)
>> ? create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..c86715fed4a7
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,1 +1,485 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.? See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology 
>> Table (PPTT)
>> + * which is optionally used to describe the processor and cache 
>> topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +
>> +??? /* there isn't a subtable at reference 0 */
>> +??? if (!pptt_ref)
>> +??????? return NULL;
> 
> Seeing the usage of pptt_ref to retrieve the subtable, would the 
> following be a more accurate check?
> 
>  ????if (pptt_ref < sizeof(struct acpi_table_header))
>  ??????? return NULL;

Yes, that makes it better match the comment, and I guess tightens up the 
sanity checking. The original intention was just to catch null 
references that were encoded as parent/etc fields.

> 
>> +
>> +??? if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>> table_hdr->length)
>> +??????? return NULL;
>> +
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
>> +
>> +??? if (pptt_ref + entry->length > table_hdr->length)
>> +??????? return NULL;
>> +
>> +??? return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? return (struct acpi_pptt_processor 
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>> pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *node, int resource)
>> +{
>> +??? u32 ref;
>> +
>> +??? if (resource >= node->number_of_priv_resources)
>> +??????? return NULL;
>> +
>> +??? ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +????????????? sizeof(u32) * resource);
>> +
> 
> I think this can be simplified as:
> 
>  ????ref = *((u32 *)(node + 1) + resource);

I think Thomasz had a better suggestion with regard to ACPI_ADD_PTR() 
for avoiding the explicit pointer math, although it may not be that 
clean either because it doesn't fit 1:1 with the macro at the moment, 
maybe i'm doing it wrong...

> 
>> +??? return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +??????????????? int local_level,
>> +??????????????? struct acpi_subtable_header *res,
>> +??????????????? struct acpi_pptt_cache **found,
>> +??????????????? int level, int type)
>> +{
>> +??? struct acpi_pptt_cache *cache;
>> +
>> +??? if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +??????? return 0;
>> +
>> +??? cache = (struct acpi_pptt_cache *) res;
>> +??? while (cache) {
>> +??????? local_level++;
>> +
>> +??????? if ((local_level == level) &&
>> +??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +??????????? if (*found != NULL)
>> +??????????????? pr_err("Found duplicate cache level/type unable to 
>> determine uniqueness\n");
>> +
>> +??????????? pr_debug("Found cache @ level %d\n", level);
>> +??????????? *found = cache;
>> +??????????? /*
>> +???????????? * continue looking at this node's resource list
>> +???????????? * to verify that we don't find a duplicate
>> +???????????? * cache node.
>> +???????????? */
>> +??????? }
>> +??????? cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +??? }
>> +??? return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, 
>> and then
>> + * for each cache node, count how many levels exist below (logically 
>> above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *cpu_node,
>> +??? int *starting_level, int level, int type)
>> +{
>> +??? struct acpi_subtable_header *res;
>> +??? int number_of_levels = *starting_level;
>> +??? int resource = 0;
>> +??? struct acpi_pptt_cache *ret = NULL;
>> +??? int local_level;
>> +
>> +??? /* walk down from the processor node */
>> +??? while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>> resource))) {
>> +??????? resource++;
>> +
>> +??????? local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +?????????????????????????? res, &ret, level, type);
>> +??????? /*
>> +???????? * we are looking for the max depth. Since its potentially
>> +???????? * possible for a given node to have resources with differing
>> +???????? * depths verify that the depth we have found is the largest.
>> +???????? */
>> +??????? if (number_of_levels < local_level)
>> +??????????? number_of_levels = local_level;
>> +??? }
>> +??? if (number_of_levels > *starting_level)
>> +??????? *starting_level = number_of_levels;
>> +
>> +??? return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it 
>> and count
>> + * how many levels exist solely for it, and then walk up each level 
>> until we hit
>> + * the root node (ignore the package level because it may be possible 
>> to have
>> + * caches that exist across packages). Count the number of cache 
>> levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +???????????????? struct acpi_pptt_processor *cpu_node)
>> +{
>> +??? int total_levels = 0;
>> +
>> +??? do {
>> +??????? acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +??? } while (cpu_node);
>> +
>> +??? return total_levels;
>> +}
>> +
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +?????????????????? struct acpi_pptt_processor *node)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +??? unsigned long table_end;
>> +??? u32 node_entry;
>> +??? struct acpi_pptt_processor *cpu_node;
> 
> Can cpu_node be defined inside the loop? It isn't used outside.

Yes, but i'm not sure that is the style of the acpi code, if you look at 
scan.c, acpi_ipmi.c maybe others, they seem to be following the "all 
definitions at the top of the block" form despite having a few loops 
with variables that are only used in the block.

> 
>> +
>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>> +??? node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +??????????????????????? sizeof(struct acpi_table_pptt));
>> +
>> +??? while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
> 
>  ????while ((unsigned long) (entry + 1) < table_end) {
> 
>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +??????????? (cpu_node->parent == node_entry))
>> +??????????? return 0;
>> +??????? entry = (struct acpi_subtable_header *)((u8 *)entry + 
>> entry->length);
>> +??? }
>> +??? return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +??? struct acpi_table_header *table_hdr,
>> +??? u32 acpi_cpu_id)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +??? unsigned long table_end;
>> +??? struct acpi_pptt_processor *cpu_node;
>> +
>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +??????????????????????? sizeof(struct acpi_table_pptt));
> 
> Can I suggest having two inline functions for this and the above function?
> 
> static inline unsigned long acpi_get_table_end(const struct 
> acpi_table_header *);

Which is a bit overkill for an add, let me think about this one.

> 
> static inline struct acpi_subtable_header *acpi_get_first_entry(const 
> struct acpi_table_header *);

This one and the below are really just degenerate cases of 
fetch_pptt_subtable().

> 
> (Feel free to adapt the names of course)
> 
>> +
>> +??? /* find the processor structure associated with this cpuid */
>> +??? while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
> 
> Same as above -> (unsigned long) (entry + 1).
> 
> 
>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +??????????? acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +??????????? pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +???????????????? acpi_cpu_id, cpu_node->acpi_processor_id);
>> +??????????? if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +??????????????? /* found the correct entry */
>> +??????????????? pr_debug("match found!\n");
>> +??????????????? return (struct acpi_pptt_processor *)entry;
>> +??????????? }
>> +??????? }
>> +
>> +??????? if (entry->length == 0) {
>> +??????????? pr_err("Invalid zero length subtable\n");
>> +??????????? break;
>> +??????? }
>> +??????? entry = (struct acpi_subtable_header *)
>> +??????????? ((u8 *)entry + entry->length);
> 
> 
> I also think it would be nicer to have an inline function for this:
> 
> static struct acpi_subtable_header *acpi_get_next_entry(const struct 
> acpi_subtable_header *);

Which is just a degenerate case of fetch_pptt_subtable() in both cases 
after having had the macro in actypes.h pointed out, I think most of 
this manipulation is going to just get buried behind those macros.


> 
> 
>> +??? }
>> +
>> +??? return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *cpu,
>> +??? int level)
>> +{
>> +??? struct acpi_pptt_processor *prev_node;
>> +
>> +??? while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
>> +??????? pr_debug("level %d\n", level);
>> +??????? prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +??????? if (prev_node == NULL)
>> +??????????? break;
>> +??????? cpu = prev_node;
>> +??????? level--;
>> +??? }
>> +??? return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>> acpi_cpu_id)
>> +{
>> +??? int number_of_levels = 0;
>> +??? struct acpi_pptt_processor *cpu;
>> +
>> +??? cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +??? if (cpu)
>> +??????? number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +??? return number_of_levels;
>> +}
>> +
>> +#define ACPI_6_2_CACHE_TYPE_DATA????????????? (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR????????????? (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED????????????? (1<<3)
>> +#define ACPI_6_2_CACHE_POLICY_WB????????????? (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT????????????? (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE????????????? (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE????????????? (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE????????????? (0x02)
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +??? switch (type) {
>> +??? case CACHE_TYPE_DATA:
>> +??????? pr_debug("Looking for data cache\n");
>> +??????? return ACPI_6_2_CACHE_TYPE_DATA;
>> +??? case CACHE_TYPE_INST:
>> +??????? pr_debug("Looking for instruction cache\n");
>> +??????? return ACPI_6_2_CACHE_TYPE_INSTR;
>> +??? default:
>> +??????? pr_debug("Unknown cache type, assume unified\n");
>> +??? case CACHE_TYPE_UNIFIED:
>> +??????? pr_debug("Looking for unified cache\n");
>> +??????? return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +??? }
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given 
>> CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +??? struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +??? enum cache_type type, unsigned int level,
>> +??? struct acpi_pptt_processor **node)
>> +{
>> +??? int total_levels = 0;
>> +??? struct acpi_pptt_cache *found = NULL;
>> +??? struct acpi_pptt_processor *cpu_node;
>> +??? u8 acpi_type = acpi_cache_type(type);
>> +
>> +??? pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +???????? acpi_cpu_id, level, acpi_type);
>> +
>> +??? cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +??? if (!cpu_node)
>> +??????? return NULL;
>> +
>> +??? do {
>> +??????? found = acpi_find_cache_level(table_hdr, cpu_node, 
>> &total_levels, level, acpi_type);
>> +??????? *node = cpu_node;
>> +??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +??? } while ((cpu_node) && (!found));
> 
> Why not combine the do...while loop and the pevious check in a simple 
> while loop? The same condion should work as such for a while loop.

Ok, sure...

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-12 19:48   ` Jeremy Linton
  (?)
@ 2017-10-16 14:24     ` John Garry
  -1 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-16 14:24 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel,
	Linuxarm

On 12/10/2017 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
>
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
>
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
>

As already commented, there are many lines over 80 characters.

And so far I only really looked at cpu topology part.

> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 485 insertions(+)
>  create mode 100644 drivers/acpi/pptt.c
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk

/s/given/Given/

> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;

please remove whitespace before res

> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +

This is the first time looking at your implementation of the PPTT 
driver. Has it been mentioned before that it is rather inefficient to 
re-parse the table for every cpu in the system? Actually within the cpu 
parsing loop we call acpi_pptt_leaf_node(), which does another table 
parsing loop.

However this is simple.

I do know the version from wangxiongfeng had a kmalloc per node, which 
would also be somewhat inefficient.

But I worry that this implementation does not scale with larger numbers 
of CPUs.

> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");

Do we really need to add 2 debug messages for case of checking for and 
finding a valid match?

> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(

One would assume from the name that this function returns an integer 
value, that being an index for the package for that cpu

> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,

Do you think that "cpu_node" would be a better name?

> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);

that's not such a useful message and I can imagine it creates so many prints

> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);

these prints (and others) probably should be verbose level or removed

> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");

is this really an error?

> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);

I think that this could be better worded

> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}
> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {

cpu does not change, so, for efficiency, can you use 
this_cpu_ci->num_leaves?

> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);

I am not sure how useful printing pointers is to the user, even if NULL. 
Therse prints are too verbose.

> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);
> +
> +		index++;
> +	}
> +}
> +
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{

It's not clear what is suppossed to be returned from this function, if 
anything, since it's job is seemingly to "setup"

> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);

Sorry, but I just don't get this. As I understand, our intention is to 
find the core/cluster/package index in the system for a given cpu, right?

If so, how is the distance between the table base and cpu level's node 
the same as the system index? I would say that this value is unique, but 
are we expecting sequential system indexing per level?

> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",

the code seems to intermix terms "core" and "cpu" - is this intentional?

> +		    cpu);
> +	return -ENOENT;
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later.
> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);

Please don't leave whitespace after "cpu"

> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return status;
> +}
> +
> +/*
> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> + * This ID can then be used to group peers.
> + */
> +int setup_acpi_cpu_topology(unsigned int cpu, int level)

I think that you should add a function description to explain what cpu 
and level are.

This function does no setup either. I think that doing a setup would be 
doing something which has a persistent result. Instead, this function 
gets the cpu topology index for a certain hierarchy level.

> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	int retval;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");

As before, is this really an error?

> +		return -ENOENT;
> +	}
> +	retval = topology_setup_acpi_cpu(table, cpu, level);
> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> +		 cpu, level, retval);
> +	acpi_put_table(table);
> +
> +	return retval;
> +}
>

Thanks,
John

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-16 14:24     ` John Garry
  0 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-16 14:24 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel,
	Linuxarm

On 12/10/2017 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
>
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
>
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
>

As already commented, there are many lines over 80 characters.

And so far I only really looked at cpu topology part.

> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 485 insertions(+)
>  create mode 100644 drivers/acpi/pptt.c
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk

/s/given/Given/

> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;

please remove whitespace before res

> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +

This is the first time looking at your implementation of the PPTT 
driver. Has it been mentioned before that it is rather inefficient to 
re-parse the table for every cpu in the system? Actually within the cpu 
parsing loop we call acpi_pptt_leaf_node(), which does another table 
parsing loop.

However this is simple.

I do know the version from wangxiongfeng had a kmalloc per node, which 
would also be somewhat inefficient.

But I worry that this implementation does not scale with larger numbers 
of CPUs.

> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");

Do we really need to add 2 debug messages for case of checking for and 
finding a valid match?

> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(

One would assume from the name that this function returns an integer 
value, that being an index for the package for that cpu

> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,

Do you think that "cpu_node" would be a better name?

> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);

that's not such a useful message and I can imagine it creates so many prints

> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);

these prints (and others) probably should be verbose level or removed

> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");

is this really an error?

> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);

I think that this could be better worded

> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}
> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {

cpu does not change, so, for efficiency, can you use 
this_cpu_ci->num_leaves?

> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);

I am not sure how useful printing pointers is to the user, even if NULL. 
Therse prints are too verbose.

> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);
> +
> +		index++;
> +	}
> +}
> +
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{

It's not clear what is suppossed to be returned from this function, if 
anything, since it's job is seemingly to "setup"

> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);

Sorry, but I just don't get this. As I understand, our intention is to 
find the core/cluster/package index in the system for a given cpu, right?

If so, how is the distance between the table base and cpu level's node 
the same as the system index? I would say that this value is unique, but 
are we expecting sequential system indexing per level?

> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",

the code seems to intermix terms "core" and "cpu" - is this intentional?

> +		    cpu);
> +	return -ENOENT;
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later.
> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);

Please don't leave whitespace after "cpu"

> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return status;
> +}
> +
> +/*
> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> + * This ID can then be used to group peers.
> + */
> +int setup_acpi_cpu_topology(unsigned int cpu, int level)

I think that you should add a function description to explain what cpu 
and level are.

This function does no setup either. I think that doing a setup would be 
doing something which has a persistent result. Instead, this function 
gets the cpu topology index for a certain hierarchy level.

> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	int retval;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");

As before, is this really an error?

> +		return -ENOENT;
> +	}
> +	retval = topology_setup_acpi_cpu(table, cpu, level);
> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> +		 cpu, level, retval);
> +	acpi_put_table(table);
> +
> +	return retval;
> +}
>

Thanks,
John

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-16 14:24     ` John Garry
  0 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-16 14:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/10/2017 20:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
>
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
>
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
>

As already commented, there are many lines over 80 characters.

And so far I only really looked at cpu topology part.

> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 485 insertions(+)
>  create mode 100644 drivers/acpi/pptt.c
>
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk

/s/given/Given/

> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;

please remove whitespace before res

> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +

This is the first time looking at your implementation of the PPTT 
driver. Has it been mentioned before that it is rather inefficient to 
re-parse the table for every cpu in the system? Actually within the cpu 
parsing loop we call acpi_pptt_leaf_node(), which does another table 
parsing loop.

However this is simple.

I do know the version from wangxiongfeng had a kmalloc per node, which 
would also be somewhat inefficient.

But I worry that this implementation does not scale with larger numbers 
of CPUs.

> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");

Do we really need to add 2 debug messages for case of checking for and 
finding a valid match?

> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(

One would assume from the name that this function returns an integer 
value, that being an index for the package for that cpu

> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,

Do you think that "cpu_node" would be a better name?

> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);

that's not such a useful message and I can imagine it creates so many prints

> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);

these prints (and others) probably should be verbose level or removed

> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");

is this really an error?

> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);

I think that this could be better worded

> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}
> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {

cpu does not change, so, for efficiency, can you use 
this_cpu_ci->num_leaves?

> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);

I am not sure how useful printing pointers is to the user, even if NULL. 
Therse prints are too verbose.

> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);
> +
> +		index++;
> +	}
> +}
> +
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{

It's not clear what is suppossed to be returned from this function, if 
anything, since it's job is seemingly to "setup"

> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> +
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);

Sorry, but I just don't get this. As I understand, our intention is to 
find the core/cluster/package index in the system for a given cpu, right?

If so, how is the distance between the table base and cpu level's node 
the same as the system index? I would say that this value is unique, but 
are we expecting sequential system indexing per level?

> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",

the code seems to intermix terms "core" and "cpu" - is this intentional?

> +		    cpu);
> +	return -ENOENT;
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later.
> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);

Please don't leave whitespace after "cpu"

> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return status;
> +}
> +
> +/*
> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> + * This ID can then be used to group peers.
> + */
> +int setup_acpi_cpu_topology(unsigned int cpu, int level)

I think that you should add a function description to explain what cpu 
and level are.

This function does no setup either. I think that doing a setup would be 
doing something which has a persistent result. Instead, this function 
gets the cpu topology index for a certain hierarchy level.

> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	int retval;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");

As before, is this really an error?

> +		return -ENOENT;
> +	}
> +	retval = topology_setup_acpi_cpu(table, cpu, level);
> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> +		 cpu, level, retval);
> +	acpi_put_table(table);
> +
> +	return retval;
> +}
>

Thanks,
John

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-17 13:25     ` Tomasz Nowicki
  -1 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-17 13:25 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi Jeremy,

I did second round of review and have some more comments, please see below:

On 12.10.2017 21:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>   drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 485 insertions(+)
>   create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {

Attributes have to be shifted:

(cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2

> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)

The function name can be more descriptive. How about:

acpi_count_cache_level() ?

> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}

It is hard to follow what acpi_find_cache_level() and 
acpi_pptt_walk_cache() really do. It is because they are trying to do 
too many things at the same time. IMO, splitting acpi_find_cache_level() 
logic to:
1. counting the cache levels (max depth)
2. finding the specific cache node
makes sense.

Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().

Here are my suggestions:


static struct acpi_pptt_cache *acpi_pptt_cache_type_level(
	struct acpi_table_header *table_hdr,
	struct acpi_subtable_header *res,
	int *local_level,
	int level, int type)
{
	struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;

	if (res->type != ACPI_PPTT_TYPE_CACHE)
		return NULL;

	while (cache) {
		if ((*local_level == level) &&
		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2 == type)) {

			pr_debug("Found cache @ level %d\n", level);
			return cache;
		}
		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
		(*local_level)++;
	}
	return NULL;
}

static struct acpi_pptt_cache *_acpi_find_cache_node(
	struct acpi_table_header *table_hdr,
	struct acpi_pptt_processor *cpu_node,
	int *local_level, int level, int type)
{
	struct acpi_subtable_header *res;
	struct acpi_pptt_cache *cache_tmp, *cache = NULL;
	int resource = 0;

	/* walk down from the processor node */
	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {

		cache_tmp = acpi_pptt_cache_type_level(table_hdr, res,
						       local_level, level, type);
		if (cache_tmp) {
			if (cache)
				pr_err("Found duplicate cache level/type unable to determine 
uniqueness\n");

			cache = cache_tmp;
		}
		resource++;
	}
	return cache;
}

/* find the ACPI node describing the cache type/level for the given CPU */
static struct acpi_pptt_cache *acpi_find_cache_node(
	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
	enum cache_type type, unsigned int level,
	struct acpi_pptt_processor **node)
{
	int total_levels = 0;
	struct acpi_pptt_cache *found = NULL;
	struct acpi_pptt_processor *cpu_node;
	u8 acpi_type = acpi_cache_type(type);

	pr_debug("Looking for CPU %d's level %d cache type %d\n",
		 acpi_cpu_id, level, acpi_type);

	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
	if (!cpu_node)
		return NULL;

	do {
		found = _acpi_find_cache_node(table_hdr, cpu_node,
					      &total_levels, level, acpi_type);
		*node = cpu_node;
		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
	} while ((cpu_node) && (!found));

	return found;
}

static int acpi_pptt_cache_level(struct acpi_table_header *table_hdr,
				struct acpi_subtable_header *res)
{
	struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
	int local_level = 1;

	if (res->type != ACPI_PPTT_TYPE_CACHE)
		return 0;

	while ((cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache)))
		local_level++;
	return local_level;
}

static int _acpi_count_cache_level(
	struct acpi_table_header *table_hdr,
	struct acpi_pptt_processor *cpu_node)
{
	struct acpi_subtable_header *res;
	int levels = 0, resource = 0, number_of_levels = 0;

	/* walk down from the processor node */
	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
		levels = acpi_pptt_cache_level(table_hdr, res);

		/*
		 * we are looking for the max depth. Since its potentially
		 * possible for a given node to have resources with differing
		 * depths verify that the depth we have found is the largest.
		 */
		if (levels > number_of_levels)
			number_of_levels = levels;

		resource++;
	}
	return number_of_levels;
}

static int acpi_count_cache_level(struct acpi_table_header *table_hdr,
				  u32 acpi_cpu_id)
{
	int total_levels = 0;
	struct acpi_pptt_processor *cpu_node;

	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
	while (cpu_node) {
		total_levels += _acpi_count_cache_level(table_hdr, cpu_node);
		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
	}

	return total_levels;
}


Did not compile the code so I may have missed somthing.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-17 13:25     ` Tomasz Nowicki
  0 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-17 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

I did second round of review and have some more comments, please see below:

On 12.10.2017 21:48, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>   drivers/acpi/pptt.c | 485 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 485 insertions(+)
>   create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..c86715fed4a7
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,1 +1,485 @@
> +/*
> + * Copyright (C) 2017, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (!pptt_ref)
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
> +		      sizeof(u32) * resource);
> +
> +	return fetch_pptt_subtable(table_hdr, ref);
> +}
> +
> +/*
> + * given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {

Attributes have to be shifted:

(cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2

> +			if (*found != NULL)
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from the processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");
> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)

The function name can be more descriptive. How about:

acpi_count_cache_level() ?

> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}

It is hard to follow what acpi_find_cache_level() and 
acpi_pptt_walk_cache() really do. It is because they are trying to do 
too many things at the same time. IMO, splitting acpi_find_cache_level() 
logic to:
1. counting the cache levels (max depth)
2. finding the specific cache node
makes sense.

Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().

Here are my suggestions:


static struct acpi_pptt_cache *acpi_pptt_cache_type_level(
	struct acpi_table_header *table_hdr,
	struct acpi_subtable_header *res,
	int *local_level,
	int level, int type)
{
	struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;

	if (res->type != ACPI_PPTT_TYPE_CACHE)
		return NULL;

	while (cache) {
		if ((*local_level == level) &&
		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2 == type)) {

			pr_debug("Found cache @ level %d\n", level);
			return cache;
		}
		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
		(*local_level)++;
	}
	return NULL;
}

static struct acpi_pptt_cache *_acpi_find_cache_node(
	struct acpi_table_header *table_hdr,
	struct acpi_pptt_processor *cpu_node,
	int *local_level, int level, int type)
{
	struct acpi_subtable_header *res;
	struct acpi_pptt_cache *cache_tmp, *cache = NULL;
	int resource = 0;

	/* walk down from the processor node */
	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {

		cache_tmp = acpi_pptt_cache_type_level(table_hdr, res,
						       local_level, level, type);
		if (cache_tmp) {
			if (cache)
				pr_err("Found duplicate cache level/type unable to determine 
uniqueness\n");

			cache = cache_tmp;
		}
		resource++;
	}
	return cache;
}

/* find the ACPI node describing the cache type/level for the given CPU */
static struct acpi_pptt_cache *acpi_find_cache_node(
	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
	enum cache_type type, unsigned int level,
	struct acpi_pptt_processor **node)
{
	int total_levels = 0;
	struct acpi_pptt_cache *found = NULL;
	struct acpi_pptt_processor *cpu_node;
	u8 acpi_type = acpi_cache_type(type);

	pr_debug("Looking for CPU %d's level %d cache type %d\n",
		 acpi_cpu_id, level, acpi_type);

	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
	if (!cpu_node)
		return NULL;

	do {
		found = _acpi_find_cache_node(table_hdr, cpu_node,
					      &total_levels, level, acpi_type);
		*node = cpu_node;
		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
	} while ((cpu_node) && (!found));

	return found;
}

static int acpi_pptt_cache_level(struct acpi_table_header *table_hdr,
				struct acpi_subtable_header *res)
{
	struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
	int local_level = 1;

	if (res->type != ACPI_PPTT_TYPE_CACHE)
		return 0;

	while ((cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache)))
		local_level++;
	return local_level;
}

static int _acpi_count_cache_level(
	struct acpi_table_header *table_hdr,
	struct acpi_pptt_processor *cpu_node)
{
	struct acpi_subtable_header *res;
	int levels = 0, resource = 0, number_of_levels = 0;

	/* walk down from the processor node */
	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
		levels = acpi_pptt_cache_level(table_hdr, res);

		/*
		 * we are looking for the max depth. Since its potentially
		 * possible for a given node to have resources with differing
		 * depths verify that the depth we have found is the largest.
		 */
		if (levels > number_of_levels)
			number_of_levels = levels;

		resource++;
	}
	return number_of_levels;
}

static int acpi_count_cache_level(struct acpi_table_header *table_hdr,
				  u32 acpi_cpu_id)
{
	int total_levels = 0;
	struct acpi_pptt_processor *cpu_node;

	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
	while (cpu_node) {
		total_levels += _acpi_count_cache_level(table_hdr, cpu_node);
		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
	}

	return total_levels;
}


Did not compile the code so I may have missed somthing.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-17 13:25     ` Tomasz Nowicki
@ 2017-10-17 15:22       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-17 15:22 UTC (permalink / raw)
  To: Tomasz Nowicki, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi,

On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
> Hi Jeremy,
> 
> I did second round of review and have some more comments, please see below:
> 
> On 12.10.2017 21:48, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/pptt.c | 485 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 485 insertions(+)
>>   create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..c86715fed4a7
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,1 +1,485 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology 
>> Table (PPTT)
>> + * which is optionally used to describe the processor and cache 
>> topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +
>> +    /* there isn't a subtable at reference 0 */
>> +    if (!pptt_ref)
>> +        return NULL;
>> +
>> +    if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>> table_hdr->length)
>> +        return NULL;
>> +
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
>> +
>> +    if (pptt_ref + entry->length > table_hdr->length)
>> +        return NULL;
>> +
>> +    return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    return (struct acpi_pptt_processor 
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +    return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>> pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *node, int resource)
>> +{
>> +    u32 ref;
>> +
>> +    if (resource >= node->number_of_priv_resources)
>> +        return NULL;
>> +
>> +    ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +              sizeof(u32) * resource);
>> +
>> +    return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +                int local_level,
>> +                struct acpi_subtable_header *res,
>> +                struct acpi_pptt_cache **found,
>> +                int level, int type)
>> +{
>> +    struct acpi_pptt_cache *cache;
>> +
>> +    if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +        return 0;
>> +
>> +    cache = (struct acpi_pptt_cache *) res;
>> +    while (cache) {
>> +        local_level++;
>> +
>> +        if ((local_level == level) &&
>> +            (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +            ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> 
> Attributes have to be shifted:
> 
> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2

Hmmm, I'm not sure that is true, the top level function in this routine 
convert the "linux" constant to the ACPI version of that constant. In 
that case the "type" field is pre-shifted, so that it matches the result 
of just anding against the field... That is unless I messed something 
up, which I don't see at the moment (and the code of course has been 
tested with PPTT's from multiple people at this point).


> 
>> +            if (*found != NULL)
>> +                pr_err("Found duplicate cache level/type unable to 
>> determine uniqueness\n");
>> +
>> +            pr_debug("Found cache @ level %d\n", level);
>> +            *found = cache;
>> +            /*
>> +             * continue looking at this node's resource list
>> +             * to verify that we don't find a duplicate
>> +             * cache node.
>> +             */
>> +        }
>> +        cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +    }
>> +    return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, 
>> and then
>> + * for each cache node, count how many levels exist below (logically 
>> above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *cpu_node,
>> +    int *starting_level, int level, int type)
>> +{
>> +    struct acpi_subtable_header *res;
>> +    int number_of_levels = *starting_level;
>> +    int resource = 0;
>> +    struct acpi_pptt_cache *ret = NULL;
>> +    int local_level;
>> +
>> +    /* walk down from the processor node */
>> +    while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>> resource))) {
>> +        resource++;
>> +
>> +        local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +                           res, &ret, level, type);
>> +        /*
>> +         * we are looking for the max depth. Since its potentially
>> +         * possible for a given node to have resources with differing
>> +         * depths verify that the depth we have found is the largest.
>> +         */
>> +        if (number_of_levels < local_level)
>> +            number_of_levels = local_level;
>> +    }
>> +    if (number_of_levels > *starting_level)
>> +        *starting_level = number_of_levels;
>> +
>> +    return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it 
>> and count
>> + * how many levels exist solely for it, and then walk up each level 
>> until we hit
>> + * the root node (ignore the package level because it may be possible 
>> to have
>> + * caches that exist across packages). Count the number of cache 
>> levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +                 struct acpi_pptt_processor *cpu_node)
>> +{
>> +    int total_levels = 0;
>> +
>> +    do {
>> +        acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +        cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +    } while (cpu_node);
>> +
>> +    return total_levels;
>> +}
>> +
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +                   struct acpi_pptt_processor *node)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +    unsigned long table_end;
>> +    u32 node_entry;
>> +    struct acpi_pptt_processor *cpu_node;
>> +
>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>> +    node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +                        sizeof(struct acpi_table_pptt));
>> +
>> +    while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +            (cpu_node->parent == node_entry))
>> +            return 0;
>> +        entry = (struct acpi_subtable_header *)((u8 *)entry + 
>> entry->length);
>> +    }
>> +    return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +    struct acpi_table_header *table_hdr,
>> +    u32 acpi_cpu_id)
>> +{
>> +    struct acpi_subtable_header *entry;
>> +    unsigned long table_end;
>> +    struct acpi_pptt_processor *cpu_node;
>> +
>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +                        sizeof(struct acpi_table_pptt));
>> +
>> +    /* find the processor structure associated with this cpuid */
>> +    while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +            acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +            pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +                 acpi_cpu_id, cpu_node->acpi_processor_id);
>> +            if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +                /* found the correct entry */
>> +                pr_debug("match found!\n");
>> +                return (struct acpi_pptt_processor *)entry;
>> +            }
>> +        }
>> +
>> +        if (entry->length == 0) {
>> +            pr_err("Invalid zero length subtable\n");
>> +            break;
>> +        }
>> +        entry = (struct acpi_subtable_header *)
>> +            ((u8 *)entry + entry->length);
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +    struct acpi_table_header *table_hdr,
>> +    struct acpi_pptt_processor *cpu,
>> +    int level)
>> +{
>> +    struct acpi_pptt_processor *prev_node;
>> +
>> +    while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
>> +        pr_debug("level %d\n", level);
>> +        prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +        if (prev_node == NULL)
>> +            break;
>> +        cpu = prev_node;
>> +        level--;
>> +    }
>> +    return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>> acpi_cpu_id)
> 
> The function name can be more descriptive. How about:
> 
> acpi_count_cache_level() ?

The naming has drifted a bit, so yes, that routine is only used by the 
portion which is determining the number of cache levels for a given PE.


> 
>> +{
>> +    int number_of_levels = 0;
>> +    struct acpi_pptt_processor *cpu;
>> +
>> +    cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +    if (cpu)
>> +        number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +    return number_of_levels;
>> +}
> 
> It is hard to follow what acpi_find_cache_level() and 
> acpi_pptt_walk_cache() really do. It is because they are trying to do 
> too many things at the same time. IMO, splitting acpi_find_cache_level() 
> logic to:
> 1. counting the cache levels (max depth)
> 2. finding the specific cache node
> makes sense.

I disagree, that routine is shared by the two code paths because its 
functionality is 99% duplicated between the two. The difference being 
whether it terminates the search at a given level, or continues 
searching until it runs out of nodes. The latter case is simply a 
degenerate version of the first.


> 
> Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().

That is true, but I fail to see how any of this is actually fixes 
anything. There are a million ways to do this, including as pointed out 
by building another data-structure to simplify the parsing what is a 
table that is less than ideal for runtime parsing (starting with the 
direction of the relative pointers, and ending with having to "infer" 
information that isn't directly flagged). I actually built a couple 
other versions of this, including a nice cute version which is about 1/8 
this size of this and really easy to understand but of course is 
recursive...


> 
> Here are my suggestions:
> 
> 
> static struct acpi_pptt_cache *acpi_pptt_cache_type_level(
>      struct acpi_table_header *table_hdr,
>      struct acpi_subtable_header *res,
>      int *local_level,
>      int level, int type)
> {
>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
> 
>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>          return NULL;
> 
>      while (cache) {
>          if ((*local_level == level) &&
>              (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>              ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2 == 
> type)) {
> 
>              pr_debug("Found cache @ level %d\n", level);
>              return cache;
>          }
>          cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>          (*local_level)++;
>      }
>      return NULL;
> }
> 
> static struct acpi_pptt_cache *_acpi_find_cache_node(
>      struct acpi_table_header *table_hdr,
>      struct acpi_pptt_processor *cpu_node,
>      int *local_level, int level, int type)
> {
>      struct acpi_subtable_header *res;
>      struct acpi_pptt_cache *cache_tmp, *cache = NULL;
>      int resource = 0;
> 
>      /* walk down from the processor node */
>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
> resource))) {
> 
>          cache_tmp = acpi_pptt_cache_type_level(table_hdr, res,
>                                 local_level, level, type);
>          if (cache_tmp) {
>              if (cache)
>                  pr_err("Found duplicate cache level/type unable to 
> determine uniqueness\n");
> 
>              cache = cache_tmp;
>          }
>          resource++;
>      }
>      return cache;
> }
> 
> /* find the ACPI node describing the cache type/level for the given CPU */
> static struct acpi_pptt_cache *acpi_find_cache_node(
>      struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>      enum cache_type type, unsigned int level,
>      struct acpi_pptt_processor **node)
> {
>      int total_levels = 0;
>      struct acpi_pptt_cache *found = NULL;
>      struct acpi_pptt_processor *cpu_node;
>      u8 acpi_type = acpi_cache_type(type);
> 
>      pr_debug("Looking for CPU %d's level %d cache type %d\n",
>           acpi_cpu_id, level, acpi_type);
> 
>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>      if (!cpu_node)
>          return NULL;
> 
>      do {
>          found = _acpi_find_cache_node(table_hdr, cpu_node,
>                            &total_levels, level, acpi_type);
>          *node = cpu_node;
>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>      } while ((cpu_node) && (!found));
> 
>      return found;
> }
> 
> static int acpi_pptt_cache_level(struct acpi_table_header *table_hdr,
>                  struct acpi_subtable_header *res)
> {
>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>      int local_level = 1;
> 
>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>          return 0;
> 
>      while ((cache = fetch_pptt_cache(table_hdr, 
> cache->next_level_of_cache)))
>          local_level++;
>      return local_level;
> }
> 
> static int _acpi_count_cache_level(
>      struct acpi_table_header *table_hdr,
>      struct acpi_pptt_processor *cpu_node)
> {
>      struct acpi_subtable_header *res;
>      int levels = 0, resource = 0, number_of_levels = 0;
> 
>      /* walk down from the processor node */
>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
> resource))) {
>          levels = acpi_pptt_cache_level(table_hdr, res);
> 
>          /*
>           * we are looking for the max depth. Since its potentially
>           * possible for a given node to have resources with differing
>           * depths verify that the depth we have found is the largest.
>           */
>          if (levels > number_of_levels)
>              number_of_levels = levels;
> 
>          resource++;
>      }
>      return number_of_levels;
> }
> 
> static int acpi_count_cache_level(struct acpi_table_header *table_hdr,
>                    u32 acpi_cpu_id)
> {
>      int total_levels = 0;
>      struct acpi_pptt_processor *cpu_node;
> 
>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>      while (cpu_node) {
>          total_levels += _acpi_count_cache_level(table_hdr, cpu_node);
>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>      }
> 
>      return total_levels;
> }
> 
> 
> Did not compile the code so I may have missed somthing.
> 
> Thanks,
> Tomasz


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-17 15:22       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-17 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
> Hi Jeremy,
> 
> I did second round of review and have some more comments, please see below:
> 
> On 12.10.2017 21:48, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>> ? drivers/acpi/pptt.c | 485 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ? 1 file changed, 485 insertions(+)
>> ? create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..c86715fed4a7
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,1 +1,485 @@
>> +/*
>> + * Copyright (C) 2017, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.? See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology 
>> Table (PPTT)
>> + * which is optionally used to describe the processor and cache 
>> topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +
>> +??? /* there isn't a subtable at reference 0 */
>> +??? if (!pptt_ref)
>> +??????? return NULL;
>> +
>> +??? if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>> table_hdr->length)
>> +??????? return NULL;
>> +
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
>> +
>> +??? if (pptt_ref + entry->length > table_hdr->length)
>> +??????? return NULL;
>> +
>> +??? return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? return (struct acpi_pptt_processor 
>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +??? return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>> pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *node, int resource)
>> +{
>> +??? u32 ref;
>> +
>> +??? if (resource >= node->number_of_priv_resources)
>> +??????? return NULL;
>> +
>> +??? ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>> +????????????? sizeof(u32) * resource);
>> +
>> +??? return fetch_pptt_subtable(table_hdr, ref);
>> +}
>> +
>> +/*
>> + * given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +??????????????? int local_level,
>> +??????????????? struct acpi_subtable_header *res,
>> +??????????????? struct acpi_pptt_cache **found,
>> +??????????????? int level, int type)
>> +{
>> +??? struct acpi_pptt_cache *cache;
>> +
>> +??? if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +??????? return 0;
>> +
>> +??? cache = (struct acpi_pptt_cache *) res;
>> +??? while (cache) {
>> +??????? local_level++;
>> +
>> +??????? if ((local_level == level) &&
>> +??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> 
> Attributes have to be shifted:
> 
> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2

Hmmm, I'm not sure that is true, the top level function in this routine 
convert the "linux" constant to the ACPI version of that constant. In 
that case the "type" field is pre-shifted, so that it matches the result 
of just anding against the field... That is unless I messed something 
up, which I don't see at the moment (and the code of course has been 
tested with PPTT's from multiple people at this point).


> 
>> +??????????? if (*found != NULL)
>> +??????????????? pr_err("Found duplicate cache level/type unable to 
>> determine uniqueness\n");
>> +
>> +??????????? pr_debug("Found cache @ level %d\n", level);
>> +??????????? *found = cache;
>> +??????????? /*
>> +???????????? * continue looking at this node's resource list
>> +???????????? * to verify that we don't find a duplicate
>> +???????????? * cache node.
>> +???????????? */
>> +??????? }
>> +??????? cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +??? }
>> +??? return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, 
>> and then
>> + * for each cache node, count how many levels exist below (logically 
>> above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *cpu_node,
>> +??? int *starting_level, int level, int type)
>> +{
>> +??? struct acpi_subtable_header *res;
>> +??? int number_of_levels = *starting_level;
>> +??? int resource = 0;
>> +??? struct acpi_pptt_cache *ret = NULL;
>> +??? int local_level;
>> +
>> +??? /* walk down from the processor node */
>> +??? while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>> resource))) {
>> +??????? resource++;
>> +
>> +??????? local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +?????????????????????????? res, &ret, level, type);
>> +??????? /*
>> +???????? * we are looking for the max depth. Since its potentially
>> +???????? * possible for a given node to have resources with differing
>> +???????? * depths verify that the depth we have found is the largest.
>> +???????? */
>> +??????? if (number_of_levels < local_level)
>> +??????????? number_of_levels = local_level;
>> +??? }
>> +??? if (number_of_levels > *starting_level)
>> +??????? *starting_level = number_of_levels;
>> +
>> +??? return ret;
>> +}
>> +
>> +/*
>> + * given a processor node containing a processing unit, walk into it 
>> and count
>> + * how many levels exist solely for it, and then walk up each level 
>> until we hit
>> + * the root node (ignore the package level because it may be possible 
>> to have
>> + * caches that exist across packages). Count the number of cache 
>> levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +???????????????? struct acpi_pptt_processor *cpu_node)
>> +{
>> +??? int total_levels = 0;
>> +
>> +??? do {
>> +??????? acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +??? } while (cpu_node);
>> +
>> +??? return total_levels;
>> +}
>> +
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +?????????????????? struct acpi_pptt_processor *node)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +??? unsigned long table_end;
>> +??? u32 node_entry;
>> +??? struct acpi_pptt_processor *cpu_node;
>> +
>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>> +??? node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +??????????????????????? sizeof(struct acpi_table_pptt));
>> +
>> +??? while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +??????????? (cpu_node->parent == node_entry))
>> +??????????? return 0;
>> +??????? entry = (struct acpi_subtable_header *)((u8 *)entry + 
>> entry->length);
>> +??? }
>> +??? return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +??? struct acpi_table_header *table_hdr,
>> +??? u32 acpi_cpu_id)
>> +{
>> +??? struct acpi_subtable_header *entry;
>> +??? unsigned long table_end;
>> +??? struct acpi_pptt_processor *cpu_node;
>> +
>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +??????????????????????? sizeof(struct acpi_table_pptt));
>> +
>> +??? /* find the processor structure associated with this cpuid */
>> +??? while (((unsigned long)entry) + sizeof(struct 
>> acpi_subtable_header) < table_end) {
>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +??????????? acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +??????????? pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +???????????????? acpi_cpu_id, cpu_node->acpi_processor_id);
>> +??????????? if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +??????????????? /* found the correct entry */
>> +??????????????? pr_debug("match found!\n");
>> +??????????????? return (struct acpi_pptt_processor *)entry;
>> +??????????? }
>> +??????? }
>> +
>> +??????? if (entry->length == 0) {
>> +??????????? pr_err("Invalid zero length subtable\n");
>> +??????????? break;
>> +??????? }
>> +??????? entry = (struct acpi_subtable_header *)
>> +??????????? ((u8 *)entry + entry->length);
>> +??? }
>> +
>> +??? return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +??? struct acpi_table_header *table_hdr,
>> +??? struct acpi_pptt_processor *cpu,
>> +??? int level)
>> +{
>> +??? struct acpi_pptt_processor *prev_node;
>> +
>> +??? while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
>> +??????? pr_debug("level %d\n", level);
>> +??????? prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +??????? if (prev_node == NULL)
>> +??????????? break;
>> +??????? cpu = prev_node;
>> +??????? level--;
>> +??? }
>> +??? return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>> acpi_cpu_id)
> 
> The function name can be more descriptive. How about:
> 
> acpi_count_cache_level() ?

The naming has drifted a bit, so yes, that routine is only used by the 
portion which is determining the number of cache levels for a given PE.


> 
>> +{
>> +??? int number_of_levels = 0;
>> +??? struct acpi_pptt_processor *cpu;
>> +
>> +??? cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +??? if (cpu)
>> +??????? number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +??? return number_of_levels;
>> +}
> 
> It is hard to follow what acpi_find_cache_level() and 
> acpi_pptt_walk_cache() really do. It is because they are trying to do 
> too many things at the same time. IMO, splitting acpi_find_cache_level() 
> logic to:
> 1. counting the cache levels (max depth)
> 2. finding the specific cache node
> makes sense.

I disagree, that routine is shared by the two code paths because its 
functionality is 99% duplicated between the two. The difference being 
whether it terminates the search at a given level, or continues 
searching until it runs out of nodes. The latter case is simply a 
degenerate version of the first.


> 
> Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().

That is true, but I fail to see how any of this is actually fixes 
anything. There are a million ways to do this, including as pointed out 
by building another data-structure to simplify the parsing what is a 
table that is less than ideal for runtime parsing (starting with the 
direction of the relative pointers, and ending with having to "infer" 
information that isn't directly flagged). I actually built a couple 
other versions of this, including a nice cute version which is about 1/8 
this size of this and really easy to understand but of course is 
recursive...


> 
> Here are my suggestions:
> 
> 
> static struct acpi_pptt_cache *acpi_pptt_cache_type_level(
>  ????struct acpi_table_header *table_hdr,
>  ????struct acpi_subtable_header *res,
>  ????int *local_level,
>  ????int level, int type)
> {
>  ????struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
> 
>  ????if (res->type != ACPI_PPTT_TYPE_CACHE)
>  ??????? return NULL;
> 
>  ????while (cache) {
>  ??????? if ((*local_level == level) &&
>  ??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>  ??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2 == 
> type)) {
> 
>  ??????????? pr_debug("Found cache @ level %d\n", level);
>  ??????????? return cache;
>  ??????? }
>  ??????? cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>  ??????? (*local_level)++;
>  ????}
>  ????return NULL;
> }
> 
> static struct acpi_pptt_cache *_acpi_find_cache_node(
>  ????struct acpi_table_header *table_hdr,
>  ????struct acpi_pptt_processor *cpu_node,
>  ????int *local_level, int level, int type)
> {
>  ????struct acpi_subtable_header *res;
>  ????struct acpi_pptt_cache *cache_tmp, *cache = NULL;
>  ????int resource = 0;
> 
>  ????/* walk down from the processor node */
>  ????while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
> resource))) {
> 
>  ??????? cache_tmp = acpi_pptt_cache_type_level(table_hdr, res,
>  ?????????????????????????????? local_level, level, type);
>  ??????? if (cache_tmp) {
>  ??????????? if (cache)
>  ??????????????? pr_err("Found duplicate cache level/type unable to 
> determine uniqueness\n");
> 
>  ??????????? cache = cache_tmp;
>  ??????? }
>  ??????? resource++;
>  ????}
>  ????return cache;
> }
> 
> /* find the ACPI node describing the cache type/level for the given CPU */
> static struct acpi_pptt_cache *acpi_find_cache_node(
>  ????struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>  ????enum cache_type type, unsigned int level,
>  ????struct acpi_pptt_processor **node)
> {
>  ????int total_levels = 0;
>  ????struct acpi_pptt_cache *found = NULL;
>  ????struct acpi_pptt_processor *cpu_node;
>  ????u8 acpi_type = acpi_cache_type(type);
> 
>  ????pr_debug("Looking for CPU %d's level %d cache type %d\n",
>  ???????? acpi_cpu_id, level, acpi_type);
> 
>  ????cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>  ????if (!cpu_node)
>  ??????? return NULL;
> 
>  ????do {
>  ??????? found = _acpi_find_cache_node(table_hdr, cpu_node,
>  ????????????????????????? &total_levels, level, acpi_type);
>  ??????? *node = cpu_node;
>  ??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>  ????} while ((cpu_node) && (!found));
> 
>  ????return found;
> }
> 
> static int acpi_pptt_cache_level(struct acpi_table_header *table_hdr,
>  ??????????????? struct acpi_subtable_header *res)
> {
>  ????struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>  ????int local_level = 1;
> 
>  ????if (res->type != ACPI_PPTT_TYPE_CACHE)
>  ??????? return 0;
> 
>  ????while ((cache = fetch_pptt_cache(table_hdr, 
> cache->next_level_of_cache)))
>  ??????? local_level++;
>  ????return local_level;
> }
> 
> static int _acpi_count_cache_level(
>  ????struct acpi_table_header *table_hdr,
>  ????struct acpi_pptt_processor *cpu_node)
> {
>  ????struct acpi_subtable_header *res;
>  ????int levels = 0, resource = 0, number_of_levels = 0;
> 
>  ????/* walk down from the processor node */
>  ????while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
> resource))) {
>  ??????? levels = acpi_pptt_cache_level(table_hdr, res);
> 
>  ??????? /*
>  ???????? * we are looking for the max depth. Since its potentially
>  ???????? * possible for a given node to have resources with differing
>  ???????? * depths verify that the depth we have found is the largest.
>  ???????? */
>  ??????? if (levels > number_of_levels)
>  ??????????? number_of_levels = levels;
> 
>  ??????? resource++;
>  ????}
>  ????return number_of_levels;
> }
> 
> static int acpi_count_cache_level(struct acpi_table_header *table_hdr,
>  ????????????????? u32 acpi_cpu_id)
> {
>  ????int total_levels = 0;
>  ????struct acpi_pptt_processor *cpu_node;
> 
>  ????cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>  ????while (cpu_node) {
>  ??????? total_levels += _acpi_count_cache_level(table_hdr, cpu_node);
>  ??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>  ????}
> 
>  ????return total_levels;
> }
> 
> 
> Did not compile the code so I may have missed somthing.
> 
> Thanks,
> Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-17 15:22       ` Jeremy Linton
  (?)
@ 2017-10-18  1:10         ` Xiongfeng Wang
  -1 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2017-10-18  1:10 UTC (permalink / raw)
  To: Jeremy Linton, Tomasz Nowicki, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, linux-arm-kernel

Hi Jeremy,


On 2017/10/17 23:22, Jeremy Linton wrote:
> Hi,
>
> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>> Hi Jeremy,
>>
>> I did second round of review and have some more comments, please see below:
>>
>> On 12.10.2017 21:48, Jeremy Linton wrote:
> I disagree, that routine is shared by the two code paths because its functionality is 99% duplicated between the two. The difference being whether it terminates the search at a given level, or continues searching until it runs out of nodes. The latter case is simply a degenerate version of the first.
>
>
>>
>> Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().
>
> That is true, but I fail to see how any of this is actually fixes anything. There are a million ways to do this, including as pointed out by building another data-structure to simplify the parsing what is a table that is less than ideal for runtime parsing (starting with the direction of the relative pointers, and ending with having to "infer" information that isn't directly flagged). I actually built a couple other versions of this, including a nice cute version which is about 1/8 this size of this and really easy to understand but of course is recursive...
>
>
Maybe you can see my version below. It is similar to what you said above. It may give some help.
https://github.com/fenghusthu/acpi_pptt

Thanks
Xiongfeng Wang
>>
>> Here are my suggestions:
>>
>>
>> static struct acpi_pptt_cache *acpi_pptt_cache_type_level(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_subtable_header *res,
>>      int *local_level,
>>      int level, int type)
>> {
>>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>>
>>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>>          return NULL;
>>
>>      while (cache) {
>>          if ((*local_level == level) &&
>>              (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>              ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2 == type)) {
>>
>>              pr_debug("Found cache @ level %d\n", level);
>>              return cache;
>>          }
>>          cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>>          (*local_level)++;
>>      }
>>      return NULL;
>> }
>>
>> static struct acpi_pptt_cache *_acpi_find_cache_node(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_pptt_processor *cpu_node,
>>      int *local_level, int level, int type)
>> {
>>      struct acpi_subtable_header *res;
>>      struct acpi_pptt_cache *cache_tmp, *cache = NULL;
>>      int resource = 0;
>>
>>      /* walk down from the processor node */
>>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>>
>>          cache_tmp = acpi_pptt_cache_type_level(table_hdr, res,
>>                                 local_level, level, type);
>>          if (cache_tmp) {
>>              if (cache)
>>                  pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
>>
>>              cache = cache_tmp;
>>          }
>>          resource++;
>>      }
>>      return cache;
>> }
>>
>> /* find the ACPI node describing the cache type/level for the given CPU */
>> static struct acpi_pptt_cache *acpi_find_cache_node(
>>      struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>>      enum cache_type type, unsigned int level,
>>      struct acpi_pptt_processor **node)
>> {
>>      int total_levels = 0;
>>      struct acpi_pptt_cache *found = NULL;
>>      struct acpi_pptt_processor *cpu_node;
>>      u8 acpi_type = acpi_cache_type(type);
>>
>>      pr_debug("Looking for CPU %d's level %d cache type %d\n",
>>           acpi_cpu_id, level, acpi_type);
>>
>>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>      if (!cpu_node)
>>          return NULL;
>>
>>      do {
>>          found = _acpi_find_cache_node(table_hdr, cpu_node,
>>                            &total_levels, level, acpi_type);
>>          *node = cpu_node;
>>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>      } while ((cpu_node) && (!found));
>>
>>      return found;
>> }
>>
>> static int acpi_pptt_cache_level(struct acpi_table_header *table_hdr,
>>                  struct acpi_subtable_header *res)
>> {
>>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>>      int local_level = 1;
>>
>>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>>          return 0;
>>
>>      while ((cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache)))
>>          local_level++;
>>      return local_level;
>> }
>>
>> static int _acpi_count_cache_level(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_pptt_processor *cpu_node)
>> {
>>      struct acpi_subtable_header *res;
>>      int levels = 0, resource = 0, number_of_levels = 0;
>>
>>      /* walk down from the processor node */
>>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>>          levels = acpi_pptt_cache_level(table_hdr, res);
>>
>>          /*
>>           * we are looking for the max depth. Since its potentially
>>           * possible for a given node to have resources with differing
>>           * depths verify that the depth we have found is the largest.
>>           */
>>          if (levels > number_of_levels)
>>              number_of_levels = levels;
>>
>>          resource++;
>>      }
>>      return number_of_levels;
>> }
>>
>> static int acpi_count_cache_level(struct acpi_table_header *table_hdr,
>>                    u32 acpi_cpu_id)
>> {
>>      int total_levels = 0;
>>      struct acpi_pptt_processor *cpu_node;
>>
>>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>      while (cpu_node) {
>>          total_levels += _acpi_count_cache_level(table_hdr, cpu_node);
>>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>      }
>>
>>      return total_levels;
>> }
>>
>>
>> Did not compile the code so I may have missed somthing.
>>
>> Thanks,
>> Tomasz
>
>
> .
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-18  1:10         ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2017-10-18  1:10 UTC (permalink / raw)
  To: Jeremy Linton, Tomasz Nowicki, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, linux-arm-kernel

Hi Jeremy,


On 2017/10/17 23:22, Jeremy Linton wrote:
> Hi,
>
> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>> Hi Jeremy,
>>
>> I did second round of review and have some more comments, please see below:
>>
>> On 12.10.2017 21:48, Jeremy Linton wrote:
> I disagree, that routine is shared by the two code paths because its functionality is 99% duplicated between the two. The difference being whether it terminates the search at a given level, or continues searching until it runs out of nodes. The latter case is simply a degenerate version of the first.
>
>
>>
>> Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().
>
> That is true, but I fail to see how any of this is actually fixes anything. There are a million ways to do this, including as pointed out by building another data-structure to simplify the parsing what is a table that is less than ideal for runtime parsing (starting with the direction of the relative pointers, and ending with having to "infer" information that isn't directly flagged). I actually built a couple other versions of this, including a nice cute version which is about 1/8 this size of this and really easy to understand but of course is recursive...
>
>
Maybe you can see my version below. It is similar to what you said above. It may give some help.
https://github.com/fenghusthu/acpi_pptt

Thanks
Xiongfeng Wang
>>
>> Here are my suggestions:
>>
>>
>> static struct acpi_pptt_cache *acpi_pptt_cache_type_level(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_subtable_header *res,
>>      int *local_level,
>>      int level, int type)
>> {
>>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>>
>>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>>          return NULL;
>>
>>      while (cache) {
>>          if ((*local_level == level) &&
>>              (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>              ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2 == type)) {
>>
>>              pr_debug("Found cache @ level %d\n", level);
>>              return cache;
>>          }
>>          cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>>          (*local_level)++;
>>      }
>>      return NULL;
>> }
>>
>> static struct acpi_pptt_cache *_acpi_find_cache_node(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_pptt_processor *cpu_node,
>>      int *local_level, int level, int type)
>> {
>>      struct acpi_subtable_header *res;
>>      struct acpi_pptt_cache *cache_tmp, *cache = NULL;
>>      int resource = 0;
>>
>>      /* walk down from the processor node */
>>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>>
>>          cache_tmp = acpi_pptt_cache_type_level(table_hdr, res,
>>                                 local_level, level, type);
>>          if (cache_tmp) {
>>              if (cache)
>>                  pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
>>
>>              cache = cache_tmp;
>>          }
>>          resource++;
>>      }
>>      return cache;
>> }
>>
>> /* find the ACPI node describing the cache type/level for the given CPU */
>> static struct acpi_pptt_cache *acpi_find_cache_node(
>>      struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>>      enum cache_type type, unsigned int level,
>>      struct acpi_pptt_processor **node)
>> {
>>      int total_levels = 0;
>>      struct acpi_pptt_cache *found = NULL;
>>      struct acpi_pptt_processor *cpu_node;
>>      u8 acpi_type = acpi_cache_type(type);
>>
>>      pr_debug("Looking for CPU %d's level %d cache type %d\n",
>>           acpi_cpu_id, level, acpi_type);
>>
>>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>      if (!cpu_node)
>>          return NULL;
>>
>>      do {
>>          found = _acpi_find_cache_node(table_hdr, cpu_node,
>>                            &total_levels, level, acpi_type);
>>          *node = cpu_node;
>>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>      } while ((cpu_node) && (!found));
>>
>>      return found;
>> }
>>
>> static int acpi_pptt_cache_level(struct acpi_table_header *table_hdr,
>>                  struct acpi_subtable_header *res)
>> {
>>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>>      int local_level = 1;
>>
>>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>>          return 0;
>>
>>      while ((cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache)))
>>          local_level++;
>>      return local_level;
>> }
>>
>> static int _acpi_count_cache_level(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_pptt_processor *cpu_node)
>> {
>>      struct acpi_subtable_header *res;
>>      int levels = 0, resource = 0, number_of_levels = 0;
>>
>>      /* walk down from the processor node */
>>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>>          levels = acpi_pptt_cache_level(table_hdr, res);
>>
>>          /*
>>           * we are looking for the max depth. Since its potentially
>>           * possible for a given node to have resources with differing
>>           * depths verify that the depth we have found is the largest.
>>           */
>>          if (levels > number_of_levels)
>>              number_of_levels = levels;
>>
>>          resource++;
>>      }
>>      return number_of_levels;
>> }
>>
>> static int acpi_count_cache_level(struct acpi_table_header *table_hdr,
>>                    u32 acpi_cpu_id)
>> {
>>      int total_levels = 0;
>>      struct acpi_pptt_processor *cpu_node;
>>
>>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>      while (cpu_node) {
>>          total_levels += _acpi_count_cache_level(table_hdr, cpu_node);
>>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>      }
>>
>>      return total_levels;
>> }
>>
>>
>> Did not compile the code so I may have missed somthing.
>>
>> Thanks,
>> Tomasz
>
>
> .
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-18  1:10         ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2017-10-18  1:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,


On 2017/10/17 23:22, Jeremy Linton wrote:
> Hi,
>
> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>> Hi Jeremy,
>>
>> I did second round of review and have some more comments, please see below:
>>
>> On 12.10.2017 21:48, Jeremy Linton wrote:
> I disagree, that routine is shared by the two code paths because its functionality is 99% duplicated between the two. The difference being whether it terminates the search at a given level, or continues searching until it runs out of nodes. The latter case is simply a degenerate version of the first.
>
>
>>
>> Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().
>
> That is true, but I fail to see how any of this is actually fixes anything. There are a million ways to do this, including as pointed out by building another data-structure to simplify the parsing what is a table that is less than ideal for runtime parsing (starting with the direction of the relative pointers, and ending with having to "infer" information that isn't directly flagged). I actually built a couple other versions of this, including a nice cute version which is about 1/8 this size of this and really easy to understand but of course is recursive...
>
>
Maybe you can see my version below. It is similar to what you said above. It may give some help.
https://github.com/fenghusthu/acpi_pptt

Thanks
Xiongfeng Wang
>>
>> Here are my suggestions:
>>
>>
>> static struct acpi_pptt_cache *acpi_pptt_cache_type_level(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_subtable_header *res,
>>      int *local_level,
>>      int level, int type)
>> {
>>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>>
>>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>>          return NULL;
>>
>>      while (cache) {
>>          if ((*local_level == level) &&
>>              (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>              ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2 == type)) {
>>
>>              pr_debug("Found cache @ level %d\n", level);
>>              return cache;
>>          }
>>          cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>>          (*local_level)++;
>>      }
>>      return NULL;
>> }
>>
>> static struct acpi_pptt_cache *_acpi_find_cache_node(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_pptt_processor *cpu_node,
>>      int *local_level, int level, int type)
>> {
>>      struct acpi_subtable_header *res;
>>      struct acpi_pptt_cache *cache_tmp, *cache = NULL;
>>      int resource = 0;
>>
>>      /* walk down from the processor node */
>>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>>
>>          cache_tmp = acpi_pptt_cache_type_level(table_hdr, res,
>>                                 local_level, level, type);
>>          if (cache_tmp) {
>>              if (cache)
>>                  pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
>>
>>              cache = cache_tmp;
>>          }
>>          resource++;
>>      }
>>      return cache;
>> }
>>
>> /* find the ACPI node describing the cache type/level for the given CPU */
>> static struct acpi_pptt_cache *acpi_find_cache_node(
>>      struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>>      enum cache_type type, unsigned int level,
>>      struct acpi_pptt_processor **node)
>> {
>>      int total_levels = 0;
>>      struct acpi_pptt_cache *found = NULL;
>>      struct acpi_pptt_processor *cpu_node;
>>      u8 acpi_type = acpi_cache_type(type);
>>
>>      pr_debug("Looking for CPU %d's level %d cache type %d\n",
>>           acpi_cpu_id, level, acpi_type);
>>
>>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>      if (!cpu_node)
>>          return NULL;
>>
>>      do {
>>          found = _acpi_find_cache_node(table_hdr, cpu_node,
>>                            &total_levels, level, acpi_type);
>>          *node = cpu_node;
>>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>      } while ((cpu_node) && (!found));
>>
>>      return found;
>> }
>>
>> static int acpi_pptt_cache_level(struct acpi_table_header *table_hdr,
>>                  struct acpi_subtable_header *res)
>> {
>>      struct acpi_pptt_cache *cache = (struct acpi_pptt_cache *) res;
>>      int local_level = 1;
>>
>>      if (res->type != ACPI_PPTT_TYPE_CACHE)
>>          return 0;
>>
>>      while ((cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache)))
>>          local_level++;
>>      return local_level;
>> }
>>
>> static int _acpi_count_cache_level(
>>      struct acpi_table_header *table_hdr,
>>      struct acpi_pptt_processor *cpu_node)
>> {
>>      struct acpi_subtable_header *res;
>>      int levels = 0, resource = 0, number_of_levels = 0;
>>
>>      /* walk down from the processor node */
>>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>>          levels = acpi_pptt_cache_level(table_hdr, res);
>>
>>          /*
>>           * we are looking for the max depth. Since its potentially
>>           * possible for a given node to have resources with differing
>>           * depths verify that the depth we have found is the largest.
>>           */
>>          if (levels > number_of_levels)
>>              number_of_levels = levels;
>>
>>          resource++;
>>      }
>>      return number_of_levels;
>> }
>>
>> static int acpi_count_cache_level(struct acpi_table_header *table_hdr,
>>                    u32 acpi_cpu_id)
>> {
>>      int total_levels = 0;
>>      struct acpi_pptt_processor *cpu_node;
>>
>>      cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>      while (cpu_node) {
>>          total_levels += _acpi_count_cache_level(table_hdr, cpu_node);
>>          cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>      }
>>
>>      return total_levels;
>> }
>>
>>
>> Did not compile the code so I may have missed somthing.
>>
>> Thanks,
>> Tomasz
>
>
> .
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-17 15:22       ` Jeremy Linton
@ 2017-10-18  5:39         ` Tomasz Nowicki
  -1 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-18  5:39 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, catalin.marinas, gregkh, jhugo, rjw, linux-pm,
	will.deacon, linux-kernel, ahs3, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, wangxiongfeng2, linux-arm-kernel

Hi,

On 17.10.2017 17:22, Jeremy Linton wrote:
> Hi,
> 
> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>> Hi Jeremy,
>>
>> I did second round of review and have some more comments, please see 
>> below:
>>
>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>> ACPI 6.2 adds a new table, which describes how processing units
>>> are related to each other in tree like fashion. Caches are
>>> also sprinkled throughout the tree and describe the properties
>>> of the caches in relation to other caches and processing units.
>>>
>>> Add the code to parse the cache hierarchy and report the total
>>> number of levels of cache for a given core using
>>> acpi_find_last_cache_level() as well as fill out the individual
>>> cores cache information with cache_setup_acpi() once the
>>> cpu_cacheinfo structure has been populated by the arch specific
>>> code.
>>>
>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>> to report a unique ID for each processing unit at a given level
>>> in the tree. These unique id's can then be used to match related
>>> processing units which exist as threads, COD (clusters
>>> on die), within a given package, etc.
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   drivers/acpi/pptt.c | 485 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 485 insertions(+)
>>>   create mode 100644 drivers/acpi/pptt.c
>>>
>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>> new file mode 100644
>>> index 000000000000..c86715fed4a7
>>> --- /dev/null
>>> +++ b/drivers/acpi/pptt.c
>>> @@ -0,1 +1,485 @@
>>> +/*
>>> + * Copyright (C) 2017, ARM
>>> + *
>>> + * This program is free software; you can redistribute it and/or 
>>> modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but 
>>> WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of 
>>> MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>>> License for
>>> + * more details.
>>> + *
>>> + * This file implements parsing of Processor Properties Topology 
>>> Table (PPTT)
>>> + * which is optionally used to describe the processor and cache 
>>> topology.
>>> + * Due to the relative pointers used throughout the table, this doesn't
>>> + * leverage the existing subtable parsing in the kernel.
>>> + */
>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>> +
>>> +#include <linux/acpi.h>
>>> +#include <linux/cacheinfo.h>
>>> +#include <acpi/processor.h>
>>> +
>>> +/*
>>> + * Given the PPTT table, find and verify that the subtable entry
>>> + * is located within the table
>>> + */
>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>> +{
>>> +    struct acpi_subtable_header *entry;
>>> +
>>> +    /* there isn't a subtable at reference 0 */
>>> +    if (!pptt_ref)
>>> +        return NULL;
>>> +
>>> +    if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>> table_hdr->length)
>>> +        return NULL;
>>> +
>>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>> pptt_ref);
>>> +
>>> +    if (pptt_ref + entry->length > table_hdr->length)
>>> +        return NULL;
>>> +
>>> +    return entry;
>>> +}
>>> +
>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>> +{
>>> +    return (struct acpi_pptt_processor 
>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>> +}
>>> +
>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>> +{
>>> +    return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>>> pptt_ref);
>>> +}
>>> +
>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>> +    struct acpi_table_header *table_hdr,
>>> +    struct acpi_pptt_processor *node, int resource)
>>> +{
>>> +    u32 ref;
>>> +
>>> +    if (resource >= node->number_of_priv_resources)
>>> +        return NULL;
>>> +
>>> +    ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>> +              sizeof(u32) * resource);
>>> +
>>> +    return fetch_pptt_subtable(table_hdr, ref);
>>> +}
>>> +
>>> +/*
>>> + * given a pptt resource, verify that it is a cache node, then walk
>>> + * down each level of caches, counting how many levels are found
>>> + * as well as checking the cache type (icache, dcache, unified). If a
>>> + * level & type match, then we set found, and continue the search.
>>> + * Once the entire cache branch has been walked return its max
>>> + * depth.
>>> + */
>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>> +                int local_level,
>>> +                struct acpi_subtable_header *res,
>>> +                struct acpi_pptt_cache **found,
>>> +                int level, int type)
>>> +{
>>> +    struct acpi_pptt_cache *cache;
>>> +
>>> +    if (res->type != ACPI_PPTT_TYPE_CACHE)
>>> +        return 0;
>>> +
>>> +    cache = (struct acpi_pptt_cache *) res;
>>> +    while (cache) {
>>> +        local_level++;
>>> +
>>> +        if ((local_level == level) &&
>>> +            (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>> +            ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>> type)) {
>>
>> Attributes have to be shifted:
>>
>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
> 
> Hmmm, I'm not sure that is true, the top level function in this routine 
> convert the "linux" constant to the ACPI version of that constant. In 
> that case the "type" field is pre-shifted, so that it matches the result 
> of just anding against the field... That is unless I messed something 
> up, which I don't see at the moment (and the code of course has been 
> tested with PPTT's from multiple people at this point).

For ThunderX2 I got lots of errors in dmesg:
Found duplicate cache level/type unable to determine uniqueness

So I fixed "type" macros definitions (without shifting) and shift it 
here which fixes the issue. As you said, it can be pre-shifted as well.

> 
> 
>>
>>> +            if (*found != NULL)
>>> +                pr_err("Found duplicate cache level/type unable to 
>>> determine uniqueness\n");
>>> +
>>> +            pr_debug("Found cache @ level %d\n", level);
>>> +            *found = cache;
>>> +            /*
>>> +             * continue looking at this node's resource list
>>> +             * to verify that we don't find a duplicate
>>> +             * cache node.
>>> +             */
>>> +        }
>>> +        cache = fetch_pptt_cache(table_hdr, 
>>> cache->next_level_of_cache);
>>> +    }
>>> +    return local_level;
>>> +}
>>> +
>>> +/*
>>> + * Given a CPU node look for cache levels that exist at this level, 
>>> and then
>>> + * for each cache node, count how many levels exist below (logically 
>>> above) it.
>>> + * If a level and type are specified, and we find that level/type, 
>>> abort
>>> + * processing and return the acpi_pptt_cache structure.
>>> + */
>>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>>> +    struct acpi_table_header *table_hdr,
>>> +    struct acpi_pptt_processor *cpu_node,
>>> +    int *starting_level, int level, int type)
>>> +{
>>> +    struct acpi_subtable_header *res;
>>> +    int number_of_levels = *starting_level;
>>> +    int resource = 0;
>>> +    struct acpi_pptt_cache *ret = NULL;
>>> +    int local_level;
>>> +
>>> +    /* walk down from the processor node */
>>> +    while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>>> resource))) {
>>> +        resource++;
>>> +
>>> +        local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>>> +                           res, &ret, level, type);
>>> +        /*
>>> +         * we are looking for the max depth. Since its potentially
>>> +         * possible for a given node to have resources with differing
>>> +         * depths verify that the depth we have found is the largest.
>>> +         */
>>> +        if (number_of_levels < local_level)
>>> +            number_of_levels = local_level;
>>> +    }
>>> +    if (number_of_levels > *starting_level)
>>> +        *starting_level = number_of_levels;
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +/*
>>> + * given a processor node containing a processing unit, walk into it 
>>> and count
>>> + * how many levels exist solely for it, and then walk up each level 
>>> until we hit
>>> + * the root node (ignore the package level because it may be 
>>> possible to have
>>> + * caches that exist across packages). Count the number of cache 
>>> levels that
>>> + * exist at each level on the way up.
>>> + */
>>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>>> +                 struct acpi_pptt_processor *cpu_node)
>>> +{
>>> +    int total_levels = 0;
>>> +
>>> +    do {
>>> +        acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 
>>> 0);
>>> +        cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>> +    } while (cpu_node);
>>> +
>>> +    return total_levels;
>>> +}
>>> +
>>> +/* determine if the given node is a leaf node */
>>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>>> +                   struct acpi_pptt_processor *node)
>>> +{
>>> +    struct acpi_subtable_header *entry;
>>> +    unsigned long table_end;
>>> +    u32 node_entry;
>>> +    struct acpi_pptt_processor *cpu_node;
>>> +
>>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +    node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>>> +                        sizeof(struct acpi_table_pptt));
>>> +
>>> +    while (((unsigned long)entry) + sizeof(struct 
>>> acpi_subtable_header) < table_end) {
>>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>>> +            (cpu_node->parent == node_entry))
>>> +            return 0;
>>> +        entry = (struct acpi_subtable_header *)((u8 *)entry + 
>>> entry->length);
>>> +    }
>>> +    return 1;
>>> +}
>>> +
>>> +/*
>>> + * Find the subtable entry describing the provided processor
>>> + */
>>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>>> +    struct acpi_table_header *table_hdr,
>>> +    u32 acpi_cpu_id)
>>> +{
>>> +    struct acpi_subtable_header *entry;
>>> +    unsigned long table_end;
>>> +    struct acpi_pptt_processor *cpu_node;
>>> +
>>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>>> +                        sizeof(struct acpi_table_pptt));
>>> +
>>> +    /* find the processor structure associated with this cpuid */
>>> +    while (((unsigned long)entry) + sizeof(struct 
>>> acpi_subtable_header) < table_end) {
>>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>>> +
>>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>>> +            acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>>> +            pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>>> +                 acpi_cpu_id, cpu_node->acpi_processor_id);
>>> +            if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>>> +                /* found the correct entry */
>>> +                pr_debug("match found!\n");
>>> +                return (struct acpi_pptt_processor *)entry;
>>> +            }
>>> +        }
>>> +
>>> +        if (entry->length == 0) {
>>> +            pr_err("Invalid zero length subtable\n");
>>> +            break;
>>> +        }
>>> +        entry = (struct acpi_subtable_header *)
>>> +            ((u8 *)entry + entry->length);
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +/*
>>> + * Given a acpi_pptt_processor node, walk up until we identify the
>>> + * package that the node is associated with or we run out of levels
>>> + * to request.
>>> + */
>>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>>> +    struct acpi_table_header *table_hdr,
>>> +    struct acpi_pptt_processor *cpu,
>>> +    int level)
>>> +{
>>> +    struct acpi_pptt_processor *prev_node;
>>> +
>>> +    while (cpu && level && !(cpu->flags & 
>>> ACPI_PPTT_PHYSICAL_PACKAGE)) {
>>> +        pr_debug("level %d\n", level);
>>> +        prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>>> +        if (prev_node == NULL)
>>> +            break;
>>> +        cpu = prev_node;
>>> +        level--;
>>> +    }
>>> +    return cpu;
>>> +}
>>> +
>>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>>> acpi_cpu_id)
>>
>> The function name can be more descriptive. How about:
>>
>> acpi_count_cache_level() ?
> 
> The naming has drifted a bit, so yes, that routine is only used by the 
> portion which is determining the number of cache levels for a given PE.
> 
> 
>>
>>> +{
>>> +    int number_of_levels = 0;
>>> +    struct acpi_pptt_processor *cpu;
>>> +
>>> +    cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>> +    if (cpu)
>>> +        number_of_levels = acpi_process_node(table_hdr, cpu);
>>> +
>>> +    return number_of_levels;
>>> +}
>>
>> It is hard to follow what acpi_find_cache_level() and 
>> acpi_pptt_walk_cache() really do. It is because they are trying to do 
>> too many things at the same time. IMO, splitting 
>> acpi_find_cache_level() logic to:
>> 1. counting the cache levels (max depth)
>> 2. finding the specific cache node
>> makes sense.
> 
> I disagree, that routine is shared by the two code paths because its 
> functionality is 99% duplicated between the two. The difference being 
> whether it terminates the search at a given level, or continues 
> searching until it runs out of nodes. The latter case is simply a 
> degenerate version of the first.

Mostly it is about trade-off between code simplicity and redundancy, I 
personally prefer the former. It is not the critical issue though.

> 
> 
>>
>> Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().
> 
> That is true, but I fail to see how any of this is actually fixes 
> anything. There are a million ways to do this, including as pointed out 
> by building another data-structure to simplify the parsing what is a 
> table that is less than ideal for runtime parsing (starting with the 
> direction of the relative pointers, and ending with having to "infer" 
> information that isn't directly flagged). I actually built a couple 
> other versions of this, including a nice cute version which is about 1/8 
> this size of this and really easy to understand but of course is 
> recursive...

I believe this will improve code readability. Obviously, you can 
disagree with my suggestions.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-18  5:39         ` Tomasz Nowicki
  0 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-18  5:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 17.10.2017 17:22, Jeremy Linton wrote:
> Hi,
> 
> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>> Hi Jeremy,
>>
>> I did second round of review and have some more comments, please see 
>> below:
>>
>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>> ACPI 6.2 adds a new table, which describes how processing units
>>> are related to each other in tree like fashion. Caches are
>>> also sprinkled throughout the tree and describe the properties
>>> of the caches in relation to other caches and processing units.
>>>
>>> Add the code to parse the cache hierarchy and report the total
>>> number of levels of cache for a given core using
>>> acpi_find_last_cache_level() as well as fill out the individual
>>> cores cache information with cache_setup_acpi() once the
>>> cpu_cacheinfo structure has been populated by the arch specific
>>> code.
>>>
>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>> to report a unique ID for each processing unit at a given level
>>> in the tree. These unique id's can then be used to match related
>>> processing units which exist as threads, COD (clusters
>>> on die), within a given package, etc.
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>> ? drivers/acpi/pptt.c | 485 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ? 1 file changed, 485 insertions(+)
>>> ? create mode 100644 drivers/acpi/pptt.c
>>>
>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>> new file mode 100644
>>> index 000000000000..c86715fed4a7
>>> --- /dev/null
>>> +++ b/drivers/acpi/pptt.c
>>> @@ -0,1 +1,485 @@
>>> +/*
>>> + * Copyright (C) 2017, ARM
>>> + *
>>> + * This program is free software; you can redistribute it and/or 
>>> modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but 
>>> WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of 
>>> MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.? See the GNU General Public 
>>> License for
>>> + * more details.
>>> + *
>>> + * This file implements parsing of Processor Properties Topology 
>>> Table (PPTT)
>>> + * which is optionally used to describe the processor and cache 
>>> topology.
>>> + * Due to the relative pointers used throughout the table, this doesn't
>>> + * leverage the existing subtable parsing in the kernel.
>>> + */
>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>> +
>>> +#include <linux/acpi.h>
>>> +#include <linux/cacheinfo.h>
>>> +#include <acpi/processor.h>
>>> +
>>> +/*
>>> + * Given the PPTT table, find and verify that the subtable entry
>>> + * is located within the table
>>> + */
>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>> +{
>>> +??? struct acpi_subtable_header *entry;
>>> +
>>> +??? /* there isn't a subtable at reference 0 */
>>> +??? if (!pptt_ref)
>>> +??????? return NULL;
>>> +
>>> +??? if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>> table_hdr->length)
>>> +??????? return NULL;
>>> +
>>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>> pptt_ref);
>>> +
>>> +??? if (pptt_ref + entry->length > table_hdr->length)
>>> +??????? return NULL;
>>> +
>>> +??? return entry;
>>> +}
>>> +
>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>> +{
>>> +??? return (struct acpi_pptt_processor 
>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>> +}
>>> +
>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>> +{
>>> +??? return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>>> pptt_ref);
>>> +}
>>> +
>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>> +??? struct acpi_table_header *table_hdr,
>>> +??? struct acpi_pptt_processor *node, int resource)
>>> +{
>>> +??? u32 ref;
>>> +
>>> +??? if (resource >= node->number_of_priv_resources)
>>> +??????? return NULL;
>>> +
>>> +??? ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>> +????????????? sizeof(u32) * resource);
>>> +
>>> +??? return fetch_pptt_subtable(table_hdr, ref);
>>> +}
>>> +
>>> +/*
>>> + * given a pptt resource, verify that it is a cache node, then walk
>>> + * down each level of caches, counting how many levels are found
>>> + * as well as checking the cache type (icache, dcache, unified). If a
>>> + * level & type match, then we set found, and continue the search.
>>> + * Once the entire cache branch has been walked return its max
>>> + * depth.
>>> + */
>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>> +??????????????? int local_level,
>>> +??????????????? struct acpi_subtable_header *res,
>>> +??????????????? struct acpi_pptt_cache **found,
>>> +??????????????? int level, int type)
>>> +{
>>> +??? struct acpi_pptt_cache *cache;
>>> +
>>> +??? if (res->type != ACPI_PPTT_TYPE_CACHE)
>>> +??????? return 0;
>>> +
>>> +??? cache = (struct acpi_pptt_cache *) res;
>>> +??? while (cache) {
>>> +??????? local_level++;
>>> +
>>> +??????? if ((local_level == level) &&
>>> +??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>> +??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>> type)) {
>>
>> Attributes have to be shifted:
>>
>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
> 
> Hmmm, I'm not sure that is true, the top level function in this routine 
> convert the "linux" constant to the ACPI version of that constant. In 
> that case the "type" field is pre-shifted, so that it matches the result 
> of just anding against the field... That is unless I messed something 
> up, which I don't see at the moment (and the code of course has been 
> tested with PPTT's from multiple people at this point).

For ThunderX2 I got lots of errors in dmesg:
Found duplicate cache level/type unable to determine uniqueness

So I fixed "type" macros definitions (without shifting) and shift it 
here which fixes the issue. As you said, it can be pre-shifted as well.

> 
> 
>>
>>> +??????????? if (*found != NULL)
>>> +??????????????? pr_err("Found duplicate cache level/type unable to 
>>> determine uniqueness\n");
>>> +
>>> +??????????? pr_debug("Found cache @ level %d\n", level);
>>> +??????????? *found = cache;
>>> +??????????? /*
>>> +???????????? * continue looking at this node's resource list
>>> +???????????? * to verify that we don't find a duplicate
>>> +???????????? * cache node.
>>> +???????????? */
>>> +??????? }
>>> +??????? cache = fetch_pptt_cache(table_hdr, 
>>> cache->next_level_of_cache);
>>> +??? }
>>> +??? return local_level;
>>> +}
>>> +
>>> +/*
>>> + * Given a CPU node look for cache levels that exist at this level, 
>>> and then
>>> + * for each cache node, count how many levels exist below (logically 
>>> above) it.
>>> + * If a level and type are specified, and we find that level/type, 
>>> abort
>>> + * processing and return the acpi_pptt_cache structure.
>>> + */
>>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>>> +??? struct acpi_table_header *table_hdr,
>>> +??? struct acpi_pptt_processor *cpu_node,
>>> +??? int *starting_level, int level, int type)
>>> +{
>>> +??? struct acpi_subtable_header *res;
>>> +??? int number_of_levels = *starting_level;
>>> +??? int resource = 0;
>>> +??? struct acpi_pptt_cache *ret = NULL;
>>> +??? int local_level;
>>> +
>>> +??? /* walk down from the processor node */
>>> +??? while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, 
>>> resource))) {
>>> +??????? resource++;
>>> +
>>> +??????? local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>>> +?????????????????????????? res, &ret, level, type);
>>> +??????? /*
>>> +???????? * we are looking for the max depth. Since its potentially
>>> +???????? * possible for a given node to have resources with differing
>>> +???????? * depths verify that the depth we have found is the largest.
>>> +???????? */
>>> +??????? if (number_of_levels < local_level)
>>> +??????????? number_of_levels = local_level;
>>> +??? }
>>> +??? if (number_of_levels > *starting_level)
>>> +??????? *starting_level = number_of_levels;
>>> +
>>> +??? return ret;
>>> +}
>>> +
>>> +/*
>>> + * given a processor node containing a processing unit, walk into it 
>>> and count
>>> + * how many levels exist solely for it, and then walk up each level 
>>> until we hit
>>> + * the root node (ignore the package level because it may be 
>>> possible to have
>>> + * caches that exist across packages). Count the number of cache 
>>> levels that
>>> + * exist at each level on the way up.
>>> + */
>>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>>> +???????????????? struct acpi_pptt_processor *cpu_node)
>>> +{
>>> +??? int total_levels = 0;
>>> +
>>> +??? do {
>>> +??????? acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 
>>> 0);
>>> +??????? cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>> +??? } while (cpu_node);
>>> +
>>> +??? return total_levels;
>>> +}
>>> +
>>> +/* determine if the given node is a leaf node */
>>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>>> +?????????????????? struct acpi_pptt_processor *node)
>>> +{
>>> +??? struct acpi_subtable_header *entry;
>>> +??? unsigned long table_end;
>>> +??? u32 node_entry;
>>> +??? struct acpi_pptt_processor *cpu_node;
>>> +
>>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +??? node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>>> +??????????????????????? sizeof(struct acpi_table_pptt));
>>> +
>>> +??? while (((unsigned long)entry) + sizeof(struct 
>>> acpi_subtable_header) < table_end) {
>>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>>> +??????????? (cpu_node->parent == node_entry))
>>> +??????????? return 0;
>>> +??????? entry = (struct acpi_subtable_header *)((u8 *)entry + 
>>> entry->length);
>>> +??? }
>>> +??? return 1;
>>> +}
>>> +
>>> +/*
>>> + * Find the subtable entry describing the provided processor
>>> + */
>>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>>> +??? struct acpi_table_header *table_hdr,
>>> +??? u32 acpi_cpu_id)
>>> +{
>>> +??? struct acpi_subtable_header *entry;
>>> +??? unsigned long table_end;
>>> +??? struct acpi_pptt_processor *cpu_node;
>>> +
>>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>>> +??????????????????????? sizeof(struct acpi_table_pptt));
>>> +
>>> +??? /* find the processor structure associated with this cpuid */
>>> +??? while (((unsigned long)entry) + sizeof(struct 
>>> acpi_subtable_header) < table_end) {
>>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>>> +
>>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>>> +??????????? acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>>> +??????????? pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>>> +???????????????? acpi_cpu_id, cpu_node->acpi_processor_id);
>>> +??????????? if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>>> +??????????????? /* found the correct entry */
>>> +??????????????? pr_debug("match found!\n");
>>> +??????????????? return (struct acpi_pptt_processor *)entry;
>>> +??????????? }
>>> +??????? }
>>> +
>>> +??????? if (entry->length == 0) {
>>> +??????????? pr_err("Invalid zero length subtable\n");
>>> +??????????? break;
>>> +??????? }
>>> +??????? entry = (struct acpi_subtable_header *)
>>> +??????????? ((u8 *)entry + entry->length);
>>> +??? }
>>> +
>>> +??? return NULL;
>>> +}
>>> +
>>> +/*
>>> + * Given a acpi_pptt_processor node, walk up until we identify the
>>> + * package that the node is associated with or we run out of levels
>>> + * to request.
>>> + */
>>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>>> +??? struct acpi_table_header *table_hdr,
>>> +??? struct acpi_pptt_processor *cpu,
>>> +??? int level)
>>> +{
>>> +??? struct acpi_pptt_processor *prev_node;
>>> +
>>> +??? while (cpu && level && !(cpu->flags & 
>>> ACPI_PPTT_PHYSICAL_PACKAGE)) {
>>> +??????? pr_debug("level %d\n", level);
>>> +??????? prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>>> +??????? if (prev_node == NULL)
>>> +??????????? break;
>>> +??????? cpu = prev_node;
>>> +??????? level--;
>>> +??? }
>>> +??? return cpu;
>>> +}
>>> +
>>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 
>>> acpi_cpu_id)
>>
>> The function name can be more descriptive. How about:
>>
>> acpi_count_cache_level() ?
> 
> The naming has drifted a bit, so yes, that routine is only used by the 
> portion which is determining the number of cache levels for a given PE.
> 
> 
>>
>>> +{
>>> +??? int number_of_levels = 0;
>>> +??? struct acpi_pptt_processor *cpu;
>>> +
>>> +??? cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>>> +??? if (cpu)
>>> +??????? number_of_levels = acpi_process_node(table_hdr, cpu);
>>> +
>>> +??? return number_of_levels;
>>> +}
>>
>> It is hard to follow what acpi_find_cache_level() and 
>> acpi_pptt_walk_cache() really do. It is because they are trying to do 
>> too many things at the same time. IMO, splitting 
>> acpi_find_cache_level() logic to:
>> 1. counting the cache levels (max depth)
>> 2. finding the specific cache node
>> makes sense.
> 
> I disagree, that routine is shared by the two code paths because its 
> functionality is 99% duplicated between the two. The difference being 
> whether it terminates the search at a given level, or continues 
> searching until it runs out of nodes. The latter case is simply a 
> degenerate version of the first.

Mostly it is about trade-off between code simplicity and redundancy, I 
personally prefer the former. It is not the critical issue though.

> 
> 
>>
>> Also, seems like we can merge acpi_parse_pptt() & acpi_process_node().
> 
> That is true, but I fail to see how any of this is actually fixes 
> anything. There are a million ways to do this, including as pointed out 
> by building another data-structure to simplify the parsing what is a 
> table that is less than ideal for runtime parsing (starting with the 
> direction of the relative pointers, and ending with having to "infer" 
> information that isn't directly flagged). I actually built a couple 
> other versions of this, including a nice cute version which is about 1/8 
> this size of this and really easy to understand but of course is 
> recursive...

I believe this will improve code readability. Obviously, you can 
disagree with my suggestions.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-18  5:39         ` Tomasz Nowicki
@ 2017-10-18 10:24           ` Tomasz Nowicki
  -1 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-18 10:24 UTC (permalink / raw)
  To: Tomasz Nowicki, Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, austinwc, linux-pm, jhugo, gregkh,
	sudeep.holla, rjw, linux-kernel, will.deacon, wangxiongfeng2,
	viresh.kumar, hanjun.guo, catalin.marinas, ahs3,
	linux-arm-kernel

On 18.10.2017 07:39, Tomasz Nowicki wrote:
> Hi,
> 
> On 17.10.2017 17:22, Jeremy Linton wrote:
>> Hi,
>>
>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>> Hi Jeremy,
>>>
>>> I did second round of review and have some more comments, please see 
>>> below:
>>>
>>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>>> ACPI 6.2 adds a new table, which describes how processing units
>>>> are related to each other in tree like fashion. Caches are
>>>> also sprinkled throughout the tree and describe the properties
>>>> of the caches in relation to other caches and processing units.
>>>>
>>>> Add the code to parse the cache hierarchy and report the total
>>>> number of levels of cache for a given core using
>>>> acpi_find_last_cache_level() as well as fill out the individual
>>>> cores cache information with cache_setup_acpi() once the
>>>> cpu_cacheinfo structure has been populated by the arch specific
>>>> code.
>>>>
>>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>>> to report a unique ID for each processing unit at a given level
>>>> in the tree. These unique id's can then be used to match related
>>>> processing units which exist as threads, COD (clusters
>>>> on die), within a given package, etc.
>>>>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>   drivers/acpi/pptt.c | 485 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 485 insertions(+)
>>>>   create mode 100644 drivers/acpi/pptt.c
>>>>
>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>> new file mode 100644
>>>> index 000000000000..c86715fed4a7
>>>> --- /dev/null
>>>> +++ b/drivers/acpi/pptt.c
>>>> @@ -0,1 +1,485 @@
>>>> +/*
>>>> + * Copyright (C) 2017, ARM
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or 
>>>> modify it
>>>> + * under the terms and conditions of the GNU General Public License,
>>>> + * version 2, as published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope it will be useful, but 
>>>> WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of 
>>>> MERCHANTABILITY or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>>>> License for
>>>> + * more details.
>>>> + *
>>>> + * This file implements parsing of Processor Properties Topology 
>>>> Table (PPTT)
>>>> + * which is optionally used to describe the processor and cache 
>>>> topology.
>>>> + * Due to the relative pointers used throughout the table, this 
>>>> doesn't
>>>> + * leverage the existing subtable parsing in the kernel.
>>>> + */
>>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>>> +
>>>> +#include <linux/acpi.h>
>>>> +#include <linux/cacheinfo.h>
>>>> +#include <acpi/processor.h>
>>>> +
>>>> +/*
>>>> + * Given the PPTT table, find and verify that the subtable entry
>>>> + * is located within the table
>>>> + */
>>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>> +{
>>>> +    struct acpi_subtable_header *entry;
>>>> +
>>>> +    /* there isn't a subtable at reference 0 */
>>>> +    if (!pptt_ref)
>>>> +        return NULL;
>>>> +
>>>> +    if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>>> table_hdr->length)
>>>> +        return NULL;
>>>> +
>>>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>>> pptt_ref);
>>>> +
>>>> +    if (pptt_ref + entry->length > table_hdr->length)
>>>> +        return NULL;
>>>> +
>>>> +    return entry;
>>>> +}
>>>> +
>>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>> +{
>>>> +    return (struct acpi_pptt_processor 
>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>> +}
>>>> +
>>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>> +{
>>>> +    return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>>>> pptt_ref);
>>>> +}
>>>> +
>>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>>> +    struct acpi_table_header *table_hdr,
>>>> +    struct acpi_pptt_processor *node, int resource)
>>>> +{
>>>> +    u32 ref;
>>>> +
>>>> +    if (resource >= node->number_of_priv_resources)
>>>> +        return NULL;
>>>> +
>>>> +    ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>>> +              sizeof(u32) * resource);
>>>> +
>>>> +    return fetch_pptt_subtable(table_hdr, ref);
>>>> +}
>>>> +
>>>> +/*
>>>> + * given a pptt resource, verify that it is a cache node, then walk
>>>> + * down each level of caches, counting how many levels are found
>>>> + * as well as checking the cache type (icache, dcache, unified). If a
>>>> + * level & type match, then we set found, and continue the search.
>>>> + * Once the entire cache branch has been walked return its max
>>>> + * depth.
>>>> + */
>>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>>> +                int local_level,
>>>> +                struct acpi_subtable_header *res,
>>>> +                struct acpi_pptt_cache **found,
>>>> +                int level, int type)
>>>> +{
>>>> +    struct acpi_pptt_cache *cache;
>>>> +
>>>> +    if (res->type != ACPI_PPTT_TYPE_CACHE)
>>>> +        return 0;
>>>> +
>>>> +    cache = (struct acpi_pptt_cache *) res;
>>>> +    while (cache) {
>>>> +        local_level++;
>>>> +
>>>> +        if ((local_level == level) &&
>>>> +            (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>>> +            ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>>> type)) {
>>>
>>> Attributes have to be shifted:
>>>
>>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
>>
>> Hmmm, I'm not sure that is true, the top level function in this 
>> routine convert the "linux" constant to the ACPI version of that 
>> constant. In that case the "type" field is pre-shifted, so that it 
>> matches the result of just anding against the field... That is unless 
>> I messed something up, which I don't see at the moment (and the code 
>> of course has been tested with PPTT's from multiple people at this 
>> point).
> 
> For ThunderX2 I got lots of errors in dmesg:
> Found duplicate cache level/type unable to determine uniqueness
> 
> So I fixed "type" macros definitions (without shifting) and shift it 
> here which fixes the issue. As you said, it can be pre-shifted as well.
> 
>>
>>
>>>
>>>> +            if (*found != NULL)
>>>> +                pr_err("Found duplicate cache level/type unable to 
>>>> determine uniqueness\n");

Actually I still see this error messages in my dmesg. It is because the 
following ThunderX2 per-core L1 and L2 cache hierarchy:

Core
  ------------------
|                  |
| L1i -----        |
|         |        |
|          ----L2  |
|         |        |
| L1d -----        |
|                  |
  ------------------

In this case we have two paths which lead to L2 cache and hit above 
case. Is it really error case?

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-18 10:24           ` Tomasz Nowicki
  0 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-18 10:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 18.10.2017 07:39, Tomasz Nowicki wrote:
> Hi,
> 
> On 17.10.2017 17:22, Jeremy Linton wrote:
>> Hi,
>>
>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>> Hi Jeremy,
>>>
>>> I did second round of review and have some more comments, please see 
>>> below:
>>>
>>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>>> ACPI 6.2 adds a new table, which describes how processing units
>>>> are related to each other in tree like fashion. Caches are
>>>> also sprinkled throughout the tree and describe the properties
>>>> of the caches in relation to other caches and processing units.
>>>>
>>>> Add the code to parse the cache hierarchy and report the total
>>>> number of levels of cache for a given core using
>>>> acpi_find_last_cache_level() as well as fill out the individual
>>>> cores cache information with cache_setup_acpi() once the
>>>> cpu_cacheinfo structure has been populated by the arch specific
>>>> code.
>>>>
>>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>>> to report a unique ID for each processing unit at a given level
>>>> in the tree. These unique id's can then be used to match related
>>>> processing units which exist as threads, COD (clusters
>>>> on die), within a given package, etc.
>>>>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>> ? drivers/acpi/pptt.c | 485 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> ? 1 file changed, 485 insertions(+)
>>>> ? create mode 100644 drivers/acpi/pptt.c
>>>>
>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>> new file mode 100644
>>>> index 000000000000..c86715fed4a7
>>>> --- /dev/null
>>>> +++ b/drivers/acpi/pptt.c
>>>> @@ -0,1 +1,485 @@
>>>> +/*
>>>> + * Copyright (C) 2017, ARM
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or 
>>>> modify it
>>>> + * under the terms and conditions of the GNU General Public License,
>>>> + * version 2, as published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope it will be useful, but 
>>>> WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of 
>>>> MERCHANTABILITY or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE.? See the GNU General Public 
>>>> License for
>>>> + * more details.
>>>> + *
>>>> + * This file implements parsing of Processor Properties Topology 
>>>> Table (PPTT)
>>>> + * which is optionally used to describe the processor and cache 
>>>> topology.
>>>> + * Due to the relative pointers used throughout the table, this 
>>>> doesn't
>>>> + * leverage the existing subtable parsing in the kernel.
>>>> + */
>>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>>> +
>>>> +#include <linux/acpi.h>
>>>> +#include <linux/cacheinfo.h>
>>>> +#include <acpi/processor.h>
>>>> +
>>>> +/*
>>>> + * Given the PPTT table, find and verify that the subtable entry
>>>> + * is located within the table
>>>> + */
>>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>> +{
>>>> +??? struct acpi_subtable_header *entry;
>>>> +
>>>> +??? /* there isn't a subtable at reference 0 */
>>>> +??? if (!pptt_ref)
>>>> +??????? return NULL;
>>>> +
>>>> +??? if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>>> table_hdr->length)
>>>> +??????? return NULL;
>>>> +
>>>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>>> pptt_ref);
>>>> +
>>>> +??? if (pptt_ref + entry->length > table_hdr->length)
>>>> +??????? return NULL;
>>>> +
>>>> +??? return entry;
>>>> +}
>>>> +
>>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>> +{
>>>> +??? return (struct acpi_pptt_processor 
>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>> +}
>>>> +
>>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>> +{
>>>> +??? return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, 
>>>> pptt_ref);
>>>> +}
>>>> +
>>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>>> +??? struct acpi_table_header *table_hdr,
>>>> +??? struct acpi_pptt_processor *node, int resource)
>>>> +{
>>>> +??? u32 ref;
>>>> +
>>>> +??? if (resource >= node->number_of_priv_resources)
>>>> +??????? return NULL;
>>>> +
>>>> +??? ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>>> +????????????? sizeof(u32) * resource);
>>>> +
>>>> +??? return fetch_pptt_subtable(table_hdr, ref);
>>>> +}
>>>> +
>>>> +/*
>>>> + * given a pptt resource, verify that it is a cache node, then walk
>>>> + * down each level of caches, counting how many levels are found
>>>> + * as well as checking the cache type (icache, dcache, unified). If a
>>>> + * level & type match, then we set found, and continue the search.
>>>> + * Once the entire cache branch has been walked return its max
>>>> + * depth.
>>>> + */
>>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>>> +??????????????? int local_level,
>>>> +??????????????? struct acpi_subtable_header *res,
>>>> +??????????????? struct acpi_pptt_cache **found,
>>>> +??????????????? int level, int type)
>>>> +{
>>>> +??? struct acpi_pptt_cache *cache;
>>>> +
>>>> +??? if (res->type != ACPI_PPTT_TYPE_CACHE)
>>>> +??????? return 0;
>>>> +
>>>> +??? cache = (struct acpi_pptt_cache *) res;
>>>> +??? while (cache) {
>>>> +??????? local_level++;
>>>> +
>>>> +??????? if ((local_level == level) &&
>>>> +??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>>> +??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>>> type)) {
>>>
>>> Attributes have to be shifted:
>>>
>>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
>>
>> Hmmm, I'm not sure that is true, the top level function in this 
>> routine convert the "linux" constant to the ACPI version of that 
>> constant. In that case the "type" field is pre-shifted, so that it 
>> matches the result of just anding against the field... That is unless 
>> I messed something up, which I don't see at the moment (and the code 
>> of course has been tested with PPTT's from multiple people at this 
>> point).
> 
> For ThunderX2 I got lots of errors in dmesg:
> Found duplicate cache level/type unable to determine uniqueness
> 
> So I fixed "type" macros definitions (without shifting) and shift it 
> here which fixes the issue. As you said, it can be pre-shifted as well.
> 
>>
>>
>>>
>>>> +??????????? if (*found != NULL)
>>>> +??????????????? pr_err("Found duplicate cache level/type unable to 
>>>> determine uniqueness\n");

Actually I still see this error messages in my dmesg. It is because the 
following ThunderX2 per-core L1 and L2 cache hierarchy:

Core
  ------------------
|                  |
| L1i -----        |
|         |        |
|          ----L2  |
|         |        |
| L1d -----        |
|                  |
  ------------------

In this case we have two paths which lead to L2 cache and hit above 
case. Is it really error case?

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-18 16:47     ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-18 16:47 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Thu, Oct 12, 2017 at 02:48:51PM -0500, Jeremy Linton wrote:
> Now that we have a PPTT parser, in preparation for its use
> on arm64, lets build it.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/Kconfig         | 1 +
>  drivers/acpi/Makefile      | 1 +
>  drivers/acpi/arm64/Kconfig | 3 +++
>  3 files changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6a56d4..68c9d1289735 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -7,6 +7,7 @@ config ARM64
>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>  	select ACPI_MCFG if ACPI
>  	select ACPI_SPCR_TABLE if ACPI
> +	select ACPI_PPTT if ACPI
>  	select ARCH_CLOCKSOURCE_DATA
>  	select ARCH_HAS_DEBUG_VIRTUAL
>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 90265ab4437a..c92a0c937551 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>  
>  # processor has its own "processor." module_param namespace
>  processor-y			:= processor_driver.o
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index 5a6f80fce0d6..74b855a669ea 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -7,3 +7,6 @@ config ACPI_IORT
>  
>  config ACPI_GTDT
>  	bool
> +
> +config ACPI_PPTT
> +	bool
> \ No newline at end of file

I do not understand the logic. Why should we have a Kconfig option
in drivers/acpi/arm64 for code in drivers/acpi ?

AFAIK PPTT is not an ACPI ARM64 specific binding.

Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-18 16:47     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-18 16:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 12, 2017 at 02:48:51PM -0500, Jeremy Linton wrote:
> Now that we have a PPTT parser, in preparation for its use
> on arm64, lets build it.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/Kconfig         | 1 +
>  drivers/acpi/Makefile      | 1 +
>  drivers/acpi/arm64/Kconfig | 3 +++
>  3 files changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6a56d4..68c9d1289735 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -7,6 +7,7 @@ config ARM64
>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>  	select ACPI_MCFG if ACPI
>  	select ACPI_SPCR_TABLE if ACPI
> +	select ACPI_PPTT if ACPI
>  	select ARCH_CLOCKSOURCE_DATA
>  	select ARCH_HAS_DEBUG_VIRTUAL
>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 90265ab4437a..c92a0c937551 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>  
>  # processor has its own "processor." module_param namespace
>  processor-y			:= processor_driver.o
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index 5a6f80fce0d6..74b855a669ea 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -7,3 +7,6 @@ config ACPI_IORT
>  
>  config ACPI_GTDT
>  	bool
> +
> +config ACPI_PPTT
> +	bool
> \ No newline at end of file

I do not understand the logic. Why should we have a Kconfig option
in drivers/acpi/arm64 for code in drivers/acpi ?

AFAIK PPTT is not an ACPI ARM64 specific binding.

Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-18 10:24           ` Tomasz Nowicki
@ 2017-10-18 17:30             ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-18 17:30 UTC (permalink / raw)
  To: Tomasz Nowicki, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, austinwc, linux-pm, jhugo, gregkh,
	sudeep.holla, rjw, linux-kernel, will.deacon, wangxiongfeng2,
	viresh.kumar, hanjun.guo, catalin.marinas, ahs3,
	linux-arm-kernel

On 10/18/2017 05:24 AM, Tomasz Nowicki wrote:
> On 18.10.2017 07:39, Tomasz Nowicki wrote:
>> Hi,
>>
>> On 17.10.2017 17:22, Jeremy Linton wrote:
>>> Hi,
>>>
>>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>>> Hi Jeremy,
>>>>
>>>> I did second round of review and have some more comments, please see 
>>>> below:
>>>>
>>>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>>>> ACPI 6.2 adds a new table, which describes how processing units
>>>>> are related to each other in tree like fashion. Caches are
>>>>> also sprinkled throughout the tree and describe the properties
>>>>> of the caches in relation to other caches and processing units.
>>>>>
>>>>> Add the code to parse the cache hierarchy and report the total
>>>>> number of levels of cache for a given core using
>>>>> acpi_find_last_cache_level() as well as fill out the individual
>>>>> cores cache information with cache_setup_acpi() once the
>>>>> cpu_cacheinfo structure has been populated by the arch specific
>>>>> code.
>>>>>
>>>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>>>> to report a unique ID for each processing unit at a given level
>>>>> in the tree. These unique id's can then be used to match related
>>>>> processing units which exist as threads, COD (clusters
>>>>> on die), within a given package, etc.
>>>>>
>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>> ---
>>>>>   drivers/acpi/pptt.c | 485 
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>   1 file changed, 485 insertions(+)
>>>>>   create mode 100644 drivers/acpi/pptt.c
>>>>>
>>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>>> new file mode 100644
>>>>> index 000000000000..c86715fed4a7
>>>>> --- /dev/null
>>>>> +++ b/drivers/acpi/pptt.c
>>>>> @@ -0,1 +1,485 @@
>>>>> +/*
>>>>> + * Copyright (C) 2017, ARM
>>>>> + *
>>>>> + * This program is free software; you can redistribute it and/or 
>>>>> modify it
>>>>> + * under the terms and conditions of the GNU General Public License,
>>>>> + * version 2, as published by the Free Software Foundation.
>>>>> + *
>>>>> + * This program is distributed in the hope it will be useful, but 
>>>>> WITHOUT
>>>>> + * ANY WARRANTY; without even the implied warranty of 
>>>>> MERCHANTABILITY or
>>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>>>>> License for
>>>>> + * more details.
>>>>> + *
>>>>> + * This file implements parsing of Processor Properties Topology 
>>>>> Table (PPTT)
>>>>> + * which is optionally used to describe the processor and cache 
>>>>> topology.
>>>>> + * Due to the relative pointers used throughout the table, this 
>>>>> doesn't
>>>>> + * leverage the existing subtable parsing in the kernel.
>>>>> + */
>>>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>>>> +
>>>>> +#include <linux/acpi.h>
>>>>> +#include <linux/cacheinfo.h>
>>>>> +#include <acpi/processor.h>
>>>>> +
>>>>> +/*
>>>>> + * Given the PPTT table, find and verify that the subtable entry
>>>>> + * is located within the table
>>>>> + */
>>>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>> +{
>>>>> +    struct acpi_subtable_header *entry;
>>>>> +
>>>>> +    /* there isn't a subtable at reference 0 */
>>>>> +    if (!pptt_ref)
>>>>> +        return NULL;
>>>>> +
>>>>> +    if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>>>> table_hdr->length)
>>>>> +        return NULL;
>>>>> +
>>>>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>>>> pptt_ref);
>>>>> +
>>>>> +    if (pptt_ref + entry->length > table_hdr->length)
>>>>> +        return NULL;
>>>>> +
>>>>> +    return entry;
>>>>> +}
>>>>> +
>>>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>> +{
>>>>> +    return (struct acpi_pptt_processor 
>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>> +}
>>>>> +
>>>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>> +{
>>>>> +    return (struct acpi_pptt_cache 
>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>> +}
>>>>> +
>>>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>>>> +    struct acpi_table_header *table_hdr,
>>>>> +    struct acpi_pptt_processor *node, int resource)
>>>>> +{
>>>>> +    u32 ref;
>>>>> +
>>>>> +    if (resource >= node->number_of_priv_resources)
>>>>> +        return NULL;
>>>>> +
>>>>> +    ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>>>> +              sizeof(u32) * resource);
>>>>> +
>>>>> +    return fetch_pptt_subtable(table_hdr, ref);
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * given a pptt resource, verify that it is a cache node, then walk
>>>>> + * down each level of caches, counting how many levels are found
>>>>> + * as well as checking the cache type (icache, dcache, unified). If a
>>>>> + * level & type match, then we set found, and continue the search.
>>>>> + * Once the entire cache branch has been walked return its max
>>>>> + * depth.
>>>>> + */
>>>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>>>> +                int local_level,
>>>>> +                struct acpi_subtable_header *res,
>>>>> +                struct acpi_pptt_cache **found,
>>>>> +                int level, int type)
>>>>> +{
>>>>> +    struct acpi_pptt_cache *cache;
>>>>> +
>>>>> +    if (res->type != ACPI_PPTT_TYPE_CACHE)
>>>>> +        return 0;
>>>>> +
>>>>> +    cache = (struct acpi_pptt_cache *) res;
>>>>> +    while (cache) {
>>>>> +        local_level++;
>>>>> +
>>>>> +        if ((local_level == level) &&
>>>>> +            (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>>>> +            ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>>>> type)) {
>>>>
>>>> Attributes have to be shifted:
>>>>
>>>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
>>>
>>> Hmmm, I'm not sure that is true, the top level function in this 
>>> routine convert the "linux" constant to the ACPI version of that 
>>> constant. In that case the "type" field is pre-shifted, so that it 
>>> matches the result of just anding against the field... That is unless 
>>> I messed something up, which I don't see at the moment (and the code 
>>> of course has been tested with PPTT's from multiple people at this 
>>> point).
>>
>> For ThunderX2 I got lots of errors in dmesg:
>> Found duplicate cache level/type unable to determine uniqueness
>>
>> So I fixed "type" macros definitions (without shifting) and shift it 
>> here which fixes the issue. As you said, it can be pre-shifted as well.

Ah, yah right... If you removed the shift per your original comment then 
it breaks this. Yes, and the type definitions for cache type aren't 
wrong in this version because the unified state has the 3rd bit set for 
both the 0x3 and 0x2 values and its only used to covert from the linux 
type to the ACPI type (and not back because we don't mess with whatever 
the original "detection" was). I'm not really planning on changing that 
because I don't think it helps "readability" (and it converts a compile 
time constant to a runtime shift).

>>
>>>
>>>
>>>>
>>>>> +            if (*found != NULL)
>>>>> +                pr_err("Found duplicate cache level/type unable to 
>>>>> determine uniqueness\n");
> 
> Actually I still see this error messages in my dmesg. It is because the 
> following ThunderX2 per-core L1 and L2 cache hierarchy:
> 
> Core
>   ------------------
> |                  |
> | L1i -----        |
> |         |        |
> |          ----L2  |
> |         |        |
> | L1d -----        |
> |                  |
>   ------------------
> 
> In this case we have two paths which lead to L2 cache and hit above 
> case. Is it really error case?

No, but its not deterministic unless we mark the node, which doesn't 
solve the problem of a table constructed like

L1i->L2 (unified)
L1d->L2 (unified)

or various other structures which aren't disallowed by the spec and have 
non-deterministic real world meanings, anymore than constructing the 
table like:

L1i
Lid->L2(unified)

which I tend to prefer because with a structuring like that it can be 
deterministic (and in a way actually represents the non-coherent 
behavior of (most?) ARM64 core's i-caches, as could be argued the first 
example if the allocation policies are varied between the L2 nodes).

The really ugly bits here happen if you add another layer:

L1i->L2i-L3
L1d------^

which is why I made that an error message, not including the fact that 
since the levels aren't tagged the numbering and meaning isn't clear.

(the L1i in the above example might be better called an L0i to avoid 
throwing off the reset of the hierarchy numbering, also so it could be 
ignored).

Summary:

I'm not at all happy with this specification's attempt to leave out 
pieces of information which make parsing things more deterministic. In 
this case I'm happy to demote the message level, but not remove it 
entirely but I do think the obvious case you list shouldn't be the 
default one.

Lastly:

I'm assuming the final result is that the table is actually being parsed 
correctly despite the ugly message?



^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-18 17:30             ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-18 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/18/2017 05:24 AM, Tomasz Nowicki wrote:
> On 18.10.2017 07:39, Tomasz Nowicki wrote:
>> Hi,
>>
>> On 17.10.2017 17:22, Jeremy Linton wrote:
>>> Hi,
>>>
>>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>>> Hi Jeremy,
>>>>
>>>> I did second round of review and have some more comments, please see 
>>>> below:
>>>>
>>>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>>>> ACPI 6.2 adds a new table, which describes how processing units
>>>>> are related to each other in tree like fashion. Caches are
>>>>> also sprinkled throughout the tree and describe the properties
>>>>> of the caches in relation to other caches and processing units.
>>>>>
>>>>> Add the code to parse the cache hierarchy and report the total
>>>>> number of levels of cache for a given core using
>>>>> acpi_find_last_cache_level() as well as fill out the individual
>>>>> cores cache information with cache_setup_acpi() once the
>>>>> cpu_cacheinfo structure has been populated by the arch specific
>>>>> code.
>>>>>
>>>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>>>> to report a unique ID for each processing unit at a given level
>>>>> in the tree. These unique id's can then be used to match related
>>>>> processing units which exist as threads, COD (clusters
>>>>> on die), within a given package, etc.
>>>>>
>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>> ---
>>>>> ? drivers/acpi/pptt.c | 485 
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> ? 1 file changed, 485 insertions(+)
>>>>> ? create mode 100644 drivers/acpi/pptt.c
>>>>>
>>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>>> new file mode 100644
>>>>> index 000000000000..c86715fed4a7
>>>>> --- /dev/null
>>>>> +++ b/drivers/acpi/pptt.c
>>>>> @@ -0,1 +1,485 @@
>>>>> +/*
>>>>> + * Copyright (C) 2017, ARM
>>>>> + *
>>>>> + * This program is free software; you can redistribute it and/or 
>>>>> modify it
>>>>> + * under the terms and conditions of the GNU General Public License,
>>>>> + * version 2, as published by the Free Software Foundation.
>>>>> + *
>>>>> + * This program is distributed in the hope it will be useful, but 
>>>>> WITHOUT
>>>>> + * ANY WARRANTY; without even the implied warranty of 
>>>>> MERCHANTABILITY or
>>>>> + * FITNESS FOR A PARTICULAR PURPOSE.? See the GNU General Public 
>>>>> License for
>>>>> + * more details.
>>>>> + *
>>>>> + * This file implements parsing of Processor Properties Topology 
>>>>> Table (PPTT)
>>>>> + * which is optionally used to describe the processor and cache 
>>>>> topology.
>>>>> + * Due to the relative pointers used throughout the table, this 
>>>>> doesn't
>>>>> + * leverage the existing subtable parsing in the kernel.
>>>>> + */
>>>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>>>> +
>>>>> +#include <linux/acpi.h>
>>>>> +#include <linux/cacheinfo.h>
>>>>> +#include <acpi/processor.h>
>>>>> +
>>>>> +/*
>>>>> + * Given the PPTT table, find and verify that the subtable entry
>>>>> + * is located within the table
>>>>> + */
>>>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>> +{
>>>>> +??? struct acpi_subtable_header *entry;
>>>>> +
>>>>> +??? /* there isn't a subtable at reference 0 */
>>>>> +??? if (!pptt_ref)
>>>>> +??????? return NULL;
>>>>> +
>>>>> +??? if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>>>> table_hdr->length)
>>>>> +??????? return NULL;
>>>>> +
>>>>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>>>> pptt_ref);
>>>>> +
>>>>> +??? if (pptt_ref + entry->length > table_hdr->length)
>>>>> +??????? return NULL;
>>>>> +
>>>>> +??? return entry;
>>>>> +}
>>>>> +
>>>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>> +{
>>>>> +??? return (struct acpi_pptt_processor 
>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>> +}
>>>>> +
>>>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>> +{
>>>>> +??? return (struct acpi_pptt_cache 
>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>> +}
>>>>> +
>>>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>>>> +??? struct acpi_table_header *table_hdr,
>>>>> +??? struct acpi_pptt_processor *node, int resource)
>>>>> +{
>>>>> +??? u32 ref;
>>>>> +
>>>>> +??? if (resource >= node->number_of_priv_resources)
>>>>> +??????? return NULL;
>>>>> +
>>>>> +??? ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>>>> +????????????? sizeof(u32) * resource);
>>>>> +
>>>>> +??? return fetch_pptt_subtable(table_hdr, ref);
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * given a pptt resource, verify that it is a cache node, then walk
>>>>> + * down each level of caches, counting how many levels are found
>>>>> + * as well as checking the cache type (icache, dcache, unified). If a
>>>>> + * level & type match, then we set found, and continue the search.
>>>>> + * Once the entire cache branch has been walked return its max
>>>>> + * depth.
>>>>> + */
>>>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>>>> +??????????????? int local_level,
>>>>> +??????????????? struct acpi_subtable_header *res,
>>>>> +??????????????? struct acpi_pptt_cache **found,
>>>>> +??????????????? int level, int type)
>>>>> +{
>>>>> +??? struct acpi_pptt_cache *cache;
>>>>> +
>>>>> +??? if (res->type != ACPI_PPTT_TYPE_CACHE)
>>>>> +??????? return 0;
>>>>> +
>>>>> +??? cache = (struct acpi_pptt_cache *) res;
>>>>> +??? while (cache) {
>>>>> +??????? local_level++;
>>>>> +
>>>>> +??????? if ((local_level == level) &&
>>>>> +??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>>>> +??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>>>> type)) {
>>>>
>>>> Attributes have to be shifted:
>>>>
>>>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
>>>
>>> Hmmm, I'm not sure that is true, the top level function in this 
>>> routine convert the "linux" constant to the ACPI version of that 
>>> constant. In that case the "type" field is pre-shifted, so that it 
>>> matches the result of just anding against the field... That is unless 
>>> I messed something up, which I don't see at the moment (and the code 
>>> of course has been tested with PPTT's from multiple people at this 
>>> point).
>>
>> For ThunderX2 I got lots of errors in dmesg:
>> Found duplicate cache level/type unable to determine uniqueness
>>
>> So I fixed "type" macros definitions (without shifting) and shift it 
>> here which fixes the issue. As you said, it can be pre-shifted as well.

Ah, yah right... If you removed the shift per your original comment then 
it breaks this. Yes, and the type definitions for cache type aren't 
wrong in this version because the unified state has the 3rd bit set for 
both the 0x3 and 0x2 values and its only used to covert from the linux 
type to the ACPI type (and not back because we don't mess with whatever 
the original "detection" was). I'm not really planning on changing that 
because I don't think it helps "readability" (and it converts a compile 
time constant to a runtime shift).

>>
>>>
>>>
>>>>
>>>>> +??????????? if (*found != NULL)
>>>>> +??????????????? pr_err("Found duplicate cache level/type unable to 
>>>>> determine uniqueness\n");
> 
> Actually I still see this error messages in my dmesg. It is because the 
> following ThunderX2 per-core L1 and L2 cache hierarchy:
> 
> Core
>  ?------------------
> |????????????????? |
> | L1i -----??????? |
> |???????? |??????? |
> |????????? ----L2? |
> |???????? |??????? |
> | L1d -----??????? |
> |????????????????? |
>  ?------------------
> 
> In this case we have two paths which lead to L2 cache and hit above 
> case. Is it really error case?

No, but its not deterministic unless we mark the node, which doesn't 
solve the problem of a table constructed like

L1i->L2 (unified)
L1d->L2 (unified)

or various other structures which aren't disallowed by the spec and have 
non-deterministic real world meanings, anymore than constructing the 
table like:

L1i
Lid->L2(unified)

which I tend to prefer because with a structuring like that it can be 
deterministic (and in a way actually represents the non-coherent 
behavior of (most?) ARM64 core's i-caches, as could be argued the first 
example if the allocation policies are varied between the L2 nodes).

The really ugly bits here happen if you add another layer:

L1i->L2i-L3
L1d------^

which is why I made that an error message, not including the fact that 
since the levels aren't tagged the numbering and meaning isn't clear.

(the L1i in the above example might be better called an L0i to avoid 
throwing off the reset of the hierarchy numbering, also so it could be 
ignored).

Summary:

I'm not at all happy with this specification's attempt to leave out 
pieces of information which make parsing things more deterministic. In 
this case I'm happy to demote the message level, but not remove it 
entirely but I do think the obvious case you list shouldn't be the 
default one.

Lastly:

I'm assuming the final result is that the table is actually being parsed 
correctly despite the ugly message?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
  2017-10-18 16:47     ` Lorenzo Pieralisi
@ 2017-10-18 17:38       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-18 17:38 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On 10/18/2017 11:47 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:51PM -0500, Jeremy Linton wrote:
>> Now that we have a PPTT parser, in preparation for its use
>> on arm64, lets build it.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/Kconfig         | 1 +
>>   drivers/acpi/Makefile      | 1 +
>>   drivers/acpi/arm64/Kconfig | 3 +++
>>   3 files changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 0df64a6a56d4..68c9d1289735 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -7,6 +7,7 @@ config ARM64
>>   	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>>   	select ACPI_MCFG if ACPI
>>   	select ACPI_SPCR_TABLE if ACPI
>> +	select ACPI_PPTT if ACPI
>>   	select ARCH_CLOCKSOURCE_DATA
>>   	select ARCH_HAS_DEBUG_VIRTUAL
>>   	select ARCH_HAS_DEVMEM_IS_ALLOWED
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 90265ab4437a..c92a0c937551 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>>   obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>>   obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>>   obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
>> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>>   
>>   # processor has its own "processor." module_param namespace
>>   processor-y			:= processor_driver.o
>> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
>> index 5a6f80fce0d6..74b855a669ea 100644
>> --- a/drivers/acpi/arm64/Kconfig
>> +++ b/drivers/acpi/arm64/Kconfig
>> @@ -7,3 +7,6 @@ config ACPI_IORT
>>   
>>   config ACPI_GTDT
>>   	bool
>> +
>> +config ACPI_PPTT
>> +	bool
>> \ No newline at end of file
> 
> I do not understand the logic. Why should we have a Kconfig option
> in drivers/acpi/arm64 for code in drivers/acpi ?
> 
> AFAIK PPTT is not an ACPI ARM64 specific binding.

Weird hu? Originally I had the whole shebang in arm64 because the x86 
(or whatever) bindings have not been written. My assumption is that once 
that part had been provided it could be moved.

The config is sort of an artifact and "easier" to move than the whole 
file. But, as Hanjun has also been complaining about it I've agreed to 
move it to the "correct" location but keep it in the arm64 wrapper. Of 
course I think that is a bit strange too, but whatever...

Once the arm64 side of things are all wrapped up (and I can come up for 
some air) I willing to help with bindings on other architectures if 
anyone is truly interested.  But, I view that whole exercise as more a 
"bug" fixing one than providing any real benefit at this point.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-18 17:38       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-18 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/18/2017 11:47 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:51PM -0500, Jeremy Linton wrote:
>> Now that we have a PPTT parser, in preparation for its use
>> on arm64, lets build it.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/Kconfig         | 1 +
>>   drivers/acpi/Makefile      | 1 +
>>   drivers/acpi/arm64/Kconfig | 3 +++
>>   3 files changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 0df64a6a56d4..68c9d1289735 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -7,6 +7,7 @@ config ARM64
>>   	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>>   	select ACPI_MCFG if ACPI
>>   	select ACPI_SPCR_TABLE if ACPI
>> +	select ACPI_PPTT if ACPI
>>   	select ARCH_CLOCKSOURCE_DATA
>>   	select ARCH_HAS_DEBUG_VIRTUAL
>>   	select ARCH_HAS_DEVMEM_IS_ALLOWED
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 90265ab4437a..c92a0c937551 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>>   obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>>   obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
>>   obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
>> +obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>>   
>>   # processor has its own "processor." module_param namespace
>>   processor-y			:= processor_driver.o
>> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
>> index 5a6f80fce0d6..74b855a669ea 100644
>> --- a/drivers/acpi/arm64/Kconfig
>> +++ b/drivers/acpi/arm64/Kconfig
>> @@ -7,3 +7,6 @@ config ACPI_IORT
>>   
>>   config ACPI_GTDT
>>   	bool
>> +
>> +config ACPI_PPTT
>> +	bool
>> \ No newline at end of file
> 
> I do not understand the logic. Why should we have a Kconfig option
> in drivers/acpi/arm64 for code in drivers/acpi ?
> 
> AFAIK PPTT is not an ACPI ARM64 specific binding.

Weird hu? Originally I had the whole shebang in arm64 because the x86 
(or whatever) bindings have not been written. My assumption is that once 
that part had been provided it could be moved.

The config is sort of an artifact and "easier" to move than the whole 
file. But, as Hanjun has also been complaining about it I've agreed to 
move it to the "correct" location but keep it in the arm64 wrapper. Of 
course I think that is a bit strange too, but whatever...

Once the arm64 side of things are all wrapped up (and I can come up for 
some air) I willing to help with bindings on other architectures if 
anyone is truly interested.  But, I view that whole exercise as more a 
"bug" fixing one than providing any real benefit at this point.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-18 17:30             ` Jeremy Linton
@ 2017-10-19  5:18               ` Tomasz Nowicki
  -1 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-19  5:18 UTC (permalink / raw)
  To: Jeremy Linton, Tomasz Nowicki, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, austinwc, linux-pm, jhugo, gregkh,
	sudeep.holla, rjw, linux-kernel, will.deacon, wangxiongfeng2,
	viresh.kumar, hanjun.guo, catalin.marinas, ahs3,
	linux-arm-kernel

On 18.10.2017 19:30, Jeremy Linton wrote:
> On 10/18/2017 05:24 AM, Tomasz Nowicki wrote:
>> On 18.10.2017 07:39, Tomasz Nowicki wrote:
>>> Hi,
>>>
>>> On 17.10.2017 17:22, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>>>> Hi Jeremy,
>>>>>
>>>>> I did second round of review and have some more comments, please 
>>>>> see below:
>>>>>
>>>>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>>>>> ACPI 6.2 adds a new table, which describes how processing units
>>>>>> are related to each other in tree like fashion. Caches are
>>>>>> also sprinkled throughout the tree and describe the properties
>>>>>> of the caches in relation to other caches and processing units.
>>>>>>
>>>>>> Add the code to parse the cache hierarchy and report the total
>>>>>> number of levels of cache for a given core using
>>>>>> acpi_find_last_cache_level() as well as fill out the individual
>>>>>> cores cache information with cache_setup_acpi() once the
>>>>>> cpu_cacheinfo structure has been populated by the arch specific
>>>>>> code.
>>>>>>
>>>>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>>>>> to report a unique ID for each processing unit at a given level
>>>>>> in the tree. These unique id's can then be used to match related
>>>>>> processing units which exist as threads, COD (clusters
>>>>>> on die), within a given package, etc.
>>>>>>
>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>>> ---
>>>>>>   drivers/acpi/pptt.c | 485 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>   1 file changed, 485 insertions(+)
>>>>>>   create mode 100644 drivers/acpi/pptt.c
>>>>>>
>>>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..c86715fed4a7
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/acpi/pptt.c
>>>>>> @@ -0,1 +1,485 @@
>>>>>> +/*
>>>>>> + * Copyright (C) 2017, ARM
>>>>>> + *
>>>>>> + * This program is free software; you can redistribute it and/or 
>>>>>> modify it
>>>>>> + * under the terms and conditions of the GNU General Public License,
>>>>>> + * version 2, as published by the Free Software Foundation.
>>>>>> + *
>>>>>> + * This program is distributed in the hope it will be useful, but 
>>>>>> WITHOUT
>>>>>> + * ANY WARRANTY; without even the implied warranty of 
>>>>>> MERCHANTABILITY or
>>>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>>>>>> License for
>>>>>> + * more details.
>>>>>> + *
>>>>>> + * This file implements parsing of Processor Properties Topology 
>>>>>> Table (PPTT)
>>>>>> + * which is optionally used to describe the processor and cache 
>>>>>> topology.
>>>>>> + * Due to the relative pointers used throughout the table, this 
>>>>>> doesn't
>>>>>> + * leverage the existing subtable parsing in the kernel.
>>>>>> + */
>>>>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>>>>> +
>>>>>> +#include <linux/acpi.h>
>>>>>> +#include <linux/cacheinfo.h>
>>>>>> +#include <acpi/processor.h>
>>>>>> +
>>>>>> +/*
>>>>>> + * Given the PPTT table, find and verify that the subtable entry
>>>>>> + * is located within the table
>>>>>> + */
>>>>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>>> +{
>>>>>> +    struct acpi_subtable_header *entry;
>>>>>> +
>>>>>> +    /* there isn't a subtable at reference 0 */
>>>>>> +    if (!pptt_ref)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>>>>> table_hdr->length)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>>>>> pptt_ref);
>>>>>> +
>>>>>> +    if (pptt_ref + entry->length > table_hdr->length)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    return entry;
>>>>>> +}
>>>>>> +
>>>>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>>> +{
>>>>>> +    return (struct acpi_pptt_processor 
>>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>>> +}
>>>>>> +
>>>>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>>>>> +    struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>>> +{
>>>>>> +    return (struct acpi_pptt_cache 
>>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>>> +}
>>>>>> +
>>>>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>>>>> +    struct acpi_table_header *table_hdr,
>>>>>> +    struct acpi_pptt_processor *node, int resource)
>>>>>> +{
>>>>>> +    u32 ref;
>>>>>> +
>>>>>> +    if (resource >= node->number_of_priv_resources)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>>>>> +              sizeof(u32) * resource);
>>>>>> +
>>>>>> +    return fetch_pptt_subtable(table_hdr, ref);
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * given a pptt resource, verify that it is a cache node, then walk
>>>>>> + * down each level of caches, counting how many levels are found
>>>>>> + * as well as checking the cache type (icache, dcache, unified). 
>>>>>> If a
>>>>>> + * level & type match, then we set found, and continue the search.
>>>>>> + * Once the entire cache branch has been walked return its max
>>>>>> + * depth.
>>>>>> + */
>>>>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>>>>> +                int local_level,
>>>>>> +                struct acpi_subtable_header *res,
>>>>>> +                struct acpi_pptt_cache **found,
>>>>>> +                int level, int type)
>>>>>> +{
>>>>>> +    struct acpi_pptt_cache *cache;
>>>>>> +
>>>>>> +    if (res->type != ACPI_PPTT_TYPE_CACHE)
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    cache = (struct acpi_pptt_cache *) res;
>>>>>> +    while (cache) {
>>>>>> +        local_level++;
>>>>>> +
>>>>>> +        if ((local_level == level) &&
>>>>>> +            (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>>>>> +            ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>>>>> type)) {
>>>>>
>>>>> Attributes have to be shifted:
>>>>>
>>>>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
>>>>
>>>> Hmmm, I'm not sure that is true, the top level function in this 
>>>> routine convert the "linux" constant to the ACPI version of that 
>>>> constant. In that case the "type" field is pre-shifted, so that it 
>>>> matches the result of just anding against the field... That is 
>>>> unless I messed something up, which I don't see at the moment (and 
>>>> the code of course has been tested with PPTT's from multiple people 
>>>> at this point).
>>>
>>> For ThunderX2 I got lots of errors in dmesg:
>>> Found duplicate cache level/type unable to determine uniqueness
>>>
>>> So I fixed "type" macros definitions (without shifting) and shift it 
>>> here which fixes the issue. As you said, it can be pre-shifted as well.
> 
> Ah, yah right... If you removed the shift per your original comment then 
> it breaks this. Yes, and the type definitions for cache type aren't 
> wrong in this version because the unified state has the 3rd bit set for 
> both the 0x3 and 0x2 values and its only used to covert from the linux 
> type to the ACPI type (and not back because we don't mess with whatever 
> the original "detection" was). I'm not really planning on changing that 
> because I don't think it helps "readability" (and it converts a compile 
> time constant to a runtime shift).
> 
>>>
>>>>
>>>>
>>>>>
>>>>>> +            if (*found != NULL)
>>>>>> +                pr_err("Found duplicate cache level/type unable 
>>>>>> to determine uniqueness\n");
>>
>> Actually I still see this error messages in my dmesg. It is because 
>> the following ThunderX2 per-core L1 and L2 cache hierarchy:
>>
>> Core
>>   ------------------
>> |                  |
>> | L1i -----        |
>> |         |        |
>> |          ----L2  |
>> |         |        |
>> | L1d -----        |
>> |                  |
>>   ------------------
>>
>> In this case we have two paths which lead to L2 cache and hit above 
>> case. Is it really error case?
> 
> No, but its not deterministic unless we mark the node, which doesn't 
> solve the problem of a table constructed like
> 
> L1i->L2 (unified)
> L1d->L2 (unified)
> 
> or various other structures which aren't disallowed by the spec and have 
> non-deterministic real world meanings, anymore than constructing the 
> table like:
> 
> L1i
> Lid->L2(unified)
> 
> which I tend to prefer because with a structuring like that it can be 
> deterministic (and in a way actually represents the non-coherent 
> behavior of (most?) ARM64 core's i-caches, as could be argued the first 
> example if the allocation policies are varied between the L2 nodes).
> 
> The really ugly bits here happen if you add another layer:
> 
> L1i->L2i-L3
> L1d------^
> 
> which is why I made that an error message, not including the fact that 
> since the levels aren't tagged the numbering and meaning isn't clear.
> 
> (the L1i in the above example might be better called an L0i to avoid 
> throwing off the reset of the hierarchy numbering, also so it could be 
> ignored).
> 
> Summary:
> 
> I'm not at all happy with this specification's attempt to leave out 
> pieces of information which make parsing things more deterministic. In 
> this case I'm happy to demote the message level, but not remove it 
> entirely but I do think the obvious case you list shouldn't be the 
> default one.
> 
> Lastly:
> 
> I'm assuming the final result is that the table is actually being parsed 
> correctly despite the ugly message?

Indeed, the ThunderX2 PPTT table is being parsed so that topology shown 
in lstopo and lscpu is correct.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-19  5:18               ` Tomasz Nowicki
  0 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-19  5:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 18.10.2017 19:30, Jeremy Linton wrote:
> On 10/18/2017 05:24 AM, Tomasz Nowicki wrote:
>> On 18.10.2017 07:39, Tomasz Nowicki wrote:
>>> Hi,
>>>
>>> On 17.10.2017 17:22, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>>>> Hi Jeremy,
>>>>>
>>>>> I did second round of review and have some more comments, please 
>>>>> see below:
>>>>>
>>>>> On 12.10.2017 21:48, Jeremy Linton wrote:
>>>>>> ACPI 6.2 adds a new table, which describes how processing units
>>>>>> are related to each other in tree like fashion. Caches are
>>>>>> also sprinkled throughout the tree and describe the properties
>>>>>> of the caches in relation to other caches and processing units.
>>>>>>
>>>>>> Add the code to parse the cache hierarchy and report the total
>>>>>> number of levels of cache for a given core using
>>>>>> acpi_find_last_cache_level() as well as fill out the individual
>>>>>> cores cache information with cache_setup_acpi() once the
>>>>>> cpu_cacheinfo structure has been populated by the arch specific
>>>>>> code.
>>>>>>
>>>>>> Further, report peers in the topology using setup_acpi_cpu_topology()
>>>>>> to report a unique ID for each processing unit at a given level
>>>>>> in the tree. These unique id's can then be used to match related
>>>>>> processing units which exist as threads, COD (clusters
>>>>>> on die), within a given package, etc.
>>>>>>
>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>>> ---
>>>>>> ? drivers/acpi/pptt.c | 485 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ? 1 file changed, 485 insertions(+)
>>>>>> ? create mode 100644 drivers/acpi/pptt.c
>>>>>>
>>>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..c86715fed4a7
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/acpi/pptt.c
>>>>>> @@ -0,1 +1,485 @@
>>>>>> +/*
>>>>>> + * Copyright (C) 2017, ARM
>>>>>> + *
>>>>>> + * This program is free software; you can redistribute it and/or 
>>>>>> modify it
>>>>>> + * under the terms and conditions of the GNU General Public License,
>>>>>> + * version 2, as published by the Free Software Foundation.
>>>>>> + *
>>>>>> + * This program is distributed in the hope it will be useful, but 
>>>>>> WITHOUT
>>>>>> + * ANY WARRANTY; without even the implied warranty of 
>>>>>> MERCHANTABILITY or
>>>>>> + * FITNESS FOR A PARTICULAR PURPOSE.? See the GNU General Public 
>>>>>> License for
>>>>>> + * more details.
>>>>>> + *
>>>>>> + * This file implements parsing of Processor Properties Topology 
>>>>>> Table (PPTT)
>>>>>> + * which is optionally used to describe the processor and cache 
>>>>>> topology.
>>>>>> + * Due to the relative pointers used throughout the table, this 
>>>>>> doesn't
>>>>>> + * leverage the existing subtable parsing in the kernel.
>>>>>> + */
>>>>>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>>>>>> +
>>>>>> +#include <linux/acpi.h>
>>>>>> +#include <linux/cacheinfo.h>
>>>>>> +#include <acpi/processor.h>
>>>>>> +
>>>>>> +/*
>>>>>> + * Given the PPTT table, find and verify that the subtable entry
>>>>>> + * is located within the table
>>>>>> + */
>>>>>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>>>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>>> +{
>>>>>> +??? struct acpi_subtable_header *entry;
>>>>>> +
>>>>>> +??? /* there isn't a subtable at reference 0 */
>>>>>> +??? if (!pptt_ref)
>>>>>> +??????? return NULL;
>>>>>> +
>>>>>> +??? if (pptt_ref + sizeof(struct acpi_subtable_header) > 
>>>>>> table_hdr->length)
>>>>>> +??????? return NULL;
>>>>>> +
>>>>>> +??? entry = (struct acpi_subtable_header *)((u8 *)table_hdr + 
>>>>>> pptt_ref);
>>>>>> +
>>>>>> +??? if (pptt_ref + entry->length > table_hdr->length)
>>>>>> +??????? return NULL;
>>>>>> +
>>>>>> +??? return entry;
>>>>>> +}
>>>>>> +
>>>>>> +static struct acpi_pptt_processor *fetch_pptt_node(
>>>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>>> +{
>>>>>> +??? return (struct acpi_pptt_processor 
>>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>>> +}
>>>>>> +
>>>>>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>>>>>> +??? struct acpi_table_header *table_hdr, u32 pptt_ref)
>>>>>> +{
>>>>>> +??? return (struct acpi_pptt_cache 
>>>>>> *)fetch_pptt_subtable(table_hdr, pptt_ref);
>>>>>> +}
>>>>>> +
>>>>>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>>>>>> +??? struct acpi_table_header *table_hdr,
>>>>>> +??? struct acpi_pptt_processor *node, int resource)
>>>>>> +{
>>>>>> +??? u32 ref;
>>>>>> +
>>>>>> +??? if (resource >= node->number_of_priv_resources)
>>>>>> +??????? return NULL;
>>>>>> +
>>>>>> +??? ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
>>>>>> +????????????? sizeof(u32) * resource);
>>>>>> +
>>>>>> +??? return fetch_pptt_subtable(table_hdr, ref);
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * given a pptt resource, verify that it is a cache node, then walk
>>>>>> + * down each level of caches, counting how many levels are found
>>>>>> + * as well as checking the cache type (icache, dcache, unified). 
>>>>>> If a
>>>>>> + * level & type match, then we set found, and continue the search.
>>>>>> + * Once the entire cache branch has been walked return its max
>>>>>> + * depth.
>>>>>> + */
>>>>>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>>>>>> +??????????????? int local_level,
>>>>>> +??????????????? struct acpi_subtable_header *res,
>>>>>> +??????????????? struct acpi_pptt_cache **found,
>>>>>> +??????????????? int level, int type)
>>>>>> +{
>>>>>> +??? struct acpi_pptt_cache *cache;
>>>>>> +
>>>>>> +??? if (res->type != ACPI_PPTT_TYPE_CACHE)
>>>>>> +??????? return 0;
>>>>>> +
>>>>>> +??? cache = (struct acpi_pptt_cache *) res;
>>>>>> +??? while (cache) {
>>>>>> +??????? local_level++;
>>>>>> +
>>>>>> +??????? if ((local_level == level) &&
>>>>>> +??????????? (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>>>>>> +??????????? ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == 
>>>>>> type)) {
>>>>>
>>>>> Attributes have to be shifted:
>>>>>
>>>>> (cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) >> 2
>>>>
>>>> Hmmm, I'm not sure that is true, the top level function in this 
>>>> routine convert the "linux" constant to the ACPI version of that 
>>>> constant. In that case the "type" field is pre-shifted, so that it 
>>>> matches the result of just anding against the field... That is 
>>>> unless I messed something up, which I don't see at the moment (and 
>>>> the code of course has been tested with PPTT's from multiple people 
>>>> at this point).
>>>
>>> For ThunderX2 I got lots of errors in dmesg:
>>> Found duplicate cache level/type unable to determine uniqueness
>>>
>>> So I fixed "type" macros definitions (without shifting) and shift it 
>>> here which fixes the issue. As you said, it can be pre-shifted as well.
> 
> Ah, yah right... If you removed the shift per your original comment then 
> it breaks this. Yes, and the type definitions for cache type aren't 
> wrong in this version because the unified state has the 3rd bit set for 
> both the 0x3 and 0x2 values and its only used to covert from the linux 
> type to the ACPI type (and not back because we don't mess with whatever 
> the original "detection" was). I'm not really planning on changing that 
> because I don't think it helps "readability" (and it converts a compile 
> time constant to a runtime shift).
> 
>>>
>>>>
>>>>
>>>>>
>>>>>> +??????????? if (*found != NULL)
>>>>>> +??????????????? pr_err("Found duplicate cache level/type unable 
>>>>>> to determine uniqueness\n");
>>
>> Actually I still see this error messages in my dmesg. It is because 
>> the following ThunderX2 per-core L1 and L2 cache hierarchy:
>>
>> Core
>> ??------------------
>> |????????????????? |
>> | L1i -----??????? |
>> |???????? |??????? |
>> |????????? ----L2? |
>> |???????? |??????? |
>> | L1d -----??????? |
>> |????????????????? |
>> ??------------------
>>
>> In this case we have two paths which lead to L2 cache and hit above 
>> case. Is it really error case?
> 
> No, but its not deterministic unless we mark the node, which doesn't 
> solve the problem of a table constructed like
> 
> L1i->L2 (unified)
> L1d->L2 (unified)
> 
> or various other structures which aren't disallowed by the spec and have 
> non-deterministic real world meanings, anymore than constructing the 
> table like:
> 
> L1i
> Lid->L2(unified)
> 
> which I tend to prefer because with a structuring like that it can be 
> deterministic (and in a way actually represents the non-coherent 
> behavior of (most?) ARM64 core's i-caches, as could be argued the first 
> example if the allocation policies are varied between the L2 nodes).
> 
> The really ugly bits here happen if you add another layer:
> 
> L1i->L2i-L3
> L1d------^
> 
> which is why I made that an error message, not including the fact that 
> since the levels aren't tagged the numbering and meaning isn't clear.
> 
> (the L1i in the above example might be better called an L0i to avoid 
> throwing off the reset of the hierarchy numbering, also so it could be 
> ignored).
> 
> Summary:
> 
> I'm not at all happy with this specification's attempt to leave out 
> pieces of information which make parsing things more deterministic. In 
> this case I'm happy to demote the message level, but not remove it 
> entirely but I do think the obvious case you list shouldn't be the 
> default one.
> 
> Lastly:
> 
> I'm assuming the final result is that the table is actually being parsed 
> correctly despite the ugly message?

Indeed, the ThunderX2 PPTT table is being parsed so that topology shown 
in lstopo and lscpu is correct.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
  2017-10-18 17:38       ` Jeremy Linton
@ 2017-10-19  9:12         ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19  9:12 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Wed, Oct 18, 2017 at 12:38:46PM -0500, Jeremy Linton wrote:
> On 10/18/2017 11:47 AM, Lorenzo Pieralisi wrote:
> >On Thu, Oct 12, 2017 at 02:48:51PM -0500, Jeremy Linton wrote:
> >>Now that we have a PPTT parser, in preparation for its use
> >>on arm64, lets build it.
> >>
> >>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>---
> >>  arch/arm64/Kconfig         | 1 +
> >>  drivers/acpi/Makefile      | 1 +
> >>  drivers/acpi/arm64/Kconfig | 3 +++
> >>  3 files changed, 5 insertions(+)
> >>
> >>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >>index 0df64a6a56d4..68c9d1289735 100644
> >>--- a/arch/arm64/Kconfig
> >>+++ b/arch/arm64/Kconfig
> >>@@ -7,6 +7,7 @@ config ARM64
> >>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
> >>  	select ACPI_MCFG if ACPI
> >>  	select ACPI_SPCR_TABLE if ACPI
> >>+	select ACPI_PPTT if ACPI
> >>  	select ARCH_CLOCKSOURCE_DATA
> >>  	select ARCH_HAS_DEBUG_VIRTUAL
> >>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
> >>diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> >>index 90265ab4437a..c92a0c937551 100644
> >>--- a/drivers/acpi/Makefile
> >>+++ b/drivers/acpi/Makefile
> >>@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
> >>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
> >>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
> >>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
> >>+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
> >>  # processor has its own "processor." module_param namespace
> >>  processor-y			:= processor_driver.o
> >>diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> >>index 5a6f80fce0d6..74b855a669ea 100644
> >>--- a/drivers/acpi/arm64/Kconfig
> >>+++ b/drivers/acpi/arm64/Kconfig
> >>@@ -7,3 +7,6 @@ config ACPI_IORT
> >>  config ACPI_GTDT
> >>  	bool
> >>+
> >>+config ACPI_PPTT
> >>+	bool
> >>\ No newline at end of file
> >
> >I do not understand the logic. Why should we have a Kconfig option
> >in drivers/acpi/arm64 for code in drivers/acpi ?
> >
> >AFAIK PPTT is not an ACPI ARM64 specific binding.
> 
> Weird hu? Originally I had the whole shebang in arm64 because the
> x86 (or whatever) bindings have not been written. My assumption is
> that once that part had been provided it could be moved.

Which part ? I asked because AFAICS the bindings are completely
generic (and are meant to be so).

> The config is sort of an artifact and "easier" to move than the
> whole file. But, as Hanjun has also been complaining about it I've
> agreed to move it to the "correct" location but keep it in the arm64
> wrapper. Of course I think that is a bit strange too, but
> whatever...

I do not want to cavil but either you have Kconfig and code in
drivers/acpi or drivers/acpi/arm64 - I would not understand a
mix of the two.

To reiterate the point, PPTT is not an ARM64 specific binding so
IMO it does not belong in drivers/acpi/arm64.

> Once the arm64 side of things are all wrapped up (and I can come up
> for some air) I willing to help with bindings on other architectures
> if anyone is truly interested.  But, I view that whole exercise as
> more a "bug" fixing one than providing any real benefit at this
> point.

Please define "bindings on other architectures" because I do not
understand what you mean.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64
@ 2017-10-19  9:12         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19  9:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 18, 2017 at 12:38:46PM -0500, Jeremy Linton wrote:
> On 10/18/2017 11:47 AM, Lorenzo Pieralisi wrote:
> >On Thu, Oct 12, 2017 at 02:48:51PM -0500, Jeremy Linton wrote:
> >>Now that we have a PPTT parser, in preparation for its use
> >>on arm64, lets build it.
> >>
> >>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>---
> >>  arch/arm64/Kconfig         | 1 +
> >>  drivers/acpi/Makefile      | 1 +
> >>  drivers/acpi/arm64/Kconfig | 3 +++
> >>  3 files changed, 5 insertions(+)
> >>
> >>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >>index 0df64a6a56d4..68c9d1289735 100644
> >>--- a/arch/arm64/Kconfig
> >>+++ b/arch/arm64/Kconfig
> >>@@ -7,6 +7,7 @@ config ARM64
> >>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
> >>  	select ACPI_MCFG if ACPI
> >>  	select ACPI_SPCR_TABLE if ACPI
> >>+	select ACPI_PPTT if ACPI
> >>  	select ARCH_CLOCKSOURCE_DATA
> >>  	select ARCH_HAS_DEBUG_VIRTUAL
> >>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
> >>diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> >>index 90265ab4437a..c92a0c937551 100644
> >>--- a/drivers/acpi/Makefile
> >>+++ b/drivers/acpi/Makefile
> >>@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
> >>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
> >>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
> >>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
> >>+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
> >>  # processor has its own "processor." module_param namespace
> >>  processor-y			:= processor_driver.o
> >>diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> >>index 5a6f80fce0d6..74b855a669ea 100644
> >>--- a/drivers/acpi/arm64/Kconfig
> >>+++ b/drivers/acpi/arm64/Kconfig
> >>@@ -7,3 +7,6 @@ config ACPI_IORT
> >>  config ACPI_GTDT
> >>  	bool
> >>+
> >>+config ACPI_PPTT
> >>+	bool
> >>\ No newline at end of file
> >
> >I do not understand the logic. Why should we have a Kconfig option
> >in drivers/acpi/arm64 for code in drivers/acpi ?
> >
> >AFAIK PPTT is not an ACPI ARM64 specific binding.
> 
> Weird hu? Originally I had the whole shebang in arm64 because the
> x86 (or whatever) bindings have not been written. My assumption is
> that once that part had been provided it could be moved.

Which part ? I asked because AFAICS the bindings are completely
generic (and are meant to be so).

> The config is sort of an artifact and "easier" to move than the
> whole file. But, as Hanjun has also been complaining about it I've
> agreed to move it to the "correct" location but keep it in the arm64
> wrapper. Of course I think that is a bit strange too, but
> whatever...

I do not want to cavil but either you have Kconfig and code in
drivers/acpi or drivers/acpi/arm64 - I would not understand a
mix of the two.

To reiterate the point, PPTT is not an ARM64 specific binding so
IMO it does not belong in drivers/acpi/arm64.

> Once the arm64 side of things are all wrapped up (and I can come up
> for some air) I willing to help with bindings on other architectures
> if anyone is truly interested.  But, I view that whole exercise as
> more a "bug" fixing one than providing any real benefit at this
> point.

Please define "bindings on other architectures" because I do not
understand what you mean.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-19 10:22     ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19 10:22 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Thu, Oct 12, 2017 at 02:48:50PM -0500, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.

I think this patch should be split ((1) topology (2) cache), it is doing
too much which makes it hard to review.

[...]

> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}

A leaf node is a node with a valid acpi_id corresponding to an MADT
entry, right ? By the way, is this function really needed ?

> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {

Is the leaf node check necessary ? Or you just need to check the
ACPI Processor ID valid flag (as discussed offline) ?

> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);

Side note: I'd question (some of) these pr_debug() messages.

> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");

Like this one for instance.

> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}

This should be moved at the beginning of the loop.

> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {

I really do not understand what ACPI_PPTT_PHYSICAL_PACKAGE means and
more importantly, how it is actually used in this code.

This function is used to get a topology id (that is just a number for
a given topology level) for a given level starting from a given leaf
node.

Why do we care at all about ACPI_PPTT_PHYSICAL_PACKAGE ?

> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

This would break !ARM64.

> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}
> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

Ditto.

> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);
> +
> +		index++;
> +	}
> +}
> +
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

Ditto.

> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);

If level is 0 there is nothing to do here.

> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);

Please explain to me the rationale behind this. To me acpi_processor_id
is as good as the cpu_node offset in the table to describe the topology
id at a given level, why special case level 0.

On top of that, with this ID scheme, we would end up with
thread/core/cluster id potentially being non-sequential values
(depending on the PPTT table layout) which should not be a problem but
we'd better check how people are using them.

> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> +		    cpu);
> +	return -ENOENT;
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later.

Add a kerneldoc style comment for an external interface.

> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return status;
> +}
> +
> +/*
> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> + * This ID can then be used to group peers.

Ditto.

> + */
> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	int retval;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +	retval = topology_setup_acpi_cpu(table, cpu, level);
> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> +		 cpu, level, retval);
> +	acpi_put_table(table);
> +
> +	return retval;

This value is just a token - with no HW meaning whatsoever and that's
where I question the ACPI_PPTT_PHYSICAL_PACKAGE flag usage in retrieving
it, you are not looking for a packageid (which has no meaning whatsoever
anyway and I wonder why it was added to the specs at all) you are
looking for an id at a given level.

I will comment on the cache code separately - which deserves to
be in a separate patch to simplify the review, I avoided repeating
already reported review comments.

Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-19 10:22     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 12, 2017 at 02:48:50PM -0500, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> Further, report peers in the topology using setup_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.

I think this patch should be split ((1) topology (2) cache), it is doing
too much which makes it hard to review.

[...]

> +/* determine if the given node is a leaf node */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> +	}

A leaf node is a node with a valid acpi_id corresponding to an MADT
entry, right ? By the way, is this function really needed ?

> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> +						sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {

Is the leaf node check necessary ? Or you just need to check the
ACPI Processor ID valid flag (as discussed offline) ?

> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> +				 acpi_cpu_id, cpu_node->acpi_processor_id);

Side note: I'd question (some of) these pr_debug() messages.

> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> +				/* found the correct entry */
> +				pr_debug("match found!\n");

Like this one for instance.

> +				return (struct acpi_pptt_processor *)entry;
> +			}
> +		}
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}

This should be moved at the beginning of the loop.

> +		entry = (struct acpi_subtable_header *)
> +			((u8 *)entry + entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Given a acpi_pptt_processor node, walk up until we identify the
> + * package that the node is associated with or we run out of levels
> + * to request.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu,
> +	int level)
> +{
> +	struct acpi_pptt_processor *prev_node;
> +
> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {

I really do not understand what ACPI_PPTT_PHYSICAL_PACKAGE means and
more importantly, how it is actually used in this code.

This function is used to get a topology id (that is just a number for
a given topology level) for a given level starting from a given leaf
node.

Why do we care at all about ACPI_PPTT_PHYSICAL_PACKAGE ?

> +		pr_debug("level %d\n", level);
> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> +		if (prev_node == NULL)
> +			break;
> +		cpu = prev_node;
> +		level--;
> +	}
> +	return cpu;
> +}
> +
> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> +
> +static u8 acpi_cache_type(enum cache_type type)
> +{
> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_6_2_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_6_2_CACHE_TYPE_INSTR;
> +	default:
> +		pr_debug("Unknown cache type, assume unified\n");
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (!cpu_node)
> +		return NULL;
> +
> +	do {
> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while ((cpu_node) && (!found));
> +
> +	return found;
> +}
> +
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

This would break !ARM64.

> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> +		this_leaf->size = found_cache->size;
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_6_2_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_6_2_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache policy %d\n",
> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> +		}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> +			break;
> +		default:
> +			pr_err("Unknown ACPI cache allocation policy %d\n",
> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> +		}
> +}
> +
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

Ditto.

> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);
> +
> +		index++;
> +	}
> +}
> +
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

Ditto.

> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);

If level is 0 there is nothing to do here.

> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);

Please explain to me the rationale behind this. To me acpi_processor_id
is as good as the cpu_node offset in the table to describe the topology
id at a given level, why special case level 0.

On top of that, with this ID scheme, we would end up with
thread/core/cluster id potentially being non-sequential values
(depending on the PPTT table layout) which should not be a problem but
we'd better check how people are using them.

> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> +		    cpu);
> +	return -ENOENT;
> +}
> +
> +/*
> + * simply assign a ACPI cache entry to each known CPU cache entry
> + * determining which entries are shared is done later.

Add a kerneldoc style comment for an external interface.

> + */
> +int cache_setup_acpi(unsigned int cpu)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_setup_acpi_cpu(table, cpu);
> +	acpi_put_table(table);
> +
> +	return status;
> +}
> +
> +/*
> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> + * This ID can then be used to group peers.

Ditto.

> + */
> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
> +{
> +	struct acpi_table_header *table;
> +	acpi_status status;
> +	int retval;
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
> +		return -ENOENT;
> +	}
> +	retval = topology_setup_acpi_cpu(table, cpu, level);
> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> +		 cpu, level, retval);
> +	acpi_put_table(table);
> +
> +	return retval;

This value is just a token - with no HW meaning whatsoever and that's
where I question the ACPI_PPTT_PHYSICAL_PACKAGE flag usage in retrieving
it, you are not looking for a packageid (which has no meaning whatsoever
anyway and I wonder why it was added to the specs at all) you are
looking for an id at a given level.

I will comment on the cache code separately - which deserves to
be in a separate patch to simplify the review, I avoided repeating
already reported review comments.

Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-19  5:18               ` Tomasz Nowicki
  (?)
@ 2017-10-19 10:25                 ` John Garry
  -1 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-19 10:25 UTC (permalink / raw)
  To: Tomasz Nowicki, Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, austinwc, linux-pm, jhugo, gregkh,
	sudeep.holla, rjw, linux-kernel, will.deacon, wangxiongfeng2,
	viresh.kumar, hanjun.guo, catalin.marinas, ahs3,
	linux-arm-kernel

On 19/10/2017 06:18, Tomasz Nowicki wrote:
>>
>> Summary:
>>
>> I'm not at all happy with this specification's attempt to leave out
>> pieces of information which make parsing things more deterministic. In
>> this case I'm happy to demote the message level, but not remove it
>> entirely but I do think the obvious case you list shouldn't be the
>> default one.
>>
>> Lastly:
>>
>> I'm assuming the final result is that the table is actually being
>> parsed correctly despite the ugly message?
>
> Indeed, the ThunderX2 PPTT table is being parsed so that topology shown
> in lstopo and lscpu is correct.

Hi Tomasz,

Can you share the lscpu output? Does it have cluster info? I did not 
think that lscpu has a concept of clustering.

I would say that the per-cpu cluster index sysfs entry needs be added to 
drivers/base/arch_topology.c (and other appropiate code under 
GENERIC_ARCH_TOPOLOGY) to support this.

Thanks,
John


>
> Thanks,
> Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-19 10:25                 ` John Garry
  0 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-19 10:25 UTC (permalink / raw)
  To: Tomasz Nowicki, Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, austinwc, linux-pm, jhugo, gregkh,
	sudeep.holla, rjw, linux-kernel, will.deacon, wangxiongfeng2,
	viresh.kumar, hanjun.guo, catalin.marinas, ahs3,
	linux-arm-kernel

On 19/10/2017 06:18, Tomasz Nowicki wrote:
>>
>> Summary:
>>
>> I'm not at all happy with this specification's attempt to leave out
>> pieces of information which make parsing things more deterministic. In
>> this case I'm happy to demote the message level, but not remove it
>> entirely but I do think the obvious case you list shouldn't be the
>> default one.
>>
>> Lastly:
>>
>> I'm assuming the final result is that the table is actually being
>> parsed correctly despite the ugly message?
>
> Indeed, the ThunderX2 PPTT table is being parsed so that topology shown
> in lstopo and lscpu is correct.

Hi Tomasz,

Can you share the lscpu output? Does it have cluster info? I did not 
think that lscpu has a concept of clustering.

I would say that the per-cpu cluster index sysfs entry needs be added to 
drivers/base/arch_topology.c (and other appropiate code under 
GENERIC_ARCH_TOPOLOGY) to support this.

Thanks,
John


>
> Thanks,
> Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-19 10:25                 ` John Garry
  0 siblings, 0 replies; 104+ messages in thread
From: John Garry @ 2017-10-19 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/10/2017 06:18, Tomasz Nowicki wrote:
>>
>> Summary:
>>
>> I'm not at all happy with this specification's attempt to leave out
>> pieces of information which make parsing things more deterministic. In
>> this case I'm happy to demote the message level, but not remove it
>> entirely but I do think the obvious case you list shouldn't be the
>> default one.
>>
>> Lastly:
>>
>> I'm assuming the final result is that the table is actually being
>> parsed correctly despite the ugly message?
>
> Indeed, the ThunderX2 PPTT table is being parsed so that topology shown
> in lstopo and lscpu is correct.

Hi Tomasz,

Can you share the lscpu output? Does it have cluster info? I did not 
think that lscpu has a concept of clustering.

I would say that the per-cpu cluster index sysfs entry needs be added to 
drivers/base/arch_topology.c (and other appropiate code under 
GENERIC_ARCH_TOPOLOGY) to support this.

Thanks,
John


>
> Thanks,
> Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-19  5:18               ` Tomasz Nowicki
@ 2017-10-19 14:24                 ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 14:24 UTC (permalink / raw)
  To: Tomasz Nowicki, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, austinwc, linux-pm, jhugo, gregkh,
	sudeep.holla, rjw, linux-kernel, will.deacon, wangxiongfeng2,
	viresh.kumar, hanjun.guo, catalin.marinas, ahs3,
	linux-arm-kernel

Hi,


On 10/19/2017 12:18 AM, Tomasz Nowicki wrote:
> On 18.10.2017 19:30, Jeremy Linton wrote:
>> On 10/18/2017 05:24 AM, Tomasz Nowicki wrote:
>>> On 18.10.2017 07:39, Tomasz Nowicki wrote:
>>>> On 17.10.2017 17:22, Jeremy Linton wrote:
>>>>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>>>>> On 12.10.2017 21:48, Jeremy Linton wrote:
(trimming)
>>>>>>> +            if (*found != NULL)
>>>>>>> +                pr_err("Found duplicate cache level/type unable 
>>>>>>> to determine uniqueness\n");
>>>
>>> Actually I still see this error messages in my dmesg. It is because 
>>> the following ThunderX2 per-core L1 and L2 cache hierarchy:
>>>
>>> Core
>>>   ------------------
>>> |                  |
>>> | L1i -----        |
>>> |         |        |
>>> |          ----L2  |
>>> |         |        |
>>> | L1d -----        |
>>> |                  |
>>>   ------------------
>>>
>>> In this case we have two paths which lead to L2 cache and hit above 
>>> case. Is it really error case?
>>
>> No, but its not deterministic unless we mark the node, which doesn't 
>> solve the problem of a table constructed like
>>
>> L1i->L2 (unified)
>> L1d->L2 (unified)
>>
>> or various other structures which aren't disallowed by the spec and 
>> have non-deterministic real world meanings, anymore than constructing 
>> the table like:
>>
>> L1i
>> Lid->L2(unified)
>>
>> which I tend to prefer because with a structuring like that it can be 
>> deterministic (and in a way actually represents the non-coherent 
>> behavior of (most?) ARM64 core's i-caches, as could be argued the 
>> first example if the allocation policies are varied between the L2 
>> nodes).
>>
>> The really ugly bits here happen if you add another layer:
>>
>> L1i->L2i-L3
>> L1d------^
>>
>> which is why I made that an error message, not including the fact that 
>> since the levels aren't tagged the numbering and meaning isn't clear.
>>
>> (the L1i in the above example might be better called an L0i to avoid 
>> throwing off the reset of the hierarchy numbering, also so it could be 
>> ignored).
>>
>> Summary:
>>
>> I'm not at all happy with this specification's attempt to leave out 
>> pieces of information which make parsing things more deterministic. In 
>> this case I'm happy to demote the message level, but not remove it 
>> entirely but I do think the obvious case you list shouldn't be the 
>> default one.
>>
>> Lastly:
>>
>> I'm assuming the final result is that the table is actually being 
>> parsed correctly despite the ugly message?
> 
> Indeed, the ThunderX2 PPTT table is being parsed so that topology shown 
> in lstopo and lscpu is correct.

Great.

Also, I think this is a better change:

      if ((*found != NULL) && (*found != cache))
          pr_err("Found duplicate cache level/type unable to determine 
uniqueness\n");

Which if its a duplicate node/type at the given level the message is 
just suppressed. It will of course still trigger in cases like:

L1d->L2
l1i->L2

or other odd cases.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-19 14:24                 ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 14:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


On 10/19/2017 12:18 AM, Tomasz Nowicki wrote:
> On 18.10.2017 19:30, Jeremy Linton wrote:
>> On 10/18/2017 05:24 AM, Tomasz Nowicki wrote:
>>> On 18.10.2017 07:39, Tomasz Nowicki wrote:
>>>> On 17.10.2017 17:22, Jeremy Linton wrote:
>>>>> On 10/17/2017 08:25 AM, Tomasz Nowicki wrote:
>>>>>> On 12.10.2017 21:48, Jeremy Linton wrote:
(trimming)
>>>>>>> +??????????? if (*found != NULL)
>>>>>>> +??????????????? pr_err("Found duplicate cache level/type unable 
>>>>>>> to determine uniqueness\n");
>>>
>>> Actually I still see this error messages in my dmesg. It is because 
>>> the following ThunderX2 per-core L1 and L2 cache hierarchy:
>>>
>>> Core
>>> ??------------------
>>> |????????????????? |
>>> | L1i -----??????? |
>>> |???????? |??????? |
>>> |????????? ----L2? |
>>> |???????? |??????? |
>>> | L1d -----??????? |
>>> |????????????????? |
>>> ??------------------
>>>
>>> In this case we have two paths which lead to L2 cache and hit above 
>>> case. Is it really error case?
>>
>> No, but its not deterministic unless we mark the node, which doesn't 
>> solve the problem of a table constructed like
>>
>> L1i->L2 (unified)
>> L1d->L2 (unified)
>>
>> or various other structures which aren't disallowed by the spec and 
>> have non-deterministic real world meanings, anymore than constructing 
>> the table like:
>>
>> L1i
>> Lid->L2(unified)
>>
>> which I tend to prefer because with a structuring like that it can be 
>> deterministic (and in a way actually represents the non-coherent 
>> behavior of (most?) ARM64 core's i-caches, as could be argued the 
>> first example if the allocation policies are varied between the L2 
>> nodes).
>>
>> The really ugly bits here happen if you add another layer:
>>
>> L1i->L2i-L3
>> L1d------^
>>
>> which is why I made that an error message, not including the fact that 
>> since the levels aren't tagged the numbering and meaning isn't clear.
>>
>> (the L1i in the above example might be better called an L0i to avoid 
>> throwing off the reset of the hierarchy numbering, also so it could be 
>> ignored).
>>
>> Summary:
>>
>> I'm not at all happy with this specification's attempt to leave out 
>> pieces of information which make parsing things more deterministic. In 
>> this case I'm happy to demote the message level, but not remove it 
>> entirely but I do think the obvious case you list shouldn't be the 
>> default one.
>>
>> Lastly:
>>
>> I'm assuming the final result is that the table is actually being 
>> parsed correctly despite the ugly message?
> 
> Indeed, the ThunderX2 PPTT table is being parsed so that topology shown 
> in lstopo and lscpu is correct.

Great.

Also, I think this is a better change:

      if ((*found != NULL) && (*found != cache))
          pr_err("Found duplicate cache level/type unable to determine 
uniqueness\n");

Which if its a duplicate node/type at the given level the message is 
just suppressed. It will of course still trigger in cases like:

L1d->L2
l1i->L2

or other odd cases.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 3/7] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-19 15:20     ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19 15:20 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Thu, Oct 12, 2017 at 02:48:52PM -0500, Jeremy Linton wrote:
> The /sys cache entries should support ACPI/PPTT generated cache
> topology information. Lets detect ACPI systems and call
> an arch specific cache_setup_acpi() routine to update the hardware
> probed cache topology.
> 
> For arm64, if ACPI is enabled, determine the max number of cache
> levels and populate them using a PPTT table if one is available.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
>  drivers/acpi/pptt.c           |  1 +
>  drivers/base/cacheinfo.c      | 17 +++++++++++------
>  include/linux/cacheinfo.h     | 11 +++++++++--
>  4 files changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
> index 380f2e2fbed5..2e2cf0d312ba 100644
> --- a/arch/arm64/kernel/cacheinfo.c
> +++ b/arch/arm64/kernel/cacheinfo.c
> @@ -17,6 +17,7 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/of.h>
>  
> @@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>  	this_leaf->type = type;
>  }
>  
> +#ifndef CONFIG_ACPI
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	/*ACPI kernels should be built with PPTT support*/
> +	return 0;
> +}
> +#endif
> +
>  static int __init_cache_level(unsigned int cpu)
>  {
> -	unsigned int ctype, level, leaves, of_level;
> +	unsigned int ctype, level, leaves, fw_level;
>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>  
>  	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
> @@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
>  		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
>  	}
>  
> -	of_level = of_find_last_cache_level(cpu);
> -	if (level < of_level) {
> +	if (acpi_disabled)
> +		fw_level = of_find_last_cache_level(cpu);
> +	else
> +		fw_level = acpi_find_last_cache_level(cpu);
> +
> +	if (level < fw_level) {
>  		/*
>  		 * some external caches not specified in CLIDR_EL1
>  		 * the information may be available in the device tree
>  		 * only unified external caches are considered here
>  		 */
> -		leaves += (of_level - level);
> -		level = of_level;
> +		leaves += (fw_level - level);
> +		level = fw_level;
>  	}
>  
>  	this_cpu_ci->num_levels = level;
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index c86715fed4a7..b5c6de37e328 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -355,6 +355,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  				    struct acpi_pptt_cache *found_cache,
>  				    struct acpi_pptt_processor *cpu_node)
>  {
> +	this_leaf->firmware_node = cpu_node;

This pointer comes from an ACPI static table mapping that happens
to be permanent so it should be safe given that the mapping is carried
out at device_initcall (ie cache_setup_acpi()) but it is not that
obvious. More to it below.

>  	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>  		this_leaf->size = found_cache->size;
>  	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index eb3af2739537..8eca279e50d1 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
>  static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  					   struct cacheinfo *sib_leaf)
>  {
> -	return sib_leaf->of_node == this_leaf->of_node;
> +	return sib_leaf->firmware_node == this_leaf->firmware_node;
>  }
>  
>  /* OF properties to query for a given cache type */
> @@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  }
>  #endif
>  
> +int __weak cache_setup_acpi(unsigned int cpu)
> +{
> +	return -ENOTSUPP;
> +}
> +
>  static int cache_shared_cpu_map_setup(unsigned int cpu)
>  {
>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> @@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>  	if (this_cpu_ci->cpu_map_populated)
>  		return 0;
>  
> -	if (of_have_populated_dt())
> +	if (!acpi_disabled)
> +		ret = cache_setup_acpi(cpu);
> +	else if (of_have_populated_dt())
>  		ret = cache_setup_of_node(cpu);
> -	else if (!acpi_disabled)
> -		/* No cache property/hierarchy support yet in ACPI */
> -		ret = -ENOTSUPP;
> +
>  	if (ret)
>  		return ret;
>  
> @@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
>  
>  static void cache_override_properties(unsigned int cpu)
>  {
> -	if (of_have_populated_dt())
> +	if (acpi_disabled && of_have_populated_dt())
>  		return cache_of_override_properties(cpu);
>  }
>  
> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index 6a524bf6a06d..d1e9b8e01981 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -36,6 +36,9 @@ enum cache_type {
>   * @of_node: if devicetree is used, this represents either the cpu node in
>   *	case there's no explicit cache node or the cache node itself in the
>   *	device tree
> + * @firmware_node: Shared with of_node. When not using DT, this may contain
> + *	pointers to other firmware based values. Particularly ACPI/PPTT
> + *	unique values.
>   * @disable_sysfs: indicates whether this node is visible to the user via
>   *	sysfs or not
>   * @priv: pointer to any private data structure specific to particular
> @@ -64,8 +67,10 @@ struct cacheinfo {
>  #define CACHE_ALLOCATE_POLICY_MASK	\
>  	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
>  #define CACHE_ID		BIT(4)
> -
> -	struct device_node *of_node;
> +	union {
> +		struct device_node *of_node;
> +		void *firmware_node;
> +	};

How about turning of_node into a struct fwnode_handle* (and allocate
one for each PPTT cache node - acpi_static_fwnode - to have the
ACPI counterpart) ?

It will make more churn given that that pointer is just used to
carry out a pointer comparison but it's also a bit more elegant.

Lorenzo

>  	bool disable_sysfs;
>  	void *priv;
>  };
> @@ -98,6 +103,8 @@ int func(unsigned int cpu)					\
>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>  int init_cache_level(unsigned int cpu);
>  int populate_cache_leaves(unsigned int cpu);
> +int cache_setup_acpi(unsigned int cpu);
> +int acpi_find_last_cache_level(unsigned int cpu);
>  
>  const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
>  
> -- 
> 2.13.5
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 3/7] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
@ 2017-10-19 15:20     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 12, 2017 at 02:48:52PM -0500, Jeremy Linton wrote:
> The /sys cache entries should support ACPI/PPTT generated cache
> topology information. Lets detect ACPI systems and call
> an arch specific cache_setup_acpi() routine to update the hardware
> probed cache topology.
> 
> For arm64, if ACPI is enabled, determine the max number of cache
> levels and populate them using a PPTT table if one is available.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
>  drivers/acpi/pptt.c           |  1 +
>  drivers/base/cacheinfo.c      | 17 +++++++++++------
>  include/linux/cacheinfo.h     | 11 +++++++++--
>  4 files changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
> index 380f2e2fbed5..2e2cf0d312ba 100644
> --- a/arch/arm64/kernel/cacheinfo.c
> +++ b/arch/arm64/kernel/cacheinfo.c
> @@ -17,6 +17,7 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/of.h>
>  
> @@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>  	this_leaf->type = type;
>  }
>  
> +#ifndef CONFIG_ACPI
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	/*ACPI kernels should be built with PPTT support*/
> +	return 0;
> +}
> +#endif
> +
>  static int __init_cache_level(unsigned int cpu)
>  {
> -	unsigned int ctype, level, leaves, of_level;
> +	unsigned int ctype, level, leaves, fw_level;
>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>  
>  	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
> @@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
>  		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
>  	}
>  
> -	of_level = of_find_last_cache_level(cpu);
> -	if (level < of_level) {
> +	if (acpi_disabled)
> +		fw_level = of_find_last_cache_level(cpu);
> +	else
> +		fw_level = acpi_find_last_cache_level(cpu);
> +
> +	if (level < fw_level) {
>  		/*
>  		 * some external caches not specified in CLIDR_EL1
>  		 * the information may be available in the device tree
>  		 * only unified external caches are considered here
>  		 */
> -		leaves += (of_level - level);
> -		level = of_level;
> +		leaves += (fw_level - level);
> +		level = fw_level;
>  	}
>  
>  	this_cpu_ci->num_levels = level;
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index c86715fed4a7..b5c6de37e328 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -355,6 +355,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  				    struct acpi_pptt_cache *found_cache,
>  				    struct acpi_pptt_processor *cpu_node)
>  {
> +	this_leaf->firmware_node = cpu_node;

This pointer comes from an ACPI static table mapping that happens
to be permanent so it should be safe given that the mapping is carried
out at device_initcall (ie cache_setup_acpi()) but it is not that
obvious. More to it below.

>  	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>  		this_leaf->size = found_cache->size;
>  	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index eb3af2739537..8eca279e50d1 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
>  static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  					   struct cacheinfo *sib_leaf)
>  {
> -	return sib_leaf->of_node == this_leaf->of_node;
> +	return sib_leaf->firmware_node == this_leaf->firmware_node;
>  }
>  
>  /* OF properties to query for a given cache type */
> @@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  }
>  #endif
>  
> +int __weak cache_setup_acpi(unsigned int cpu)
> +{
> +	return -ENOTSUPP;
> +}
> +
>  static int cache_shared_cpu_map_setup(unsigned int cpu)
>  {
>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> @@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>  	if (this_cpu_ci->cpu_map_populated)
>  		return 0;
>  
> -	if (of_have_populated_dt())
> +	if (!acpi_disabled)
> +		ret = cache_setup_acpi(cpu);
> +	else if (of_have_populated_dt())
>  		ret = cache_setup_of_node(cpu);
> -	else if (!acpi_disabled)
> -		/* No cache property/hierarchy support yet in ACPI */
> -		ret = -ENOTSUPP;
> +
>  	if (ret)
>  		return ret;
>  
> @@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
>  
>  static void cache_override_properties(unsigned int cpu)
>  {
> -	if (of_have_populated_dt())
> +	if (acpi_disabled && of_have_populated_dt())
>  		return cache_of_override_properties(cpu);
>  }
>  
> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index 6a524bf6a06d..d1e9b8e01981 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -36,6 +36,9 @@ enum cache_type {
>   * @of_node: if devicetree is used, this represents either the cpu node in
>   *	case there's no explicit cache node or the cache node itself in the
>   *	device tree
> + * @firmware_node: Shared with of_node. When not using DT, this may contain
> + *	pointers to other firmware based values. Particularly ACPI/PPTT
> + *	unique values.
>   * @disable_sysfs: indicates whether this node is visible to the user via
>   *	sysfs or not
>   * @priv: pointer to any private data structure specific to particular
> @@ -64,8 +67,10 @@ struct cacheinfo {
>  #define CACHE_ALLOCATE_POLICY_MASK	\
>  	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
>  #define CACHE_ID		BIT(4)
> -
> -	struct device_node *of_node;
> +	union {
> +		struct device_node *of_node;
> +		void *firmware_node;
> +	};

How about turning of_node into a struct fwnode_handle* (and allocate
one for each PPTT cache node - acpi_static_fwnode - to have the
ACPI counterpart) ?

It will make more churn given that that pointer is just used to
carry out a pointer comparison but it's also a bit more elegant.

Lorenzo

>  	bool disable_sysfs;
>  	void *priv;
>  };
> @@ -98,6 +103,8 @@ int func(unsigned int cpu)					\
>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>  int init_cache_level(unsigned int cpu);
>  int populate_cache_leaves(unsigned int cpu);
> +int cache_setup_acpi(unsigned int cpu);
> +int acpi_find_last_cache_level(unsigned int cpu);
>  
>  const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
>  
> -- 
> 2.13.5
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-19 10:22     ` Lorenzo Pieralisi
@ 2017-10-19 15:43       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 15:43 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On 10/19/2017 05:22 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:50PM -0500, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
> 
> I think this patch should be split ((1) topology (2) cache), it is doing
> too much which makes it hard to review.

If you look at the RFC, it only did cache parsing, the topology changes 
were added for v1. The cache bits are the ugly parts because they are 
walking up/down both the node tree, as well as the cache tree's attached 
to the nodes during the walk. Once that was in the place the addition of 
the cpu topology was trivial. But, trying to understand the cpu topology 
without first understanding the weird stuff done for the cache topology 
might not be the right way to approach this code.

> 
> [...]
> 
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +			       struct acpi_pptt_processor *node)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	u32 node_entry;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +						sizeof(struct acpi_table_pptt));
>> +
>> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    (cpu_node->parent == node_entry))
>> +			return 0;
>> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
>> +	}
> 
> A leaf node is a node with a valid acpi_id corresponding to an MADT
> entry, right ? By the way, is this function really needed ?

Yes, because the only way to determine if it is a leaf node is to see if 
there are any references to it elsewhere in the table because the nodes 
point towards the root of the tree (rather than the other way).

This piece was the primary change for v1->v2.

> 
>> +	return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +	struct acpi_table_header *table_hdr,
>> +	u32 acpi_cpu_id)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +						sizeof(struct acpi_table_pptt));
>> +
>> +	/* find the processor structure associated with this cpuid */
>> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> 
> Is the leaf node check necessary ? Or you just need to check the
> ACPI Processor ID valid flag (as discussed offline) ?

The valid flag doesn't mean anything for the leaf nodes, so its the only 
correct way of determining if the node _might_ have a valid madt/acpi 
ID. This actually should have the acpi_cpu_id checked as part of the if 
statement and the leaf node check below because doing it this way makes 
this parse n^2 instead of 2n. Of course in my mind, checking the id 
before we know it might be valid is backwards of the "logical" way to do it.

> 
>> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> 
> Side note: I'd question (some of) these pr_debug() messages
> 
>> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +				/* found the correct entry */
>> +				pr_debug("match found!\n");
> 
> Like this one for instance.

This one is a bit redundant, but I come from the school that I want to 
be able to debug a remote machine. Large blocks of silent code are a 
nightmare, particularly if you have a sysadmin level user driving the 
keyboard/etc.

> 
>> +				return (struct acpi_pptt_processor *)entry;
>> +			}
>> +		}
>> +
>> +		if (entry->length == 0) {
>> +			pr_err("Invalid zero length subtable\n");
>> +			break;
>> +		}
> 
> This should be moved at the beginning of the loop.

Yah, the intention was to verify the next entry, but if its 0 then good 
point, the current one is probably invalid.

> 
>> +		entry = (struct acpi_subtable_header *)
>> +			((u8 *)entry + entry->length);
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *cpu,
>> +	int level)
>> +{
>> +	struct acpi_pptt_processor *prev_node;
>> +
>> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> 
> I really do not understand what ACPI_PPTT_PHYSICAL_PACKAGE means and
> more importantly, how it is actually used in this code.

?

Physical package maps to the package_id, which is generally defined to 
mean the "socket" and is used to terminate the cpu topology side of the 
parse.

> 
> This function is used to get a topology id (that is just a number for
> a given topology level) for a given level starting from a given leaf
> node.

This flag is the one decent part of the spec, because its the only level 
which actually is guaranteed to mean anything. Because the requirement 
that the sharability of cache nodes is described with general processor 
nodes it means that the number of nodes within a given leg of the tree 
is mostly meaningless because people sprinkle caches around the system, 
including potentially above the "socket" level.

> Why do we care at all about ACPI_PPTT_PHYSICAL_PACKAGE ?

Because, it gives us a hard mapping to core siblings.

> 
>> +		pr_debug("level %d\n", level);
>> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +		if (prev_node == NULL)
>> +			break;
>> +		cpu = prev_node;
>> +		level--;
>> +	}
>> +	return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
>> +{
>> +	int number_of_levels = 0;
>> +	struct acpi_pptt_processor *cpu;
>> +
>> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (cpu)
>> +		number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
>> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +	switch (type) {
>> +	case CACHE_TYPE_DATA:
>> +		pr_debug("Looking for data cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_DATA;
>> +	case CACHE_TYPE_INST:
>> +		pr_debug("Looking for instruction cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_INSTR;
>> +	default:
>> +		pr_debug("Unknown cache type, assume unified\n");
>> +	case CACHE_TYPE_UNIFIED:
>> +		pr_debug("Looking for unified cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +	}
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +	enum cache_type type, unsigned int level,
>> +	struct acpi_pptt_processor **node)
>> +{
>> +	int total_levels = 0;
>> +	struct acpi_pptt_cache *found = NULL;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u8 acpi_type = acpi_cache_type(type);
>> +
>> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +		 acpi_cpu_id, level, acpi_type);
>> +
>> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (!cpu_node)
>> +		return NULL;
>> +
>> +	do {
>> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
>> +		*node = cpu_node;
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	} while ((cpu_node) && (!found));
>> +
>> +	return found;
>> +}
>> +
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	u32 acpi_cpu_id;
>> +	struct acpi_table_header *table;
>> +	int number_of_levels = 0;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
>> +
>> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> This would break !ARM64.

> 
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");

Yup, as in a way this does too... Without writing the binding code for 
another arch where that line is isn't clear at the moment. Part of the 
reason I put this in the arm64 directory.


>> +	} else {
>> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
>> +		acpi_put_table(table);
>> +	}
>> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +/*
>> + * The ACPI spec implies that the fields in the cache structures are used to
>> + * extend and correct the information probed from the hardware. In the case
>> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
>> + */
>> +static void update_cache_properties(struct cacheinfo *this_leaf,
>> +				    struct acpi_pptt_cache *found_cache,
>> +				    struct acpi_pptt_processor *cpu_node)
>> +{
>> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> +		this_leaf->size = found_cache->size;
>> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> +		this_leaf->coherency_line_size = found_cache->line_size;
>> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> +		this_leaf->number_of_sets = found_cache->number_of_sets;
>> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> +		this_leaf->ways_of_associativity = found_cache->associativity;
>> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +		case ACPI_6_2_CACHE_POLICY_WT:
>> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
>> +			break;
>> +		case ACPI_6_2_CACHE_POLICY_WB:
>> +			this_leaf->attributes = CACHE_WRITE_BACK;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache policy %d\n",
>> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
>> +		}
>> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
>> +			this_leaf->attributes |=
>> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache allocation policy %d\n",
>> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
>> +		}
>> +}
>> +
>> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>> +				 unsigned int cpu)
>> +{
>> +	struct acpi_pptt_cache *found_cache;
>> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> Ditto.
> 
>> +	struct cacheinfo *this_leaf;
>> +	unsigned int index = 0;
>> +	struct acpi_pptt_processor *cpu_node = NULL;
>> +
>> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
>> +		this_leaf = this_cpu_ci->info_list + index;
>> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						   this_leaf->type,
>> +						   this_leaf->level,
>> +						   &cpu_node);
>> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
>> +		if (found_cache)
>> +			update_cache_properties(this_leaf,
>> +						found_cache,
>> +						cpu_node);
>> +
>> +		index++;
>> +	}
>> +}
>> +
>> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
>> +				    unsigned int cpu, int level)
>> +{
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> Ditto.
> 
>> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +	if (cpu_node) {
>> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> 
> If level is 0 there is nothing to do here.
> 
>> +		/* Only the first level has a guaranteed id */
>> +		if (level == 0)
>> +			return cpu_node->acpi_processor_id;
>> +		return (int)((u8 *)cpu_node - (u8 *)table);
> 
> Please explain to me the rationale behind this. To me acpi_processor_id
> is as good as the cpu_node offset in the table to describe the topology
> id at a given level, why special case level 0.

Level 0 is the only level guaranteed to have something set in the 
acpi_processor_id field. Its possible that values exist in nodes above 
this one, but they must _all_ be flagged and have matching container 
ids, and nothing in the spec requires that. Meaning that we need a 
guaranteed way to generate ids. This was added between v2->v3 after the 
discussion about making the ids a little nicer for the user.


> 
> On top of that, with this ID scheme, we would end up with
> thread/core/cluster id potentially being non-sequential values
> (depending on the PPTT table layout) which should not be a problem but
> we'd better check how people are using them.

The thread (or core, depending on which is the 0 level) will have 
firmware provided Ids, everything else gets somewhat random looking but 
consistent ids. I commented earlier in this series that "normalizing" 
them is totally doable, although at the moment really only the 
physical_id is user visible and that should probably be normalized 
outside of this module in the arm64 topology parser if we want to 
actually do it. I'm not sure its worth the effort at least not as part 
of the general PPTT changes.


> 
>> +	}
>> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
>> +		    cpu);
>> +	return -ENOENT;
>> +}
>> +
>> +/*
>> + * simply assign a ACPI cache entry to each known CPU cache entry
>> + * determining which entries are shared is done later.
> 
> Add a kerneldoc style comment for an external interface.

That is a good point.

> 
>> + */
>> +int cache_setup_acpi(unsigned int cpu)
>> +{
>> +	struct acpi_table_header *table;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
>> +		return -ENOENT;
>> +	}
>> +
>> +	cache_setup_acpi_cpu(table, cpu);
>> +	acpi_put_table(table);
>> +
>> +	return status;
>> +}
>> +
>> +/*
>> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
>> + * This ID can then be used to group peers.
> 
> Ditto.
> 
>> + */
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
>> +{
>> +	struct acpi_table_header *table;
>> +	acpi_status status;
>> +	int retval;
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
>> +		return -ENOENT;
>> +	}
>> +	retval = topology_setup_acpi_cpu(table, cpu, level);
>> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
>> +		 cpu, level, retval);
>> +	acpi_put_table(table);
>> +
>> +	return retval;
> 
> This value is just a token - with no HW meaning whatsoever and that's
> where I question the ACPI_PPTT_PHYSICAL_PACKAGE flag usage in retrieving
> it, you are not looking for a packageid (which has no meaning whatsoever
> anyway and I wonder why it was added to the specs at all) you are
> looking for an id at a given level.

If you look at the next patch in the series, to get the top level I pass 
an arbitrary large value as the "level" which should terminate on the 
PHYSICAL_PACKAGE rather than any intermediate nodes.


> 
> I will comment on the cache code separately - which deserves to
> be in a separate patch to simplify the review, I avoided repeating
> already reported review comments.
> 
> Lorenzo
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-19 15:43       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/19/2017 05:22 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:50PM -0500, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> Further, report peers in the topology using setup_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
> 
> I think this patch should be split ((1) topology (2) cache), it is doing
> too much which makes it hard to review.

If you look at the RFC, it only did cache parsing, the topology changes 
were added for v1. The cache bits are the ugly parts because they are 
walking up/down both the node tree, as well as the cache tree's attached 
to the nodes during the walk. Once that was in the place the addition of 
the cpu topology was trivial. But, trying to understand the cpu topology 
without first understanding the weird stuff done for the cache topology 
might not be the right way to approach this code.

> 
> [...]
> 
>> +/* determine if the given node is a leaf node */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +			       struct acpi_pptt_processor *node)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	u32 node_entry;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +						sizeof(struct acpi_table_pptt));
>> +
>> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    (cpu_node->parent == node_entry))
>> +			return 0;
>> +		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
>> +	}
> 
> A leaf node is a node with a valid acpi_id corresponding to an MADT
> entry, right ? By the way, is this function really needed ?

Yes, because the only way to determine if it is a leaf node is to see if 
there are any references to it elsewhere in the table because the nodes 
point towards the root of the tree (rather than the other way).

This piece was the primary change for v1->v2.

> 
>> +	return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +	struct acpi_table_header *table_hdr,
>> +	u32 acpi_cpu_id)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
>> +						sizeof(struct acpi_table_pptt));
>> +
>> +	/* find the processor structure associated with this cpuid */
>> +	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> 
> Is the leaf node check necessary ? Or you just need to check the
> ACPI Processor ID valid flag (as discussed offline) ?

The valid flag doesn't mean anything for the leaf nodes, so its the only 
correct way of determining if the node _might_ have a valid madt/acpi 
ID. This actually should have the acpi_cpu_id checked as part of the if 
statement and the leaf node check below because doing it this way makes 
this parse n^2 instead of 2n. Of course in my mind, checking the id 
before we know it might be valid is backwards of the "logical" way to do it.

> 
>> +			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
>> +				 acpi_cpu_id, cpu_node->acpi_processor_id);
> 
> Side note: I'd question (some of) these pr_debug() messages
> 
>> +			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
>> +				/* found the correct entry */
>> +				pr_debug("match found!\n");
> 
> Like this one for instance.

This one is a bit redundant, but I come from the school that I want to 
be able to debug a remote machine. Large blocks of silent code are a 
nightmare, particularly if you have a sysadmin level user driving the 
keyboard/etc.

> 
>> +				return (struct acpi_pptt_processor *)entry;
>> +			}
>> +		}
>> +
>> +		if (entry->length == 0) {
>> +			pr_err("Invalid zero length subtable\n");
>> +			break;
>> +		}
> 
> This should be moved at the beginning of the loop.

Yah, the intention was to verify the next entry, but if its 0 then good 
point, the current one is probably invalid.

> 
>> +		entry = (struct acpi_subtable_header *)
>> +			((u8 *)entry + entry->length);
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * Given a acpi_pptt_processor node, walk up until we identify the
>> + * package that the node is associated with or we run out of levels
>> + * to request.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_package_id(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *cpu,
>> +	int level)
>> +{
>> +	struct acpi_pptt_processor *prev_node;
>> +
>> +	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> 
> I really do not understand what ACPI_PPTT_PHYSICAL_PACKAGE means and
> more importantly, how it is actually used in this code.

?

Physical package maps to the package_id, which is generally defined to 
mean the "socket" and is used to terminate the cpu topology side of the 
parse.

> 
> This function is used to get a topology id (that is just a number for
> a given topology level) for a given level starting from a given leaf
> node.

This flag is the one decent part of the spec, because its the only level 
which actually is guaranteed to mean anything. Because the requirement 
that the sharability of cache nodes is described with general processor 
nodes it means that the number of nodes within a given leg of the tree 
is mostly meaningless because people sprinkle caches around the system, 
including potentially above the "socket" level.

> Why do we care at all about ACPI_PPTT_PHYSICAL_PACKAGE ?

Because, it gives us a hard mapping to core siblings.

> 
>> +		pr_debug("level %d\n", level);
>> +		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
>> +		if (prev_node == NULL)
>> +			break;
>> +		cpu = prev_node;
>> +		level--;
>> +	}
>> +	return cpu;
>> +}
>> +
>> +static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
>> +{
>> +	int number_of_levels = 0;
>> +	struct acpi_pptt_processor *cpu;
>> +
>> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (cpu)
>> +		number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
>> +#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
>> +#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
>> +#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
>> +#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
>> +#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
>> +#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
>> +#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
>> +
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
>> +	switch (type) {
>> +	case CACHE_TYPE_DATA:
>> +		pr_debug("Looking for data cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_DATA;
>> +	case CACHE_TYPE_INST:
>> +		pr_debug("Looking for instruction cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_INSTR;
>> +	default:
>> +		pr_debug("Unknown cache type, assume unified\n");
>> +	case CACHE_TYPE_UNIFIED:
>> +		pr_debug("Looking for unified cache\n");
>> +		return ACPI_6_2_CACHE_TYPE_UNIFIED;
>> +	}
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +	enum cache_type type, unsigned int level,
>> +	struct acpi_pptt_processor **node)
>> +{
>> +	int total_levels = 0;
>> +	struct acpi_pptt_cache *found = NULL;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u8 acpi_type = acpi_cache_type(type);
>> +
>> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +		 acpi_cpu_id, level, acpi_type);
>> +
>> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (!cpu_node)
>> +		return NULL;
>> +
>> +	do {
>> +		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
>> +		*node = cpu_node;
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	} while ((cpu_node) && (!found));
>> +
>> +	return found;
>> +}
>> +
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	u32 acpi_cpu_id;
>> +	struct acpi_table_header *table;
>> +	int number_of_levels = 0;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
>> +
>> +	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> This would break !ARM64.

> 
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");

Yup, as in a way this does too... Without writing the binding code for 
another arch where that line is isn't clear at the moment. Part of the 
reason I put this in the arm64 directory.


>> +	} else {
>> +		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
>> +		acpi_put_table(table);
>> +	}
>> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +/*
>> + * The ACPI spec implies that the fields in the cache structures are used to
>> + * extend and correct the information probed from the hardware. In the case
>> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
>> + */
>> +static void update_cache_properties(struct cacheinfo *this_leaf,
>> +				    struct acpi_pptt_cache *found_cache,
>> +				    struct acpi_pptt_processor *cpu_node)
>> +{
>> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>> +		this_leaf->size = found_cache->size;
>> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> +		this_leaf->coherency_line_size = found_cache->line_size;
>> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
>> +		this_leaf->number_of_sets = found_cache->number_of_sets;
>> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
>> +		this_leaf->ways_of_associativity = found_cache->associativity;
>> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +		case ACPI_6_2_CACHE_POLICY_WT:
>> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
>> +			break;
>> +		case ACPI_6_2_CACHE_POLICY_WB:
>> +			this_leaf->attributes = CACHE_WRITE_BACK;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache policy %d\n",
>> +			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
>> +		}
>> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +		case ACPI_6_2_CACHE_READ_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		case ACPI_6_2_CACHE_RW_ALLOCATE:
>> +			this_leaf->attributes |=
>> +				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		default:
>> +			pr_err("Unknown ACPI cache allocation policy %d\n",
>> +			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
>> +		}
>> +}
>> +
>> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>> +				 unsigned int cpu)
>> +{
>> +	struct acpi_pptt_cache *found_cache;
>> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> Ditto.
> 
>> +	struct cacheinfo *this_leaf;
>> +	unsigned int index = 0;
>> +	struct acpi_pptt_processor *cpu_node = NULL;
>> +
>> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
>> +		this_leaf = this_cpu_ci->info_list + index;
>> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						   this_leaf->type,
>> +						   this_leaf->level,
>> +						   &cpu_node);
>> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
>> +		if (found_cache)
>> +			update_cache_properties(this_leaf,
>> +						found_cache,
>> +						cpu_node);
>> +
>> +		index++;
>> +	}
>> +}
>> +
>> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
>> +				    unsigned int cpu, int level)
>> +{
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> Ditto.
> 
>> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +	if (cpu_node) {
>> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> 
> If level is 0 there is nothing to do here.
> 
>> +		/* Only the first level has a guaranteed id */
>> +		if (level == 0)
>> +			return cpu_node->acpi_processor_id;
>> +		return (int)((u8 *)cpu_node - (u8 *)table);
> 
> Please explain to me the rationale behind this. To me acpi_processor_id
> is as good as the cpu_node offset in the table to describe the topology
> id at a given level, why special case level 0.

Level 0 is the only level guaranteed to have something set in the 
acpi_processor_id field. Its possible that values exist in nodes above 
this one, but they must _all_ be flagged and have matching container 
ids, and nothing in the spec requires that. Meaning that we need a 
guaranteed way to generate ids. This was added between v2->v3 after the 
discussion about making the ids a little nicer for the user.


> 
> On top of that, with this ID scheme, we would end up with
> thread/core/cluster id potentially being non-sequential values
> (depending on the PPTT table layout) which should not be a problem but
> we'd better check how people are using them.

The thread (or core, depending on which is the 0 level) will have 
firmware provided Ids, everything else gets somewhat random looking but 
consistent ids. I commented earlier in this series that "normalizing" 
them is totally doable, although at the moment really only the 
physical_id is user visible and that should probably be normalized 
outside of this module in the arm64 topology parser if we want to 
actually do it. I'm not sure its worth the effort at least not as part 
of the general PPTT changes.


> 
>> +	}
>> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
>> +		    cpu);
>> +	return -ENOENT;
>> +}
>> +
>> +/*
>> + * simply assign a ACPI cache entry to each known CPU cache entry
>> + * determining which entries are shared is done later.
> 
> Add a kerneldoc style comment for an external interface.

That is a good point.

> 
>> + */
>> +int cache_setup_acpi(unsigned int cpu)
>> +{
>> +	struct acpi_table_header *table;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
>> +		return -ENOENT;
>> +	}
>> +
>> +	cache_setup_acpi_cpu(table, cpu);
>> +	acpi_put_table(table);
>> +
>> +	return status;
>> +}
>> +
>> +/*
>> + * Determine a topology unique ID for each thread/core/cluster/socket/etc.
>> + * This ID can then be used to group peers.
> 
> Ditto.
> 
>> + */
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level)
>> +{
>> +	struct acpi_table_header *table;
>> +	acpi_status status;
>> +	int retval;
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
>> +		return -ENOENT;
>> +	}
>> +	retval = topology_setup_acpi_cpu(table, cpu, level);
>> +	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
>> +		 cpu, level, retval);
>> +	acpi_put_table(table);
>> +
>> +	return retval;
> 
> This value is just a token - with no HW meaning whatsoever and that's
> where I question the ACPI_PPTT_PHYSICAL_PACKAGE flag usage in retrieving
> it, you are not looking for a packageid (which has no meaning whatsoever
> anyway and I wonder why it was added to the specs at all) you are
> looking for an id at a given level.

If you look at the next patch in the series, to get the top level I pass 
an arbitrary large value as the "level" which should terminate on the 
PHYSICAL_PACKAGE rather than any intermediate nodes.


> 
> I will comment on the cache code separately - which deserves to
> be in a separate patch to simplify the review, I avoided repeating
> already reported review comments.
> 
> Lorenzo
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 3/7] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
  2017-10-19 15:20     ` Lorenzo Pieralisi
@ 2017-10-19 15:52       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 15:52 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

Hi,


On 10/19/2017 10:20 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:52PM -0500, Jeremy Linton wrote:
>> The /sys cache entries should support ACPI/PPTT generated cache
>> topology information. Lets detect ACPI systems and call
>> an arch specific cache_setup_acpi() routine to update the hardware
>> probed cache topology.
>>
>> For arm64, if ACPI is enabled, determine the max number of cache
>> levels and populate them using a PPTT table if one is available.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
>>   drivers/acpi/pptt.c           |  1 +
>>   drivers/base/cacheinfo.c      | 17 +++++++++++------
>>   include/linux/cacheinfo.h     | 11 +++++++++--
>>   4 files changed, 39 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
>> index 380f2e2fbed5..2e2cf0d312ba 100644
>> --- a/arch/arm64/kernel/cacheinfo.c
>> +++ b/arch/arm64/kernel/cacheinfo.c
>> @@ -17,6 +17,7 @@
>>    * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/cacheinfo.h>
>>   #include <linux/of.h>
>>   
>> @@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>   	this_leaf->type = type;
>>   }
>>   
>> +#ifndef CONFIG_ACPI
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
>> +	return 0;
>> +}
>> +#endif
>> +
>>   static int __init_cache_level(unsigned int cpu)
>>   {
>> -	unsigned int ctype, level, leaves, of_level;
>> +	unsigned int ctype, level, leaves, fw_level;
>>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>>   
>>   	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
>> @@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
>>   		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
>>   	}
>>   
>> -	of_level = of_find_last_cache_level(cpu);
>> -	if (level < of_level) {
>> +	if (acpi_disabled)
>> +		fw_level = of_find_last_cache_level(cpu);
>> +	else
>> +		fw_level = acpi_find_last_cache_level(cpu);
>> +
>> +	if (level < fw_level) {
>>   		/*
>>   		 * some external caches not specified in CLIDR_EL1
>>   		 * the information may be available in the device tree
>>   		 * only unified external caches are considered here
>>   		 */
>> -		leaves += (of_level - level);
>> -		level = of_level;
>> +		leaves += (fw_level - level);
>> +		level = fw_level;
>>   	}
>>   
>>   	this_cpu_ci->num_levels = level;
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index c86715fed4a7..b5c6de37e328 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -355,6 +355,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>>   				    struct acpi_pptt_cache *found_cache,
>>   				    struct acpi_pptt_processor *cpu_node)
>>   {
>> +	this_leaf->firmware_node = cpu_node;
> 
> This pointer comes from an ACPI static table mapping that happens
> to be permanent so it should be safe given that the mapping is carried
> out at device_initcall (ie cache_setup_acpi()) but it is not that
> obvious. More to it below.
> 
>>   	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>>   		this_leaf->size = found_cache->size;
>>   	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>> index eb3af2739537..8eca279e50d1 100644
>> --- a/drivers/base/cacheinfo.c
>> +++ b/drivers/base/cacheinfo.c
>> @@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
>>   static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   					   struct cacheinfo *sib_leaf)
>>   {
>> -	return sib_leaf->of_node == this_leaf->of_node;
>> +	return sib_leaf->firmware_node == this_leaf->firmware_node;
>>   }
>>   
>>   /* OF properties to query for a given cache type */
>> @@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   }
>>   #endif
>>   
>> +int __weak cache_setup_acpi(unsigned int cpu)
>> +{
>> +	return -ENOTSUPP;
>> +}
>> +
>>   static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   {
>>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> @@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   	if (this_cpu_ci->cpu_map_populated)
>>   		return 0;
>>   
>> -	if (of_have_populated_dt())
>> +	if (!acpi_disabled)
>> +		ret = cache_setup_acpi(cpu);
>> +	else if (of_have_populated_dt())
>>   		ret = cache_setup_of_node(cpu);
>> -	else if (!acpi_disabled)
>> -		/* No cache property/hierarchy support yet in ACPI */
>> -		ret = -ENOTSUPP;
>> +
>>   	if (ret)
>>   		return ret;
>>   
>> @@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
>>   
>>   static void cache_override_properties(unsigned int cpu)
>>   {
>> -	if (of_have_populated_dt())
>> +	if (acpi_disabled && of_have_populated_dt())
>>   		return cache_of_override_properties(cpu);
>>   }
>>   
>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 6a524bf6a06d..d1e9b8e01981 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -36,6 +36,9 @@ enum cache_type {
>>    * @of_node: if devicetree is used, this represents either the cpu node in
>>    *	case there's no explicit cache node or the cache node itself in the
>>    *	device tree
>> + * @firmware_node: Shared with of_node. When not using DT, this may contain
>> + *	pointers to other firmware based values. Particularly ACPI/PPTT
>> + *	unique values.
>>    * @disable_sysfs: indicates whether this node is visible to the user via
>>    *	sysfs or not
>>    * @priv: pointer to any private data structure specific to particular
>> @@ -64,8 +67,10 @@ struct cacheinfo {
>>   #define CACHE_ALLOCATE_POLICY_MASK	\
>>   	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
>>   #define CACHE_ID		BIT(4)
>> -
>> -	struct device_node *of_node;
>> +	union {
>> +		struct device_node *of_node;
>> +		void *firmware_node;
>> +	};
> 
> How about turning of_node into a struct fwnode_handle* (and allocate
> one for each PPTT cache node - acpi_static_fwnode - to have the
> ACPI counterpart) ?

As pointed out internally (with a patch), I converted this whole module 
into fwnode, but the ACPI side is still a mess because it doesn't map 
cleanly to what ends up being an relative acpi pointer, mostly because 
fwnode is really designed more for parsing DT and the ACPI DSDT/etc. So 
you end up with fwnode routines that simply don't have mappings to PPTT, 
with IMHO is going to just create far more confusion than any assumed 
cleanliness that might result.

> 
> It will make more churn given that that pointer is just used to
> carry out a pointer comparison but it's also a bit more elegant.

Except for the huge api surface that takes a fwnode which won't work.

> 
> Lorenzo
> 
>>   	bool disable_sysfs;
>>   	void *priv;
>>   };
>> @@ -98,6 +103,8 @@ int func(unsigned int cpu)					\
>>   struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>>   int init_cache_level(unsigned int cpu);
>>   int populate_cache_leaves(unsigned int cpu);
>> +int cache_setup_acpi(unsigned int cpu);
>> +int acpi_find_last_cache_level(unsigned int cpu);
>>   
>>   const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
>>   
>> -- 
>> 2.13.5
>>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 3/7] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
@ 2017-10-19 15:52       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


On 10/19/2017 10:20 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:52PM -0500, Jeremy Linton wrote:
>> The /sys cache entries should support ACPI/PPTT generated cache
>> topology information. Lets detect ACPI systems and call
>> an arch specific cache_setup_acpi() routine to update the hardware
>> probed cache topology.
>>
>> For arm64, if ACPI is enabled, determine the max number of cache
>> levels and populate them using a PPTT table if one is available.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
>>   drivers/acpi/pptt.c           |  1 +
>>   drivers/base/cacheinfo.c      | 17 +++++++++++------
>>   include/linux/cacheinfo.h     | 11 +++++++++--
>>   4 files changed, 39 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
>> index 380f2e2fbed5..2e2cf0d312ba 100644
>> --- a/arch/arm64/kernel/cacheinfo.c
>> +++ b/arch/arm64/kernel/cacheinfo.c
>> @@ -17,6 +17,7 @@
>>    * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/cacheinfo.h>
>>   #include <linux/of.h>
>>   
>> @@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>   	this_leaf->type = type;
>>   }
>>   
>> +#ifndef CONFIG_ACPI
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
>> +	return 0;
>> +}
>> +#endif
>> +
>>   static int __init_cache_level(unsigned int cpu)
>>   {
>> -	unsigned int ctype, level, leaves, of_level;
>> +	unsigned int ctype, level, leaves, fw_level;
>>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>>   
>>   	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
>> @@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
>>   		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
>>   	}
>>   
>> -	of_level = of_find_last_cache_level(cpu);
>> -	if (level < of_level) {
>> +	if (acpi_disabled)
>> +		fw_level = of_find_last_cache_level(cpu);
>> +	else
>> +		fw_level = acpi_find_last_cache_level(cpu);
>> +
>> +	if (level < fw_level) {
>>   		/*
>>   		 * some external caches not specified in CLIDR_EL1
>>   		 * the information may be available in the device tree
>>   		 * only unified external caches are considered here
>>   		 */
>> -		leaves += (of_level - level);
>> -		level = of_level;
>> +		leaves += (fw_level - level);
>> +		level = fw_level;
>>   	}
>>   
>>   	this_cpu_ci->num_levels = level;
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index c86715fed4a7..b5c6de37e328 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -355,6 +355,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>>   				    struct acpi_pptt_cache *found_cache,
>>   				    struct acpi_pptt_processor *cpu_node)
>>   {
>> +	this_leaf->firmware_node = cpu_node;
> 
> This pointer comes from an ACPI static table mapping that happens
> to be permanent so it should be safe given that the mapping is carried
> out at device_initcall (ie cache_setup_acpi()) but it is not that
> obvious. More to it below.
> 
>>   	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
>>   		this_leaf->size = found_cache->size;
>>   	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>> index eb3af2739537..8eca279e50d1 100644
>> --- a/drivers/base/cacheinfo.c
>> +++ b/drivers/base/cacheinfo.c
>> @@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
>>   static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   					   struct cacheinfo *sib_leaf)
>>   {
>> -	return sib_leaf->of_node == this_leaf->of_node;
>> +	return sib_leaf->firmware_node == this_leaf->firmware_node;
>>   }
>>   
>>   /* OF properties to query for a given cache type */
>> @@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   }
>>   #endif
>>   
>> +int __weak cache_setup_acpi(unsigned int cpu)
>> +{
>> +	return -ENOTSUPP;
>> +}
>> +
>>   static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   {
>>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> @@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   	if (this_cpu_ci->cpu_map_populated)
>>   		return 0;
>>   
>> -	if (of_have_populated_dt())
>> +	if (!acpi_disabled)
>> +		ret = cache_setup_acpi(cpu);
>> +	else if (of_have_populated_dt())
>>   		ret = cache_setup_of_node(cpu);
>> -	else if (!acpi_disabled)
>> -		/* No cache property/hierarchy support yet in ACPI */
>> -		ret = -ENOTSUPP;
>> +
>>   	if (ret)
>>   		return ret;
>>   
>> @@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
>>   
>>   static void cache_override_properties(unsigned int cpu)
>>   {
>> -	if (of_have_populated_dt())
>> +	if (acpi_disabled && of_have_populated_dt())
>>   		return cache_of_override_properties(cpu);
>>   }
>>   
>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 6a524bf6a06d..d1e9b8e01981 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -36,6 +36,9 @@ enum cache_type {
>>    * @of_node: if devicetree is used, this represents either the cpu node in
>>    *	case there's no explicit cache node or the cache node itself in the
>>    *	device tree
>> + * @firmware_node: Shared with of_node. When not using DT, this may contain
>> + *	pointers to other firmware based values. Particularly ACPI/PPTT
>> + *	unique values.
>>    * @disable_sysfs: indicates whether this node is visible to the user via
>>    *	sysfs or not
>>    * @priv: pointer to any private data structure specific to particular
>> @@ -64,8 +67,10 @@ struct cacheinfo {
>>   #define CACHE_ALLOCATE_POLICY_MASK	\
>>   	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
>>   #define CACHE_ID		BIT(4)
>> -
>> -	struct device_node *of_node;
>> +	union {
>> +		struct device_node *of_node;
>> +		void *firmware_node;
>> +	};
> 
> How about turning of_node into a struct fwnode_handle* (and allocate
> one for each PPTT cache node - acpi_static_fwnode - to have the
> ACPI counterpart) ?

As pointed out internally (with a patch), I converted this whole module 
into fwnode, but the ACPI side is still a mess because it doesn't map 
cleanly to what ends up being an relative acpi pointer, mostly because 
fwnode is really designed more for parsing DT and the ACPI DSDT/etc. So 
you end up with fwnode routines that simply don't have mappings to PPTT, 
with IMHO is going to just create far more confusion than any assumed 
cleanliness that might result.

> 
> It will make more churn given that that pointer is just used to
> carry out a pointer comparison but it's also a bit more elegant.

Except for the huge api surface that takes a fwnode which won't work.

> 
> Lorenzo
> 
>>   	bool disable_sysfs;
>>   	void *priv;
>>   };
>> @@ -98,6 +103,8 @@ int func(unsigned int cpu)					\
>>   struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>>   int init_cache_level(unsigned int cpu);
>>   int populate_cache_leaves(unsigned int cpu);
>> +int cache_setup_acpi(unsigned int cpu);
>> +int acpi_find_last_cache_level(unsigned int cpu);
>>   
>>   const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
>>   
>> -- 
>> 2.13.5
>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-19 15:56     ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19 15:56 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by passing an arbitrary large level to setup_acpi_cpu_topology()
> which terminates its search when it finds an ACPI node flagged
> as the physical package. If the tree doesn't contain enough
> levels to represent all of thread/core/cod/package then the package
> id will be used for the missing levels.
> 
> Since server/ACPI machines are more likely to be multisocket and NUMA,

I think this stuff is vague enough already so to start with I would drop
patch 4 and 5 and stop assuming what machines are more likely to ship
with ACPI than DT.

I am just saying, for the umpteenth time, that these levels have no
architectural meaning _whatsoever_, level is a hierarchy concept
with no architectural meaning attached.

The only consistent thing PPTT is bringing about is the hierarchy
levels/grouping (and _possibly_ - what a package boundary is), let's
stick to that for the time being.

> this patch also modifies the default clusters=sockets behavior
> for ACPI machines to sockets=sockets. DT machines continue to
> represent sockets as clusters. For ACPI machines, this results in a
> more normalized view of the topology. Cluster level scheduler decisions
> are still being made due to the "MC" level in the scheduler which has
> knowledge of cache sharing domains.
> 
> This code is loosely based on a combination of code from:
> Xiongfeng Wang <wangxiongfeng2@huawei.com>
> John Garry <john.garry@huawei.com>
> Jeffrey Hugo <jhugo@codeaurora.org>
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 54 +++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/topology.h     |  1 +
>  2 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 9147e5b6326d..42f3e7f28b2b 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -22,6 +23,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/topology.h>
>  #include <linux/slab.h>
> +#include <linux/smp.h>
>  #include <linux/string.h>
>  
>  #include <asm/cpu.h>
> @@ -304,6 +306,54 @@ static void __init reset_cpu_topology(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> +	u64 is_threaded;
> +	int cpu;
> +	int topology_id;
> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
> +	const int max_topo = 0xFF;
> +
> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +
> +	for_each_possible_cpu(cpu) {
> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
> +		if (topology_id < 0)
> +			return topology_id;
> +
> +		if (is_threaded) {
> +			cpu_topology[cpu].thread_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);

Nit: you can move setup_acpi_cpu_topology() to include/linux/acpi.h,
provide an empty inline function for the !ACPI case and remove
this function ACPI ifdeffery.

> +			cpu_topology[cpu].core_id   = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);

If you want a package id (that's just a package tag to group cores), you
should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
(so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
whatever it represents).

Lorenzo

> +			cpu_topology[cpu].package_id = topology_id;
> +		} else {
> +			cpu_topology[cpu].thread_id  = -1;
> +			cpu_topology[cpu].core_id    = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> +			cpu_topology[cpu].package_id = topology_id;
> +		}
> +	}
> +	return 0;
> +}
> +
> +#else
> +static int __init parse_acpi_topology(void)
> +{
> +	/*ACPI kernels should be built with PPTT support*/
> +	return -EINVAL;
> +}
> +#endif
> +
>  void __init init_cpu_topology(void)
>  {
>  	reset_cpu_topology();
> @@ -312,6 +362,8 @@ void __init init_cpu_topology(void)
>  	 * Discard anything that was parsed if we hit an error so we
>  	 * don't use partial information.
>  	 */
> -	if (of_have_populated_dt() && parse_dt_topology())
> +	if ((!acpi_disabled) && parse_acpi_topology())
> +		reset_cpu_topology();
> +	else if (of_have_populated_dt() && parse_dt_topology())
>  		reset_cpu_topology();
>  }
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 4660749a7303..cbf2fb13bf92 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -43,6 +43,7 @@
>  		if (nr_cpus_node(node))
>  
>  int arch_update_cpu_topology(void);
> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>  
>  /* Conform to ACPI 2.0 SLIT distance definitions */
>  #define LOCAL_DISTANCE		10
> -- 
> 2.13.5
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-19 15:56     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-19 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by passing an arbitrary large level to setup_acpi_cpu_topology()
> which terminates its search when it finds an ACPI node flagged
> as the physical package. If the tree doesn't contain enough
> levels to represent all of thread/core/cod/package then the package
> id will be used for the missing levels.
> 
> Since server/ACPI machines are more likely to be multisocket and NUMA,

I think this stuff is vague enough already so to start with I would drop
patch 4 and 5 and stop assuming what machines are more likely to ship
with ACPI than DT.

I am just saying, for the umpteenth time, that these levels have no
architectural meaning _whatsoever_, level is a hierarchy concept
with no architectural meaning attached.

The only consistent thing PPTT is bringing about is the hierarchy
levels/grouping (and _possibly_ - what a package boundary is), let's
stick to that for the time being.

> this patch also modifies the default clusters=sockets behavior
> for ACPI machines to sockets=sockets. DT machines continue to
> represent sockets as clusters. For ACPI machines, this results in a
> more normalized view of the topology. Cluster level scheduler decisions
> are still being made due to the "MC" level in the scheduler which has
> knowledge of cache sharing domains.
> 
> This code is loosely based on a combination of code from:
> Xiongfeng Wang <wangxiongfeng2@huawei.com>
> John Garry <john.garry@huawei.com>
> Jeffrey Hugo <jhugo@codeaurora.org>
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 54 +++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/topology.h     |  1 +
>  2 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 9147e5b6326d..42f3e7f28b2b 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -22,6 +23,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/topology.h>
>  #include <linux/slab.h>
> +#include <linux/smp.h>
>  #include <linux/string.h>
>  
>  #include <asm/cpu.h>
> @@ -304,6 +306,54 @@ static void __init reset_cpu_topology(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> +	u64 is_threaded;
> +	int cpu;
> +	int topology_id;
> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
> +	const int max_topo = 0xFF;
> +
> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +
> +	for_each_possible_cpu(cpu) {
> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
> +		if (topology_id < 0)
> +			return topology_id;
> +
> +		if (is_threaded) {
> +			cpu_topology[cpu].thread_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);

Nit: you can move setup_acpi_cpu_topology() to include/linux/acpi.h,
provide an empty inline function for the !ACPI case and remove
this function ACPI ifdeffery.

> +			cpu_topology[cpu].core_id   = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);

If you want a package id (that's just a package tag to group cores), you
should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
(so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
whatever it represents).

Lorenzo

> +			cpu_topology[cpu].package_id = topology_id;
> +		} else {
> +			cpu_topology[cpu].thread_id  = -1;
> +			cpu_topology[cpu].core_id    = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> +			cpu_topology[cpu].package_id = topology_id;
> +		}
> +	}
> +	return 0;
> +}
> +
> +#else
> +static int __init parse_acpi_topology(void)
> +{
> +	/*ACPI kernels should be built with PPTT support*/
> +	return -EINVAL;
> +}
> +#endif
> +
>  void __init init_cpu_topology(void)
>  {
>  	reset_cpu_topology();
> @@ -312,6 +362,8 @@ void __init init_cpu_topology(void)
>  	 * Discard anything that was parsed if we hit an error so we
>  	 * don't use partial information.
>  	 */
> -	if (of_have_populated_dt() && parse_dt_topology())
> +	if ((!acpi_disabled) && parse_acpi_topology())
> +		reset_cpu_topology();
> +	else if (of_have_populated_dt() && parse_dt_topology())
>  		reset_cpu_topology();
>  }
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 4660749a7303..cbf2fb13bf92 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -43,6 +43,7 @@
>  		if (nr_cpus_node(node))
>  
>  int arch_update_cpu_topology(void);
> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>  
>  /* Conform to ACPI 2.0 SLIT distance definitions */
>  #define LOCAL_DISTANCE		10
> -- 
> 2.13.5
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-19 15:56     ` Lorenzo Pieralisi
@ 2017-10-19 16:13       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 16:13 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>> which terminates its search when it finds an ACPI node flagged
>> as the physical package. If the tree doesn't contain enough
>> levels to represent all of thread/core/cod/package then the package
>> id will be used for the missing levels.
>>
>> Since server/ACPI machines are more likely to be multisocket and NUMA,
> 
> I think this stuff is vague enough already so to start with I would drop
> patch 4 and 5 and stop assuming what machines are more likely to ship
> with ACPI than DT.
> 
> I am just saying, for the umpteenth time, that these levels have no
> architectural meaning _whatsoever_, level is a hierarchy concept
> with no architectural meaning attached.

?

Did anyone say anything about that? No, I think the only thing being 
guaranteed here is that the kernel's physical_id maps to an ACPI defined 
socket. Which seems to be the mindset of pretty much the entire !arm64 
community meaning they are optimizing their software and the kernel with 
that concept in mind.

Are you denying the existence of non-uniformity between threads running 
on different physical sockets?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-19 16:13       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>> which terminates its search when it finds an ACPI node flagged
>> as the physical package. If the tree doesn't contain enough
>> levels to represent all of thread/core/cod/package then the package
>> id will be used for the missing levels.
>>
>> Since server/ACPI machines are more likely to be multisocket and NUMA,
> 
> I think this stuff is vague enough already so to start with I would drop
> patch 4 and 5 and stop assuming what machines are more likely to ship
> with ACPI than DT.
> 
> I am just saying, for the umpteenth time, that these levels have no
> architectural meaning _whatsoever_, level is a hierarchy concept
> with no architectural meaning attached.

?

Did anyone say anything about that? No, I think the only thing being 
guaranteed here is that the kernel's physical_id maps to an ACPI defined 
socket. Which seems to be the mindset of pretty much the entire !arm64 
community meaning they are optimizing their software and the kernel with 
that concept in mind.

Are you denying the existence of non-uniformity between threads running 
on different physical sockets?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-19 15:56     ` Lorenzo Pieralisi
@ 2017-10-19 16:54       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 16:54 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

Hi,

I missed the rest of the comment below..


On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>> which terminates its search when it finds an ACPI node flagged
>> as the physical package. If the tree doesn't contain enough
>> levels to represent all of thread/core/cod/package then the package
>> id will be used for the missing levels.
>>
>> Since server/ACPI machines are more likely to be multisocket and NUMA,
> 
> I think this stuff is vague enough already so to start with I would drop
> patch 4 and 5 and stop assuming what machines are more likely to ship
> with ACPI than DT.
> 
> I am just saying, for the umpteenth time, that these levels have no
> architectural meaning _whatsoever_, level is a hierarchy concept
> with no architectural meaning attached.
> 
> The only consistent thing PPTT is bringing about is the hierarchy
> levels/grouping (and _possibly_ - what a package boundary is), let's
> stick to that for the time being.
> 
>> this patch also modifies the default clusters=sockets behavior
>> for ACPI machines to sockets=sockets. DT machines continue to
>> represent sockets as clusters. For ACPI machines, this results in a
>> more normalized view of the topology. Cluster level scheduler decisions
>> are still being made due to the "MC" level in the scheduler which has
>> knowledge of cache sharing domains.
>>
>> This code is loosely based on a combination of code from:
>> Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> John Garry <john.garry@huawei.com>
>> Jeffrey Hugo <jhugo@codeaurora.org>
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/topology.c | 54 +++++++++++++++++++++++++++++++++++++++++++-
>>   include/linux/topology.h     |  1 +
>>   2 files changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 9147e5b6326d..42f3e7f28b2b 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -11,6 +11,7 @@
>>    * for more details.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/arch_topology.h>
>>   #include <linux/cpu.h>
>>   #include <linux/cpumask.h>
>> @@ -22,6 +23,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/sched/topology.h>
>>   #include <linux/slab.h>
>> +#include <linux/smp.h>
>>   #include <linux/string.h>
>>   
>>   #include <asm/cpu.h>
>> @@ -304,6 +306,54 @@ static void __init reset_cpu_topology(void)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Propagate the topology information of the processor_topology_node tree to the
>> + * cpu_topology array.
>> + */
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	u64 is_threaded;
>> +	int cpu;
>> +	int topology_id;
>> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
>> +	const int max_topo = 0xFF;
>> +
>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
>> +		if (topology_id < 0)
>> +			return topology_id;
>> +
>> +		if (is_threaded) {
>> +			cpu_topology[cpu].thread_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> 
> Nit: you can move setup_acpi_cpu_topology() to include/linux/acpi.h,
> provide an empty inline function for the !ACPI case and remove
> this function ACPI ifdeffery.

Yah sure..

> 
>> +			cpu_topology[cpu].core_id   = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> 
> If you want a package id (that's just a package tag to group cores), you
> should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
> (so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
> whatever it represents).

I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I 
misreading it? Which means we need to "pick" a node level to represent 
the physical package if one doesn't exist...



> 
> Lorenzo
> 
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else {
>> +			cpu_topology[cpu].thread_id  = -1;
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +#else
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
>> +	return -EINVAL;
>> +}
>> +#endif
>> +
>>   void __init init_cpu_topology(void)
>>   {
>>   	reset_cpu_topology();
>> @@ -312,6 +362,8 @@ void __init init_cpu_topology(void)
>>   	 * Discard anything that was parsed if we hit an error so we
>>   	 * don't use partial information.
>>   	 */
>> -	if (of_have_populated_dt() && parse_dt_topology())
>> +	if ((!acpi_disabled) && parse_acpi_topology())
>> +		reset_cpu_topology();
>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>   		reset_cpu_topology();
>>   }
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index 4660749a7303..cbf2fb13bf92 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -43,6 +43,7 @@
>>   		if (nr_cpus_node(node))
>>   
>>   int arch_update_cpu_topology(void);
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>>   
>>   /* Conform to ACPI 2.0 SLIT distance definitions */
>>   #define LOCAL_DISTANCE		10
>> -- 
>> 2.13.5
>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-19 16:54       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-19 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I missed the rest of the comment below..


On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>> which terminates its search when it finds an ACPI node flagged
>> as the physical package. If the tree doesn't contain enough
>> levels to represent all of thread/core/cod/package then the package
>> id will be used for the missing levels.
>>
>> Since server/ACPI machines are more likely to be multisocket and NUMA,
> 
> I think this stuff is vague enough already so to start with I would drop
> patch 4 and 5 and stop assuming what machines are more likely to ship
> with ACPI than DT.
> 
> I am just saying, for the umpteenth time, that these levels have no
> architectural meaning _whatsoever_, level is a hierarchy concept
> with no architectural meaning attached.
> 
> The only consistent thing PPTT is bringing about is the hierarchy
> levels/grouping (and _possibly_ - what a package boundary is), let's
> stick to that for the time being.
> 
>> this patch also modifies the default clusters=sockets behavior
>> for ACPI machines to sockets=sockets. DT machines continue to
>> represent sockets as clusters. For ACPI machines, this results in a
>> more normalized view of the topology. Cluster level scheduler decisions
>> are still being made due to the "MC" level in the scheduler which has
>> knowledge of cache sharing domains.
>>
>> This code is loosely based on a combination of code from:
>> Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> John Garry <john.garry@huawei.com>
>> Jeffrey Hugo <jhugo@codeaurora.org>
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/topology.c | 54 +++++++++++++++++++++++++++++++++++++++++++-
>>   include/linux/topology.h     |  1 +
>>   2 files changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 9147e5b6326d..42f3e7f28b2b 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -11,6 +11,7 @@
>>    * for more details.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/arch_topology.h>
>>   #include <linux/cpu.h>
>>   #include <linux/cpumask.h>
>> @@ -22,6 +23,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/sched/topology.h>
>>   #include <linux/slab.h>
>> +#include <linux/smp.h>
>>   #include <linux/string.h>
>>   
>>   #include <asm/cpu.h>
>> @@ -304,6 +306,54 @@ static void __init reset_cpu_topology(void)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Propagate the topology information of the processor_topology_node tree to the
>> + * cpu_topology array.
>> + */
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	u64 is_threaded;
>> +	int cpu;
>> +	int topology_id;
>> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
>> +	const int max_topo = 0xFF;
>> +
>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
>> +		if (topology_id < 0)
>> +			return topology_id;
>> +
>> +		if (is_threaded) {
>> +			cpu_topology[cpu].thread_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> 
> Nit: you can move setup_acpi_cpu_topology() to include/linux/acpi.h,
> provide an empty inline function for the !ACPI case and remove
> this function ACPI ifdeffery.

Yah sure..

> 
>> +			cpu_topology[cpu].core_id   = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> 
> If you want a package id (that's just a package tag to group cores), you
> should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
> (so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
> whatever it represents).

I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I 
misreading it? Which means we need to "pick" a node level to represent 
the physical package if one doesn't exist...



> 
> Lorenzo
> 
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else {
>> +			cpu_topology[cpu].thread_id  = -1;
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +#else
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
>> +	return -EINVAL;
>> +}
>> +#endif
>> +
>>   void __init init_cpu_topology(void)
>>   {
>>   	reset_cpu_topology();
>> @@ -312,6 +362,8 @@ void __init init_cpu_topology(void)
>>   	 * Discard anything that was parsed if we hit an error so we
>>   	 * don't use partial information.
>>   	 */
>> -	if (of_have_populated_dt() && parse_dt_topology())
>> +	if ((!acpi_disabled) && parse_acpi_topology())
>> +		reset_cpu_topology();
>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>   		reset_cpu_topology();
>>   }
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index 4660749a7303..cbf2fb13bf92 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -43,6 +43,7 @@
>>   		if (nr_cpus_node(node))
>>   
>>   int arch_update_cpu_topology(void);
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>>   
>>   /* Conform to ACPI 2.0 SLIT distance definitions */
>>   #define LOCAL_DISTANCE		10
>> -- 
>> 2.13.5
>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-19 16:13       ` Jeremy Linton
@ 2017-10-20  9:14         ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-20  9:14 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
> >On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
> >>Propagate the topology information from the PPTT tree to the
> >>cpu_topology array. We can get the thread id, core_id and
> >>cluster_id by assuming certain levels of the PPTT tree correspond
> >>to those concepts. The package_id is flagged in the tree and can be
> >>found by passing an arbitrary large level to setup_acpi_cpu_topology()
> >>which terminates its search when it finds an ACPI node flagged
> >>as the physical package. If the tree doesn't contain enough
> >>levels to represent all of thread/core/cod/package then the package
> >>id will be used for the missing levels.
> >>
> >>Since server/ACPI machines are more likely to be multisocket and NUMA,
> >
> >I think this stuff is vague enough already so to start with I would drop
> >patch 4 and 5 and stop assuming what machines are more likely to ship
> >with ACPI than DT.
> >
> >I am just saying, for the umpteenth time, that these levels have no
> >architectural meaning _whatsoever_, level is a hierarchy concept
> >with no architectural meaning attached.
> 
> ?
> 
> Did anyone say anything about that? No, I think the only thing being
> guaranteed here is that the kernel's physical_id maps to an ACPI
> defined socket. Which seems to be the mindset of pretty much the
> entire !arm64 community meaning they are optimizing their software
> and the kernel with that concept in mind.
> 
> Are you denying the existence of non-uniformity between threads
> running on different physical sockets?

No, I have not explained my POV clearly, apologies.

AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.

1) thread
2) core
3) package

What I wanted to say is, that, to simplify this series, you do not need
to introduce the COD topology level, since it is just another arbitrary
topology level (ie there is no way you can pinpoint which level
corresponds to COD with PPTT - or DT for the sake of this discussion)
that would not be used in the kernel (apart from big.LITTLE cpufreq
driver and PSCI checker whose usage of topology_physical_package_id() is
questionable anyway).

PPTT allows you to define what level corresponds to a package, use
it to initialize the package topology level (that on ARM internal
variables we call cluster) and be done with it.

I do not think that adding another topology level improves anything as
far as ACPI topology detection is concerned, you are not able to use it
in the scheduler or from userspace to group CPUs anyway.

Does this answer your question ?

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-20  9:14         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-20  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
> >On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
> >>Propagate the topology information from the PPTT tree to the
> >>cpu_topology array. We can get the thread id, core_id and
> >>cluster_id by assuming certain levels of the PPTT tree correspond
> >>to those concepts. The package_id is flagged in the tree and can be
> >>found by passing an arbitrary large level to setup_acpi_cpu_topology()
> >>which terminates its search when it finds an ACPI node flagged
> >>as the physical package. If the tree doesn't contain enough
> >>levels to represent all of thread/core/cod/package then the package
> >>id will be used for the missing levels.
> >>
> >>Since server/ACPI machines are more likely to be multisocket and NUMA,
> >
> >I think this stuff is vague enough already so to start with I would drop
> >patch 4 and 5 and stop assuming what machines are more likely to ship
> >with ACPI than DT.
> >
> >I am just saying, for the umpteenth time, that these levels have no
> >architectural meaning _whatsoever_, level is a hierarchy concept
> >with no architectural meaning attached.
> 
> ?
> 
> Did anyone say anything about that? No, I think the only thing being
> guaranteed here is that the kernel's physical_id maps to an ACPI
> defined socket. Which seems to be the mindset of pretty much the
> entire !arm64 community meaning they are optimizing their software
> and the kernel with that concept in mind.
> 
> Are you denying the existence of non-uniformity between threads
> running on different physical sockets?

No, I have not explained my POV clearly, apologies.

AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.

1) thread
2) core
3) package

What I wanted to say is, that, to simplify this series, you do not need
to introduce the COD topology level, since it is just another arbitrary
topology level (ie there is no way you can pinpoint which level
corresponds to COD with PPTT - or DT for the sake of this discussion)
that would not be used in the kernel (apart from big.LITTLE cpufreq
driver and PSCI checker whose usage of topology_physical_package_id() is
questionable anyway).

PPTT allows you to define what level corresponds to a package, use
it to initialize the package topology level (that on ARM internal
variables we call cluster) and be done with it.

I do not think that adding another topology level improves anything as
far as ACPI topology detection is concerned, you are not able to use it
in the scheduler or from userspace to group CPUs anyway.

Does this answer your question ?

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-19 16:54       ` Jeremy Linton
@ 2017-10-20  9:22         ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-20  9:22 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Thu, Oct 19, 2017 at 11:54:22AM -0500, Jeremy Linton wrote:

[...]

> >>+			cpu_topology[cpu].core_id   = topology_id;
> >>+			topology_id = setup_acpi_cpu_topology(cpu, 2);
> >>+			cpu_topology[cpu].cluster_id = topology_id;
> >>+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> >
> >If you want a package id (that's just a package tag to group cores), you
> >should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
> >(so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
> >whatever it represents).
> 
> I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I
> misreading it? Which means we need to "pick" a node level to
> represent the physical package if one doesn't exist...

The specs define a means to detect if a given PPTT node corresponds to a
package (I am refraining from stating again that to me that's not clean
cut what a package is _architecturally_, I think you know my POV by now)
and that's what you need to use to retrieve a packageid for a given cpu,
if I understand the aim of the physical package flag.

Either that or that flag is completely useless.

Lorenzo

ACPI 6.2 - Table 5-151 (page 248)
Physical package
-----------------
Set to 1 if this node of the processor topology represents the boundary
of a physical package, whether socketed or surface mounted.  Set to 0 if
this instance of the processor topology does not represent the boundary
of a physical package.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-20  9:22         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-20  9:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 19, 2017 at 11:54:22AM -0500, Jeremy Linton wrote:

[...]

> >>+			cpu_topology[cpu].core_id   = topology_id;
> >>+			topology_id = setup_acpi_cpu_topology(cpu, 2);
> >>+			cpu_topology[cpu].cluster_id = topology_id;
> >>+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> >
> >If you want a package id (that's just a package tag to group cores), you
> >should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
> >(so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
> >whatever it represents).
> 
> I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I
> misreading it? Which means we need to "pick" a node level to
> represent the physical package if one doesn't exist...

The specs define a means to detect if a given PPTT node corresponds to a
package (I am refraining from stating again that to me that's not clean
cut what a package is _architecturally_, I think you know my POV by now)
and that's what you need to use to retrieve a packageid for a given cpu,
if I understand the aim of the physical package flag.

Either that or that flag is completely useless.

Lorenzo

ACPI 6.2 - Table 5-151 (page 248)
Physical package
-----------------
Set to 1 if this node of the processor topology represents the boundary
of a physical package, whether socketed or surface mounted.  Set to 0 if
this instance of the processor topology does not represent the boundary
of a physical package.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-19 15:43       ` Jeremy Linton
@ 2017-10-20 10:15         ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-20 10:15 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

On Thu, Oct 19, 2017 at 10:43:46AM -0500, Jeremy Linton wrote:
> On 10/19/2017 05:22 AM, Lorenzo Pieralisi wrote:
> >On Thu, Oct 12, 2017 at 02:48:50PM -0500, Jeremy Linton wrote:
> >>ACPI 6.2 adds a new table, which describes how processing units
> >>are related to each other in tree like fashion. Caches are
> >>also sprinkled throughout the tree and describe the properties
> >>of the caches in relation to other caches and processing units.
> >>
> >>Add the code to parse the cache hierarchy and report the total
> >>number of levels of cache for a given core using
> >>acpi_find_last_cache_level() as well as fill out the individual
> >>cores cache information with cache_setup_acpi() once the
> >>cpu_cacheinfo structure has been populated by the arch specific
> >>code.
> >>
> >>Further, report peers in the topology using setup_acpi_cpu_topology()
> >>to report a unique ID for each processing unit at a given level
> >>in the tree. These unique id's can then be used to match related
> >>processing units which exist as threads, COD (clusters
> >>on die), within a given package, etc.
> >
> >I think this patch should be split ((1) topology (2) cache), it is doing
> >too much which makes it hard to review.
> 
> If you look at the RFC, it only did cache parsing, the topology
> changes were added for v1. The cache bits are the ugly parts because
> they are walking up/down both the node tree, as well as the cache
> tree's attached to the nodes during the walk. Once that was in the
> place the addition of the cpu topology was trivial. But, trying to
> understand the cpu topology without first understanding the weird
> stuff done for the cache topology might not be the right way to
> approach this code.

Topology and cache bindings parsing seem decoupled to me:

cache_setup_acpi(cpu)
setup_acpi_cpu_topology(cpu, level)

I mentioned that because it can simplify review (and merging)
of this series.

> >
> >[...]
> >
> >>+/* determine if the given node is a leaf node */
> >>+static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> >>+			       struct acpi_pptt_processor *node)
> >>+{
> >>+	struct acpi_subtable_header *entry;
> >>+	unsigned long table_end;
> >>+	u32 node_entry;
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+
> >>+	table_end = (unsigned long)table_hdr + table_hdr->length;
> >>+	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> >>+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> >>+						sizeof(struct acpi_table_pptt));
> >>+
> >>+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> >>+		cpu_node = (struct acpi_pptt_processor *)entry;
> >>+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> >>+		    (cpu_node->parent == node_entry))
> >>+			return 0;
> >>+		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> >>+	}
> >
> >A leaf node is a node with a valid acpi_id corresponding to an MADT
> >entry, right ? By the way, is this function really needed ?
> 
> Yes, because the only way to determine if it is a leaf node is to
> see if there are any references to it elsewhere in the table because
> the nodes point towards the root of the tree (rather than the other
> way).

The question is whether we need to know a node is a leaf, see below.

> This piece was the primary change for v1->v2.
> 
> >
> >>+	return 1;
> >>+}
> >>+
> >>+/*
> >>+ * Find the subtable entry describing the provided processor
> >>+ */
> >>+static struct acpi_pptt_processor *acpi_find_processor_node(
> >>+	struct acpi_table_header *table_hdr,
> >>+	u32 acpi_cpu_id)
> >>+{
> >>+	struct acpi_subtable_header *entry;
> >>+	unsigned long table_end;
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+
> >>+	table_end = (unsigned long)table_hdr + table_hdr->length;
> >>+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> >>+						sizeof(struct acpi_table_pptt));
> >>+
> >>+	/* find the processor structure associated with this cpuid */
> >>+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> >>+		cpu_node = (struct acpi_pptt_processor *)entry;
> >>+
> >>+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> >>+		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> >
> >Is the leaf node check necessary ? Or you just need to check the
> >ACPI Processor ID valid flag (as discussed offline) ?
> 
> The valid flag doesn't mean anything for the leaf nodes, so its the
> only correct way of determining if the node _might_ have a valid
> madt/acpi ID. This actually should have the acpi_cpu_id checked as
> part of the if statement and the leaf node check below because doing
> it this way makes this parse n^2 instead of 2n. Of course in my
> mind, checking the id before we know it might be valid is backwards
> of the "logical" way to do it.

Ok, it is not clearly worded in the specs (we can update it though) but I
think the valid flag must be set for leaf nodes, which would make the
leaf node check useless because you just have to match a PPTT node with
a valid ACPI Processor ID.

Lorenzo

> >>+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> >>+				 acpi_cpu_id, cpu_node->acpi_processor_id);
> >
> >Side note: I'd question (some of) these pr_debug() messages
> >
> >>+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> >>+				/* found the correct entry */
> >>+				pr_debug("match found!\n");
> >
> >Like this one for instance.
> 
> This one is a bit redundant, but I come from the school that I want
> to be able to debug a remote machine. Large blocks of silent code
> are a nightmare, particularly if you have a sysadmin level user
> driving the keyboard/etc.
> 
> >
> >>+				return (struct acpi_pptt_processor *)entry;
> >>+			}
> >>+		}
> >>+
> >>+		if (entry->length == 0) {
> >>+			pr_err("Invalid zero length subtable\n");
> >>+			break;
> >>+		}
> >
> >This should be moved at the beginning of the loop.
> 
> Yah, the intention was to verify the next entry, but if its 0 then
> good point, the current one is probably invalid.
> 
> >
> >>+		entry = (struct acpi_subtable_header *)
> >>+			((u8 *)entry + entry->length);
> >>+	}
> >>+
> >>+	return NULL;
> >>+}
> >>+
> >>+/*
> >>+ * Given a acpi_pptt_processor node, walk up until we identify the
> >>+ * package that the node is associated with or we run out of levels
> >>+ * to request.
> >>+ */
> >>+static struct acpi_pptt_processor *acpi_find_processor_package_id(
> >>+	struct acpi_table_header *table_hdr,
> >>+	struct acpi_pptt_processor *cpu,
> >>+	int level)
> >>+{
> >>+	struct acpi_pptt_processor *prev_node;
> >>+
> >>+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> >
> >I really do not understand what ACPI_PPTT_PHYSICAL_PACKAGE means and
> >more importantly, how it is actually used in this code.
> 
> ?
> 
> Physical package maps to the package_id, which is generally defined
> to mean the "socket" and is used to terminate the cpu topology side
> of the parse.
> 
> >
> >This function is used to get a topology id (that is just a number for
> >a given topology level) for a given level starting from a given leaf
> >node.
> 
> This flag is the one decent part of the spec, because its the only
> level which actually is guaranteed to mean anything. Because the
> requirement that the sharability of cache nodes is described with
> general processor nodes it means that the number of nodes within a
> given leg of the tree is mostly meaningless because people sprinkle
> caches around the system, including potentially above the "socket"
> level.
> 
> >Why do we care at all about ACPI_PPTT_PHYSICAL_PACKAGE ?
> 
> Because, it gives us a hard mapping to core siblings.
> 
> >
> >>+		pr_debug("level %d\n", level);
> >>+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> >>+		if (prev_node == NULL)
> >>+			break;
> >>+		cpu = prev_node;
> >>+		level--;
> >>+	}
> >>+	return cpu;
> >>+}
> >>+
> >>+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> >>+{
> >>+	int number_of_levels = 0;
> >>+	struct acpi_pptt_processor *cpu;
> >>+
> >>+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> >>+	if (cpu)
> >>+		number_of_levels = acpi_process_node(table_hdr, cpu);
> >>+
> >>+	return number_of_levels;
> >>+}
> >>+
> >>+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> >>+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> >>+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> >>+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> >>+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> >>+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> >>+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> >>+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> >>+
> >>+static u8 acpi_cache_type(enum cache_type type)
> >>+{
> >>+	switch (type) {
> >>+	case CACHE_TYPE_DATA:
> >>+		pr_debug("Looking for data cache\n");
> >>+		return ACPI_6_2_CACHE_TYPE_DATA;
> >>+	case CACHE_TYPE_INST:
> >>+		pr_debug("Looking for instruction cache\n");
> >>+		return ACPI_6_2_CACHE_TYPE_INSTR;
> >>+	default:
> >>+		pr_debug("Unknown cache type, assume unified\n");
> >>+	case CACHE_TYPE_UNIFIED:
> >>+		pr_debug("Looking for unified cache\n");
> >>+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> >>+	}
> >>+}
> >>+
> >>+/* find the ACPI node describing the cache type/level for the given CPU */
> >>+static struct acpi_pptt_cache *acpi_find_cache_node(
> >>+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> >>+	enum cache_type type, unsigned int level,
> >>+	struct acpi_pptt_processor **node)
> >>+{
> >>+	int total_levels = 0;
> >>+	struct acpi_pptt_cache *found = NULL;
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+	u8 acpi_type = acpi_cache_type(type);
> >>+
> >>+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> >>+		 acpi_cpu_id, level, acpi_type);
> >>+
> >>+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> >>+	if (!cpu_node)
> >>+		return NULL;
> >>+
> >>+	do {
> >>+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> >>+		*node = cpu_node;
> >>+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> >>+	} while ((cpu_node) && (!found));
> >>+
> >>+	return found;
> >>+}
> >>+
> >>+int acpi_find_last_cache_level(unsigned int cpu)
> >>+{
> >>+	u32 acpi_cpu_id;
> >>+	struct acpi_table_header *table;
> >>+	int number_of_levels = 0;
> >>+	acpi_status status;
> >>+
> >>+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> >>+
> >>+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> >
> >This would break !ARM64.
> 
> >
> >>+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> >>+	if (ACPI_FAILURE(status)) {
> >>+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> 
> Yup, as in a way this does too... Without writing the binding code
> for another arch where that line is isn't clear at the moment. Part
> of the reason I put this in the arm64 directory.
> 
> 
> >>+	} else {
> >>+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> >>+		acpi_put_table(table);
> >>+	}
> >>+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> >>+
> >>+	return number_of_levels;
> >>+}
> >>+
> >>+/*
> >>+ * The ACPI spec implies that the fields in the cache structures are used to
> >>+ * extend and correct the information probed from the hardware. In the case
> >>+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> >>+ */
> >>+static void update_cache_properties(struct cacheinfo *this_leaf,
> >>+				    struct acpi_pptt_cache *found_cache,
> >>+				    struct acpi_pptt_processor *cpu_node)
> >>+{
> >>+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> >>+		this_leaf->size = found_cache->size;
> >>+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> >>+		this_leaf->coherency_line_size = found_cache->line_size;
> >>+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> >>+		this_leaf->number_of_sets = found_cache->number_of_sets;
> >>+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> >>+		this_leaf->ways_of_associativity = found_cache->associativity;
> >>+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> >>+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> >>+		case ACPI_6_2_CACHE_POLICY_WT:
> >>+			this_leaf->attributes = CACHE_WRITE_THROUGH;
> >>+			break;
> >>+		case ACPI_6_2_CACHE_POLICY_WB:
> >>+			this_leaf->attributes = CACHE_WRITE_BACK;
> >>+			break;
> >>+		default:
> >>+			pr_err("Unknown ACPI cache policy %d\n",
> >>+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> >>+		}
> >>+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> >>+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> >>+		case ACPI_6_2_CACHE_READ_ALLOCATE:
> >>+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> >>+			break;
> >>+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> >>+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> >>+			break;
> >>+		case ACPI_6_2_CACHE_RW_ALLOCATE:
> >>+			this_leaf->attributes |=
> >>+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> >>+			break;
> >>+		default:
> >>+			pr_err("Unknown ACPI cache allocation policy %d\n",
> >>+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> >>+		}
> >>+}
> >>+
> >>+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> >>+				 unsigned int cpu)
> >>+{
> >>+	struct acpi_pptt_cache *found_cache;
> >>+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> >>+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> >
> >Ditto.
> >
> >>+	struct cacheinfo *this_leaf;
> >>+	unsigned int index = 0;
> >>+	struct acpi_pptt_processor *cpu_node = NULL;
> >>+
> >>+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> >>+		this_leaf = this_cpu_ci->info_list + index;
> >>+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> >>+						   this_leaf->type,
> >>+						   this_leaf->level,
> >>+						   &cpu_node);
> >>+		pr_debug("found = %p %p\n", found_cache, cpu_node);
> >>+		if (found_cache)
> >>+			update_cache_properties(this_leaf,
> >>+						found_cache,
> >>+						cpu_node);
> >>+
> >>+		index++;
> >>+	}
> >>+}
> >>+
> >>+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> >>+				    unsigned int cpu, int level)
> >>+{
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> >
> >Ditto.
> >
> >>+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> >>+	if (cpu_node) {
> >>+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> >
> >If level is 0 there is nothing to do here.
> >
> >>+		/* Only the first level has a guaranteed id */
> >>+		if (level == 0)
> >>+			return cpu_node->acpi_processor_id;
> >>+		return (int)((u8 *)cpu_node - (u8 *)table);
> >
> >Please explain to me the rationale behind this. To me acpi_processor_id
> >is as good as the cpu_node offset in the table to describe the topology
> >id at a given level, why special case level 0.
> 
> Level 0 is the only level guaranteed to have something set in the
> acpi_processor_id field. Its possible that values exist in nodes
> above this one, but they must _all_ be flagged and have matching
> container ids, and nothing in the spec requires that. Meaning that
> we need a guaranteed way to generate ids. This was added between
> v2->v3 after the discussion about making the ids a little nicer for
> the user.
> 
> 
> >
> >On top of that, with this ID scheme, we would end up with
> >thread/core/cluster id potentially being non-sequential values
> >(depending on the PPTT table layout) which should not be a problem but
> >we'd better check how people are using them.
> 
> The thread (or core, depending on which is the 0 level) will have
> firmware provided Ids, everything else gets somewhat random looking
> but consistent ids. I commented earlier in this series that
> "normalizing" them is totally doable, although at the moment really
> only the physical_id is user visible and that should probably be
> normalized outside of this module in the arm64 topology parser if we
> want to actually do it. I'm not sure its worth the effort at least
> not as part of the general PPTT changes.
> 
> 
> >
> >>+	}
> >>+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> >>+		    cpu);
> >>+	return -ENOENT;
> >>+}
> >>+
> >>+/*
> >>+ * simply assign a ACPI cache entry to each known CPU cache entry
> >>+ * determining which entries are shared is done later.
> >
> >Add a kerneldoc style comment for an external interface.
> 
> That is a good point.
> 
> >
> >>+ */
> >>+int cache_setup_acpi(unsigned int cpu)
> >>+{
> >>+	struct acpi_table_header *table;
> >>+	acpi_status status;
> >>+
> >>+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> >>+
> >>+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> >>+	if (ACPI_FAILURE(status)) {
> >>+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> >>+		return -ENOENT;
> >>+	}
> >>+
> >>+	cache_setup_acpi_cpu(table, cpu);
> >>+	acpi_put_table(table);
> >>+
> >>+	return status;
> >>+}
> >>+
> >>+/*
> >>+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> >>+ * This ID can then be used to group peers.
> >
> >Ditto.
> >
> >>+ */
> >>+int setup_acpi_cpu_topology(unsigned int cpu, int level)
> >>+{
> >>+	struct acpi_table_header *table;
> >>+	acpi_status status;
> >>+	int retval;
> >>+
> >>+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> >>+	if (ACPI_FAILURE(status)) {
> >>+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
> >>+		return -ENOENT;
> >>+	}
> >>+	retval = topology_setup_acpi_cpu(table, cpu, level);
> >>+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> >>+		 cpu, level, retval);
> >>+	acpi_put_table(table);
> >>+
> >>+	return retval;
> >
> >This value is just a token - with no HW meaning whatsoever and that's
> >where I question the ACPI_PPTT_PHYSICAL_PACKAGE flag usage in retrieving
> >it, you are not looking for a packageid (which has no meaning whatsoever
> >anyway and I wonder why it was added to the specs at all) you are
> >looking for an id at a given level.
> 
> If you look at the next patch in the series, to get the top level I
> pass an arbitrary large value as the "level" which should terminate
> on the PHYSICAL_PACKAGE rather than any intermediate nodes.
> 
> 
> >
> >I will comment on the cache code separately - which deserves to
> >be in a separate patch to simplify the review, I avoided repeating
> >already reported review comments.
> >
> >Lorenzo
> >
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-20 10:15         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-20 10:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 19, 2017 at 10:43:46AM -0500, Jeremy Linton wrote:
> On 10/19/2017 05:22 AM, Lorenzo Pieralisi wrote:
> >On Thu, Oct 12, 2017 at 02:48:50PM -0500, Jeremy Linton wrote:
> >>ACPI 6.2 adds a new table, which describes how processing units
> >>are related to each other in tree like fashion. Caches are
> >>also sprinkled throughout the tree and describe the properties
> >>of the caches in relation to other caches and processing units.
> >>
> >>Add the code to parse the cache hierarchy and report the total
> >>number of levels of cache for a given core using
> >>acpi_find_last_cache_level() as well as fill out the individual
> >>cores cache information with cache_setup_acpi() once the
> >>cpu_cacheinfo structure has been populated by the arch specific
> >>code.
> >>
> >>Further, report peers in the topology using setup_acpi_cpu_topology()
> >>to report a unique ID for each processing unit at a given level
> >>in the tree. These unique id's can then be used to match related
> >>processing units which exist as threads, COD (clusters
> >>on die), within a given package, etc.
> >
> >I think this patch should be split ((1) topology (2) cache), it is doing
> >too much which makes it hard to review.
> 
> If you look at the RFC, it only did cache parsing, the topology
> changes were added for v1. The cache bits are the ugly parts because
> they are walking up/down both the node tree, as well as the cache
> tree's attached to the nodes during the walk. Once that was in the
> place the addition of the cpu topology was trivial. But, trying to
> understand the cpu topology without first understanding the weird
> stuff done for the cache topology might not be the right way to
> approach this code.

Topology and cache bindings parsing seem decoupled to me:

cache_setup_acpi(cpu)
setup_acpi_cpu_topology(cpu, level)

I mentioned that because it can simplify review (and merging)
of this series.

> >
> >[...]
> >
> >>+/* determine if the given node is a leaf node */
> >>+static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> >>+			       struct acpi_pptt_processor *node)
> >>+{
> >>+	struct acpi_subtable_header *entry;
> >>+	unsigned long table_end;
> >>+	u32 node_entry;
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+
> >>+	table_end = (unsigned long)table_hdr + table_hdr->length;
> >>+	node_entry = (u32)((u8 *)node - (u8 *)table_hdr);
> >>+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> >>+						sizeof(struct acpi_table_pptt));
> >>+
> >>+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> >>+		cpu_node = (struct acpi_pptt_processor *)entry;
> >>+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> >>+		    (cpu_node->parent == node_entry))
> >>+			return 0;
> >>+		entry = (struct acpi_subtable_header *)((u8 *)entry + entry->length);
> >>+	}
> >
> >A leaf node is a node with a valid acpi_id corresponding to an MADT
> >entry, right ? By the way, is this function really needed ?
> 
> Yes, because the only way to determine if it is a leaf node is to
> see if there are any references to it elsewhere in the table because
> the nodes point towards the root of the tree (rather than the other
> way).

The question is whether we need to know a node is a leaf, see below.

> This piece was the primary change for v1->v2.
> 
> >
> >>+	return 1;
> >>+}
> >>+
> >>+/*
> >>+ * Find the subtable entry describing the provided processor
> >>+ */
> >>+static struct acpi_pptt_processor *acpi_find_processor_node(
> >>+	struct acpi_table_header *table_hdr,
> >>+	u32 acpi_cpu_id)
> >>+{
> >>+	struct acpi_subtable_header *entry;
> >>+	unsigned long table_end;
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+
> >>+	table_end = (unsigned long)table_hdr + table_hdr->length;
> >>+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
> >>+						sizeof(struct acpi_table_pptt));
> >>+
> >>+	/* find the processor structure associated with this cpuid */
> >>+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
> >>+		cpu_node = (struct acpi_pptt_processor *)entry;
> >>+
> >>+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> >>+		    acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> >
> >Is the leaf node check necessary ? Or you just need to check the
> >ACPI Processor ID valid flag (as discussed offline) ?
> 
> The valid flag doesn't mean anything for the leaf nodes, so its the
> only correct way of determining if the node _might_ have a valid
> madt/acpi ID. This actually should have the acpi_cpu_id checked as
> part of the if statement and the leaf node check below because doing
> it this way makes this parse n^2 instead of 2n. Of course in my
> mind, checking the id before we know it might be valid is backwards
> of the "logical" way to do it.

Ok, it is not clearly worded in the specs (we can update it though) but I
think the valid flag must be set for leaf nodes, which would make the
leaf node check useless because you just have to match a PPTT node with
a valid ACPI Processor ID.

Lorenzo

> >>+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
> >>+				 acpi_cpu_id, cpu_node->acpi_processor_id);
> >
> >Side note: I'd question (some of) these pr_debug() messages
> >
> >>+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
> >>+				/* found the correct entry */
> >>+				pr_debug("match found!\n");
> >
> >Like this one for instance.
> 
> This one is a bit redundant, but I come from the school that I want
> to be able to debug a remote machine. Large blocks of silent code
> are a nightmare, particularly if you have a sysadmin level user
> driving the keyboard/etc.
> 
> >
> >>+				return (struct acpi_pptt_processor *)entry;
> >>+			}
> >>+		}
> >>+
> >>+		if (entry->length == 0) {
> >>+			pr_err("Invalid zero length subtable\n");
> >>+			break;
> >>+		}
> >
> >This should be moved at the beginning of the loop.
> 
> Yah, the intention was to verify the next entry, but if its 0 then
> good point, the current one is probably invalid.
> 
> >
> >>+		entry = (struct acpi_subtable_header *)
> >>+			((u8 *)entry + entry->length);
> >>+	}
> >>+
> >>+	return NULL;
> >>+}
> >>+
> >>+/*
> >>+ * Given a acpi_pptt_processor node, walk up until we identify the
> >>+ * package that the node is associated with or we run out of levels
> >>+ * to request.
> >>+ */
> >>+static struct acpi_pptt_processor *acpi_find_processor_package_id(
> >>+	struct acpi_table_header *table_hdr,
> >>+	struct acpi_pptt_processor *cpu,
> >>+	int level)
> >>+{
> >>+	struct acpi_pptt_processor *prev_node;
> >>+
> >>+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
> >
> >I really do not understand what ACPI_PPTT_PHYSICAL_PACKAGE means and
> >more importantly, how it is actually used in this code.
> 
> ?
> 
> Physical package maps to the package_id, which is generally defined
> to mean the "socket" and is used to terminate the cpu topology side
> of the parse.
> 
> >
> >This function is used to get a topology id (that is just a number for
> >a given topology level) for a given level starting from a given leaf
> >node.
> 
> This flag is the one decent part of the spec, because its the only
> level which actually is guaranteed to mean anything. Because the
> requirement that the sharability of cache nodes is described with
> general processor nodes it means that the number of nodes within a
> given leg of the tree is mostly meaningless because people sprinkle
> caches around the system, including potentially above the "socket"
> level.
> 
> >Why do we care at all about ACPI_PPTT_PHYSICAL_PACKAGE ?
> 
> Because, it gives us a hard mapping to core siblings.
> 
> >
> >>+		pr_debug("level %d\n", level);
> >>+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
> >>+		if (prev_node == NULL)
> >>+			break;
> >>+		cpu = prev_node;
> >>+		level--;
> >>+	}
> >>+	return cpu;
> >>+}
> >>+
> >>+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
> >>+{
> >>+	int number_of_levels = 0;
> >>+	struct acpi_pptt_processor *cpu;
> >>+
> >>+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> >>+	if (cpu)
> >>+		number_of_levels = acpi_process_node(table_hdr, cpu);
> >>+
> >>+	return number_of_levels;
> >>+}
> >>+
> >>+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
> >>+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
> >>+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
> >>+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
> >>+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
> >>+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
> >>+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
> >>+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
> >>+
> >>+static u8 acpi_cache_type(enum cache_type type)
> >>+{
> >>+	switch (type) {
> >>+	case CACHE_TYPE_DATA:
> >>+		pr_debug("Looking for data cache\n");
> >>+		return ACPI_6_2_CACHE_TYPE_DATA;
> >>+	case CACHE_TYPE_INST:
> >>+		pr_debug("Looking for instruction cache\n");
> >>+		return ACPI_6_2_CACHE_TYPE_INSTR;
> >>+	default:
> >>+		pr_debug("Unknown cache type, assume unified\n");
> >>+	case CACHE_TYPE_UNIFIED:
> >>+		pr_debug("Looking for unified cache\n");
> >>+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
> >>+	}
> >>+}
> >>+
> >>+/* find the ACPI node describing the cache type/level for the given CPU */
> >>+static struct acpi_pptt_cache *acpi_find_cache_node(
> >>+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> >>+	enum cache_type type, unsigned int level,
> >>+	struct acpi_pptt_processor **node)
> >>+{
> >>+	int total_levels = 0;
> >>+	struct acpi_pptt_cache *found = NULL;
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+	u8 acpi_type = acpi_cache_type(type);
> >>+
> >>+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> >>+		 acpi_cpu_id, level, acpi_type);
> >>+
> >>+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> >>+	if (!cpu_node)
> >>+		return NULL;
> >>+
> >>+	do {
> >>+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
> >>+		*node = cpu_node;
> >>+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> >>+	} while ((cpu_node) && (!found));
> >>+
> >>+	return found;
> >>+}
> >>+
> >>+int acpi_find_last_cache_level(unsigned int cpu)
> >>+{
> >>+	u32 acpi_cpu_id;
> >>+	struct acpi_table_header *table;
> >>+	int number_of_levels = 0;
> >>+	acpi_status status;
> >>+
> >>+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> >>+
> >>+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> >
> >This would break !ARM64.
> 
> >
> >>+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> >>+	if (ACPI_FAILURE(status)) {
> >>+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> 
> Yup, as in a way this does too... Without writing the binding code
> for another arch where that line is isn't clear at the moment. Part
> of the reason I put this in the arm64 directory.
> 
> 
> >>+	} else {
> >>+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
> >>+		acpi_put_table(table);
> >>+	}
> >>+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> >>+
> >>+	return number_of_levels;
> >>+}
> >>+
> >>+/*
> >>+ * The ACPI spec implies that the fields in the cache structures are used to
> >>+ * extend and correct the information probed from the hardware. In the case
> >>+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> >>+ */
> >>+static void update_cache_properties(struct cacheinfo *this_leaf,
> >>+				    struct acpi_pptt_cache *found_cache,
> >>+				    struct acpi_pptt_processor *cpu_node)
> >>+{
> >>+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
> >>+		this_leaf->size = found_cache->size;
> >>+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
> >>+		this_leaf->coherency_line_size = found_cache->line_size;
> >>+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
> >>+		this_leaf->number_of_sets = found_cache->number_of_sets;
> >>+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
> >>+		this_leaf->ways_of_associativity = found_cache->associativity;
> >>+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
> >>+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> >>+		case ACPI_6_2_CACHE_POLICY_WT:
> >>+			this_leaf->attributes = CACHE_WRITE_THROUGH;
> >>+			break;
> >>+		case ACPI_6_2_CACHE_POLICY_WB:
> >>+			this_leaf->attributes = CACHE_WRITE_BACK;
> >>+			break;
> >>+		default:
> >>+			pr_err("Unknown ACPI cache policy %d\n",
> >>+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
> >>+		}
> >>+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
> >>+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> >>+		case ACPI_6_2_CACHE_READ_ALLOCATE:
> >>+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> >>+			break;
> >>+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
> >>+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> >>+			break;
> >>+		case ACPI_6_2_CACHE_RW_ALLOCATE:
> >>+			this_leaf->attributes |=
> >>+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
> >>+			break;
> >>+		default:
> >>+			pr_err("Unknown ACPI cache allocation policy %d\n",
> >>+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
> >>+		}
> >>+}
> >>+
> >>+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> >>+				 unsigned int cpu)
> >>+{
> >>+	struct acpi_pptt_cache *found_cache;
> >>+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> >>+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> >
> >Ditto.
> >
> >>+	struct cacheinfo *this_leaf;
> >>+	unsigned int index = 0;
> >>+	struct acpi_pptt_processor *cpu_node = NULL;
> >>+
> >>+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> >>+		this_leaf = this_cpu_ci->info_list + index;
> >>+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> >>+						   this_leaf->type,
> >>+						   this_leaf->level,
> >>+						   &cpu_node);
> >>+		pr_debug("found = %p %p\n", found_cache, cpu_node);
> >>+		if (found_cache)
> >>+			update_cache_properties(this_leaf,
> >>+						found_cache,
> >>+						cpu_node);
> >>+
> >>+		index++;
> >>+	}
> >>+}
> >>+
> >>+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> >>+				    unsigned int cpu, int level)
> >>+{
> >>+	struct acpi_pptt_processor *cpu_node;
> >>+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> >
> >Ditto.
> >
> >>+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> >>+	if (cpu_node) {
> >>+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> >
> >If level is 0 there is nothing to do here.
> >
> >>+		/* Only the first level has a guaranteed id */
> >>+		if (level == 0)
> >>+			return cpu_node->acpi_processor_id;
> >>+		return (int)((u8 *)cpu_node - (u8 *)table);
> >
> >Please explain to me the rationale behind this. To me acpi_processor_id
> >is as good as the cpu_node offset in the table to describe the topology
> >id at a given level, why special case level 0.
> 
> Level 0 is the only level guaranteed to have something set in the
> acpi_processor_id field. Its possible that values exist in nodes
> above this one, but they must _all_ be flagged and have matching
> container ids, and nothing in the spec requires that. Meaning that
> we need a guaranteed way to generate ids. This was added between
> v2->v3 after the discussion about making the ids a little nicer for
> the user.
> 
> 
> >
> >On top of that, with this ID scheme, we would end up with
> >thread/core/cluster id potentially being non-sequential values
> >(depending on the PPTT table layout) which should not be a problem but
> >we'd better check how people are using them.
> 
> The thread (or core, depending on which is the 0 level) will have
> firmware provided Ids, everything else gets somewhat random looking
> but consistent ids. I commented earlier in this series that
> "normalizing" them is totally doable, although at the moment really
> only the physical_id is user visible and that should probably be
> normalized outside of this module in the arm64 topology parser if we
> want to actually do it. I'm not sure its worth the effort at least
> not as part of the general PPTT changes.
> 
> 
> >
> >>+	}
> >>+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> >>+		    cpu);
> >>+	return -ENOENT;
> >>+}
> >>+
> >>+/*
> >>+ * simply assign a ACPI cache entry to each known CPU cache entry
> >>+ * determining which entries are shared is done later.
> >
> >Add a kerneldoc style comment for an external interface.
> 
> That is a good point.
> 
> >
> >>+ */
> >>+int cache_setup_acpi(unsigned int cpu)
> >>+{
> >>+	struct acpi_table_header *table;
> >>+	acpi_status status;
> >>+
> >>+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
> >>+
> >>+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> >>+	if (ACPI_FAILURE(status)) {
> >>+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> >>+		return -ENOENT;
> >>+	}
> >>+
> >>+	cache_setup_acpi_cpu(table, cpu);
> >>+	acpi_put_table(table);
> >>+
> >>+	return status;
> >>+}
> >>+
> >>+/*
> >>+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
> >>+ * This ID can then be used to group peers.
> >
> >Ditto.
> >
> >>+ */
> >>+int setup_acpi_cpu_topology(unsigned int cpu, int level)
> >>+{
> >>+	struct acpi_table_header *table;
> >>+	acpi_status status;
> >>+	int retval;
> >>+
> >>+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> >>+	if (ACPI_FAILURE(status)) {
> >>+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
> >>+		return -ENOENT;
> >>+	}
> >>+	retval = topology_setup_acpi_cpu(table, cpu, level);
> >>+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
> >>+		 cpu, level, retval);
> >>+	acpi_put_table(table);
> >>+
> >>+	return retval;
> >
> >This value is just a token - with no HW meaning whatsoever and that's
> >where I question the ACPI_PPTT_PHYSICAL_PACKAGE flag usage in retrieving
> >it, you are not looking for a packageid (which has no meaning whatsoever
> >anyway and I wonder why it was added to the specs at all) you are
> >looking for an id at a given level.
> 
> If you look at the next patch in the series, to get the top level I
> pass an arbitrary large value as the "level" which should terminate
> on the PHYSICAL_PACKAGE rather than any intermediate nodes.
> 
> 
> >
> >I will comment on the cache code separately - which deserves to
> >be in a separate patch to simplify the review, I avoided repeating
> >already reported review comments.
> >
> >Lorenzo
> >
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-20  9:14         ` Lorenzo Pieralisi
@ 2017-10-20 16:14           ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-20 16:14 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc

Hi,

On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>>> which terminates its search when it finds an ACPI node flagged
>>>> as the physical package. If the tree doesn't contain enough
>>>> levels to represent all of thread/core/cod/package then the package
>>>> id will be used for the missing levels.
>>>>
>>>> Since server/ACPI machines are more likely to be multisocket and NUMA,
>>>
>>> I think this stuff is vague enough already so to start with I would drop
>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>> with ACPI than DT.
>>>
>>> I am just saying, for the umpteenth time, that these levels have no
>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>> with no architectural meaning attached.
>>
>> ?
>>
>> Did anyone say anything about that? No, I think the only thing being
>> guaranteed here is that the kernel's physical_id maps to an ACPI
>> defined socket. Which seems to be the mindset of pretty much the
>> entire !arm64 community meaning they are optimizing their software
>> and the kernel with that concept in mind.
>>
>> Are you denying the existence of non-uniformity between threads
>> running on different physical sockets?
> 
> No, I have not explained my POV clearly, apologies.
> 
> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
> 
> 1) thread
> 2) core
> 3) package
> 
> What I wanted to say is, that, to simplify this series, you do not need
> to introduce the COD topology level, since it is just another arbitrary
> topology level (ie there is no way you can pinpoint which level
> corresponds to COD with PPTT - or DT for the sake of this discussion)
> that would not be used in the kernel (apart from big.LITTLE cpufreq
> driver and PSCI checker whose usage of topology_physical_package_id() is
> questionable anyway).

Oh! But, i'm at a loss as to what to do with those two users if I set 
the node which has the physical socket flag set, as the "cluster_id" in 
the topology.

Granted, this being ACPI I don't expect the cpufreq driver to be active 
(given CPPC) and the psci checker might be ignored? Even so, its a bit 
of a misnomer what is actually happening. Are we good with this?


> 
> PPTT allows you to define what level corresponds to a package, use
> it to initialize the package topology level (that on ARM internal
> variables we call cluster) and be done with it.
> 
> I do not think that adding another topology level improves anything as
> far as ACPI topology detection is concerned, you are not able to use it
> in the scheduler or from userspace to group CPUs anyway.

Correct, and AFAIK after having poked a bit at the scheduler its sort of 
redundant as the generic cache sharing levels are more useful anyway.

> 
> Does this answer your question ?
Yes, other than what to do with the two drivers.

> 
> Thanks,
> Lorenzo
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-20 16:14           ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-20 16:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>>> which terminates its search when it finds an ACPI node flagged
>>>> as the physical package. If the tree doesn't contain enough
>>>> levels to represent all of thread/core/cod/package then the package
>>>> id will be used for the missing levels.
>>>>
>>>> Since server/ACPI machines are more likely to be multisocket and NUMA,
>>>
>>> I think this stuff is vague enough already so to start with I would drop
>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>> with ACPI than DT.
>>>
>>> I am just saying, for the umpteenth time, that these levels have no
>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>> with no architectural meaning attached.
>>
>> ?
>>
>> Did anyone say anything about that? No, I think the only thing being
>> guaranteed here is that the kernel's physical_id maps to an ACPI
>> defined socket. Which seems to be the mindset of pretty much the
>> entire !arm64 community meaning they are optimizing their software
>> and the kernel with that concept in mind.
>>
>> Are you denying the existence of non-uniformity between threads
>> running on different physical sockets?
> 
> No, I have not explained my POV clearly, apologies.
> 
> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
> 
> 1) thread
> 2) core
> 3) package
> 
> What I wanted to say is, that, to simplify this series, you do not need
> to introduce the COD topology level, since it is just another arbitrary
> topology level (ie there is no way you can pinpoint which level
> corresponds to COD with PPTT - or DT for the sake of this discussion)
> that would not be used in the kernel (apart from big.LITTLE cpufreq
> driver and PSCI checker whose usage of topology_physical_package_id() is
> questionable anyway).

Oh! But, i'm at a loss as to what to do with those two users if I set 
the node which has the physical socket flag set, as the "cluster_id" in 
the topology.

Granted, this being ACPI I don't expect the cpufreq driver to be active 
(given CPPC) and the psci checker might be ignored? Even so, its a bit 
of a misnomer what is actually happening. Are we good with this?


> 
> PPTT allows you to define what level corresponds to a package, use
> it to initialize the package topology level (that on ARM internal
> variables we call cluster) and be done with it.
> 
> I do not think that adding another topology level improves anything as
> far as ACPI topology detection is concerned, you are not able to use it
> in the scheduler or from userspace to group CPUs anyway.

Correct, and AFAIK after having poked a bit at the scheduler its sort of 
redundant as the generic cache sharing levels are more useful anyway.

> 
> Does this answer your question ?
Yes, other than what to do with the two drivers.

> 
> Thanks,
> Lorenzo
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-20 16:14           ` Jeremy Linton
@ 2017-10-20 16:42             ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2017-10-20 16:42 UTC (permalink / raw)
  To: Jeremy Linton, Lorenzo Pieralisi
  Cc: Sudeep Holla, linux-acpi, linux-arm-kernel, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc



On 20/10/17 17:14, Jeremy Linton wrote:
> Hi,
> 
> On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
>> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>>> Propagate the topology information from the PPTT tree to the
>>>>> cpu_topology array. We can get the thread id, core_id and
>>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>>> to those concepts. The package_id is flagged in the tree and can be
>>>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>>>> which terminates its search when it finds an ACPI node flagged
>>>>> as the physical package. If the tree doesn't contain enough
>>>>> levels to represent all of thread/core/cod/package then the package
>>>>> id will be used for the missing levels.
>>>>>
>>>>> Since server/ACPI machines are more likely to be multisocket and NUMA,
>>>>
>>>> I think this stuff is vague enough already so to start with I would
>>>> drop
>>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>>> with ACPI than DT.
>>>>
>>>> I am just saying, for the umpteenth time, that these levels have no
>>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>>> with no architectural meaning attached.
>>>
>>> ?
>>>
>>> Did anyone say anything about that? No, I think the only thing being
>>> guaranteed here is that the kernel's physical_id maps to an ACPI
>>> defined socket. Which seems to be the mindset of pretty much the
>>> entire !arm64 community meaning they are optimizing their software
>>> and the kernel with that concept in mind.
>>>
>>> Are you denying the existence of non-uniformity between threads
>>> running on different physical sockets?
>>
>> No, I have not explained my POV clearly, apologies.
>>
>> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
>>
>> 1) thread
>> 2) core
>> 3) package
>>
>> What I wanted to say is, that, to simplify this series, you do not need
>> to introduce the COD topology level, since it is just another arbitrary
>> topology level (ie there is no way you can pinpoint which level
>> corresponds to COD with PPTT - or DT for the sake of this discussion)
>> that would not be used in the kernel (apart from big.LITTLE cpufreq
>> driver and PSCI checker whose usage of topology_physical_package_id() is
>> questionable anyway).

Just thinking out loud here.

1. psci_checker.c : it's just used to get groups of cpu's to achieve
   deeper idle states. It should be easy to get rid of that.
2. big.LITTLE cpufreq : 2 users, scpi I should be able to do what I did
   for SCMI and for spc I am thinking if we can hard code it's just used
   on TC2

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-20 16:42             ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2017-10-20 16:42 UTC (permalink / raw)
  To: linux-arm-kernel



On 20/10/17 17:14, Jeremy Linton wrote:
> Hi,
> 
> On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
>> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>>> Propagate the topology information from the PPTT tree to the
>>>>> cpu_topology array. We can get the thread id, core_id and
>>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>>> to those concepts. The package_id is flagged in the tree and can be
>>>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>>>> which terminates its search when it finds an ACPI node flagged
>>>>> as the physical package. If the tree doesn't contain enough
>>>>> levels to represent all of thread/core/cod/package then the package
>>>>> id will be used for the missing levels.
>>>>>
>>>>> Since server/ACPI machines are more likely to be multisocket and NUMA,
>>>>
>>>> I think this stuff is vague enough already so to start with I would
>>>> drop
>>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>>> with ACPI than DT.
>>>>
>>>> I am just saying, for the umpteenth time, that these levels have no
>>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>>> with no architectural meaning attached.
>>>
>>> ?
>>>
>>> Did anyone say anything about that? No, I think the only thing being
>>> guaranteed here is that the kernel's physical_id maps to an ACPI
>>> defined socket. Which seems to be the mindset of pretty much the
>>> entire !arm64 community meaning they are optimizing their software
>>> and the kernel with that concept in mind.
>>>
>>> Are you denying the existence of non-uniformity between threads
>>> running on different physical sockets?
>>
>> No, I have not explained my POV clearly, apologies.
>>
>> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
>>
>> 1) thread
>> 2) core
>> 3) package
>>
>> What I wanted to say is, that, to simplify this series, you do not need
>> to introduce the COD topology level, since it is just another arbitrary
>> topology level (ie there is no way you can pinpoint which level
>> corresponds to COD with PPTT - or DT for the sake of this discussion)
>> that would not be used in the kernel (apart from big.LITTLE cpufreq
>> driver and PSCI checker whose usage of topology_physical_package_id() is
>> questionable anyway).

Just thinking out loud here.

1. psci_checker.c : it's just used to get groups of cpu's to achieve
   deeper idle states. It should be easy to get rid of that.
2. big.LITTLE cpufreq : 2 users, scpi I should be able to do what I did
   for SCMI and for spc I am thinking if we can hard code it's just used
   on TC2

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-12 19:48   ` Jeremy Linton
@ 2017-10-20 19:53     ` Christ, Austin
  -1 siblings, 0 replies; 104+ messages in thread
From: Christ, Austin @ 2017-10-20 19:53 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair

Hey Jeremy,

Quick comment below.

On 10/12/2017 1:48 PM, Jeremy Linton wrote:
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

This lookup for the acpi id is architecture dependent. Can you use a 
function that would work for any user of PPTT and MADT? It may require 
writing and exporting the inverse lookup of the function 
acpi_get_cpuid() which is exported from processor_core.c

> +
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);
> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> +		    cpu);
> +	return -ENOENT;
> +}

-- 
Austin Christ
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-20 19:53     ` Christ, Austin
  0 siblings, 0 replies; 104+ messages in thread
From: Christ, Austin @ 2017-10-20 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hey Jeremy,

Quick comment below.

On 10/12/2017 1:48 PM, Jeremy Linton wrote:
> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
> +				    unsigned int cpu, int level)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;

This lookup for the acpi id is architecture dependent. Can you use a 
function that would work for any user of PPTT and MADT? It may require 
writing and exporting the inverse lookup of the function 
acpi_get_cpuid() which is exported from processor_core.c

> +
> +	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +	if (cpu_node) {
> +		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
> +		/* Only the first level has a guaranteed id */
> +		if (level == 0)
> +			return cpu_node->acpi_processor_id;
> +		return (int)((u8 *)cpu_node - (u8 *)table);
> +	}
> +	pr_err_once("PPTT table found, but unable to locate core for %d\n",
> +		    cpu);
> +	return -ENOENT;
> +}

-- 
Austin Christ
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-20 16:14           ` Jeremy Linton
@ 2017-10-20 19:55             ` Jeffrey Hugo
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeffrey Hugo @ 2017-10-20 19:55 UTC (permalink / raw)
  To: Jeremy Linton, Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, wangxiongfeng2, Jonathan.Zhang, ahs3,
	Jayachandran.Nair, austinwc

On 10/20/2017 10:14 AM, Jeremy Linton wrote:
> Hi,
> 
> On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
>> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>>> Propagate the topology information from the PPTT tree to the
>>>>> cpu_topology array. We can get the thread id, core_id and
>>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>>> to those concepts. The package_id is flagged in the tree and can be
>>>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>>>> which terminates its search when it finds an ACPI node flagged
>>>>> as the physical package. If the tree doesn't contain enough
>>>>> levels to represent all of thread/core/cod/package then the package
>>>>> id will be used for the missing levels.
>>>>>
>>>>> Since server/ACPI machines are more likely to be multisocket and NUMA,
>>>>
>>>> I think this stuff is vague enough already so to start with I would 
>>>> drop
>>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>>> with ACPI than DT.
>>>>
>>>> I am just saying, for the umpteenth time, that these levels have no
>>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>>> with no architectural meaning attached.
>>>
>>> ?
>>>
>>> Did anyone say anything about that? No, I think the only thing being
>>> guaranteed here is that the kernel's physical_id maps to an ACPI
>>> defined socket. Which seems to be the mindset of pretty much the
>>> entire !arm64 community meaning they are optimizing their software
>>> and the kernel with that concept in mind.
>>>
>>> Are you denying the existence of non-uniformity between threads
>>> running on different physical sockets?
>>
>> No, I have not explained my POV clearly, apologies.
>>
>> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
>>
>> 1) thread
>> 2) core
>> 3) package
>>
>> What I wanted to say is, that, to simplify this series, you do not need
>> to introduce the COD topology level, since it is just another arbitrary
>> topology level (ie there is no way you can pinpoint which level
>> corresponds to COD with PPTT - or DT for the sake of this discussion)
>> that would not be used in the kernel (apart from big.LITTLE cpufreq
>> driver and PSCI checker whose usage of topology_physical_package_id() is
>> questionable anyway).
> 
> Oh! But, i'm at a loss as to what to do with those two users if I set 
> the node which has the physical socket flag set, as the "cluster_id" in 
> the topology.
> 
> Granted, this being ACPI I don't expect the cpufreq driver to be active 
> (given CPPC) and the psci checker might be ignored? Even so, its a bit 
> of a misnomer what is actually happening. Are we good with this?
> 
> 
>>
>> PPTT allows you to define what level corresponds to a package, use
>> it to initialize the package topology level (that on ARM internal
>> variables we call cluster) and be done with it.
>>
>> I do not think that adding another topology level improves anything as
>> far as ACPI topology detection is concerned, you are not able to use it
>> in the scheduler or from userspace to group CPUs anyway.
> 
> Correct, and AFAIK after having poked a bit at the scheduler its sort of 
> redundant as the generic cache sharing levels are more useful anyway.

What do you mean, it can't be used?  We expect a followup series which 
uses PPTT to define scheduling domains/groups.

The scheduler supports 4 types of levels, with an arbitrary number of 
instances of each - NUMA, DIE (package, usually not used with NUMA), MC 
(multicore, typically cores which share resources like cache), SMT 
(threads).

Our particular platform has a single socket/package, with multiple 
"clusters", each cluster consisting of multiple cores that share caches. 
  We represent all of this in PPTT, and expect it to be used.  Leaf 
nodes are cores.  The level above is the cluster.  The top level is the 
package.  We expect eventually (and understand that Jeremy is not 
tackling this with his current series) that clusters get represented MC 
so that migrated processes prefer their cache-shared siblings, and the 
entire package is represented by DIE.

This will have to come from PPTT since you can't use core_siblings to 
derive this.  Additionally, if we had multiple layers of clustering, we 
would expect each layer to be represented by MC.  Topology.c has none of 
this support today.

PPTT can refer to SLIT/SRAT to determine if a hirearchy level 
corresponds to the "Cluster-on-Die" concept of other architectures 
(which end up as NUMA nodes in NUMA scheduling domains).

What PPTT will have to do is parse the tree(s), determine what each 
level is - SMT, MC, NUMA, DIE - and then use set_sched_topology() so 
that the scheduler can build up groups/domains appropriately.


Jeremy, we've tested v3 on our platform.  The topology part works as 
expected, we no longer see lstopo reporting sockets where there are 
none, but the scheduling groups are broken (expected).  Caches still 
don't work right (no sizes reported, and the sched caches are not 
attributed to the cores).  We will likely have additional comments as we 
delve into it.
> 
>>
>> Does this answer your question ?
> Yes, other than what to do with the two drivers.
> 
>>
>> Thanks,
>> Lorenzo
>>
> 


-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-20 19:55             ` Jeffrey Hugo
  0 siblings, 0 replies; 104+ messages in thread
From: Jeffrey Hugo @ 2017-10-20 19:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/20/2017 10:14 AM, Jeremy Linton wrote:
> Hi,
> 
> On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
>> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>>> Propagate the topology information from the PPTT tree to the
>>>>> cpu_topology array. We can get the thread id, core_id and
>>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>>> to those concepts. The package_id is flagged in the tree and can be
>>>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>>>> which terminates its search when it finds an ACPI node flagged
>>>>> as the physical package. If the tree doesn't contain enough
>>>>> levels to represent all of thread/core/cod/package then the package
>>>>> id will be used for the missing levels.
>>>>>
>>>>> Since server/ACPI machines are more likely to be multisocket and NUMA,
>>>>
>>>> I think this stuff is vague enough already so to start with I would 
>>>> drop
>>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>>> with ACPI than DT.
>>>>
>>>> I am just saying, for the umpteenth time, that these levels have no
>>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>>> with no architectural meaning attached.
>>>
>>> ?
>>>
>>> Did anyone say anything about that? No, I think the only thing being
>>> guaranteed here is that the kernel's physical_id maps to an ACPI
>>> defined socket. Which seems to be the mindset of pretty much the
>>> entire !arm64 community meaning they are optimizing their software
>>> and the kernel with that concept in mind.
>>>
>>> Are you denying the existence of non-uniformity between threads
>>> running on different physical sockets?
>>
>> No, I have not explained my POV clearly, apologies.
>>
>> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
>>
>> 1) thread
>> 2) core
>> 3) package
>>
>> What I wanted to say is, that, to simplify this series, you do not need
>> to introduce the COD topology level, since it is just another arbitrary
>> topology level (ie there is no way you can pinpoint which level
>> corresponds to COD with PPTT - or DT for the sake of this discussion)
>> that would not be used in the kernel (apart from big.LITTLE cpufreq
>> driver and PSCI checker whose usage of topology_physical_package_id() is
>> questionable anyway).
> 
> Oh! But, i'm at a loss as to what to do with those two users if I set 
> the node which has the physical socket flag set, as the "cluster_id" in 
> the topology.
> 
> Granted, this being ACPI I don't expect the cpufreq driver to be active 
> (given CPPC) and the psci checker might be ignored? Even so, its a bit 
> of a misnomer what is actually happening. Are we good with this?
> 
> 
>>
>> PPTT allows you to define what level corresponds to a package, use
>> it to initialize the package topology level (that on ARM internal
>> variables we call cluster) and be done with it.
>>
>> I do not think that adding another topology level improves anything as
>> far as ACPI topology detection is concerned, you are not able to use it
>> in the scheduler or from userspace to group CPUs anyway.
> 
> Correct, and AFAIK after having poked a bit at the scheduler its sort of 
> redundant as the generic cache sharing levels are more useful anyway.

What do you mean, it can't be used?  We expect a followup series which 
uses PPTT to define scheduling domains/groups.

The scheduler supports 4 types of levels, with an arbitrary number of 
instances of each - NUMA, DIE (package, usually not used with NUMA), MC 
(multicore, typically cores which share resources like cache), SMT 
(threads).

Our particular platform has a single socket/package, with multiple 
"clusters", each cluster consisting of multiple cores that share caches. 
  We represent all of this in PPTT, and expect it to be used.  Leaf 
nodes are cores.  The level above is the cluster.  The top level is the 
package.  We expect eventually (and understand that Jeremy is not 
tackling this with his current series) that clusters get represented MC 
so that migrated processes prefer their cache-shared siblings, and the 
entire package is represented by DIE.

This will have to come from PPTT since you can't use core_siblings to 
derive this.  Additionally, if we had multiple layers of clustering, we 
would expect each layer to be represented by MC.  Topology.c has none of 
this support today.

PPTT can refer to SLIT/SRAT to determine if a hirearchy level 
corresponds to the "Cluster-on-Die" concept of other architectures 
(which end up as NUMA nodes in NUMA scheduling domains).

What PPTT will have to do is parse the tree(s), determine what each 
level is - SMT, MC, NUMA, DIE - and then use set_sched_topology() so 
that the scheduler can build up groups/domains appropriately.


Jeremy, we've tested v3 on our platform.  The topology part works as 
expected, we no longer see lstopo reporting sockets where there are 
none, but the scheduling groups are broken (expected).  Caches still 
don't work right (no sizes reported, and the sched caches are not 
attributed to the cores).  We will likely have additional comments as we 
delve into it.
> 
>>
>> Does this answer your question ?
> Yes, other than what to do with the two drivers.
> 
>>
>> Thanks,
>> Lorenzo
>>
> 


-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-20 19:53     ` Christ, Austin
@ 2017-10-23 21:14       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-23 21:14 UTC (permalink / raw)
  To: Christ, Austin, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair

Hi,

On 10/20/2017 02:53 PM, Christ, Austin wrote:
> Hey Jeremy,
> 
> Quick comment below.
> 
> On 10/12/2017 1:48 PM, Jeremy Linton wrote:
>> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
>> +                    unsigned int cpu, int level)
>> +{
>> +    struct acpi_pptt_processor *cpu_node;
>> +    u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> This lookup for the acpi id is architecture dependent. Can you use a 
> function that would work for any user of PPTT and MADT? It may require 
> writing and exporting the inverse lookup of the function 
> acpi_get_cpuid() which is exported from processor_core.c

Sure, I was actually thinking about just passing it into the function, 
so it becomes the responsibility of the caller to do the platform 
specific reverse lookup.

> 
>> +
>> +    cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +    if (cpu_node) {
>> +        cpu_node = acpi_find_processor_package_id(table, cpu_node, 
>> level);
>> +        /* Only the first level has a guaranteed id */
>> +        if (level == 0)
>> +            return cpu_node->acpi_processor_id;
>> +        return (int)((u8 *)cpu_node - (u8 *)table);
>> +    }
>> +    pr_err_once("PPTT table found, but unable to locate core for %d\n",
>> +            cpu);
>> +    return -ENOENT;
>> +}
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-23 21:14       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-23 21:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 10/20/2017 02:53 PM, Christ, Austin wrote:
> Hey Jeremy,
> 
> Quick comment below.
> 
> On 10/12/2017 1:48 PM, Jeremy Linton wrote:
>> +static int topology_setup_acpi_cpu(struct acpi_table_header *table,
>> +??????????????????? unsigned int cpu, int level)
>> +{
>> +??? struct acpi_pptt_processor *cpu_node;
>> +??? u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
> 
> This lookup for the acpi id is architecture dependent. Can you use a 
> function that would work for any user of PPTT and MADT? It may require 
> writing and exporting the inverse lookup of the function 
> acpi_get_cpuid() which is exported from processor_core.c

Sure, I was actually thinking about just passing it into the function, 
so it becomes the responsibility of the caller to do the platform 
specific reverse lookup.

> 
>> +
>> +??? cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +??? if (cpu_node) {
>> +??????? cpu_node = acpi_find_processor_package_id(table, cpu_node, 
>> level);
>> +??????? /* Only the first level has a guaranteed id */
>> +??????? if (level == 0)
>> +??????????? return cpu_node->acpi_processor_id;
>> +??????? return (int)((u8 *)cpu_node - (u8 *)table);
>> +??? }
>> +??? pr_err_once("PPTT table found, but unable to locate core for %d\n",
>> +??????????? cpu);
>> +??? return -ENOENT;
>> +}
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-20 19:55             ` Jeffrey Hugo
@ 2017-10-23 21:26               ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-23 21:26 UTC (permalink / raw)
  To: Jeffrey Hugo, Lorenzo Pieralisi
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, wangxiongfeng2, Jonathan.Zhang, ahs3,
	Jayachandran.Nair, austinwc

Hi,

On 10/20/2017 02:55 PM, Jeffrey Hugo wrote:
> On 10/20/2017 10:14 AM, Jeremy Linton wrote:
>> Hi,
>>
>> On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
>>> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>>>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>>>> Propagate the topology information from the PPTT tree to the
>>>>>> cpu_topology array. We can get the thread id, core_id and
>>>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>>>> to those concepts. The package_id is flagged in the tree and can be
>>>>>> found by passing an arbitrary large level to 
>>>>>> setup_acpi_cpu_topology()
>>>>>> which terminates its search when it finds an ACPI node flagged
>>>>>> as the physical package. If the tree doesn't contain enough
>>>>>> levels to represent all of thread/core/cod/package then the package
>>>>>> id will be used for the missing levels.
>>>>>>
>>>>>> Since server/ACPI machines are more likely to be multisocket and 
>>>>>> NUMA,
>>>>>
>>>>> I think this stuff is vague enough already so to start with I would 
>>>>> drop
>>>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>>>> with ACPI than DT.
>>>>>
>>>>> I am just saying, for the umpteenth time, that these levels have no
>>>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>>>> with no architectural meaning attached.
>>>>
>>>> ?
>>>>
>>>> Did anyone say anything about that? No, I think the only thing being
>>>> guaranteed here is that the kernel's physical_id maps to an ACPI
>>>> defined socket. Which seems to be the mindset of pretty much the
>>>> entire !arm64 community meaning they are optimizing their software
>>>> and the kernel with that concept in mind.
>>>>
>>>> Are you denying the existence of non-uniformity between threads
>>>> running on different physical sockets?
>>>
>>> No, I have not explained my POV clearly, apologies.
>>>
>>> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
>>>
>>> 1) thread
>>> 2) core
>>> 3) package
>>>
>>> What I wanted to say is, that, to simplify this series, you do not need
>>> to introduce the COD topology level, since it is just another arbitrary
>>> topology level (ie there is no way you can pinpoint which level
>>> corresponds to COD with PPTT - or DT for the sake of this discussion)
>>> that would not be used in the kernel (apart from big.LITTLE cpufreq
>>> driver and PSCI checker whose usage of topology_physical_package_id() is
>>> questionable anyway).
>>
>> Oh! But, i'm at a loss as to what to do with those two users if I set 
>> the node which has the physical socket flag set, as the "cluster_id" 
>> in the topology.
>>
>> Granted, this being ACPI I don't expect the cpufreq driver to be 
>> active (given CPPC) and the psci checker might be ignored? Even so, 
>> its a bit of a misnomer what is actually happening. Are we good with 
>> this?
>>
>>
>>>
>>> PPTT allows you to define what level corresponds to a package, use
>>> it to initialize the package topology level (that on ARM internal
>>> variables we call cluster) and be done with it.
>>>
>>> I do not think that adding another topology level improves anything as
>>> far as ACPI topology detection is concerned, you are not able to use it
>>> in the scheduler or from userspace to group CPUs anyway.
>>
>> Correct, and AFAIK after having poked a bit at the scheduler its sort 
>> of redundant as the generic cache sharing levels are more useful anyway.
> 
> What do you mean, it can't be used?  We expect a followup series which 
> uses PPTT to define scheduling domains/groups.
> 
> The scheduler supports 4 types of levels, with an arbitrary number of 
> instances of each - NUMA, DIE (package, usually not used with NUMA), MC 
> (multicore, typically cores which share resources like cache), SMT 
> (threads).

It turns out to be pretty easy to map individual PPTT "levels" to MC 
layers simply by creating a custom sched_domain_topology_level and 
populating it with an equal number of MC layers. The only thing that 
changes is the "mask" portion of each entry.

Whether that is good/bad vs just using a topology like:

static struct sched_domain_topology_level arm64_topology[] = {
#ifdef CONFIG_SCHED_SMT
        { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif
        { cpu_cluster_mask, cpu_core_flags, SD_INIT_NAME(CLU) },
#ifdef CONFIG_SCHED_MC
        { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
#endif
        { cpu_cpu_mask, SD_INIT_NAME(DIE) },
        { NULL, },
};

and using it on successful ACPI/PPTT parse, along with a new 
cpu_cluster_mask isn't clear to me either. Particularly, if one goes in 
and starts changing the "cpu_core_flags" for starters to the cpu_smt_flags.


But as mentioned I think this is a follow on patch which meshes with 
patches 4/5 here.



> 
> Our particular platform has a single socket/package, with multiple 
> "clusters", each cluster consisting of multiple cores that share caches. 
>   We represent all of this in PPTT, and expect it to be used.  Leaf 
> nodes are cores.  The level above is the cluster.  The top level is the 
> package.  We expect eventually (and understand that Jeremy is not 
> tackling this with his current series) that clusters get represented MC 
> so that migrated processes prefer their cache-shared siblings, and the 
> entire package is represented by DIE.
> 
> This will have to come from PPTT since you can't use core_siblings to 
> derive this.  Additionally, if we had multiple layers of clustering, we 
> would expect each layer to be represented by MC.  Topology.c has none of 
> this support today.
> 
> PPTT can refer to SLIT/SRAT to determine if a hirearchy level 
> corresponds to the "Cluster-on-Die" concept of other architectures 
> (which end up as NUMA nodes in NUMA scheduling domains).
> 
> What PPTT will have to do is parse the tree(s), determine what each 
> level is - SMT, MC, NUMA, DIE - and then use set_sched_topology() so 
> that the scheduler can build up groups/domains appropriately.
> 
> 
> Jeremy, we've tested v3 on our platform.  The topology part works as 
> expected, we no longer see lstopo reporting sockets where there are 
> none, but the scheduling groups are broken (expected).  Caches still 
> don't work right (no sizes reported, and the sched caches are not 
> attributed to the cores).  We will likely have additional comments as we 
> delve into it.
>>
>>>
>>> Does this answer your question ?
>> Yes, other than what to do with the two drivers.
>>
>>>
>>> Thanks,
>>> Lorenzo
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-10-23 21:26               ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2017-10-23 21:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 10/20/2017 02:55 PM, Jeffrey Hugo wrote:
> On 10/20/2017 10:14 AM, Jeremy Linton wrote:
>> Hi,
>>
>> On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
>>> On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
>>>> On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
>>>>> On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
>>>>>> Propagate the topology information from the PPTT tree to the
>>>>>> cpu_topology array. We can get the thread id, core_id and
>>>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>>>> to those concepts. The package_id is flagged in the tree and can be
>>>>>> found by passing an arbitrary large level to 
>>>>>> setup_acpi_cpu_topology()
>>>>>> which terminates its search when it finds an ACPI node flagged
>>>>>> as the physical package. If the tree doesn't contain enough
>>>>>> levels to represent all of thread/core/cod/package then the package
>>>>>> id will be used for the missing levels.
>>>>>>
>>>>>> Since server/ACPI machines are more likely to be multisocket and 
>>>>>> NUMA,
>>>>>
>>>>> I think this stuff is vague enough already so to start with I would 
>>>>> drop
>>>>> patch 4 and 5 and stop assuming what machines are more likely to ship
>>>>> with ACPI than DT.
>>>>>
>>>>> I am just saying, for the umpteenth time, that these levels have no
>>>>> architectural meaning _whatsoever_, level is a hierarchy concept
>>>>> with no architectural meaning attached.
>>>>
>>>> ?
>>>>
>>>> Did anyone say anything about that? No, I think the only thing being
>>>> guaranteed here is that the kernel's physical_id maps to an ACPI
>>>> defined socket. Which seems to be the mindset of pretty much the
>>>> entire !arm64 community meaning they are optimizing their software
>>>> and the kernel with that concept in mind.
>>>>
>>>> Are you denying the existence of non-uniformity between threads
>>>> running on different physical sockets?
>>>
>>> No, I have not explained my POV clearly, apologies.
>>>
>>> AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.
>>>
>>> 1) thread
>>> 2) core
>>> 3) package
>>>
>>> What I wanted to say is, that, to simplify this series, you do not need
>>> to introduce the COD topology level, since it is just another arbitrary
>>> topology level (ie there is no way you can pinpoint which level
>>> corresponds to COD with PPTT - or DT for the sake of this discussion)
>>> that would not be used in the kernel (apart from big.LITTLE cpufreq
>>> driver and PSCI checker whose usage of topology_physical_package_id() is
>>> questionable anyway).
>>
>> Oh! But, i'm at a loss as to what to do with those two users if I set 
>> the node which has the physical socket flag set, as the "cluster_id" 
>> in the topology.
>>
>> Granted, this being ACPI I don't expect the cpufreq driver to be 
>> active (given CPPC) and the psci checker might be ignored? Even so, 
>> its a bit of a misnomer what is actually happening. Are we good with 
>> this?
>>
>>
>>>
>>> PPTT allows you to define what level corresponds to a package, use
>>> it to initialize the package topology level (that on ARM internal
>>> variables we call cluster) and be done with it.
>>>
>>> I do not think that adding another topology level improves anything as
>>> far as ACPI topology detection is concerned, you are not able to use it
>>> in the scheduler or from userspace to group CPUs anyway.
>>
>> Correct, and AFAIK after having poked a bit at the scheduler its sort 
>> of redundant as the generic cache sharing levels are more useful anyway.
> 
> What do you mean, it can't be used?? We expect a followup series which 
> uses PPTT to define scheduling domains/groups.
> 
> The scheduler supports 4 types of levels, with an arbitrary number of 
> instances of each - NUMA, DIE (package, usually not used with NUMA), MC 
> (multicore, typically cores which share resources like cache), SMT 
> (threads).

It turns out to be pretty easy to map individual PPTT "levels" to MC 
layers simply by creating a custom sched_domain_topology_level and 
populating it with an equal number of MC layers. The only thing that 
changes is the "mask" portion of each entry.

Whether that is good/bad vs just using a topology like:

static struct sched_domain_topology_level arm64_topology[] = {
#ifdef CONFIG_SCHED_SMT
        { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif
        { cpu_cluster_mask, cpu_core_flags, SD_INIT_NAME(CLU) },
#ifdef CONFIG_SCHED_MC
        { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
#endif
        { cpu_cpu_mask, SD_INIT_NAME(DIE) },
        { NULL, },
};

and using it on successful ACPI/PPTT parse, along with a new 
cpu_cluster_mask isn't clear to me either. Particularly, if one goes in 
and starts changing the "cpu_core_flags" for starters to the cpu_smt_flags.


But as mentioned I think this is a follow on patch which meshes with 
patches 4/5 here.



> 
> Our particular platform has a single socket/package, with multiple 
> "clusters", each cluster consisting of multiple cores that share caches. 
>  ?We represent all of this in PPTT, and expect it to be used.? Leaf 
> nodes are cores.? The level above is the cluster.? The top level is the 
> package.? We expect eventually (and understand that Jeremy is not 
> tackling this with his current series) that clusters get represented MC 
> so that migrated processes prefer their cache-shared siblings, and the 
> entire package is represented by DIE.
> 
> This will have to come from PPTT since you can't use core_siblings to 
> derive this.? Additionally, if we had multiple layers of clustering, we 
> would expect each layer to be represented by MC.? Topology.c has none of 
> this support today.
> 
> PPTT can refer to SLIT/SRAT to determine if a hirearchy level 
> corresponds to the "Cluster-on-Die" concept of other architectures 
> (which end up as NUMA nodes in NUMA scheduling domains).
> 
> What PPTT will have to do is parse the tree(s), determine what each 
> level is - SMT, MC, NUMA, DIE - and then use set_sched_topology() so 
> that the scheduler can build up groups/domains appropriately.
> 
> 
> Jeremy, we've tested v3 on our platform.? The topology part works as 
> expected, we no longer see lstopo reporting sockets where there are 
> none, but the scheduling groups are broken (expected).? Caches still 
> don't work right (no sizes reported, and the sched caches are not 
> attributed to the cores).? We will likely have additional comments as we 
> delve into it.
>>
>>>
>>> Does this answer your question ?
>> Yes, other than what to do with the two drivers.
>>
>>>
>>> Thanks,
>>> Lorenzo
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-10-19 10:25                 ` John Garry
@ 2017-10-27  5:21                   ` Tomasz Nowicki
  -1 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-27  5:21 UTC (permalink / raw)
  To: John Garry, Tomasz Nowicki, Jeremy Linton, linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair,
	lorenzo.pieralisi, austinwc, linux-pm, jhugo, gregkh,
	sudeep.holla, rjw, linux-kernel, will.deacon, wangxiongfeng2,
	viresh.kumar, hanjun.guo, catalin.marinas, ahs3,
	linux-arm-kernel

Hi John,

On 19.10.2017 12:25, John Garry wrote:
> On 19/10/2017 06:18, Tomasz Nowicki wrote:
>>>
>>> Summary:
>>>
>>> I'm not at all happy with this specification's attempt to leave out
>>> pieces of information which make parsing things more deterministic. In
>>> this case I'm happy to demote the message level, but not remove it
>>> entirely but I do think the obvious case you list shouldn't be the
>>> default one.
>>>
>>> Lastly:
>>>
>>> I'm assuming the final result is that the table is actually being
>>> parsed correctly despite the ugly message?
>>
>> Indeed, the ThunderX2 PPTT table is being parsed so that topology shown
>> in lstopo and lscpu is correct.
> 
> Hi Tomasz,
> 
> Can you share the lscpu output? Does it have cluster info? I did not 
> think that lscpu has a concept of clustering.
> 
> I would say that the per-cpu cluster index sysfs entry needs be added to 
> drivers/base/arch_topology.c (and other appropiate code under 
> GENERIC_ARCH_TOPOLOGY) to support this.

Here is what I get:

tn@val2-11 [~]$ lscpu -ap
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting from zero.
# CPU,Core,Socket,Node,,L1d,L1i,L2,L3
[...]
1,0,0,0,,0,0,0,0
[...]

so yes, no cluster info.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-10-27  5:21                   ` Tomasz Nowicki
  0 siblings, 0 replies; 104+ messages in thread
From: Tomasz Nowicki @ 2017-10-27  5:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi John,

On 19.10.2017 12:25, John Garry wrote:
> On 19/10/2017 06:18, Tomasz Nowicki wrote:
>>>
>>> Summary:
>>>
>>> I'm not at all happy with this specification's attempt to leave out
>>> pieces of information which make parsing things more deterministic. In
>>> this case I'm happy to demote the message level, but not remove it
>>> entirely but I do think the obvious case you list shouldn't be the
>>> default one.
>>>
>>> Lastly:
>>>
>>> I'm assuming the final result is that the table is actually being
>>> parsed correctly despite the ugly message?
>>
>> Indeed, the ThunderX2 PPTT table is being parsed so that topology shown
>> in lstopo and lscpu is correct.
> 
> Hi Tomasz,
> 
> Can you share the lscpu output? Does it have cluster info? I did not 
> think that lscpu has a concept of clustering.
> 
> I would say that the per-cpu cluster index sysfs entry needs be added to 
> drivers/base/arch_topology.c (and other appropiate code under 
> GENERIC_ARCH_TOPOLOGY) to support this.

Here is what I get:

tn at val2-11 [~]$ lscpu -ap
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting from zero.
# CPU,Core,Socket,Node,,L1d,L1i,L2,L3
[...]
1,0,0,0,,0,0,0,0
[...]

so yes, no cluster info.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 0/7] Support PPTT for ARM64
  2017-10-12 19:48 ` Jeremy Linton
@ 2017-10-31 12:46   ` Jon Masters
  -1 siblings, 0 replies; 104+ messages in thread
From: Jon Masters @ 2017-10-31 12:46 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc

On 10/12/2017 03:48 PM, Jeremy Linton wrote:

> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topology. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
> 
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
> to the physical id as the remainder of the kernel expects.

Just wanted to thank you for doing this Jeremy. As you know, we're
tracking these patches and working with multiple vendors to ensure that
firmware has accurate PPTTs populated to match. We're expecting to pull
these patches and replace our current RHEL-only kludge asap. RHEL
currently has to kludge topology based upon magic "known" meanings of
the MPIDRs on various server platforms. It's (known to be) ugly and is
one of the reasons that we pushed for what became PPTT.

Beyond scheduler efficiency, in general, it's very important that Arm
systems can correctly report x86 style topology industry conventions -
especially sockets - since (and I told Arm this years ago, and other
non-Linux vendors backed me up) it's typical on server platforms to use
either "memory" or "number of sockets" when making licensing and
subscription calculations in various tooling. This became a problem
early on even with X-Gene1 and Seattle showing as 8 socket boxes ;)

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 0/7] Support PPTT for ARM64
@ 2017-10-31 12:46   ` Jon Masters
  0 siblings, 0 replies; 104+ messages in thread
From: Jon Masters @ 2017-10-31 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/12/2017 03:48 PM, Jeremy Linton wrote:

> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topology. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.
> 
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable ACPI/PPTT for arm64 we map the socket
> to the physical id as the remainder of the kernel expects.

Just wanted to thank you for doing this Jeremy. As you know, we're
tracking these patches and working with multiple vendors to ensure that
firmware has accurate PPTTs populated to match. We're expecting to pull
these patches and replace our current RHEL-only kludge asap. RHEL
currently has to kludge topology based upon magic "known" meanings of
the MPIDRs on various server platforms. It's (known to be) ugly and is
one of the reasons that we pushed for what became PPTT.

Beyond scheduler efficiency, in general, it's very important that Arm
systems can correctly report x86 style topology industry conventions -
especially sockets - since (and I told Arm this years ago, and other
non-Linux vendors backed me up) it's typical on server platforms to use
either "memory" or "number of sockets" when making licensing and
subscription calculations in various tooling. This became a problem
early on even with X-Gene1 and Seattle showing as 8 socket boxes ;)

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-10-20  9:22         ` Lorenzo Pieralisi
@ 2017-11-01 20:29           ` Al Stone
  -1 siblings, 0 replies; 104+ messages in thread
From: Al Stone @ 2017-11-01 20:29 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	Jayachandran.Nair, austinwc

On 10/20/2017 03:22 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 19, 2017 at 11:54:22AM -0500, Jeremy Linton wrote:
> 
> [...]
> 
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>>>> +			cpu_topology[cpu].cluster_id = topology_id;
>>>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>>>
>>> If you want a package id (that's just a package tag to group cores), you
>>> should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
>>> (so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
>>> whatever it represents).
>>
>> I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I
>> misreading it? Which means we need to "pick" a node level to
>> represent the physical package if one doesn't exist...
> 
> The specs define a means to detect if a given PPTT node corresponds to a
> package (I am refraining from stating again that to me that's not clean
> cut what a package is _architecturally_, I think you know my POV by now)
> and that's what you need to use to retrieve a packageid for a given cpu,
> if I understand the aim of the physical package flag.
> 
> Either that or that flag is completely useless.
> 
> Lorenzo
> 
> ACPI 6.2 - Table 5-151 (page 248)
> Physical package
> -----------------
> Set to 1 if this node of the processor topology represents the boundary
> of a physical package, whether socketed or surface mounted.  Set to 0 if
> this instance of the processor topology does not represent the boundary
> of a physical package.
> 

I've been following the discussion and I'm not sure I understand what the
confusion is around having a physical package ID.  Since I'm the one that
insisted it be in the spec, I'd be glad to clarify anything.  My apologies
for not saying anything sooner but things IRL have been very complicated
of late.

What was intended was a simple flag that was meant to tell me if a CPU ID
(this could be a CPU, a cluster, a processor container -- I don't really
care which) is *also* an actual physical device on a motherboard.  That is
the only intent; there was no architectural meaning intended at all -- that
is what the PPTT structures are for, in conjunction with any DSDT information
uncovered later in the boot process.

However, in the broader server ecosystem, this can be incredibly useful.  There
are a significant number of software products sold or supported that base their
fees on the number of physical sockets in use.  There have been in the past (and
may be in the near future) machines where the cost of the lease on the machine
is determined by how many physical sockets (or even CPUs) are in use, even if
the machine has many more available.

Some vendors also include FRU (Field Replaceable Unit) location info in their
ACPI tables.  So, for example, one or more CPUs or caches might fail in one
physical package, which is then reported to a maintenance system of some sort
that tells some human which of the physical sockets on what motherboard needs a
replacement device, or it's simply noted and shut off until it's time to replace
the entire server, or perhaps it's logged and used in an algorithm to predict
when the server might fail completely.

So, that's why the flag exists in the spec.  It seems to make sense to me to
have a package ID as part of struct cpu_topology -- it might even be really
handy for CPU hotplug.  If you don't, it seems to me a whole separate struct
would be needed with more cpumasks to show who belongs to what physical package;
that might be okay but seems unnecessarily complicated to me.

You can also tell me that I have completely missed the point of the discussion
so far :-).  But if you do, you have to tell me what I missed.

Hope this helps clarify...

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Red Hat, Inc.
ahs3@redhat.com
-----------------------------------

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-11-01 20:29           ` Al Stone
  0 siblings, 0 replies; 104+ messages in thread
From: Al Stone @ 2017-11-01 20:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/20/2017 03:22 AM, Lorenzo Pieralisi wrote:
> On Thu, Oct 19, 2017 at 11:54:22AM -0500, Jeremy Linton wrote:
> 
> [...]
> 
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>>>> +			cpu_topology[cpu].cluster_id = topology_id;
>>>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>>>
>>> If you want a package id (that's just a package tag to group cores), you
>>> should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
>>> (so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
>>> whatever it represents).
>>
>> I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I
>> misreading it? Which means we need to "pick" a node level to
>> represent the physical package if one doesn't exist...
> 
> The specs define a means to detect if a given PPTT node corresponds to a
> package (I am refraining from stating again that to me that's not clean
> cut what a package is _architecturally_, I think you know my POV by now)
> and that's what you need to use to retrieve a packageid for a given cpu,
> if I understand the aim of the physical package flag.
> 
> Either that or that flag is completely useless.
> 
> Lorenzo
> 
> ACPI 6.2 - Table 5-151 (page 248)
> Physical package
> -----------------
> Set to 1 if this node of the processor topology represents the boundary
> of a physical package, whether socketed or surface mounted.  Set to 0 if
> this instance of the processor topology does not represent the boundary
> of a physical package.
> 

I've been following the discussion and I'm not sure I understand what the
confusion is around having a physical package ID.  Since I'm the one that
insisted it be in the spec, I'd be glad to clarify anything.  My apologies
for not saying anything sooner but things IRL have been very complicated
of late.

What was intended was a simple flag that was meant to tell me if a CPU ID
(this could be a CPU, a cluster, a processor container -- I don't really
care which) is *also* an actual physical device on a motherboard.  That is
the only intent; there was no architectural meaning intended at all -- that
is what the PPTT structures are for, in conjunction with any DSDT information
uncovered later in the boot process.

However, in the broader server ecosystem, this can be incredibly useful.  There
are a significant number of software products sold or supported that base their
fees on the number of physical sockets in use.  There have been in the past (and
may be in the near future) machines where the cost of the lease on the machine
is determined by how many physical sockets (or even CPUs) are in use, even if
the machine has many more available.

Some vendors also include FRU (Field Replaceable Unit) location info in their
ACPI tables.  So, for example, one or more CPUs or caches might fail in one
physical package, which is then reported to a maintenance system of some sort
that tells some human which of the physical sockets on what motherboard needs a
replacement device, or it's simply noted and shut off until it's time to replace
the entire server, or perhaps it's logged and used in an algorithm to predict
when the server might fail completely.

So, that's why the flag exists in the spec.  It seems to make sense to me to
have a package ID as part of struct cpu_topology -- it might even be really
handy for CPU hotplug.  If you don't, it seems to me a whole separate struct
would be needed with more cpumasks to show who belongs to what physical package;
that might be okay but seems unnecessarily complicated to me.

You can also tell me that I have completely missed the point of the discussion
so far :-).  But if you do, you have to tell me what I missed.

Hope this helps clarify...

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Red Hat, Inc.
ahs3 at redhat.com
-----------------------------------

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-11-01 20:29           ` Al Stone
@ 2017-11-02 10:48             ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-11-02 10:48 UTC (permalink / raw)
  To: Al Stone
  Cc: Jeremy Linton, linux-acpi, linux-arm-kernel, sudeep.holla,
	hanjun.guo, rjw, will.deacon, catalin.marinas, gregkh,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, Jayachandran.Nair, austinwc

On Wed, Nov 01, 2017 at 02:29:26PM -0600, Al Stone wrote:
> On 10/20/2017 03:22 AM, Lorenzo Pieralisi wrote:
> > On Thu, Oct 19, 2017 at 11:54:22AM -0500, Jeremy Linton wrote:
> > 
> > [...]
> > 
> >>>> +			cpu_topology[cpu].core_id   = topology_id;
> >>>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> >>>> +			cpu_topology[cpu].cluster_id = topology_id;
> >>>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> >>>
> >>> If you want a package id (that's just a package tag to group cores), you
> >>> should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
> >>> (so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
> >>> whatever it represents).
> >>
> >> I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I
> >> misreading it? Which means we need to "pick" a node level to
> >> represent the physical package if one doesn't exist...
> > 
> > The specs define a means to detect if a given PPTT node corresponds to a
> > package (I am refraining from stating again that to me that's not clean
> > cut what a package is _architecturally_, I think you know my POV by now)
> > and that's what you need to use to retrieve a packageid for a given cpu,
> > if I understand the aim of the physical package flag.
> > 
> > Either that or that flag is completely useless.
> > 
> > Lorenzo
> > 
> > ACPI 6.2 - Table 5-151 (page 248)
> > Physical package
> > -----------------
> > Set to 1 if this node of the processor topology represents the boundary
> > of a physical package, whether socketed or surface mounted.  Set to 0 if
> > this instance of the processor topology does not represent the boundary
> > of a physical package.
> > 
> 
> I've been following the discussion and I'm not sure I understand what the
> confusion is around having a physical package ID.  Since I'm the one that
> insisted it be in the spec, I'd be glad to clarify anything.  My apologies
> for not saying anything sooner but things IRL have been very complicated
> of late.
> 
> What was intended was a simple flag that was meant to tell me if a CPU ID
> (this could be a CPU, a cluster, a processor container -- I don't really
> care which) is *also* an actual physical device on a motherboard.  That is
> the only intent; there was no architectural meaning intended at all -- that
> is what the PPTT structures are for, in conjunction with any DSDT information
> uncovered later in the boot process.
> 
> However, in the broader server ecosystem, this can be incredibly useful.  There
> are a significant number of software products sold or supported that base their
> fees on the number of physical sockets in use.  There have been in the past (and
> may be in the near future) machines where the cost of the lease on the machine
> is determined by how many physical sockets (or even CPUs) are in use, even if
> the machine has many more available.
> 
> Some vendors also include FRU (Field Replaceable Unit) location info in their
> ACPI tables.  So, for example, one or more CPUs or caches might fail in one
> physical package, which is then reported to a maintenance system of some sort
> that tells some human which of the physical sockets on what motherboard needs a
> replacement device, or it's simply noted and shut off until it's time to replace
> the entire server, or perhaps it's logged and used in an algorithm to predict
> when the server might fail completely.
> 
> So, that's why the flag exists in the spec.  It seems to make sense to me to
> have a package ID as part of struct cpu_topology -- it might even be really
> handy for CPU hotplug.  If you don't, it seems to me a whole separate struct
> would be needed with more cpumasks to show who belongs to what physical package;
> that might be okay but seems unnecessarily complicated to me.
> 
> You can also tell me that I have completely missed the point of the discussion
> so far :-).  But if you do, you have to tell me what I missed.
> 
> Hope this helps clarify...

Hi Al,

yes it does.

I think we agree that package ID has a HW/architectural meaning on x86,
it has none on PPTT (ie it totally depends on how PPTT is enumerated).

That's the first remark. So, if the package flag is used to group CPUs
and provide the topology package hierarchy to the kernel/userspace fine,
if it is to be used to provide scheduler/userspace with an ID that can
identify a HW "component" of sorts it is not fine because the topology
package ID is a SW construction on ARM systems relying on PPTT
(and DT - by the way).

So, to group CPUs and call them a package, fine by me (with a hope
FW developers won't play too much with that package flag to make
things work but use it consistenly instead).

Having said that, all I asked is that, given that we _know_ (thanks
to the PPTT flag) the package boundary, let's use it to initialize
the topology package level and that's where this patch series should
stop IMHO.

For the time being, I see no point in adding another arbitrary topology
level (ie COD) with no architectural meaning, as I said, this is vague
enough already and there is legacy (and DT systems) to take into account
too.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-11-02 10:48             ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2017-11-02 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 01, 2017 at 02:29:26PM -0600, Al Stone wrote:
> On 10/20/2017 03:22 AM, Lorenzo Pieralisi wrote:
> > On Thu, Oct 19, 2017 at 11:54:22AM -0500, Jeremy Linton wrote:
> > 
> > [...]
> > 
> >>>> +			cpu_topology[cpu].core_id   = topology_id;
> >>>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> >>>> +			cpu_topology[cpu].cluster_id = topology_id;
> >>>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> >>>
> >>> If you want a package id (that's just a package tag to group cores), you
> >>> should not use a large level because you know how setup_acpi_cpu_topology()works, you should add an API that allows you to retrieve the package id
> >>> (so that you can use th ACPI_PPTT_PHYSICAL_PACKAGE flag consistenly,
> >>> whatever it represents).
> >>
> >> I don't think the spec requires the use of PHYSICAL_PACKAGE... Am I
> >> misreading it? Which means we need to "pick" a node level to
> >> represent the physical package if one doesn't exist...
> > 
> > The specs define a means to detect if a given PPTT node corresponds to a
> > package (I am refraining from stating again that to me that's not clean
> > cut what a package is _architecturally_, I think you know my POV by now)
> > and that's what you need to use to retrieve a packageid for a given cpu,
> > if I understand the aim of the physical package flag.
> > 
> > Either that or that flag is completely useless.
> > 
> > Lorenzo
> > 
> > ACPI 6.2 - Table 5-151 (page 248)
> > Physical package
> > -----------------
> > Set to 1 if this node of the processor topology represents the boundary
> > of a physical package, whether socketed or surface mounted.  Set to 0 if
> > this instance of the processor topology does not represent the boundary
> > of a physical package.
> > 
> 
> I've been following the discussion and I'm not sure I understand what the
> confusion is around having a physical package ID.  Since I'm the one that
> insisted it be in the spec, I'd be glad to clarify anything.  My apologies
> for not saying anything sooner but things IRL have been very complicated
> of late.
> 
> What was intended was a simple flag that was meant to tell me if a CPU ID
> (this could be a CPU, a cluster, a processor container -- I don't really
> care which) is *also* an actual physical device on a motherboard.  That is
> the only intent; there was no architectural meaning intended at all -- that
> is what the PPTT structures are for, in conjunction with any DSDT information
> uncovered later in the boot process.
> 
> However, in the broader server ecosystem, this can be incredibly useful.  There
> are a significant number of software products sold or supported that base their
> fees on the number of physical sockets in use.  There have been in the past (and
> may be in the near future) machines where the cost of the lease on the machine
> is determined by how many physical sockets (or even CPUs) are in use, even if
> the machine has many more available.
> 
> Some vendors also include FRU (Field Replaceable Unit) location info in their
> ACPI tables.  So, for example, one or more CPUs or caches might fail in one
> physical package, which is then reported to a maintenance system of some sort
> that tells some human which of the physical sockets on what motherboard needs a
> replacement device, or it's simply noted and shut off until it's time to replace
> the entire server, or perhaps it's logged and used in an algorithm to predict
> when the server might fail completely.
> 
> So, that's why the flag exists in the spec.  It seems to make sense to me to
> have a package ID as part of struct cpu_topology -- it might even be really
> handy for CPU hotplug.  If you don't, it seems to me a whole separate struct
> would be needed with more cpumasks to show who belongs to what physical package;
> that might be okay but seems unnecessarily complicated to me.
> 
> You can also tell me that I have completely missed the point of the discussion
> so far :-).  But if you do, you have to tell me what I missed.
> 
> Hope this helps clarify...

Hi Al,

yes it does.

I think we agree that package ID has a HW/architectural meaning on x86,
it has none on PPTT (ie it totally depends on how PPTT is enumerated).

That's the first remark. So, if the package flag is used to group CPUs
and provide the topology package hierarchy to the kernel/userspace fine,
if it is to be used to provide scheduler/userspace with an ID that can
identify a HW "component" of sorts it is not fine because the topology
package ID is a SW construction on ARM systems relying on PPTT
(and DT - by the way).

So, to group CPUs and call them a package, fine by me (with a hope
FW developers won't play too much with that package flag to make
things work but use it consistenly instead).

Having said that, all I asked is that, given that we _know_ (thanks
to the PPTT flag) the package boundary, let's use it to initialize
the topology package level and that's where this patch series should
stop IMHO.

For the time being, I see no point in adding another arbitrary topology
level (ie COD) with no architectural meaning, as I said, this is vague
enough already and there is legacy (and DT systems) to take into account
too.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2017-11-02 10:48 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12 19:48 [PATCH v3 0/7] Support PPTT for ARM64 Jeremy Linton
2017-10-12 19:48 ` Jeremy Linton
2017-10-12 19:48 ` [PATCH v3 1/7] ACPI/PPTT: Add Processor Properties Topology Table parsing Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-13  9:56   ` Julien Thierry
2017-10-13  9:56     ` Julien Thierry
2017-10-13 22:41     ` Jeremy Linton
2017-10-13 22:41       ` Jeremy Linton
2017-10-13 14:23   ` tn
2017-10-13 14:23     ` tn
2017-10-13 19:58     ` Jeremy Linton
2017-10-13 19:58       ` Jeremy Linton
2017-10-16 14:24   ` John Garry
2017-10-16 14:24     ` John Garry
2017-10-16 14:24     ` John Garry
2017-10-17 13:25   ` Tomasz Nowicki
2017-10-17 13:25     ` Tomasz Nowicki
2017-10-17 15:22     ` Jeremy Linton
2017-10-17 15:22       ` Jeremy Linton
2017-10-18  1:10       ` Xiongfeng Wang
2017-10-18  1:10         ` Xiongfeng Wang
2017-10-18  1:10         ` Xiongfeng Wang
2017-10-18  5:39       ` Tomasz Nowicki
2017-10-18  5:39         ` Tomasz Nowicki
2017-10-18 10:24         ` Tomasz Nowicki
2017-10-18 10:24           ` Tomasz Nowicki
2017-10-18 17:30           ` Jeremy Linton
2017-10-18 17:30             ` Jeremy Linton
2017-10-19  5:18             ` Tomasz Nowicki
2017-10-19  5:18               ` Tomasz Nowicki
2017-10-19 10:25               ` John Garry
2017-10-19 10:25                 ` John Garry
2017-10-19 10:25                 ` John Garry
2017-10-27  5:21                 ` Tomasz Nowicki
2017-10-27  5:21                   ` Tomasz Nowicki
2017-10-19 14:24               ` Jeremy Linton
2017-10-19 14:24                 ` Jeremy Linton
2017-10-19 10:22   ` Lorenzo Pieralisi
2017-10-19 10:22     ` Lorenzo Pieralisi
2017-10-19 15:43     ` Jeremy Linton
2017-10-19 15:43       ` Jeremy Linton
2017-10-20 10:15       ` Lorenzo Pieralisi
2017-10-20 10:15         ` Lorenzo Pieralisi
2017-10-20 19:53   ` Christ, Austin
2017-10-20 19:53     ` Christ, Austin
2017-10-23 21:14     ` Jeremy Linton
2017-10-23 21:14       ` Jeremy Linton
2017-10-12 19:48 ` [PATCH v3 2/7] ACPI: Enable PPTT support on ARM64 Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-13  9:53   ` Hanjun Guo
2017-10-13  9:53     ` Hanjun Guo
2017-10-13  9:53     ` Hanjun Guo
2017-10-13 17:51     ` Jeremy Linton
2017-10-13 17:51       ` Jeremy Linton
2017-10-18 16:47   ` Lorenzo Pieralisi
2017-10-18 16:47     ` Lorenzo Pieralisi
2017-10-18 17:38     ` Jeremy Linton
2017-10-18 17:38       ` Jeremy Linton
2017-10-19  9:12       ` Lorenzo Pieralisi
2017-10-19  9:12         ` Lorenzo Pieralisi
2017-10-12 19:48 ` [PATCH v3 3/7] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-19 15:20   ` Lorenzo Pieralisi
2017-10-19 15:20     ` Lorenzo Pieralisi
2017-10-19 15:52     ` Jeremy Linton
2017-10-19 15:52       ` Jeremy Linton
2017-10-12 19:48 ` [PATCH v3 4/7] Topology: Add cluster on die macros and arm64 decoding Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-12 19:48 ` [PATCH v3 5/7] arm64: Fixup users of topology_physical_package_id Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-12 19:48 ` [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-19 15:56   ` Lorenzo Pieralisi
2017-10-19 15:56     ` Lorenzo Pieralisi
2017-10-19 16:13     ` Jeremy Linton
2017-10-19 16:13       ` Jeremy Linton
2017-10-20  9:14       ` Lorenzo Pieralisi
2017-10-20  9:14         ` Lorenzo Pieralisi
2017-10-20 16:14         ` Jeremy Linton
2017-10-20 16:14           ` Jeremy Linton
2017-10-20 16:42           ` Sudeep Holla
2017-10-20 16:42             ` Sudeep Holla
2017-10-20 19:55           ` Jeffrey Hugo
2017-10-20 19:55             ` Jeffrey Hugo
2017-10-23 21:26             ` Jeremy Linton
2017-10-23 21:26               ` Jeremy Linton
2017-10-19 16:54     ` Jeremy Linton
2017-10-19 16:54       ` Jeremy Linton
2017-10-20  9:22       ` Lorenzo Pieralisi
2017-10-20  9:22         ` Lorenzo Pieralisi
2017-11-01 20:29         ` Al Stone
2017-11-01 20:29           ` Al Stone
2017-11-02 10:48           ` Lorenzo Pieralisi
2017-11-02 10:48             ` Lorenzo Pieralisi
2017-10-12 19:48 ` [PATCH v3 7/7] ACPI: Add PPTT to injectable table list Jeremy Linton
2017-10-12 19:48   ` Jeremy Linton
2017-10-13 11:08 ` [PATCH v3 0/7] Support PPTT for ARM64 John Garry
2017-10-13 11:08   ` John Garry
2017-10-13 11:08   ` John Garry
2017-10-13 19:34   ` Jeremy Linton
2017-10-13 19:34     ` Jeremy Linton
2017-10-31 12:46 ` Jon Masters
2017-10-31 12:46   ` Jon Masters

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.