All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Support PPTT for ARM64
@ 2017-09-14 18:49 ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topologies. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
For the latter we also add an additional topology_cod_id() macro,
and a package_id for arm64. Initially the physical id will match
the cluster id, but we update users of the cluster to utilize
the new macro. When we enable PPTT for the arm64 the cluster/socket
starts to differ. Because of this we also make some dynamic decisions
about mapping thread/core/cod/socket to the thread/socket used by the
scheduler.

For example on juno:

[root@mammon-juno-rh topology]# lstopo-no-graphics
Machine (7048MB)
  Package L#0
    L2 L#0 (1024KB) + Core L#0
      L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
    L2 L#1 (2048KB) + Core L#1
      L1d L#4 (32KB) + L1i L#4 (48KB) + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"


Jeremy Linton (6):
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
    tables
  Topology: Add cluster on die macros and arm64 decoding
  arm64: Fixup users of topology_physical_package_id
  arm64: topology: Enable ACPI/PPTT based CPU topology.

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  23 +-
 arch/arm64/kernel/topology.c      |  76 +++++-
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/arm64/Kconfig        |   3 +
 drivers/acpi/pptt.c               | 508 ++++++++++++++++++++++++++++++++++++++
 drivers/base/cacheinfo.c          |  17 +-
 drivers/clk/clk-mb86s7x.c         |   2 +-
 drivers/cpufreq/arm_big_little.c  |   2 +-
 drivers/firmware/psci_checker.c   |   2 +-
 include/linux/cacheinfo.h         |  10 +-
 include/linux/topology.h          |   5 +
 13 files changed, 634 insertions(+), 20 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 0/6] Support PPTT for ARM64
@ 2017-09-14 18:49 ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topologies. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
For the latter we also add an additional topology_cod_id() macro,
and a package_id for arm64. Initially the physical id will match
the cluster id, but we update users of the cluster to utilize
the new macro. When we enable PPTT for the arm64 the cluster/socket
starts to differ. Because of this we also make some dynamic decisions
about mapping thread/core/cod/socket to the thread/socket used by the
scheduler.

For example on juno:

[root at mammon-juno-rh topology]# lstopo-no-graphics
Machine (7048MB)
  Package L#0
    L2 L#0 (1024KB) + Core L#0
      L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
    L2 L#1 (2048KB) + Core L#1
      L1d L#4 (32KB) + L1i L#4 (48KB) + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"


Jeremy Linton (6):
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
    tables
  Topology: Add cluster on die macros and arm64 decoding
  arm64: Fixup users of topology_physical_package_id
  arm64: topology: Enable ACPI/PPTT based CPU topology.

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  23 +-
 arch/arm64/kernel/topology.c      |  76 +++++-
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/arm64/Kconfig        |   3 +
 drivers/acpi/pptt.c               | 508 ++++++++++++++++++++++++++++++++++++++
 drivers/base/cacheinfo.c          |  17 +-
 drivers/clk/clk-mb86s7x.c         |   2 +-
 drivers/cpufreq/arm_big_little.c  |   2 +-
 drivers/firmware/psci_checker.c   |   2 +-
 include/linux/cacheinfo.h         |  10 +-
 include/linux/topology.h          |   5 +
 13 files changed, 634 insertions(+), 20 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/6] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Further, report peers in the topology using setup_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 507 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..a70b83bd8328
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,0 +1,507 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return NULL;
+}
+
+/*
+ * Count the total number of processor nodes that are marked as physical
+ * packages. This should equal the number of sockets in the machine.
+ */
+static int acpi_count_socket_nodes(struct acpi_table_header *table_hdr)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+	int number_of_sockets = 0;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* count processor structures with PHYSICAL_PACKAGE set */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_PHYSICAL_PACKAGE))
+			number_of_sockets++;
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return number_of_sockets;
+}
+
+
+/*
+ * Given a acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with or we run out of levels
+ * to request.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_debug("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache)
+{
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level);
+		pr_debug("found = %p\n", found_cache);
+		if (found_cache)
+			update_cache_properties(this_leaf, found_cache);
+
+		index++;
+	}
+}
+
+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
+				    unsigned int cpu, int level)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later.
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
+
+/*
+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
+ * This ID can then be used to group peers.
+ */
+int setup_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_setup_acpi_cpu(table, cpu, level);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
+
+/*
+ * Walk the PPTT, count the number of sockets we detect
+ */
+int acpi_multisocket_count(void)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval = 0;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, socket topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = acpi_count_socket_nodes(table);
+	acpi_put_table(table);
+
+	return retval;
+}
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 1/6] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Further, report peers in the topology using setup_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 507 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..a70b83bd8328
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,0 +1,507 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return NULL;
+}
+
+/*
+ * Count the total number of processor nodes that are marked as physical
+ * packages. This should equal the number of sockets in the machine.
+ */
+static int acpi_count_socket_nodes(struct acpi_table_header *table_hdr)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+	int number_of_sockets = 0;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* count processor structures with PHYSICAL_PACKAGE set */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_PHYSICAL_PACKAGE))
+			number_of_sockets++;
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return number_of_sockets;
+}
+
+
+/*
+ * Given a acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with or we run out of levels
+ * to request.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_debug("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache)
+{
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level);
+		pr_debug("found = %p\n", found_cache);
+		if (found_cache)
+			update_cache_properties(this_leaf, found_cache);
+
+		index++;
+	}
+}
+
+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
+				    unsigned int cpu, int level)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later.
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
+
+/*
+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
+ * This ID can then be used to group peers.
+ */
+int setup_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_setup_acpi_cpu(table, cpu, level);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
+
+/*
+ * Walk the PPTT, count the number of sockets we detect
+ */
+int acpi_multisocket_count(void)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval = 0;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, socket topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = acpi_count_socket_nodes(table);
+	acpi_put_table(table);
+
+	return retval;
+}
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/6] ACPI: Enable PPTT support on ARM64
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig         | 1 +
 drivers/acpi/Makefile      | 1 +
 drivers/acpi/arm64/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..68c9d1289735 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 90265ab4437a..c92a0c937551 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..74b855a669ea 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/6] ACPI: Enable PPTT support on ARM64
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig         | 1 +
 drivers/acpi/Makefile      | 1 +
 drivers/acpi/arm64/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..68c9d1289735 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 90265ab4437a..c92a0c937551 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..74b855a669ea 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/6] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using a PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 drivers/acpi/pptt.c           |  1 +
 drivers/base/cacheinfo.c      | 17 +++++++++++------
 include/linux/cacheinfo.h     | 10 ++++++++--
 4 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..2e2cf0d312ba 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index a70b83bd8328..c1f0eb741e86 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -364,6 +364,7 @@ int acpi_find_last_cache_level(unsigned int cpu)
 static void update_cache_properties(struct cacheinfo *this_leaf,
 				    struct acpi_pptt_cache *found_cache)
 {
+	this_leaf->firmware_node = found_cache;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
 		this_leaf->size = found_cache->size;
 	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af2739537..8eca279e50d1 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->firmware_node == this_leaf->firmware_node;
 }
 
 /* OF properties to query for a given cache type */
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf6a06d..0114eb9ab67b 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -36,6 +36,9 @@ enum cache_type {
  * @of_node: if devicetree is used, this represents either the cpu node in
  *	case there's no explicit cache node or the cache node itself in the
  *	device tree
+ * @firmware_node: Shared with of_node. When not using DT, this may contain
+ *	pointers to other firmware based values. Particularly ACPI/PPTT
+ *	unique values.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -64,8 +67,10 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	union {
+		struct device_node *of_node;
+		void *firmware_node;
+	};
 	bool disable_sysfs;
 	void *priv;
 };
@@ -98,6 +103,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/6] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using a PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 drivers/acpi/pptt.c           |  1 +
 drivers/base/cacheinfo.c      | 17 +++++++++++------
 include/linux/cacheinfo.h     | 10 ++++++++--
 4 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..2e2cf0d312ba 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index a70b83bd8328..c1f0eb741e86 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -364,6 +364,7 @@ int acpi_find_last_cache_level(unsigned int cpu)
 static void update_cache_properties(struct cacheinfo *this_leaf,
 				    struct acpi_pptt_cache *found_cache)
 {
+	this_leaf->firmware_node = found_cache;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
 		this_leaf->size = found_cache->size;
 	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af2739537..8eca279e50d1 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->firmware_node == this_leaf->firmware_node;
 }
 
 /* OF properties to query for a given cache type */
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf6a06d..0114eb9ab67b 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -36,6 +36,9 @@ enum cache_type {
  * @of_node: if devicetree is used, this represents either the cpu node in
  *	case there's no explicit cache node or the cache node itself in the
  *	device tree
+ * @firmware_node: Shared with of_node. When not using DT, this may contain
+ *	pointers to other firmware based values. Particularly ACPI/PPTT
+ *	unique values.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -64,8 +67,10 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	union {
+		struct device_node *of_node;
+		void *firmware_node;
+	};
 	bool disable_sysfs;
 	void *priv;
 };
@@ -98,6 +103,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

Many modern machines have cluster on die (COD) non-uniformity
as well as the traditional multi-socket architectures. Reusing
the multi-socket or NUMA on die concepts for these (as arm64 does)
breaks down when presented with actual multi-socket/COD machines.
Similar, problems are also visible on some x86 machines so it
seems appropriate to start abstracting and making these topologies
visible.

To start, a topology_cod_id() macro is added which defaults to returning
the same information as topology_physical_package_id(). Moving forward
we can start to spit out the differences.

For arm64, an additional package_id is added to the cpu_topology array.
Initially this will be equal to the cluster_id as well.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h | 4 +++-
 arch/arm64/kernel/topology.c      | 8 ++++++--
 include/linux/topology.h          | 3 +++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 8b57339823e9..bd7517960d39 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,13 +7,15 @@ struct cpu_topology {
 	int thread_id;
 	int core_id;
 	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
+#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..9147e5b6326d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
+				/* maintain DT cluster == package behavior */
+				cpu_topology[cpu].package_id = cluster_id;
 				cpu_topology[cpu].cluster_id = cluster_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
@@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			       core);
 			return -EINVAL;
 		}
-
+		cpu_topology[cpu].package_id = cluster_id;
 		cpu_topology[cpu].cluster_id = cluster_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
@@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
+	cpuid_topo->package_id = cpuid_topo->cluster_id;
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
 		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
@@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
 		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..4660749a7303 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_cod_id				/* cluster on die */
+#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Many modern machines have cluster on die (COD) non-uniformity
as well as the traditional multi-socket architectures. Reusing
the multi-socket or NUMA on die concepts for these (as arm64 does)
breaks down when presented with actual multi-socket/COD machines.
Similar, problems are also visible on some x86 machines so it
seems appropriate to start abstracting and making these topologies
visible.

To start, a topology_cod_id() macro is added which defaults to returning
the same information as topology_physical_package_id(). Moving forward
we can start to spit out the differences.

For arm64, an additional package_id is added to the cpu_topology array.
Initially this will be equal to the cluster_id as well.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h | 4 +++-
 arch/arm64/kernel/topology.c      | 8 ++++++--
 include/linux/topology.h          | 3 +++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 8b57339823e9..bd7517960d39 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,13 +7,15 @@ struct cpu_topology {
 	int thread_id;
 	int core_id;
 	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
+#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..9147e5b6326d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
+				/* maintain DT cluster == package behavior */
+				cpu_topology[cpu].package_id = cluster_id;
 				cpu_topology[cpu].cluster_id = cluster_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
@@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			       core);
 			return -EINVAL;
 		}
-
+		cpu_topology[cpu].package_id = cluster_id;
 		cpu_topology[cpu].cluster_id = cluster_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
@@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
+	cpuid_topo->package_id = cpuid_topo->cluster_id;
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
 		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
@@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
 		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..4660749a7303 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_cod_id				/* cluster on die */
+#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/6] arm64: Fixup users of topology_physical_package_id
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

There are a few arm64 specific users (cpufreq, psci, etc) which really
want the cluster rather than the topology_physical_package_id(). Lets
convert those users to topology_cod_id(). That way when we start
differentiating the socket/cluster they will continue to behave correctly.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/clk/clk-mb86s7x.c        | 2 +-
 drivers/cpufreq/arm_big_little.c | 2 +-
 drivers/firmware/psci_checker.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-mb86s7x.c b/drivers/clk/clk-mb86s7x.c
index 2a83a3ff1d09..da4b456f9afc 100644
--- a/drivers/clk/clk-mb86s7x.c
+++ b/drivers/clk/clk-mb86s7x.c
@@ -338,7 +338,7 @@ static struct clk_hw *mb86s7x_clclk_register(struct device *cpu_dev)
 		return ERR_PTR(-ENOMEM);
 
 	clc->hw.init = &init;
-	clc->cluster = topology_physical_package_id(cpu_dev->id);
+	clc->cluster = topology_cod_id(cpu_dev->id);
 
 	init.name = dev_name(cpu_dev);
 	init.ops = &clk_clc_ops;
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 17504129fd77..6ee69b3820de 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -72,7 +72,7 @@ static struct mutex cluster_lock[MAX_CLUSTERS];
 
 static inline int raw_cpu_to_cluster(int cpu)
 {
-	return topology_physical_package_id(cpu);
+	return topology_cod_id(cpu);
 }
 
 static inline int cpu_to_cluster(int cpu)
diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index 6523ce962865..a9465f5d344a 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -202,7 +202,7 @@ static int hotplug_tests(void)
 	 */
 	for (i = 0; i < nb_cluster; ++i) {
 		int cluster_id =
-			topology_physical_package_id(cpumask_any(clusters[i]));
+			topology_cod_id(cpumask_any(clusters[i]));
 		ssize_t len = cpumap_print_to_pagebuf(true, page_buf,
 						      clusters[i]);
 		/* Remove trailing newline. */
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/6] arm64: Fixup users of topology_physical_package_id
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

There are a few arm64 specific users (cpufreq, psci, etc) which really
want the cluster rather than the topology_physical_package_id(). Lets
convert those users to topology_cod_id(). That way when we start
differentiating the socket/cluster they will continue to behave correctly.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/clk/clk-mb86s7x.c        | 2 +-
 drivers/cpufreq/arm_big_little.c | 2 +-
 drivers/firmware/psci_checker.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-mb86s7x.c b/drivers/clk/clk-mb86s7x.c
index 2a83a3ff1d09..da4b456f9afc 100644
--- a/drivers/clk/clk-mb86s7x.c
+++ b/drivers/clk/clk-mb86s7x.c
@@ -338,7 +338,7 @@ static struct clk_hw *mb86s7x_clclk_register(struct device *cpu_dev)
 		return ERR_PTR(-ENOMEM);
 
 	clc->hw.init = &init;
-	clc->cluster = topology_physical_package_id(cpu_dev->id);
+	clc->cluster = topology_cod_id(cpu_dev->id);
 
 	init.name = dev_name(cpu_dev);
 	init.ops = &clk_clc_ops;
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 17504129fd77..6ee69b3820de 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -72,7 +72,7 @@ static struct mutex cluster_lock[MAX_CLUSTERS];
 
 static inline int raw_cpu_to_cluster(int cpu)
 {
-	return topology_physical_package_id(cpu);
+	return topology_cod_id(cpu);
 }
 
 static inline int cpu_to_cluster(int cpu)
diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index 6523ce962865..a9465f5d344a 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -202,7 +202,7 @@ static int hotplug_tests(void)
 	 */
 	for (i = 0; i < nb_cluster; ++i) {
 		int cluster_id =
-			topology_physical_package_id(cpumask_any(clusters[i]));
+			topology_cod_id(cpumask_any(clusters[i]));
 		ssize_t len = cpumap_print_to_pagebuf(true, page_buf,
 						      clusters[i]);
 		/* Remove trailing newline. */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by passing an arbitrary large level to setup_acpi_cpu_topology()
which terminates its search when it finds an ACPI node flagged
as the physical package. If the tree doesn't contain enough
levels to represent all of thread/core/cod/package then the package
id will be used for the missing levels.

Since arm64 machines can have 3 distinct topology levels, and the
scheduler only handles sockets/threads well today, we compromise
by collapsing into one of three diffrent configurations. These are
thread/socket, thread/cluster or cluster/socket depending on whether
the machine has threading and multisocket, threading in a single
socket, or doesn't have threading.

This code is loosely based on a combination of code from:
Xiongfeng Wang <wangxiongfeng2@huawei.com>
John Garry <john.garry@huawei.com>
Jeffrey Hugo <jhugo@codeaurora.org>

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/topology.h     |  2 ++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9147e5b6326d..8ee5cc5ba9bd 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	u64 is_threaded;
+	int is_multisocket;
+	int cpu;
+	int topology_id;
+	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
+	const int max_topo = 0xFF;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+	is_multisocket = acpi_multisocket_count();
+	if (is_multisocket < 0)
+		return is_multisocket;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = setup_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if ((is_threaded) && (is_multisocket > 1)) {
+			/* MT per core, and multiple sockets */
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		} else if (is_threaded) {
+			/* mutltiple threads, but only a single socket */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			/* no threads, clusters behave like threads */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+	return 0;
+}
+
+#else
+static int __init parse_acpi_topology(void)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return -EINVAL;
+}
+#endif
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
@@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4660749a7303..08bf736be7c1 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,8 @@
 		if (nr_cpus_node(node))
 
 int arch_update_cpu_topology(void);
+int setup_acpi_cpu_topology(unsigned int cpu, int level);
+int acpi_multisocket_count(void);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by passing an arbitrary large level to setup_acpi_cpu_topology()
which terminates its search when it finds an ACPI node flagged
as the physical package. If the tree doesn't contain enough
levels to represent all of thread/core/cod/package then the package
id will be used for the missing levels.

Since arm64 machines can have 3 distinct topology levels, and the
scheduler only handles sockets/threads well today, we compromise
by collapsing into one of three diffrent configurations. These are
thread/socket, thread/cluster or cluster/socket depending on whether
the machine has threading and multisocket, threading in a single
socket, or doesn't have threading.

This code is loosely based on a combination of code from:
Xiongfeng Wang <wangxiongfeng2@huawei.com>
John Garry <john.garry@huawei.com>
Jeffrey Hugo <jhugo@codeaurora.org>

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/topology.h     |  2 ++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9147e5b6326d..8ee5cc5ba9bd 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	u64 is_threaded;
+	int is_multisocket;
+	int cpu;
+	int topology_id;
+	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
+	const int max_topo = 0xFF;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+	is_multisocket = acpi_multisocket_count();
+	if (is_multisocket < 0)
+		return is_multisocket;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = setup_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if ((is_threaded) && (is_multisocket > 1)) {
+			/* MT per core, and multiple sockets */
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		} else if (is_threaded) {
+			/* mutltiple threads, but only a single socket */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			/* no threads, clusters behave like threads */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+	return 0;
+}
+
+#else
+static int __init parse_acpi_topology(void)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return -EINVAL;
+}
+#endif
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
@@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4660749a7303..08bf736be7c1 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,8 @@
 		if (nr_cpus_node(node))
 
 int arch_update_cpu_topology(void);
+int setup_acpi_cpu_topology(unsigned int cpu, int level);
+int acpi_multisocket_count(void);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 0/6] Support PPTT for ARM64
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topologies. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
For the latter we also add an additional topology_cod_id() macro,
and a package_id for arm64. Initially the physical id will match
the cluster id, but we update users of the cluster to utilize
the new macro. When we enable PPTT for the arm64 the cluster/socket
starts to differ. Because of this we also make some dynamic decisions
about mapping thread/core/cod/socket to the thread/socket used by the
scheduler.

For example on juno:

[root@mammon-juno-rh topology]# lstopo-no-graphics
Machine (7048MB)
  Package L#0
    L2 L#0 (1024KB) + Core L#0
      L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
    L2 L#1 (2048KB) + Core L#1
      L1d L#4 (32KB) + L1i L#4 (48KB) + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"


Jeremy Linton (6):
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
    tables
  Topology: Add cluster on die macros and arm64 decoding
  arm64: Fixup users of topology_physical_package_id
  arm64: topology: Enable ACPI/PPTT based CPU topology.

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  23 +-
 arch/arm64/kernel/topology.c      |  76 +++++-
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/arm64/Kconfig        |   3 +
 drivers/acpi/pptt.c               | 508 ++++++++++++++++++++++++++++++++++++++
 drivers/base/cacheinfo.c          |  17 +-
 drivers/clk/clk-mb86s7x.c         |   2 +-
 drivers/cpufreq/arm_big_little.c  |   2 +-
 drivers/firmware/psci_checker.c   |   2 +-
 include/linux/cacheinfo.h         |  10 +-
 include/linux/topology.h          |   5 +
 13 files changed, 634 insertions(+), 20 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 0/6] Support PPTT for ARM64
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topologies. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
For the latter we also add an additional topology_cod_id() macro,
and a package_id for arm64. Initially the physical id will match
the cluster id, but we update users of the cluster to utilize
the new macro. When we enable PPTT for the arm64 the cluster/socket
starts to differ. Because of this we also make some dynamic decisions
about mapping thread/core/cod/socket to the thread/socket used by the
scheduler.

For example on juno:

[root at mammon-juno-rh topology]# lstopo-no-graphics
Machine (7048MB)
  Package L#0
    L2 L#0 (1024KB) + Core L#0
      L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
    L2 L#1 (2048KB) + Core L#1
      L1d L#4 (32KB) + L1i L#4 (48KB) + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"


Jeremy Linton (6):
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
    tables
  Topology: Add cluster on die macros and arm64 decoding
  arm64: Fixup users of topology_physical_package_id
  arm64: topology: Enable ACPI/PPTT based CPU topology.

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  23 +-
 arch/arm64/kernel/topology.c      |  76 +++++-
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/arm64/Kconfig        |   3 +
 drivers/acpi/pptt.c               | 508 ++++++++++++++++++++++++++++++++++++++
 drivers/base/cacheinfo.c          |  17 +-
 drivers/clk/clk-mb86s7x.c         |   2 +-
 drivers/cpufreq/arm_big_little.c  |   2 +-
 drivers/firmware/psci_checker.c   |   2 +-
 include/linux/cacheinfo.h         |  10 +-
 include/linux/topology.h          |   5 +
 13 files changed, 634 insertions(+), 20 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/6] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Further, report peers in the topology using setup_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 507 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..a70b83bd8328
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,0 +1,507 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return NULL;
+}
+
+/*
+ * Count the total number of processor nodes that are marked as physical
+ * packages. This should equal the number of sockets in the machine.
+ */
+static int acpi_count_socket_nodes(struct acpi_table_header *table_hdr)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+	int number_of_sockets = 0;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* count processor structures with PHYSICAL_PACKAGE set */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_PHYSICAL_PACKAGE))
+			number_of_sockets++;
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return number_of_sockets;
+}
+
+
+/*
+ * Given a acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with or we run out of levels
+ * to request.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_debug("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache)
+{
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level);
+		pr_debug("found = %p\n", found_cache);
+		if (found_cache)
+			update_cache_properties(this_leaf, found_cache);
+
+		index++;
+	}
+}
+
+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
+				    unsigned int cpu, int level)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later.
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
+
+/*
+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
+ * This ID can then be used to group peers.
+ */
+int setup_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_setup_acpi_cpu(table, cpu, level);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
+
+/*
+ * Walk the PPTT, count the number of sockets we detect
+ */
+int acpi_multisocket_count(void)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval = 0;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, socket topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = acpi_count_socket_nodes(table);
+	acpi_put_table(table);
+
+	return retval;
+}
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 1/6] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

Further, report peers in the topology using setup_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 507 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..a70b83bd8328
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,0 +1,507 @@
+/*
+ * Copyright (C) 2017, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ */
+
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (!pptt_ref)
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr + pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr, pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = *(u32 *)((u8 *)node + sizeof(struct acpi_pptt_processor) +
+		      sizeof(u32) * resource);
+
+	return fetch_pptt_subtable(table_hdr, ref);
+}
+
+/*
+ * given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if (*found != NULL)
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Find the subtable entry describing the provided processor
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID)) {
+			pr_debug("checking phy_cpu_id %d against acpi id %d\n",
+				 acpi_cpu_id, cpu_node->acpi_processor_id);
+			if (acpi_cpu_id == cpu_node->acpi_processor_id) {
+				/* found the correct entry */
+				pr_debug("match found!\n");
+				return (struct acpi_pptt_processor *)entry;
+			}
+		}
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return NULL;
+}
+
+/*
+ * Count the total number of processor nodes that are marked as physical
+ * packages. This should equal the number of sockets in the machine.
+ */
+static int acpi_count_socket_nodes(struct acpi_table_header *table_hdr)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+	int number_of_sockets = 0;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = (struct acpi_subtable_header *)((u8 *)table_hdr +
+						sizeof(struct acpi_table_pptt));
+
+	/* count processor structures with PHYSICAL_PACKAGE set */
+	while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->flags & ACPI_PPTT_PHYSICAL_PACKAGE))
+			number_of_sockets++;
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		entry = (struct acpi_subtable_header *)
+			((u8 *)entry + entry->length);
+	}
+
+	return number_of_sockets;
+}
+
+
+/*
+ * Given a acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with or we run out of levels
+ * to request.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level && !(cpu->flags & ACPI_PPTT_PHYSICAL_PACKAGE)) {
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+static int acpi_parse_pptt(struct acpi_table_header *table_hdr, u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+#define ACPI_6_2_CACHE_TYPE_DATA		      (0x0)
+#define ACPI_6_2_CACHE_TYPE_INSTR		      (1<<2)
+#define ACPI_6_2_CACHE_TYPE_UNIFIED		      (1<<3)
+#define ACPI_6_2_CACHE_POLICY_WB		      (0x0)
+#define ACPI_6_2_CACHE_POLICY_WT		      (1<<4)
+#define ACPI_6_2_CACHE_READ_ALLOCATE		      (0x0)
+#define ACPI_6_2_CACHE_WRITE_ALLOCATE		      (0x01)
+#define ACPI_6_2_CACHE_RW_ALLOCATE		      (0x02)
+
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_6_2_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_6_2_CACHE_TYPE_INSTR;
+	default:
+		pr_debug("Unknown cache type, assume unified\n");
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		return ACPI_6_2_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (!cpu_node)
+		return NULL;
+
+	do {
+		found = acpi_find_cache_level(table_hdr, cpu_node, &total_levels, level, acpi_type);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while ((cpu_node) && (!found));
+
+	return found;
+}
+
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_parse_pptt(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache)
+{
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
+		this_leaf->size = found_cache->size;
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
+		this_leaf->coherency_line_size = found_cache->line_size;
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID)
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID)
+		this_leaf->ways_of_associativity = found_cache->associativity;
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_6_2_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_6_2_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		default:
+			pr_err("Unknown ACPI cache policy %d\n",
+			      found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY);
+		}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID)
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_6_2_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_6_2_CACHE_RW_ALLOCATE:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE|CACHE_WRITE_ALLOCATE;
+			break;
+		default:
+			pr_err("Unknown ACPI cache allocation policy %d\n",
+			   found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE);
+		}
+}
+
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level);
+		pr_debug("found = %p\n", found_cache);
+		if (found_cache)
+			update_cache_properties(this_leaf, found_cache);
+
+		index++;
+	}
+}
+
+static int topology_setup_acpi_cpu(struct acpi_table_header *table,
+				    unsigned int cpu, int level)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = acpi_cpu_get_madt_gicc(cpu)->uid;
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node, level);
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+/*
+ * simply assign a ACPI cache entry to each known CPU cache entry
+ * determining which entries are shared is done later.
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
+
+/*
+ * Determine a topology unique ID for each thread/core/cluster/socket/etc.
+ * This ID can then be used to group peers.
+ */
+int setup_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_setup_acpi_cpu(table, cpu, level);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
+
+/*
+ * Walk the PPTT, count the number of sockets we detect
+ */
+int acpi_multisocket_count(void)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval = 0;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, socket topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = acpi_count_socket_nodes(table);
+	acpi_put_table(table);
+
+	return retval;
+}
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/6] ACPI: Enable PPTT support on ARM64
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig         | 1 +
 drivers/acpi/Makefile      | 1 +
 drivers/acpi/arm64/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..68c9d1289735 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 90265ab4437a..c92a0c937551 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..74b855a669ea 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/6] ACPI: Enable PPTT support on ARM64
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig         | 1 +
 drivers/acpi/Makefile      | 1 +
 drivers/acpi/arm64/Kconfig | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..68c9d1289735 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 90265ab4437a..c92a0c937551 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..74b855a669ea 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -7,3 +7,6 @@ config ACPI_IORT
 
 config ACPI_GTDT
 	bool
+
+config ACPI_PPTT
+	bool
\ No newline at end of file
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/6] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using a PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 drivers/acpi/pptt.c           |  1 +
 drivers/base/cacheinfo.c      | 17 +++++++++++------
 include/linux/cacheinfo.h     | 10 ++++++++--
 4 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..2e2cf0d312ba 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index a70b83bd8328..c1f0eb741e86 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -364,6 +364,7 @@ int acpi_find_last_cache_level(unsigned int cpu)
 static void update_cache_properties(struct cacheinfo *this_leaf,
 				    struct acpi_pptt_cache *found_cache)
 {
+	this_leaf->firmware_node = found_cache;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
 		this_leaf->size = found_cache->size;
 	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af2739537..8eca279e50d1 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->firmware_node == this_leaf->firmware_node;
 }
 
 /* OF properties to query for a given cache type */
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf6a06d..0114eb9ab67b 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -36,6 +36,9 @@ enum cache_type {
  * @of_node: if devicetree is used, this represents either the cpu node in
  *	case there's no explicit cache node or the cache node itself in the
  *	device tree
+ * @firmware_node: Shared with of_node. When not using DT, this may contain
+ *	pointers to other firmware based values. Particularly ACPI/PPTT
+ *	unique values.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -64,8 +67,10 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	union {
+		struct device_node *of_node;
+		void *firmware_node;
+	};
 	bool disable_sysfs;
 	void *priv;
 };
@@ -98,6 +103,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/6] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using a PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 23 ++++++++++++++++++-----
 drivers/acpi/pptt.c           |  1 +
 drivers/base/cacheinfo.c      | 17 +++++++++++------
 include/linux/cacheinfo.h     | 10 ++++++++--
 4 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..2e2cf0d312ba 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -44,9 +45,17 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 	this_leaf->type = type;
 }
 
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
+
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +68,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index a70b83bd8328..c1f0eb741e86 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -364,6 +364,7 @@ int acpi_find_last_cache_level(unsigned int cpu)
 static void update_cache_properties(struct cacheinfo *this_leaf,
 				    struct acpi_pptt_cache *found_cache)
 {
+	this_leaf->firmware_node = found_cache;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID)
 		this_leaf->size = found_cache->size;
 	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index eb3af2739537..8eca279e50d1 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -86,7 +86,7 @@ static int cache_setup_of_node(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->firmware_node == this_leaf->firmware_node;
 }
 
 /* OF properties to query for a given cache type */
@@ -215,6 +215,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -225,11 +230,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -286,7 +291,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 
 static void cache_override_properties(unsigned int cpu)
 {
-	if (of_have_populated_dt())
+	if (acpi_disabled && of_have_populated_dt())
 		return cache_of_override_properties(cpu);
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6a524bf6a06d..0114eb9ab67b 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -36,6 +36,9 @@ enum cache_type {
  * @of_node: if devicetree is used, this represents either the cpu node in
  *	case there's no explicit cache node or the cache node itself in the
  *	device tree
+ * @firmware_node: Shared with of_node. When not using DT, this may contain
+ *	pointers to other firmware based values. Particularly ACPI/PPTT
+ *	unique values.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -64,8 +67,10 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	union {
+		struct device_node *of_node;
+		void *firmware_node;
+	};
 	bool disable_sysfs;
 	void *priv;
 };
@@ -98,6 +103,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

Many modern machines have cluster on die (COD) non-uniformity
as well as the traditional multi-socket architectures. Reusing
the multi-socket or NUMA on die concepts for these (as arm64 does)
breaks down when presented with actual multi-socket/COD machines.
Similar, problems are also visible on some x86 machines so it
seems appropriate to start abstracting and making these topologies
visible.

To start, a topology_cod_id() macro is added which defaults to returning
the same information as topology_physical_package_id(). Moving forward
we can start to spit out the differences.

For arm64, an additional package_id is added to the cpu_topology array.
Initially this will be equal to the cluster_id as well.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h | 4 +++-
 arch/arm64/kernel/topology.c      | 8 ++++++--
 include/linux/topology.h          | 3 +++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 8b57339823e9..bd7517960d39 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,13 +7,15 @@ struct cpu_topology {
 	int thread_id;
 	int core_id;
 	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
+#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..9147e5b6326d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
+				/* maintain DT cluster == package behavior */
+				cpu_topology[cpu].package_id = cluster_id;
 				cpu_topology[cpu].cluster_id = cluster_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
@@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			       core);
 			return -EINVAL;
 		}
-
+		cpu_topology[cpu].package_id = cluster_id;
 		cpu_topology[cpu].cluster_id = cluster_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
@@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
+	cpuid_topo->package_id = cpuid_topo->cluster_id;
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
 		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
@@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
 		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..4660749a7303 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_cod_id				/* cluster on die */
+#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Many modern machines have cluster on die (COD) non-uniformity
as well as the traditional multi-socket architectures. Reusing
the multi-socket or NUMA on die concepts for these (as arm64 does)
breaks down when presented with actual multi-socket/COD machines.
Similar, problems are also visible on some x86 machines so it
seems appropriate to start abstracting and making these topologies
visible.

To start, a topology_cod_id() macro is added which defaults to returning
the same information as topology_physical_package_id(). Moving forward
we can start to spit out the differences.

For arm64, an additional package_id is added to the cpu_topology array.
Initially this will be equal to the cluster_id as well.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h | 4 +++-
 arch/arm64/kernel/topology.c      | 8 ++++++--
 include/linux/topology.h          | 3 +++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 8b57339823e9..bd7517960d39 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,13 +7,15 @@ struct cpu_topology {
 	int thread_id;
 	int core_id;
 	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
+#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..9147e5b6326d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
+				/* maintain DT cluster == package behavior */
+				cpu_topology[cpu].package_id = cluster_id;
 				cpu_topology[cpu].cluster_id = cluster_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
@@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			       core);
 			return -EINVAL;
 		}
-
+		cpu_topology[cpu].package_id = cluster_id;
 		cpu_topology[cpu].cluster_id = cluster_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
@@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
+	cpuid_topo->package_id = cpuid_topo->cluster_id;
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
 		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
@@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
 		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..4660749a7303 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_cod_id				/* cluster on die */
+#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/6] arm64: Fixup users of topology_physical_package_id
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

There are a few arm64 specific users (cpufreq, psci, etc) which really
want the cluster rather than the topology_physical_package_id(). Lets
convert those users to topology_cod_id(). That way when we start
differentiating the socket/cluster they will continue to behave correctly.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/clk/clk-mb86s7x.c        | 2 +-
 drivers/cpufreq/arm_big_little.c | 2 +-
 drivers/firmware/psci_checker.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-mb86s7x.c b/drivers/clk/clk-mb86s7x.c
index 2a83a3ff1d09..da4b456f9afc 100644
--- a/drivers/clk/clk-mb86s7x.c
+++ b/drivers/clk/clk-mb86s7x.c
@@ -338,7 +338,7 @@ static struct clk_hw *mb86s7x_clclk_register(struct device *cpu_dev)
 		return ERR_PTR(-ENOMEM);
 
 	clc->hw.init = &init;
-	clc->cluster = topology_physical_package_id(cpu_dev->id);
+	clc->cluster = topology_cod_id(cpu_dev->id);
 
 	init.name = dev_name(cpu_dev);
 	init.ops = &clk_clc_ops;
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 17504129fd77..6ee69b3820de 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -72,7 +72,7 @@ static struct mutex cluster_lock[MAX_CLUSTERS];
 
 static inline int raw_cpu_to_cluster(int cpu)
 {
-	return topology_physical_package_id(cpu);
+	return topology_cod_id(cpu);
 }
 
 static inline int cpu_to_cluster(int cpu)
diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index 6523ce962865..a9465f5d344a 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -202,7 +202,7 @@ static int hotplug_tests(void)
 	 */
 	for (i = 0; i < nb_cluster; ++i) {
 		int cluster_id =
-			topology_physical_package_id(cpumask_any(clusters[i]));
+			topology_cod_id(cpumask_any(clusters[i]));
 		ssize_t len = cpumap_print_to_pagebuf(true, page_buf,
 						      clusters[i]);
 		/* Remove trailing newline. */
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/6] arm64: Fixup users of topology_physical_package_id
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

There are a few arm64 specific users (cpufreq, psci, etc) which really
want the cluster rather than the topology_physical_package_id(). Lets
convert those users to topology_cod_id(). That way when we start
differentiating the socket/cluster they will continue to behave correctly.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/clk/clk-mb86s7x.c        | 2 +-
 drivers/cpufreq/arm_big_little.c | 2 +-
 drivers/firmware/psci_checker.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk-mb86s7x.c b/drivers/clk/clk-mb86s7x.c
index 2a83a3ff1d09..da4b456f9afc 100644
--- a/drivers/clk/clk-mb86s7x.c
+++ b/drivers/clk/clk-mb86s7x.c
@@ -338,7 +338,7 @@ static struct clk_hw *mb86s7x_clclk_register(struct device *cpu_dev)
 		return ERR_PTR(-ENOMEM);
 
 	clc->hw.init = &init;
-	clc->cluster = topology_physical_package_id(cpu_dev->id);
+	clc->cluster = topology_cod_id(cpu_dev->id);
 
 	init.name = dev_name(cpu_dev);
 	init.ops = &clk_clc_ops;
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 17504129fd77..6ee69b3820de 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -72,7 +72,7 @@ static struct mutex cluster_lock[MAX_CLUSTERS];
 
 static inline int raw_cpu_to_cluster(int cpu)
 {
-	return topology_physical_package_id(cpu);
+	return topology_cod_id(cpu);
 }
 
 static inline int cpu_to_cluster(int cpu)
diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index 6523ce962865..a9465f5d344a 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -202,7 +202,7 @@ static int hotplug_tests(void)
 	 */
 	for (i = 0; i < nb_cluster; ++i) {
 		int cluster_id =
-			topology_physical_package_id(cpumask_any(clusters[i]));
+			topology_cod_id(cpumask_any(clusters[i]));
 		ssize_t len = cpumap_print_to_pagebuf(true, page_buf,
 						      clusters[i]);
 		/* Remove trailing newline. */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-14 18:49   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas, Jeremy Linton

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by passing an arbitrary large level to setup_acpi_cpu_topology()
which terminates its search when it finds an ACPI node flagged
as the physical package. If the tree doesn't contain enough
levels to represent all of thread/core/cod/package then the package
id will be used for the missing levels.

Since arm64 machines can have 3 distinct topology levels, and the
scheduler only handles sockets/threads well today, we compromise
by collapsing into one of three diffrent configurations. These are
thread/socket, thread/cluster or cluster/socket depending on whether
the machine has threading and multisocket, threading in a single
socket, or doesn't have threading.

This code is loosely based on a combination of code from:
Xiongfeng Wang <wangxiongfeng2@huawei.com>
John Garry <john.garry@huawei.com>
Jeffrey Hugo <jhugo@codeaurora.org>

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/topology.h     |  2 ++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9147e5b6326d..8ee5cc5ba9bd 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	u64 is_threaded;
+	int is_multisocket;
+	int cpu;
+	int topology_id;
+	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
+	const int max_topo = 0xFF;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+	is_multisocket = acpi_multisocket_count();
+	if (is_multisocket < 0)
+		return is_multisocket;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = setup_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if ((is_threaded) && (is_multisocket > 1)) {
+			/* MT per core, and multiple sockets */
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		} else if (is_threaded) {
+			/* mutltiple threads, but only a single socket */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			/* no threads, clusters behave like threads */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+	return 0;
+}
+
+#else
+static int __init parse_acpi_topology(void)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return -EINVAL;
+}
+#endif
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
@@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4660749a7303..08bf736be7c1 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,8 @@
 		if (nr_cpus_node(node))
 
 int arch_update_cpu_topology(void);
+int setup_acpi_cpu_topology(unsigned int cpu, int level);
+int acpi_multisocket_count(void);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-09-14 18:49   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-14 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by passing an arbitrary large level to setup_acpi_cpu_topology()
which terminates its search when it finds an ACPI node flagged
as the physical package. If the tree doesn't contain enough
levels to represent all of thread/core/cod/package then the package
id will be used for the missing levels.

Since arm64 machines can have 3 distinct topology levels, and the
scheduler only handles sockets/threads well today, we compromise
by collapsing into one of three diffrent configurations. These are
thread/socket, thread/cluster or cluster/socket depending on whether
the machine has threading and multisocket, threading in a single
socket, or doesn't have threading.

This code is loosely based on a combination of code from:
Xiongfeng Wang <wangxiongfeng2@huawei.com>
John Garry <john.garry@huawei.com>
Jeffrey Hugo <jhugo@codeaurora.org>

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/topology.h     |  2 ++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 9147e5b6326d..8ee5cc5ba9bd 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	u64 is_threaded;
+	int is_multisocket;
+	int cpu;
+	int topology_id;
+	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
+	const int max_topo = 0xFF;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+	is_multisocket = acpi_multisocket_count();
+	if (is_multisocket < 0)
+		return is_multisocket;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = setup_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if ((is_threaded) && (is_multisocket > 1)) {
+			/* MT per core, and multiple sockets */
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		} else if (is_threaded) {
+			/* mutltiple threads, but only a single socket */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 2);
+			cpu_topology[cpu].cluster_id = topology_id;
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			/* no threads, clusters behave like threads */
+			cpu_topology[cpu].thread_id  = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id    = topology_id;
+			cpu_topology[cpu].cluster_id = topology_id;
+			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+	return 0;
+}
+
+#else
+static int __init parse_acpi_topology(void)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return -EINVAL;
+}
+#endif
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
@@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4660749a7303..08bf736be7c1 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,8 @@
 		if (nr_cpus_node(node))
 
 int arch_update_cpu_topology(void);
+int setup_acpi_cpu_topology(unsigned int cpu, int level);
+int acpi_multisocket_count(void);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/6] Support PPTT for ARM64
  2017-09-14 18:49 ` Jeremy Linton
@ 2017-09-15 17:05   ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-15 17:05 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, wangxiongfeng2, hanjun.guo, jhugo, john.garry,
	austinwc, sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas

On 09/14/2017 01:49 PM, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topologies. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.

Hi,

So there is a problem with this patch set when cache nodes are 
referenced by cpu nodes with the intention that the resulting caches 
aren't shared, even though the PPTT cache node is shared. The code uses 
the node reference when determining if caches are shared. This means 
that it makes a mistake and thinks that all the (say L1) caches are 
shared because they share a PPTT cache node.

Its a fairly small tweak, I will re-post this set.




> 
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable PPTT for the arm64 the cluster/socket
> starts to differ. Because of this we also make some dynamic decisions
> about mapping thread/core/cod/socket to the thread/socket used by the
> scheduler.
> 
> For example on juno:
> 
> [root@mammon-juno-rh topology]# lstopo-no-graphics
> Machine (7048MB)
>    Package L#0
>      L2 L#0 (1024KB) + Core L#0
>        L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
>        L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
>        L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
>        L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
>      L2 L#1 (2048KB) + Core L#1
>        L1d L#4 (32KB) + L1i L#4 (48KB) + PU L#4 (P#4)
>        L1d L#5 (32KB) + L1i L#5 (48KB) + PU L#5 (P#5)
>    HostBridge L#0
>      PCIBridge
>        PCIBridge
>          PCIBridge
>            PCI 1095:3132
>              Block(Disk) L#0 "sda"
>          PCIBridge
>            PCI 1002:68f9
>              GPU L#1 "renderD128"
>              GPU L#2 "card0"
>              GPU L#3 "controlD64"
>          PCIBridge
>            PCI 11ab:4380
>              Net L#4 "enp8s0"
> 
> 
> Jeremy Linton (6):
>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>    ACPI: Enable PPTT support on ARM64
>    drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
>      tables
>    Topology: Add cluster on die macros and arm64 decoding
>    arm64: Fixup users of topology_physical_package_id
>    arm64: topology: Enable ACPI/PPTT based CPU topology.
> 
>   arch/arm64/Kconfig                |   1 +
>   arch/arm64/include/asm/topology.h |   4 +-
>   arch/arm64/kernel/cacheinfo.c     |  23 +-
>   arch/arm64/kernel/topology.c      |  76 +++++-
>   drivers/acpi/Makefile             |   1 +
>   drivers/acpi/arm64/Kconfig        |   3 +
>   drivers/acpi/pptt.c               | 508 ++++++++++++++++++++++++++++++++++++++
>   drivers/base/cacheinfo.c          |  17 +-
>   drivers/clk/clk-mb86s7x.c         |   2 +-
>   drivers/cpufreq/arm_big_little.c  |   2 +-
>   drivers/firmware/psci_checker.c   |   2 +-
>   include/linux/cacheinfo.h         |  10 +-
>   include/linux/topology.h          |   5 +
>   13 files changed, 634 insertions(+), 20 deletions(-)
>   create mode 100644 drivers/acpi/pptt.c
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 0/6] Support PPTT for ARM64
@ 2017-09-15 17:05   ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-15 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/14/2017 01:49 PM, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
> used to describe the processor and cache topologies. Ideally it is
> used to extend/override information provided by the hardware, but
> right now ARM64 is entirely dependent on firmware provided tables.

Hi,

So there is a problem with this patch set when cache nodes are 
referenced by cpu nodes with the intention that the resulting caches 
aren't shared, even though the PPTT cache node is shared. The code uses 
the node reference when determining if caches are shared. This means 
that it makes a mistake and thinks that all the (say L1) caches are 
shared because they share a PPTT cache node.

Its a fairly small tweak, I will re-post this set.




> 
> This patch parses the table for the cache topology and CPU topology.
> For the latter we also add an additional topology_cod_id() macro,
> and a package_id for arm64. Initially the physical id will match
> the cluster id, but we update users of the cluster to utilize
> the new macro. When we enable PPTT for the arm64 the cluster/socket
> starts to differ. Because of this we also make some dynamic decisions
> about mapping thread/core/cod/socket to the thread/socket used by the
> scheduler.
> 
> For example on juno:
> 
> [root at mammon-juno-rh topology]# lstopo-no-graphics
> Machine (7048MB)
>    Package L#0
>      L2 L#0 (1024KB) + Core L#0
>        L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
>        L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
>        L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
>        L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
>      L2 L#1 (2048KB) + Core L#1
>        L1d L#4 (32KB) + L1i L#4 (48KB) + PU L#4 (P#4)
>        L1d L#5 (32KB) + L1i L#5 (48KB) + PU L#5 (P#5)
>    HostBridge L#0
>      PCIBridge
>        PCIBridge
>          PCIBridge
>            PCI 1095:3132
>              Block(Disk) L#0 "sda"
>          PCIBridge
>            PCI 1002:68f9
>              GPU L#1 "renderD128"
>              GPU L#2 "card0"
>              GPU L#3 "controlD64"
>          PCIBridge
>            PCI 11ab:4380
>              Net L#4 "enp8s0"
> 
> 
> Jeremy Linton (6):
>    ACPI/PPTT: Add Processor Properties Topology Table parsing
>    ACPI: Enable PPTT support on ARM64
>    drivers: base: cacheinfo: arm64: Add support for ACPI based firmware
>      tables
>    Topology: Add cluster on die macros and arm64 decoding
>    arm64: Fixup users of topology_physical_package_id
>    arm64: topology: Enable ACPI/PPTT based CPU topology.
> 
>   arch/arm64/Kconfig                |   1 +
>   arch/arm64/include/asm/topology.h |   4 +-
>   arch/arm64/kernel/cacheinfo.c     |  23 +-
>   arch/arm64/kernel/topology.c      |  76 +++++-
>   drivers/acpi/Makefile             |   1 +
>   drivers/acpi/arm64/Kconfig        |   3 +
>   drivers/acpi/pptt.c               | 508 ++++++++++++++++++++++++++++++++++++++
>   drivers/base/cacheinfo.c          |  17 +-
>   drivers/clk/clk-mb86s7x.c         |   2 +-
>   drivers/cpufreq/arm_big_little.c  |   2 +-
>   drivers/firmware/psci_checker.c   |   2 +-
>   include/linux/cacheinfo.h         |  10 +-
>   include/linux/topology.h          |   5 +
>   13 files changed, 634 insertions(+), 20 deletions(-)
>   create mode 100644 drivers/acpi/pptt.c
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-09-14 18:49   ` Jeremy Linton
@ 2017-09-18  1:37     ` Xiongfeng Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-18  1:37 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, hanjun.guo, jhugo, john.garry, austinwc,
	sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas

Hi Jeremy,

On 2017/9/15 2:49, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by passing an arbitrary large level to setup_acpi_cpu_topology()
> which terminates its search when it finds an ACPI node flagged
> as the physical package. If the tree doesn't contain enough
> levels to represent all of thread/core/cod/package then the package
> id will be used for the missing levels.
> 
> Since arm64 machines can have 3 distinct topology levels, and the
> scheduler only handles sockets/threads well today, we compromise
> by collapsing into one of three diffrent configurations. These are
> thread/socket, thread/cluster or cluster/socket depending on whether
> the machine has threading and multisocket, threading in a single
> socket, or doesn't have threading.
> 
> This code is loosely based on a combination of code from:
> Xiongfeng Wang <wangxiongfeng2@huawei.com>
> John Garry <john.garry@huawei.com>
> Jeffrey Hugo <jhugo@codeaurora.org>
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/topology.h     |  2 ++
>  2 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 9147e5b6326d..8ee5cc5ba9bd 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -22,6 +23,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/topology.h>
>  #include <linux/slab.h>
> +#include <linux/smp.h>
>  #include <linux/string.h>
>  
>  #include <asm/cpu.h>
> @@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> +	u64 is_threaded;
> +	int is_multisocket;
> +	int cpu;
> +	int topology_id;
> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
> +	const int max_topo = 0xFF;
> +
> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +	is_multisocket = acpi_multisocket_count();
> +	if (is_multisocket < 0)
> +		return is_multisocket;
> +
> +	for_each_possible_cpu(cpu) {
> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
> +		if (topology_id < 0)
> +			return topology_id;
> +
> +		if ((is_threaded) && (is_multisocket > 1)) {
> +			/* MT per core, and multiple sockets */
> +			cpu_topology[cpu].thread_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id   = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> +			cpu_topology[cpu].package_id = topology_id;
> +		} else if (is_threaded) {
> +			/* mutltiple threads, but only a single socket */
> +			cpu_topology[cpu].thread_id  = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id    = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			cpu_topology[cpu].package_id = topology_id;
> +		} else {
> +			/* no threads, clusters behave like threads */
> +			cpu_topology[cpu].thread_id  = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id    = topology_id;
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> +			cpu_topology[cpu].package_id = topology_id;

I can not understand why should we consider cores in a cluster as threads. The scheduler will
be effected a lot by this. And the 'lstopo' may display wrong information.

Thanks,
Xiongfeng Wang

> +		}
> +	}
> +	return 0;
> +}
> +
> +#else
> +static int __init parse_acpi_topology(void)
> +{
> +	/*ACPI kernels should be built with PPTT support*/
> +	return -EINVAL;
> +}
> +#endif
> +
>  void __init init_cpu_topology(void)
>  {
>  	reset_cpu_topology();
> @@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
>  	 * Discard anything that was parsed if we hit an error so we
>  	 * don't use partial information.
>  	 */
> -	if (of_have_populated_dt() && parse_dt_topology())
> +	if ((!acpi_disabled) && parse_acpi_topology())
> +		reset_cpu_topology();
> +	else if (of_have_populated_dt() && parse_dt_topology())
>  		reset_cpu_topology();
>  }
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 4660749a7303..08bf736be7c1 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -43,6 +43,8 @@
>  		if (nr_cpus_node(node))
>  
>  int arch_update_cpu_topology(void);
> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
> +int acpi_multisocket_count(void);
>  
>  /* Conform to ACPI 2.0 SLIT distance definitions */
>  #define LOCAL_DISTANCE		10
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-09-18  1:37     ` Xiongfeng Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-18  1:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 2017/9/15 2:49, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by passing an arbitrary large level to setup_acpi_cpu_topology()
> which terminates its search when it finds an ACPI node flagged
> as the physical package. If the tree doesn't contain enough
> levels to represent all of thread/core/cod/package then the package
> id will be used for the missing levels.
> 
> Since arm64 machines can have 3 distinct topology levels, and the
> scheduler only handles sockets/threads well today, we compromise
> by collapsing into one of three diffrent configurations. These are
> thread/socket, thread/cluster or cluster/socket depending on whether
> the machine has threading and multisocket, threading in a single
> socket, or doesn't have threading.
> 
> This code is loosely based on a combination of code from:
> Xiongfeng Wang <wangxiongfeng2@huawei.com>
> John Garry <john.garry@huawei.com>
> Jeffrey Hugo <jhugo@codeaurora.org>
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/topology.h     |  2 ++
>  2 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 9147e5b6326d..8ee5cc5ba9bd 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -22,6 +23,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/topology.h>
>  #include <linux/slab.h>
> +#include <linux/smp.h>
>  #include <linux/string.h>
>  
>  #include <asm/cpu.h>
> @@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> +	u64 is_threaded;
> +	int is_multisocket;
> +	int cpu;
> +	int topology_id;
> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
> +	const int max_topo = 0xFF;
> +
> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +	is_multisocket = acpi_multisocket_count();
> +	if (is_multisocket < 0)
> +		return is_multisocket;
> +
> +	for_each_possible_cpu(cpu) {
> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
> +		if (topology_id < 0)
> +			return topology_id;
> +
> +		if ((is_threaded) && (is_multisocket > 1)) {
> +			/* MT per core, and multiple sockets */
> +			cpu_topology[cpu].thread_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id   = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> +			cpu_topology[cpu].package_id = topology_id;
> +		} else if (is_threaded) {
> +			/* mutltiple threads, but only a single socket */
> +			cpu_topology[cpu].thread_id  = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id    = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			cpu_topology[cpu].package_id = topology_id;
> +		} else {
> +			/* no threads, clusters behave like threads */
> +			cpu_topology[cpu].thread_id  = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id    = topology_id;
> +			cpu_topology[cpu].cluster_id = topology_id;
> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
> +			cpu_topology[cpu].package_id = topology_id;

I can not understand why should we consider cores in a cluster as threads. The scheduler will
be effected a lot by this. And the 'lstopo' may display wrong information.

Thanks,
Xiongfeng Wang

> +		}
> +	}
> +	return 0;
> +}
> +
> +#else
> +static int __init parse_acpi_topology(void)
> +{
> +	/*ACPI kernels should be built with PPTT support*/
> +	return -EINVAL;
> +}
> +#endif
> +
>  void __init init_cpu_topology(void)
>  {
>  	reset_cpu_topology();
> @@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
>  	 * Discard anything that was parsed if we hit an error so we
>  	 * don't use partial information.
>  	 */
> -	if (of_have_populated_dt() && parse_dt_topology())
> +	if ((!acpi_disabled) && parse_acpi_topology())
> +		reset_cpu_topology();
> +	else if (of_have_populated_dt() && parse_dt_topology())
>  		reset_cpu_topology();
>  }
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 4660749a7303..08bf736be7c1 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -43,6 +43,8 @@
>  		if (nr_cpus_node(node))
>  
>  int arch_update_cpu_topology(void);
> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
> +int acpi_multisocket_count(void);
>  
>  /* Conform to ACPI 2.0 SLIT distance definitions */
>  #define LOCAL_DISTANCE		10
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
  2017-09-14 18:49   ` Jeremy Linton
@ 2017-09-18  1:50     ` Xiongfeng Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-18  1:50 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: lorenzo.pieralisi, austinwc, jhugo, will.deacon, john.garry, rjw,
	hanjun.guo, sudeep.holla, catalin.marinas, linux-arm-kernel

Hi Jeremy,

On 2017/9/15 2:49, Jeremy Linton wrote:
> Many modern machines have cluster on die (COD) non-uniformity
> as well as the traditional multi-socket architectures. Reusing
> the multi-socket or NUMA on die concepts for these (as arm64 does)
> breaks down when presented with actual multi-socket/COD machines.
> Similar, problems are also visible on some x86 machines so it
> seems appropriate to start abstracting and making these topologies
> visible.
> 
> To start, a topology_cod_id() macro is added which defaults to returning
> the same information as topology_physical_package_id(). Moving forward
> we can start to spit out the differences.
> 
> For arm64, an additional package_id is added to the cpu_topology array.
> Initially this will be equal to the cluster_id as well.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/include/asm/topology.h | 4 +++-
>  arch/arm64/kernel/topology.c      | 8 ++++++--
>  include/linux/topology.h          | 3 +++
>  3 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index 8b57339823e9..bd7517960d39 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -7,13 +7,15 @@ struct cpu_topology {
>  	int thread_id;
>  	int core_id;
>  	int cluster_id;
> +	int package_id;
>  	cpumask_t thread_sibling;
>  	cpumask_t core_sibling;
>  };

'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster;
Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this
cpumask_t to display the right information.

Thanks,
Xiongfeng Wang

>  
>  extern struct cpu_topology cpu_topology[NR_CPUS];
>  
> -#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
> +#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
> +#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
>  #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
>  #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
>  #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 8d48b233e6ce..9147e5b6326d 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>  			leaf = false;
>  			cpu = get_cpu_for_node(t);
>  			if (cpu >= 0) {
> +				/* maintain DT cluster == package behavior */
> +				cpu_topology[cpu].package_id = cluster_id;
>  				cpu_topology[cpu].cluster_id = cluster_id;
>  				cpu_topology[cpu].core_id = core_id;
>  				cpu_topology[cpu].thread_id = i;
> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>  			       core);
>  			return -EINVAL;
>  		}
> -
> +		cpu_topology[cpu].package_id = cluster_id;
>  		cpu_topology[cpu].cluster_id = cluster_id;
>  		cpu_topology[cpu].core_id = core_id;
>  	} else if (leaf) {
> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
>  	for_each_possible_cpu(cpu) {
>  		cpu_topo = &cpu_topology[cpu];
>  
> -		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
> +		if (cpuid_topo->package_id != cpu_topo->package_id)
>  			continue;
>  
>  		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
>  					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
>  					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
>  	}
> +	cpuid_topo->package_id = cpuid_topo->cluster_id;
>  
>  	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
>  		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
>  		cpu_topo->thread_id = -1;
>  		cpu_topo->core_id = 0;
>  		cpu_topo->cluster_id = -1;
> +		cpu_topo->package_id = -1;
>  
>  		cpumask_clear(&cpu_topo->core_sibling);
>  		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index cb0775e1ee4b..4660749a7303 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
>  #ifndef topology_physical_package_id
>  #define topology_physical_package_id(cpu)	((void)(cpu), -1)
>  #endif
> +#ifndef topology_cod_id				/* cluster on die */
> +#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
> +#endif
>  #ifndef topology_core_id
>  #define topology_core_id(cpu)			((void)(cpu), 0)
>  #endif
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
@ 2017-09-18  1:50     ` Xiongfeng Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-18  1:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 2017/9/15 2:49, Jeremy Linton wrote:
> Many modern machines have cluster on die (COD) non-uniformity
> as well as the traditional multi-socket architectures. Reusing
> the multi-socket or NUMA on die concepts for these (as arm64 does)
> breaks down when presented with actual multi-socket/COD machines.
> Similar, problems are also visible on some x86 machines so it
> seems appropriate to start abstracting and making these topologies
> visible.
> 
> To start, a topology_cod_id() macro is added which defaults to returning
> the same information as topology_physical_package_id(). Moving forward
> we can start to spit out the differences.
> 
> For arm64, an additional package_id is added to the cpu_topology array.
> Initially this will be equal to the cluster_id as well.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/include/asm/topology.h | 4 +++-
>  arch/arm64/kernel/topology.c      | 8 ++++++--
>  include/linux/topology.h          | 3 +++
>  3 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index 8b57339823e9..bd7517960d39 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -7,13 +7,15 @@ struct cpu_topology {
>  	int thread_id;
>  	int core_id;
>  	int cluster_id;
> +	int package_id;
>  	cpumask_t thread_sibling;
>  	cpumask_t core_sibling;
>  };

'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster;
Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this
cpumask_t to display the right information.

Thanks,
Xiongfeng Wang

>  
>  extern struct cpu_topology cpu_topology[NR_CPUS];
>  
> -#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
> +#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
> +#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
>  #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
>  #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
>  #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 8d48b233e6ce..9147e5b6326d 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>  			leaf = false;
>  			cpu = get_cpu_for_node(t);
>  			if (cpu >= 0) {
> +				/* maintain DT cluster == package behavior */
> +				cpu_topology[cpu].package_id = cluster_id;
>  				cpu_topology[cpu].cluster_id = cluster_id;
>  				cpu_topology[cpu].core_id = core_id;
>  				cpu_topology[cpu].thread_id = i;
> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>  			       core);
>  			return -EINVAL;
>  		}
> -
> +		cpu_topology[cpu].package_id = cluster_id;
>  		cpu_topology[cpu].cluster_id = cluster_id;
>  		cpu_topology[cpu].core_id = core_id;
>  	} else if (leaf) {
> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
>  	for_each_possible_cpu(cpu) {
>  		cpu_topo = &cpu_topology[cpu];
>  
> -		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
> +		if (cpuid_topo->package_id != cpu_topo->package_id)
>  			continue;
>  
>  		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
>  					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
>  					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
>  	}
> +	cpuid_topo->package_id = cpuid_topo->cluster_id;
>  
>  	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
>  		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
>  		cpu_topo->thread_id = -1;
>  		cpu_topo->core_id = 0;
>  		cpu_topo->cluster_id = -1;
> +		cpu_topo->package_id = -1;
>  
>  		cpumask_clear(&cpu_topo->core_sibling);
>  		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index cb0775e1ee4b..4660749a7303 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
>  #ifndef topology_physical_package_id
>  #define topology_physical_package_id(cpu)	((void)(cpu), -1)
>  #endif
> +#ifndef topology_cod_id				/* cluster on die */
> +#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
> +#endif
>  #ifndef topology_core_id
>  #define topology_core_id(cpu)			((void)(cpu), 0)
>  #endif
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
  2017-09-18  1:50     ` Xiongfeng Wang
@ 2017-09-18 18:54       ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-18 18:54 UTC (permalink / raw)
  To: Xiongfeng Wang, linux-acpi
  Cc: lorenzo.pieralisi, austinwc, jhugo, will.deacon, john.garry, rjw,
	hanjun.guo, sudeep.holla, catalin.marinas, linux-arm-kernel

Hi,


On 09/17/2017 08:50 PM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> On 2017/9/15 2:49, Jeremy Linton wrote:
>> Many modern machines have cluster on die (COD) non-uniformity
>> as well as the traditional multi-socket architectures. Reusing
>> the multi-socket or NUMA on die concepts for these (as arm64 does)
>> breaks down when presented with actual multi-socket/COD machines.
>> Similar, problems are also visible on some x86 machines so it
>> seems appropriate to start abstracting and making these topologies
>> visible.
>>
>> To start, a topology_cod_id() macro is added which defaults to returning
>> the same information as topology_physical_package_id(). Moving forward
>> we can start to spit out the differences.
>>
>> For arm64, an additional package_id is added to the cpu_topology array.
>> Initially this will be equal to the cluster_id as well.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/include/asm/topology.h | 4 +++-
>>   arch/arm64/kernel/topology.c      | 8 ++++++--
>>   include/linux/topology.h          | 3 +++
>>   3 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
>> index 8b57339823e9..bd7517960d39 100644
>> --- a/arch/arm64/include/asm/topology.h
>> +++ b/arch/arm64/include/asm/topology.h
>> @@ -7,13 +7,15 @@ struct cpu_topology {
>>   	int thread_id;
>>   	int core_id;
>>   	int cluster_id;
>> +	int package_id;
>>   	cpumask_t thread_sibling;
>>   	cpumask_t core_sibling;
>>   };
> 
> 'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster;
> Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this
> cpumask_t to display the right information.

So, the change below modifies update_siblings_mask() to utilize the 
package_id. Per the ABI the ..cpuX/topology/physical_package_id is 
shared between the core_siblings/core_siblings_list. What 
physical_package_id means can vary per architecture, but the siblings 
list needs to be the cores with the same phyiscal_package (AFAIK, feel 
free to correct my understanding). That rule should be enforced by this 
patch set.

I suspect if your running these patches, and the lstopo output looks 
strange its because your on a machine where the thread_id has been 
assigned the cluster_id in the later patch set.


> 
> Thanks,
> Xiongfeng Wang
> 
>>   
>>   extern struct cpu_topology cpu_topology[NR_CPUS];
>>   
>> -#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
>> +#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
>> +#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
>>   #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
>>   #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
>>   #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 8d48b233e6ce..9147e5b6326d 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>   			leaf = false;
>>   			cpu = get_cpu_for_node(t);
>>   			if (cpu >= 0) {
>> +				/* maintain DT cluster == package behavior */
>> +				cpu_topology[cpu].package_id = cluster_id;
>>   				cpu_topology[cpu].cluster_id = cluster_id;
>>   				cpu_topology[cpu].core_id = core_id;
>>   				cpu_topology[cpu].thread_id = i;
>> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>   			       core);
>>   			return -EINVAL;
>>   		}
>> -
>> +		cpu_topology[cpu].package_id = cluster_id;
>>   		cpu_topology[cpu].cluster_id = cluster_id;
>>   		cpu_topology[cpu].core_id = core_id;
>>   	} else if (leaf) {
>> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
>>   	for_each_possible_cpu(cpu) {
>>   		cpu_topo = &cpu_topology[cpu];
>>   
>> -		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
>> +		if (cpuid_topo->package_id != cpu_topo->package_id)

(note here that core_siblings now reflect the package_id rather than the 
cluster_id. This only matters if cluster_id!=package_id).

>>   			continue;
>>   
>>   		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
>> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
>>   					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
>>   					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
>>   	}
>> +	cpuid_topo->package_id = cpuid_topo->cluster_id;
>>   
>>   	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
>>   		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
>> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
>>   		cpu_topo->thread_id = -1;
>>   		cpu_topo->core_id = 0;
>>   		cpu_topo->cluster_id = -1;
>> +		cpu_topo->package_id = -1;
>>   
>>   		cpumask_clear(&cpu_topo->core_sibling);
>>   		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index cb0775e1ee4b..4660749a7303 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
>>   #ifndef topology_physical_package_id
>>   #define topology_physical_package_id(cpu)	((void)(cpu), -1)
>>   #endif
>> +#ifndef topology_cod_id				/* cluster on die */
>> +#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
>> +#endif
>>   #ifndef topology_core_id
>>   #define topology_core_id(cpu)			((void)(cpu), 0)
>>   #endif
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
@ 2017-09-18 18:54       ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-18 18:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


On 09/17/2017 08:50 PM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> On 2017/9/15 2:49, Jeremy Linton wrote:
>> Many modern machines have cluster on die (COD) non-uniformity
>> as well as the traditional multi-socket architectures. Reusing
>> the multi-socket or NUMA on die concepts for these (as arm64 does)
>> breaks down when presented with actual multi-socket/COD machines.
>> Similar, problems are also visible on some x86 machines so it
>> seems appropriate to start abstracting and making these topologies
>> visible.
>>
>> To start, a topology_cod_id() macro is added which defaults to returning
>> the same information as topology_physical_package_id(). Moving forward
>> we can start to spit out the differences.
>>
>> For arm64, an additional package_id is added to the cpu_topology array.
>> Initially this will be equal to the cluster_id as well.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/include/asm/topology.h | 4 +++-
>>   arch/arm64/kernel/topology.c      | 8 ++++++--
>>   include/linux/topology.h          | 3 +++
>>   3 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
>> index 8b57339823e9..bd7517960d39 100644
>> --- a/arch/arm64/include/asm/topology.h
>> +++ b/arch/arm64/include/asm/topology.h
>> @@ -7,13 +7,15 @@ struct cpu_topology {
>>   	int thread_id;
>>   	int core_id;
>>   	int cluster_id;
>> +	int package_id;
>>   	cpumask_t thread_sibling;
>>   	cpumask_t core_sibling;
>>   };
> 
> 'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster;
> Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this
> cpumask_t to display the right information.

So, the change below modifies update_siblings_mask() to utilize the 
package_id. Per the ABI the ..cpuX/topology/physical_package_id is 
shared between the core_siblings/core_siblings_list. What 
physical_package_id means can vary per architecture, but the siblings 
list needs to be the cores with the same phyiscal_package (AFAIK, feel 
free to correct my understanding). That rule should be enforced by this 
patch set.

I suspect if your running these patches, and the lstopo output looks 
strange its because your on a machine where the thread_id has been 
assigned the cluster_id in the later patch set.


> 
> Thanks,
> Xiongfeng Wang
> 
>>   
>>   extern struct cpu_topology cpu_topology[NR_CPUS];
>>   
>> -#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
>> +#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
>> +#define topology_cod_id(cpu)		(cpu_topology[cpu].cluster_id)
>>   #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
>>   #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
>>   #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 8d48b233e6ce..9147e5b6326d 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>   			leaf = false;
>>   			cpu = get_cpu_for_node(t);
>>   			if (cpu >= 0) {
>> +				/* maintain DT cluster == package behavior */
>> +				cpu_topology[cpu].package_id = cluster_id;
>>   				cpu_topology[cpu].cluster_id = cluster_id;
>>   				cpu_topology[cpu].core_id = core_id;
>>   				cpu_topology[cpu].thread_id = i;
>> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>   			       core);
>>   			return -EINVAL;
>>   		}
>> -
>> +		cpu_topology[cpu].package_id = cluster_id;
>>   		cpu_topology[cpu].cluster_id = cluster_id;
>>   		cpu_topology[cpu].core_id = core_id;
>>   	} else if (leaf) {
>> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
>>   	for_each_possible_cpu(cpu) {
>>   		cpu_topo = &cpu_topology[cpu];
>>   
>> -		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
>> +		if (cpuid_topo->package_id != cpu_topo->package_id)

(note here that core_siblings now reflect the package_id rather than the 
cluster_id. This only matters if cluster_id!=package_id).

>>   			continue;
>>   
>>   		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
>> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
>>   					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
>>   					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
>>   	}
>> +	cpuid_topo->package_id = cpuid_topo->cluster_id;
>>   
>>   	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
>>   		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
>> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
>>   		cpu_topo->thread_id = -1;
>>   		cpu_topo->core_id = 0;
>>   		cpu_topo->cluster_id = -1;
>> +		cpu_topo->package_id = -1;
>>   
>>   		cpumask_clear(&cpu_topo->core_sibling);
>>   		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index cb0775e1ee4b..4660749a7303 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
>>   #ifndef topology_physical_package_id
>>   #define topology_physical_package_id(cpu)	((void)(cpu), -1)
>>   #endif
>> +#ifndef topology_cod_id				/* cluster on die */
>> +#define topology_cod_id(cpu)			topology_physical_package_id(cpu)
>> +#endif
>>   #ifndef topology_core_id
>>   #define topology_core_id(cpu)			((void)(cpu), 0)
>>   #endif
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-09-18  1:37     ` Xiongfeng Wang
@ 2017-09-18 19:02       ` Jeremy Linton
  -1 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-18 19:02 UTC (permalink / raw)
  To: Xiongfeng Wang, linux-acpi
  Cc: linux-arm-kernel, hanjun.guo, jhugo, john.garry, austinwc,
	sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas

On 09/17/2017 08:37 PM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> On 2017/9/15 2:49, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>> which terminates its search when it finds an ACPI node flagged
>> as the physical package. If the tree doesn't contain enough
>> levels to represent all of thread/core/cod/package then the package
>> id will be used for the missing levels.
>>
>> Since arm64 machines can have 3 distinct topology levels, and the
>> scheduler only handles sockets/threads well today, we compromise
>> by collapsing into one of three diffrent configurations. These are
>> thread/socket, thread/cluster or cluster/socket depending on whether
>> the machine has threading and multisocket, threading in a single
>> socket, or doesn't have threading.
>>
>> This code is loosely based on a combination of code from:
>> Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> John Garry <john.garry@huawei.com>
>> Jeffrey Hugo <jhugo@codeaurora.org>
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>>   include/linux/topology.h     |  2 ++
>>   2 files changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 9147e5b6326d..8ee5cc5ba9bd 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -11,6 +11,7 @@
>>    * for more details.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/arch_topology.h>
>>   #include <linux/cpu.h>
>>   #include <linux/cpumask.h>
>> @@ -22,6 +23,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/sched/topology.h>
>>   #include <linux/slab.h>
>> +#include <linux/smp.h>
>>   #include <linux/string.h>
>>   
>>   #include <asm/cpu.h>
>> @@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Propagate the topology information of the processor_topology_node tree to the
>> + * cpu_topology array.
>> + */
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	u64 is_threaded;
>> +	int is_multisocket;
>> +	int cpu;
>> +	int topology_id;
>> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
>> +	const int max_topo = 0xFF;
>> +
>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>> +	is_multisocket = acpi_multisocket_count();
>> +	if (is_multisocket < 0)
>> +		return is_multisocket;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
>> +		if (topology_id < 0)
>> +			return topology_id;
>> +
>> +		if ((is_threaded) && (is_multisocket > 1)) {
>> +			/* MT per core, and multiple sockets */
>> +			cpu_topology[cpu].thread_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id   = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else if (is_threaded) {
>> +			/* mutltiple threads, but only a single socket */
>> +			cpu_topology[cpu].thread_id  = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else {
>> +			/* no threads, clusters behave like threads */
>> +			cpu_topology[cpu].thread_id  = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>> +			cpu_topology[cpu].package_id = topology_id;
> 
> I can not understand why should we consider cores in a cluster as threads. The scheduler will
> be effected a lot by this. And the 'lstopo' may display wrong information.

My take, is that we shouldn't be discarding the cluster information 
because its extremely valuable. In many ways it seems that clustered 
cores have, at a high level, similar performance characteristics to 
threads (AKA, cores in a cluster have high performance when sharing 
data, but for problems with little sharing its more advantageous to 
first schedule those threads to differing clusters). Although, how much 
affect this has vs the MC cache priorities in the scheduler isn't 
apparent to me.

Anyway, lstopo doesn't currently know about anything beyond 
package/thread, except for the book. The question is, do we want to 
misuse the book_id to represent sockets and continue to use cluster_id 
as the physical_package_id? I don't think that is a better plan than 
what I've done here.


The bottom line, is that after having looked at the scheduler a bit, I 
suspect that thread=cluster for machines without MT doesn't' really 
matter much. So, the next version i'm just going to collapse this into 
what everyone expects socket=socket and thread=thread for ACPI users 
(which are more likely to have NUMA and multisocket at this point). The 
cluster knowledge is still somewhat visible to the scheduler via the 
cache topology.




> 
> Thanks,
> Xiongfeng Wang
> 
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +#else
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
>> +	return -EINVAL;
>> +}
>> +#endif
>> +
>>   void __init init_cpu_topology(void)
>>   {
>>   	reset_cpu_topology();
>> @@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
>>   	 * Discard anything that was parsed if we hit an error so we
>>   	 * don't use partial information.
>>   	 */
>> -	if (of_have_populated_dt() && parse_dt_topology())
>> +	if ((!acpi_disabled) && parse_acpi_topology())
>> +		reset_cpu_topology();
>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>   		reset_cpu_topology();
>>   }
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index 4660749a7303..08bf736be7c1 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -43,6 +43,8 @@
>>   		if (nr_cpus_node(node))
>>   
>>   int arch_update_cpu_topology(void);
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>> +int acpi_multisocket_count(void);
>>   
>>   /* Conform to ACPI 2.0 SLIT distance definitions */
>>   #define LOCAL_DISTANCE		10
>>
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-09-18 19:02       ` Jeremy Linton
  0 siblings, 0 replies; 42+ messages in thread
From: Jeremy Linton @ 2017-09-18 19:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/17/2017 08:37 PM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> On 2017/9/15 2:49, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>> which terminates its search when it finds an ACPI node flagged
>> as the physical package. If the tree doesn't contain enough
>> levels to represent all of thread/core/cod/package then the package
>> id will be used for the missing levels.
>>
>> Since arm64 machines can have 3 distinct topology levels, and the
>> scheduler only handles sockets/threads well today, we compromise
>> by collapsing into one of three diffrent configurations. These are
>> thread/socket, thread/cluster or cluster/socket depending on whether
>> the machine has threading and multisocket, threading in a single
>> socket, or doesn't have threading.
>>
>> This code is loosely based on a combination of code from:
>> Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> John Garry <john.garry@huawei.com>
>> Jeffrey Hugo <jhugo@codeaurora.org>
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>>   include/linux/topology.h     |  2 ++
>>   2 files changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 9147e5b6326d..8ee5cc5ba9bd 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -11,6 +11,7 @@
>>    * for more details.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/arch_topology.h>
>>   #include <linux/cpu.h>
>>   #include <linux/cpumask.h>
>> @@ -22,6 +23,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/sched/topology.h>
>>   #include <linux/slab.h>
>> +#include <linux/smp.h>
>>   #include <linux/string.h>
>>   
>>   #include <asm/cpu.h>
>> @@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Propagate the topology information of the processor_topology_node tree to the
>> + * cpu_topology array.
>> + */
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	u64 is_threaded;
>> +	int is_multisocket;
>> +	int cpu;
>> +	int topology_id;
>> +	/* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
>> +	const int max_topo = 0xFF;
>> +
>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>> +	is_multisocket = acpi_multisocket_count();
>> +	if (is_multisocket < 0)
>> +		return is_multisocket;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		topology_id = setup_acpi_cpu_topology(cpu, 0);
>> +		if (topology_id < 0)
>> +			return topology_id;
>> +
>> +		if ((is_threaded) && (is_multisocket > 1)) {
>> +			/* MT per core, and multiple sockets */
>> +			cpu_topology[cpu].thread_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id   = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else if (is_threaded) {
>> +			/* mutltiple threads, but only a single socket */
>> +			cpu_topology[cpu].thread_id  = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 2);
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else {
>> +			/* no threads, clusters behave like threads */
>> +			cpu_topology[cpu].thread_id  = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			cpu_topology[cpu].cluster_id = topology_id;
>> +			topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>> +			cpu_topology[cpu].package_id = topology_id;
> 
> I can not understand why should we consider cores in a cluster as threads. The scheduler will
> be effected a lot by this. And the 'lstopo' may display wrong information.

My take, is that we shouldn't be discarding the cluster information 
because its extremely valuable. In many ways it seems that clustered 
cores have, at a high level, similar performance characteristics to 
threads (AKA, cores in a cluster have high performance when sharing 
data, but for problems with little sharing its more advantageous to 
first schedule those threads to differing clusters). Although, how much 
affect this has vs the MC cache priorities in the scheduler isn't 
apparent to me.

Anyway, lstopo doesn't currently know about anything beyond 
package/thread, except for the book. The question is, do we want to 
misuse the book_id to represent sockets and continue to use cluster_id 
as the physical_package_id? I don't think that is a better plan than 
what I've done here.


The bottom line, is that after having looked at the scheduler a bit, I 
suspect that thread=cluster for machines without MT doesn't' really 
matter much. So, the next version i'm just going to collapse this into 
what everyone expects socket=socket and thread=thread for ACPI users 
(which are more likely to have NUMA and multisocket at this point). The 
cluster knowledge is still somewhat visible to the scheduler via the 
cache topology.




> 
> Thanks,
> Xiongfeng Wang
> 
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +#else
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
>> +	return -EINVAL;
>> +}
>> +#endif
>> +
>>   void __init init_cpu_topology(void)
>>   {
>>   	reset_cpu_topology();
>> @@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
>>   	 * Discard anything that was parsed if we hit an error so we
>>   	 * don't use partial information.
>>   	 */
>> -	if (of_have_populated_dt() && parse_dt_topology())
>> +	if ((!acpi_disabled) && parse_acpi_topology())
>> +		reset_cpu_topology();
>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>   		reset_cpu_topology();
>>   }
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index 4660749a7303..08bf736be7c1 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -43,6 +43,8 @@
>>   		if (nr_cpus_node(node))
>>   
>>   int arch_update_cpu_topology(void);
>> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>> +int acpi_multisocket_count(void);
>>   
>>   /* Conform to ACPI 2.0 SLIT distance definitions */
>>   #define LOCAL_DISTANCE		10
>>
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
  2017-09-18 18:54       ` Jeremy Linton
@ 2017-09-19  1:03         ` Xiongfeng Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-19  1:03 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: lorenzo.pieralisi, austinwc, jhugo, will.deacon, john.garry, rjw,
	hanjun.guo, sudeep.holla, catalin.marinas, linux-arm-kernel

Hi Jeremy,

On 2017/9/19 2:54, Jeremy Linton wrote:
> Hi,
> 
> 
> On 09/17/2017 08:50 PM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> On 2017/9/15 2:49, Jeremy Linton wrote:
>>> Many modern machines have cluster on die (COD) non-uniformity
>>> as well as the traditional multi-socket architectures. Reusing
>>> the multi-socket or NUMA on die concepts for these (as arm64 does)
>>> breaks down when presented with actual multi-socket/COD machines.
>>> Similar, problems are also visible on some x86 machines so it
>>> seems appropriate to start abstracting and making these topologies
>>> visible.
>>>
>>> To start, a topology_cod_id() macro is added which defaults to returning
>>> the same information as topology_physical_package_id(). Moving forward
>>> we can start to spit out the differences.
>>>
>>> For arm64, an additional package_id is added to the cpu_topology array.
>>> Initially this will be equal to the cluster_id as well.
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/arm64/include/asm/topology.h | 4 +++-
>>>   arch/arm64/kernel/topology.c      | 8 ++++++--
>>>   include/linux/topology.h          | 3 +++
>>>   3 files changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
>>> index 8b57339823e9..bd7517960d39 100644
>>> --- a/arch/arm64/include/asm/topology.h
>>> +++ b/arch/arm64/include/asm/topology.h
>>> @@ -7,13 +7,15 @@ struct cpu_topology {
>>>       int thread_id;
>>>       int core_id;
>>>       int cluster_id;
>>> +    int package_id;
>>>       cpumask_t thread_sibling;
>>>       cpumask_t core_sibling;
>>>   };
>>
>> 'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster;
>> Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this
>> cpumask_t to display the right information.
> 
> So, the change below modifies update_siblings_mask() to utilize the package_id. Per the ABI the ..cpuX/topology/physical_package_id is shared between the core_siblings/core_siblings_list. What physical_package_id means can vary per architecture, but the siblings list needs to be the cores with the same phyiscal_package (AFAIK, feel free to correct my understanding). That rule should be enforced by this patch set.
> 
> I suspect if your running these patches, and the lstopo output looks strange its because your on a machine where the thread_id has been assigned the cluster_id in the later patch set.
> 
Sorry, I didn't notice your change in 'update_siblings_masks()' before, so 'core_sibling' are represent cores in a package now.
But we may need another cpumask_t field to represent cores in a cluster, so that the scheduler can use it to build a sched_domain
only with cores in one cluster.
> 
>>
>> Thanks,
>> Xiongfeng Wang
>>
>>>     extern struct cpu_topology cpu_topology[NR_CPUS];
>>>   -#define topology_physical_package_id(cpu)    (cpu_topology[cpu].cluster_id)
>>> +#define topology_physical_package_id(cpu)    (cpu_topology[cpu].package_id)
>>> +#define topology_cod_id(cpu)        (cpu_topology[cpu].cluster_id)
>>>   #define topology_core_id(cpu)        (cpu_topology[cpu].core_id)
>>>   #define topology_core_cpumask(cpu)    (&cpu_topology[cpu].core_sibling)
>>>   #define topology_sibling_cpumask(cpu)    (&cpu_topology[cpu].thread_sibling)
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 8d48b233e6ce..9147e5b6326d 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>>               leaf = false;
>>>               cpu = get_cpu_for_node(t);
>>>               if (cpu >= 0) {
>>> +                /* maintain DT cluster == package behavior */
>>> +                cpu_topology[cpu].package_id = cluster_id;
>>>                   cpu_topology[cpu].cluster_id = cluster_id;
>>>                   cpu_topology[cpu].core_id = core_id;
>>>                   cpu_topology[cpu].thread_id = i;
>>> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>>                      core);
>>>               return -EINVAL;
>>>           }
>>> -
>>> +        cpu_topology[cpu].package_id = cluster_id;
>>>           cpu_topology[cpu].cluster_id = cluster_id;
>>>           cpu_topology[cpu].core_id = core_id;
>>>       } else if (leaf) {
>>> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
>>>       for_each_possible_cpu(cpu) {
>>>           cpu_topo = &cpu_topology[cpu];
>>>   -        if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
>>> +        if (cpuid_topo->package_id != cpu_topo->package_id)
> 
> (note here that core_siblings now reflect the package_id rather than the cluster_id. This only matters if cluster_id!=package_id).
> 
>>>               continue;
>>>             cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
>>> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
>>>                        MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
>>>                        MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
>>>       }
>>> +    cpuid_topo->package_id = cpuid_topo->cluster_id;
>>>         pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
>>>            cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
>>> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
>>>           cpu_topo->thread_id = -1;
>>>           cpu_topo->core_id = 0;
>>>           cpu_topo->cluster_id = -1;
>>> +        cpu_topo->package_id = -1;
>>>             cpumask_clear(&cpu_topo->core_sibling);
>>>           cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
>>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>>> index cb0775e1ee4b..4660749a7303 100644
>>> --- a/include/linux/topology.h
>>> +++ b/include/linux/topology.h
>>> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
>>>   #ifndef topology_physical_package_id
>>>   #define topology_physical_package_id(cpu)    ((void)(cpu), -1)
>>>   #endif
>>> +#ifndef topology_cod_id                /* cluster on die */
>>> +#define topology_cod_id(cpu)            topology_physical_package_id(cpu)
>>> +#endif
>>>   #ifndef topology_core_id
>>>   #define topology_core_id(cpu)            ((void)(cpu), 0)
>>>   #endif
>>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding
@ 2017-09-19  1:03         ` Xiongfeng Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-19  1:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 2017/9/19 2:54, Jeremy Linton wrote:
> Hi,
> 
> 
> On 09/17/2017 08:50 PM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> On 2017/9/15 2:49, Jeremy Linton wrote:
>>> Many modern machines have cluster on die (COD) non-uniformity
>>> as well as the traditional multi-socket architectures. Reusing
>>> the multi-socket or NUMA on die concepts for these (as arm64 does)
>>> breaks down when presented with actual multi-socket/COD machines.
>>> Similar, problems are also visible on some x86 machines so it
>>> seems appropriate to start abstracting and making these topologies
>>> visible.
>>>
>>> To start, a topology_cod_id() macro is added which defaults to returning
>>> the same information as topology_physical_package_id(). Moving forward
>>> we can start to spit out the differences.
>>>
>>> For arm64, an additional package_id is added to the cpu_topology array.
>>> Initially this will be equal to the cluster_id as well.
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/arm64/include/asm/topology.h | 4 +++-
>>>   arch/arm64/kernel/topology.c      | 8 ++++++--
>>>   include/linux/topology.h          | 3 +++
>>>   3 files changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
>>> index 8b57339823e9..bd7517960d39 100644
>>> --- a/arch/arm64/include/asm/topology.h
>>> +++ b/arch/arm64/include/asm/topology.h
>>> @@ -7,13 +7,15 @@ struct cpu_topology {
>>>       int thread_id;
>>>       int core_id;
>>>       int cluster_id;
>>> +    int package_id;
>>>       cpumask_t thread_sibling;
>>>       cpumask_t core_sibling;
>>>   };
>>
>> 'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster;
>> Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this
>> cpumask_t to display the right information.
> 
> So, the change below modifies update_siblings_mask() to utilize the package_id. Per the ABI the ..cpuX/topology/physical_package_id is shared between the core_siblings/core_siblings_list. What physical_package_id means can vary per architecture, but the siblings list needs to be the cores with the same phyiscal_package (AFAIK, feel free to correct my understanding). That rule should be enforced by this patch set.
> 
> I suspect if your running these patches, and the lstopo output looks strange its because your on a machine where the thread_id has been assigned the cluster_id in the later patch set.
> 
Sorry, I didn't notice your change in 'update_siblings_masks()' before, so 'core_sibling' are represent cores in a package now.
But we may need another cpumask_t field to represent cores in a cluster, so that the scheduler can use it to build a sched_domain
only with cores in one cluster.
> 
>>
>> Thanks,
>> Xiongfeng Wang
>>
>>>     extern struct cpu_topology cpu_topology[NR_CPUS];
>>>   -#define topology_physical_package_id(cpu)    (cpu_topology[cpu].cluster_id)
>>> +#define topology_physical_package_id(cpu)    (cpu_topology[cpu].package_id)
>>> +#define topology_cod_id(cpu)        (cpu_topology[cpu].cluster_id)
>>>   #define topology_core_id(cpu)        (cpu_topology[cpu].core_id)
>>>   #define topology_core_cpumask(cpu)    (&cpu_topology[cpu].core_sibling)
>>>   #define topology_sibling_cpumask(cpu)    (&cpu_topology[cpu].thread_sibling)
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 8d48b233e6ce..9147e5b6326d 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>>               leaf = false;
>>>               cpu = get_cpu_for_node(t);
>>>               if (cpu >= 0) {
>>> +                /* maintain DT cluster == package behavior */
>>> +                cpu_topology[cpu].package_id = cluster_id;
>>>                   cpu_topology[cpu].cluster_id = cluster_id;
>>>                   cpu_topology[cpu].core_id = core_id;
>>>                   cpu_topology[cpu].thread_id = i;
>>> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
>>>                      core);
>>>               return -EINVAL;
>>>           }
>>> -
>>> +        cpu_topology[cpu].package_id = cluster_id;
>>>           cpu_topology[cpu].cluster_id = cluster_id;
>>>           cpu_topology[cpu].core_id = core_id;
>>>       } else if (leaf) {
>>> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid)
>>>       for_each_possible_cpu(cpu) {
>>>           cpu_topo = &cpu_topology[cpu];
>>>   -        if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
>>> +        if (cpuid_topo->package_id != cpu_topo->package_id)
> 
> (note here that core_siblings now reflect the package_id rather than the cluster_id. This only matters if cluster_id!=package_id).
> 
>>>               continue;
>>>             cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
>>> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid)
>>>                        MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
>>>                        MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
>>>       }
>>> +    cpuid_topo->package_id = cpuid_topo->cluster_id;
>>>         pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
>>>            cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
>>> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void)
>>>           cpu_topo->thread_id = -1;
>>>           cpu_topo->core_id = 0;
>>>           cpu_topo->cluster_id = -1;
>>> +        cpu_topo->package_id = -1;
>>>             cpumask_clear(&cpu_topo->core_sibling);
>>>           cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
>>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>>> index cb0775e1ee4b..4660749a7303 100644
>>> --- a/include/linux/topology.h
>>> +++ b/include/linux/topology.h
>>> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
>>>   #ifndef topology_physical_package_id
>>>   #define topology_physical_package_id(cpu)    ((void)(cpu), -1)
>>>   #endif
>>> +#ifndef topology_cod_id                /* cluster on die */
>>> +#define topology_cod_id(cpu)            topology_physical_package_id(cpu)
>>> +#endif
>>>   #ifndef topology_core_id
>>>   #define topology_core_id(cpu)            ((void)(cpu), 0)
>>>   #endif
>>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
  2017-09-18 19:02       ` Jeremy Linton
@ 2017-09-19  1:41         ` Xiongfeng Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-19  1:41 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, hanjun.guo, jhugo, john.garry, austinwc,
	sudeep.holla, lorenzo.pieralisi, rjw, will.deacon,
	catalin.marinas

Hi Jeremy,

On 2017/9/19 3:02, Jeremy Linton wrote:
> On 09/17/2017 08:37 PM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> On 2017/9/15 2:49, Jeremy Linton wrote:
>>> Propagate the topology information from the PPTT tree to the
>>> cpu_topology array. We can get the thread id, core_id and
>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>> to those concepts. The package_id is flagged in the tree and can be
>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>> which terminates its search when it finds an ACPI node flagged
>>> as the physical package. If the tree doesn't contain enough
>>> levels to represent all of thread/core/cod/package then the package
>>> id will be used for the missing levels.
>>>
>>> Since arm64 machines can have 3 distinct topology levels, and the
>>> scheduler only handles sockets/threads well today, we compromise
>>> by collapsing into one of three diffrent configurations. These are
>>> thread/socket, thread/cluster or cluster/socket depending on whether
>>> the machine has threading and multisocket, threading in a single
>>> socket, or doesn't have threading.
>>>
>>> This code is loosely based on a combination of code from:
>>> Xiongfeng Wang <wangxiongfeng2@huawei.com>
>>> John Garry <john.garry@huawei.com>
>>> Jeffrey Hugo <jhugo@codeaurora.org>
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>>>   include/linux/topology.h     |  2 ++
>>>   2 files changed, 69 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 9147e5b6326d..8ee5cc5ba9bd 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -11,6 +11,7 @@
>>>    * for more details.
>>>    */
>>>   +#include <linux/acpi.h>
>>>   #include <linux/arch_topology.h>
>>>   #include <linux/cpu.h>
>>>   #include <linux/cpumask.h>
>>> @@ -22,6 +23,7 @@
>>>   #include <linux/sched.h>
>>>   #include <linux/sched/topology.h>
>>>   #include <linux/slab.h>
>>> +#include <linux/smp.h>
>>>   #include <linux/string.h>
>>>     #include <asm/cpu.h>
>>> @@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
>>>       }
>>>   }
>>>   +#ifdef CONFIG_ACPI
>>> +/*
>>> + * Propagate the topology information of the processor_topology_node tree to the
>>> + * cpu_topology array.
>>> + */
>>> +static int __init parse_acpi_topology(void)
>>> +{
>>> +    u64 is_threaded;
>>> +    int is_multisocket;
>>> +    int cpu;
>>> +    int topology_id;
>>> +    /* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
>>> +    const int max_topo = 0xFF;
>>> +
>>> +    is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>> +    is_multisocket = acpi_multisocket_count();
>>> +    if (is_multisocket < 0)
>>> +        return is_multisocket;
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        topology_id = setup_acpi_cpu_topology(cpu, 0);
>>> +        if (topology_id < 0)
>>> +            return topology_id;
>>> +
>>> +        if ((is_threaded) && (is_multisocket > 1)) {
>>> +            /* MT per core, and multiple sockets */
>>> +            cpu_topology[cpu].thread_id = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id   = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 2);
>>> +            cpu_topology[cpu].cluster_id = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        } else if (is_threaded) {
>>> +            /* mutltiple threads, but only a single socket */
>>> +            cpu_topology[cpu].thread_id  = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id    = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 2);
>>> +            cpu_topology[cpu].cluster_id = topology_id;
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        } else {
>>> +            /* no threads, clusters behave like threads */
>>> +            cpu_topology[cpu].thread_id  = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id    = topology_id;
>>> +            cpu_topology[cpu].cluster_id = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>
>> I can not understand why should we consider cores in a cluster as threads. The scheduler will
>> be effected a lot by this. And the 'lstopo' may display wrong information.
> 
> My take, is that we shouldn't be discarding the cluster information because its extremely valuable. In many ways it seems that clustered cores have, at a high level, 
> similar performance characteristics to threads (AKA, cores in a cluster have high performance when sharing data, but for problems with little sharing its more advantageous to 
> first schedule those threads to differing clusters). Although, how much affect this has vs the MC cache priorities in the scheduler isn't apparent to me.
The code for sched_domain building for arm64 is as below. 'cpu_smt_mask' use 'thread_sibling' in struct cpu_topology, and 'cpu_coregroup_mask' use 'core_sibling' in struct cpu_topology.
But the defconfig for ARM64 does not include 'CONFIG_SCHED_SMT'. If we add a *_sibling field in struct cpu_topology to represent cores in one cluster, and change 'cpu_coregroup_mask'
to use this field, we can build a sched_domain only with cores in a cluster.

static struct sched_domain_topology_level default_topology[] = {
#ifdef CONFIG_SCHED_SMT
        { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif
#ifdef CONFIG_SCHED_MC
        { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
#endif
        { cpu_cpu_mask, SD_INIT_NAME(DIE) },
        { NULL, },
};

> 
> Anyway, lstopo doesn't currently know about anything beyond package/thread, except for the book. The question is, do we want to misuse the book_id to represent sockets and 
> continue to use cluster_id as the physical_package_id? I don't think that is a better plan than what I've done here.
> 
Sorry I didn't know much about the book_id. For my understanding, 'lstopo' use the information from the sysfs. So I search the linux code for 'book_id' and found out that
'book_id' seems to be used in S390 architecture only.
> 
> The bottom line, is that after having looked at the scheduler a bit, I suspect that thread=cluster for machines without MT doesn't' really matter much. So, the next version 
> i'm just going to collapse this into what everyone expects socket=socket and thread=thread for ACPI users (which are more likely to have NUMA and multisocket at this point). The 
> cluster knowledge is still somewhat visible to the scheduler via the cache topology.
> 
> 
> 
> 
>>
>> Thanks,
>> Xiongfeng Wang
>>
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +#else
>>> +static int __init parse_acpi_topology(void)
>>> +{
>>> +    /*ACPI kernels should be built with PPTT support*/
>>> +    return -EINVAL;
>>> +}
>>> +#endif
>>> +
>>>   void __init init_cpu_topology(void)
>>>   {
>>>       reset_cpu_topology();
>>> @@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
>>>        * Discard anything that was parsed if we hit an error so we
>>>        * don't use partial information.
>>>        */
>>> -    if (of_have_populated_dt() && parse_dt_topology())
>>> +    if ((!acpi_disabled) && parse_acpi_topology())
>>> +        reset_cpu_topology();
>>> +    else if (of_have_populated_dt() && parse_dt_topology())
>>>           reset_cpu_topology();
>>>   }
>>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>>> index 4660749a7303..08bf736be7c1 100644
>>> --- a/include/linux/topology.h
>>> +++ b/include/linux/topology.h
>>> @@ -43,6 +43,8 @@
>>>           if (nr_cpus_node(node))
>>>     int arch_update_cpu_topology(void);
>>> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>>> +int acpi_multisocket_count(void);
>>>     /* Conform to ACPI 2.0 SLIT distance definitions */
>>>   #define LOCAL_DISTANCE        10
>>>
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology.
@ 2017-09-19  1:41         ` Xiongfeng Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Xiongfeng Wang @ 2017-09-19  1:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On 2017/9/19 3:02, Jeremy Linton wrote:
> On 09/17/2017 08:37 PM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> On 2017/9/15 2:49, Jeremy Linton wrote:
>>> Propagate the topology information from the PPTT tree to the
>>> cpu_topology array. We can get the thread id, core_id and
>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>> to those concepts. The package_id is flagged in the tree and can be
>>> found by passing an arbitrary large level to setup_acpi_cpu_topology()
>>> which terminates its search when it finds an ACPI node flagged
>>> as the physical package. If the tree doesn't contain enough
>>> levels to represent all of thread/core/cod/package then the package
>>> id will be used for the missing levels.
>>>
>>> Since arm64 machines can have 3 distinct topology levels, and the
>>> scheduler only handles sockets/threads well today, we compromise
>>> by collapsing into one of three diffrent configurations. These are
>>> thread/socket, thread/cluster or cluster/socket depending on whether
>>> the machine has threading and multisocket, threading in a single
>>> socket, or doesn't have threading.
>>>
>>> This code is loosely based on a combination of code from:
>>> Xiongfeng Wang <wangxiongfeng2@huawei.com>
>>> John Garry <john.garry@huawei.com>
>>> Jeffrey Hugo <jhugo@codeaurora.org>
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/arm64/kernel/topology.c | 68 +++++++++++++++++++++++++++++++++++++++++++-
>>>   include/linux/topology.h     |  2 ++
>>>   2 files changed, 69 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 9147e5b6326d..8ee5cc5ba9bd 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -11,6 +11,7 @@
>>>    * for more details.
>>>    */
>>>   +#include <linux/acpi.h>
>>>   #include <linux/arch_topology.h>
>>>   #include <linux/cpu.h>
>>>   #include <linux/cpumask.h>
>>> @@ -22,6 +23,7 @@
>>>   #include <linux/sched.h>
>>>   #include <linux/sched/topology.h>
>>>   #include <linux/slab.h>
>>> +#include <linux/smp.h>
>>>   #include <linux/string.h>
>>>     #include <asm/cpu.h>
>>> @@ -304,6 +306,68 @@ static void __init reset_cpu_topology(void)
>>>       }
>>>   }
>>>   +#ifdef CONFIG_ACPI
>>> +/*
>>> + * Propagate the topology information of the processor_topology_node tree to the
>>> + * cpu_topology array.
>>> + */
>>> +static int __init parse_acpi_topology(void)
>>> +{
>>> +    u64 is_threaded;
>>> +    int is_multisocket;
>>> +    int cpu;
>>> +    int topology_id;
>>> +    /* set a large depth, to hit ACPI_PPTT_PHYSICAL_PACKAGE if one exists */
>>> +    const int max_topo = 0xFF;
>>> +
>>> +    is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>> +    is_multisocket = acpi_multisocket_count();
>>> +    if (is_multisocket < 0)
>>> +        return is_multisocket;
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        topology_id = setup_acpi_cpu_topology(cpu, 0);
>>> +        if (topology_id < 0)
>>> +            return topology_id;
>>> +
>>> +        if ((is_threaded) && (is_multisocket > 1)) {
>>> +            /* MT per core, and multiple sockets */
>>> +            cpu_topology[cpu].thread_id = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id   = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 2);
>>> +            cpu_topology[cpu].cluster_id = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        } else if (is_threaded) {
>>> +            /* mutltiple threads, but only a single socket */
>>> +            cpu_topology[cpu].thread_id  = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id    = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 2);
>>> +            cpu_topology[cpu].cluster_id = topology_id;
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        } else {
>>> +            /* no threads, clusters behave like threads */
>>> +            cpu_topology[cpu].thread_id  = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id    = topology_id;
>>> +            cpu_topology[cpu].cluster_id = topology_id;
>>> +            topology_id = setup_acpi_cpu_topology(cpu, max_topo);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>
>> I can not understand why should we consider cores in a cluster as threads. The scheduler will
>> be effected a lot by this. And the 'lstopo' may display wrong information.
> 
> My take, is that we shouldn't be discarding the cluster information because its extremely valuable. In many ways it seems that clustered cores have, at a high level, 
> similar performance characteristics to threads (AKA, cores in a cluster have high performance when sharing data, but for problems with little sharing its more advantageous to 
> first schedule those threads to differing clusters). Although, how much affect this has vs the MC cache priorities in the scheduler isn't apparent to me.
The code for sched_domain building for arm64 is as below. 'cpu_smt_mask' use 'thread_sibling' in struct cpu_topology, and 'cpu_coregroup_mask' use 'core_sibling' in struct cpu_topology.
But the defconfig for ARM64 does not include 'CONFIG_SCHED_SMT'. If we add a *_sibling field in struct cpu_topology to represent cores in one cluster, and change 'cpu_coregroup_mask'
to use this field, we can build a sched_domain only with cores in a cluster.

static struct sched_domain_topology_level default_topology[] = {
#ifdef CONFIG_SCHED_SMT
        { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif
#ifdef CONFIG_SCHED_MC
        { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
#endif
        { cpu_cpu_mask, SD_INIT_NAME(DIE) },
        { NULL, },
};

> 
> Anyway, lstopo doesn't currently know about anything beyond package/thread, except for the book. The question is, do we want to misuse the book_id to represent sockets and 
> continue to use cluster_id as the physical_package_id? I don't think that is a better plan than what I've done here.
> 
Sorry I didn't know much about the book_id. For my understanding, 'lstopo' use the information from the sysfs. So I search the linux code for 'book_id' and found out that
'book_id' seems to be used in S390 architecture only.
> 
> The bottom line, is that after having looked at the scheduler a bit, I suspect that thread=cluster for machines without MT doesn't' really matter much. So, the next version 
> i'm just going to collapse this into what everyone expects socket=socket and thread=thread for ACPI users (which are more likely to have NUMA and multisocket at this point). The 
> cluster knowledge is still somewhat visible to the scheduler via the cache topology.
> 
> 
> 
> 
>>
>> Thanks,
>> Xiongfeng Wang
>>
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +#else
>>> +static int __init parse_acpi_topology(void)
>>> +{
>>> +    /*ACPI kernels should be built with PPTT support*/
>>> +    return -EINVAL;
>>> +}
>>> +#endif
>>> +
>>>   void __init init_cpu_topology(void)
>>>   {
>>>       reset_cpu_topology();
>>> @@ -312,6 +376,8 @@ void __init init_cpu_topology(void)
>>>        * Discard anything that was parsed if we hit an error so we
>>>        * don't use partial information.
>>>        */
>>> -    if (of_have_populated_dt() && parse_dt_topology())
>>> +    if ((!acpi_disabled) && parse_acpi_topology())
>>> +        reset_cpu_topology();
>>> +    else if (of_have_populated_dt() && parse_dt_topology())
>>>           reset_cpu_topology();
>>>   }
>>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>>> index 4660749a7303..08bf736be7c1 100644
>>> --- a/include/linux/topology.h
>>> +++ b/include/linux/topology.h
>>> @@ -43,6 +43,8 @@
>>>           if (nr_cpus_node(node))
>>>     int arch_update_cpu_topology(void);
>>> +int setup_acpi_cpu_topology(unsigned int cpu, int level);
>>> +int acpi_multisocket_count(void);
>>>     /* Conform to ACPI 2.0 SLIT distance definitions */
>>>   #define LOCAL_DISTANCE        10
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-09-19  1:42 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-14 18:49 [PATCH 0/6] Support PPTT for ARM64 Jeremy Linton
2017-09-14 18:49 ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 1/6] ACPI/PPTT: Add Processor Properties Topology Table parsing Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 2/6] ACPI: Enable PPTT support on ARM64 Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 3/6] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-18  1:50   ` Xiongfeng Wang
2017-09-18  1:50     ` Xiongfeng Wang
2017-09-18 18:54     ` Jeremy Linton
2017-09-18 18:54       ` Jeremy Linton
2017-09-19  1:03       ` Xiongfeng Wang
2017-09-19  1:03         ` Xiongfeng Wang
2017-09-14 18:49 ` [PATCH 5/6] arm64: Fixup users of topology_physical_package_id Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-18  1:37   ` Xiongfeng Wang
2017-09-18  1:37     ` Xiongfeng Wang
2017-09-18 19:02     ` Jeremy Linton
2017-09-18 19:02       ` Jeremy Linton
2017-09-19  1:41       ` Xiongfeng Wang
2017-09-19  1:41         ` Xiongfeng Wang
2017-09-14 18:49 ` [PATCH 0/6] Support PPTT for ARM64 Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 1/6] ACPI/PPTT: Add Processor Properties Topology Table parsing Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 2/6] ACPI: Enable PPTT support on ARM64 Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 3/6] drivers: base: cacheinfo: arm64: Add support for ACPI based firmware tables Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 4/6] Topology: Add cluster on die macros and arm64 decoding Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 5/6] arm64: Fixup users of topology_physical_package_id Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-14 18:49 ` [PATCH 6/6] arm64: topology: Enable ACPI/PPTT based CPU topology Jeremy Linton
2017-09-14 18:49   ` Jeremy Linton
2017-09-15 17:05 ` [PATCH 0/6] Support PPTT for ARM64 Jeremy Linton
2017-09-15 17:05   ` Jeremy Linton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.