All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/12]  Support PPTT for ARM64
@ 2018-01-13  0:59 ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

This patch set is dependent on "[14/15] ACPICA: ACPI 6.2: Additional
PPTT flags" https://patchwork.kernel.org/patch/10064191/

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topology. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
When we enable ACPI/PPTT for arm64 we map the physical_id to the
PPTT node flagged as the physical package by the firmware.
This results in topologies that match what the remainder of the
system expects.

For example on juno:
[root@mammon-juno-rh topology]# lstopo-no-graphics
  Package L#0
    L2 L#0 (1024KB)
      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
    L2 L#1 (2048KB)
      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"


Git tree at:
http://linux-arm.org/git?p=linux-jlinton.git
branch: pptt_v6

v5->v6:
Add additional patches which re-factor how the initial DT code sets
  up the cacheinfo structure so that its not as dependent on the
  of_node stored in that tree. Once that is done we rename it
  for use with the ACPI code.

Additionally there were a fair number of minor name/location/etc
  tweaks scattered about made in response to review comments.

v4->v5:
Update the cache type from NOCACHE to UNIFIED when all the cache
  attributes we update are valid. This fixes a problem where caches
  which are entirely created by the PPTT don't show up in lstopo.

Give the PPTT its own firmware_node in the cache structure instead of
  sharing it with the of_node.

Move some pieces around between patches.

v3->v4:
Suppress the "Found duplicate cache level/type..." message if the
  duplicate cache entry is actually a duplicate node. This allows cases
  like L1I and L1D nodes that point at the same L2 node to be accepted
  without the warning.
  
Remove cluster/physical split code. Add a patch to rename cluster_id
  so that its clear the topology may not be referring to a cluster.
  
Add additional ACPICA patch for the PPTT cache properties. This matches
  an outstanding ACPICA pull that should be merged in the near future.
  
Replace a number of (struct*)((u8*)ptr+offset) constructs with ACPI_ADD_PTR

Split out the topology parsing into an additional patch.

Tweak the cpu topology code to terminate on either a level, or a flag.
  Add an additional function which retrives the physical package id
  for a given cpu.

Various other comments/tweaks.

v2->v3:

Remove valid bit check on leaf nodes. Now simply being a leaf node
  is sufficient to verify the processor id against the ACPI
  processor ids (gotten from MADT). 

Use the acpi processor for the "level 0" Id. This makes the /sys
  visible core/thread ids more human readable if the firmware uses
  small consecutive values for processor ids.

Added PPTT to the list of injectable ACPI tables.

Fix bug which kept the code from using the processor node as intended
  in v2, caused by misuse of git rebase/fixup.

v1->v2:

The parser keys off the acpi_pptt_processor node to determine
  unique cache's rather than the acpi_pptt_cache referenced by the
  processor node. This allows PPTT tables which "share" cache nodes
  across cpu nodes despite not being a shared cache.

Jeremy Linton (12):
  drivers: base: cacheinfo: move cache_setup_of_node()
  drivers: base: cacheinfo: setup DT cache properties early
  cacheinfo: rename of_node to fw_unique
  arm64/acpi: Create arch specific cpu to acpi id helper
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base cacheinfo: Add support for ACPI based firmware tables
  arm64: Add support for ACPI based firmware tables
  ACPI/PPTT: Add topology parsing code
  arm64: topology: rename cluster_id
  arm64: topology: enable ACPI/PPTT based CPU topology
  ACPI: Add PPTT to injectable table list

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/acpi.h     |   4 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  15 +-
 arch/arm64/kernel/topology.c      |  73 ++++-
 arch/riscv/kernel/cacheinfo.c     |   3 +-
 drivers/acpi/Kconfig              |   3 +
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/pptt.c               | 592 ++++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c             |   3 +-
 drivers/base/cacheinfo.c          | 159 +++++-----
 include/linux/acpi.h              |   3 +
 include/linux/cacheinfo.h         |  18 +-
 13 files changed, 772 insertions(+), 107 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 00/12]  Support PPTT for ARM64
@ 2018-01-13  0:59 ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

This patch set is dependent on "[14/15] ACPICA: ACPI 6.2: Additional
PPTT flags" https://patchwork.kernel.org/patch/10064191/

ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topology. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology and CPU topology.
When we enable ACPI/PPTT for arm64 we map the physical_id to the
PPTT node flagged as the physical package by the firmware.
This results in topologies that match what the remainder of the
system expects.

For example on juno:
[root at mammon-juno-rh topology]# lstopo-no-graphics
  Package L#0
    L2 L#0 (1024KB)
      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
    L2 L#1 (2048KB)
      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 1002:68f9
            GPU L#1 "renderD128"
            GPU L#2 "card0"
            GPU L#3 "controlD64"
        PCIBridge
          PCI 11ab:4380
            Net L#4 "enp8s0"


Git tree at:
http://linux-arm.org/git?p=linux-jlinton.git
branch: pptt_v6

v5->v6:
Add additional patches which re-factor how the initial DT code sets
  up the cacheinfo structure so that its not as dependent on the
  of_node stored in that tree. Once that is done we rename it
  for use with the ACPI code.

Additionally there were a fair number of minor name/location/etc
  tweaks scattered about made in response to review comments.

v4->v5:
Update the cache type from NOCACHE to UNIFIED when all the cache
  attributes we update are valid. This fixes a problem where caches
  which are entirely created by the PPTT don't show up in lstopo.

Give the PPTT its own firmware_node in the cache structure instead of
  sharing it with the of_node.

Move some pieces around between patches.

v3->v4:
Suppress the "Found duplicate cache level/type..." message if the
  duplicate cache entry is actually a duplicate node. This allows cases
  like L1I and L1D nodes that point at the same L2 node to be accepted
  without the warning.
  
Remove cluster/physical split code. Add a patch to rename cluster_id
  so that its clear the topology may not be referring to a cluster.
  
Add additional ACPICA patch for the PPTT cache properties. This matches
  an outstanding ACPICA pull that should be merged in the near future.
  
Replace a number of (struct*)((u8*)ptr+offset) constructs with ACPI_ADD_PTR

Split out the topology parsing into an additional patch.

Tweak the cpu topology code to terminate on either a level, or a flag.
  Add an additional function which retrives the physical package id
  for a given cpu.

Various other comments/tweaks.

v2->v3:

Remove valid bit check on leaf nodes. Now simply being a leaf node
  is sufficient to verify the processor id against the ACPI
  processor ids (gotten from MADT). 

Use the acpi processor for the "level 0" Id. This makes the /sys
  visible core/thread ids more human readable if the firmware uses
  small consecutive values for processor ids.

Added PPTT to the list of injectable ACPI tables.

Fix bug which kept the code from using the processor node as intended
  in v2, caused by misuse of git rebase/fixup.

v1->v2:

The parser keys off the acpi_pptt_processor node to determine
  unique cache's rather than the acpi_pptt_cache referenced by the
  processor node. This allows PPTT tables which "share" cache nodes
  across cpu nodes despite not being a shared cache.

Jeremy Linton (12):
  drivers: base: cacheinfo: move cache_setup_of_node()
  drivers: base: cacheinfo: setup DT cache properties early
  cacheinfo: rename of_node to fw_unique
  arm64/acpi: Create arch specific cpu to acpi id helper
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64
  drivers: base cacheinfo: Add support for ACPI based firmware tables
  arm64: Add support for ACPI based firmware tables
  ACPI/PPTT: Add topology parsing code
  arm64: topology: rename cluster_id
  arm64: topology: enable ACPI/PPTT based CPU topology
  ACPI: Add PPTT to injectable table list

 arch/arm64/Kconfig                |   1 +
 arch/arm64/include/asm/acpi.h     |   4 +
 arch/arm64/include/asm/topology.h |   4 +-
 arch/arm64/kernel/cacheinfo.c     |  15 +-
 arch/arm64/kernel/topology.c      |  73 ++++-
 arch/riscv/kernel/cacheinfo.c     |   3 +-
 drivers/acpi/Kconfig              |   3 +
 drivers/acpi/Makefile             |   1 +
 drivers/acpi/pptt.c               | 592 ++++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c             |   3 +-
 drivers/base/cacheinfo.c          | 159 +++++-----
 include/linux/acpi.h              |   3 +
 include/linux/cacheinfo.h         |  18 +-
 13 files changed, 772 insertions(+), 107 deletions(-)
 create mode 100644 drivers/acpi/pptt.c

-- 
2.13.5

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 01/12] drivers: base: cacheinfo: move cache_setup_of_node()
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

In preparation for the next patch, and to aid in
review of that patch, lets move cache_setup_of_node
farther down in the module without any changes.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/base/cacheinfo.c | 80 ++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 07532d83be0b..a883a213fcd5 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -43,46 +43,6 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 }
 
 #ifdef CONFIG_OF
-static int cache_setup_of_node(unsigned int cpu)
-{
-	struct device_node *np;
-	struct cacheinfo *this_leaf;
-	struct device *cpu_dev = get_cpu_device(cpu);
-	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-	unsigned int index = 0;
-
-	/* skip if of_node is already populated */
-	if (this_cpu_ci->info_list->of_node)
-		return 0;
-
-	if (!cpu_dev) {
-		pr_err("No cpu device for CPU %d\n", cpu);
-		return -ENODEV;
-	}
-	np = cpu_dev->of_node;
-	if (!np) {
-		pr_err("Failed to find cpu%d device node\n", cpu);
-		return -ENOENT;
-	}
-
-	while (index < cache_leaves(cpu)) {
-		this_leaf = this_cpu_ci->info_list + index;
-		if (this_leaf->level != 1)
-			np = of_find_next_cache_node(np);
-		else
-			np = of_node_get(np);/* cpu node itself */
-		if (!np)
-			break;
-		this_leaf->of_node = np;
-		index++;
-	}
-
-	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
-		return -ENOENT;
-
-	return 0;
-}
-
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
@@ -213,6 +173,46 @@ static void cache_of_override_properties(unsigned int cpu)
 		cache_associativity(this_leaf);
 	}
 }
+
+static int cache_setup_of_node(unsigned int cpu)
+{
+	struct device_node *np;
+	struct cacheinfo *this_leaf;
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	unsigned int index = 0;
+
+	/* skip if of_node is already populated */
+	if (this_cpu_ci->info_list->of_node)
+		return 0;
+
+	if (!cpu_dev) {
+		pr_err("No cpu device for CPU %d\n", cpu);
+		return -ENODEV;
+	}
+	np = cpu_dev->of_node;
+	if (!np) {
+		pr_err("Failed to find cpu%d device node\n", cpu);
+		return -ENOENT;
+	}
+
+	while (index < cache_leaves(cpu)) {
+		this_leaf = this_cpu_ci->info_list + index;
+		if (this_leaf->level != 1)
+			np = of_find_next_cache_node(np);
+		else
+			np = of_node_get(np);/* cpu node itself */
+		if (!np)
+			break;
+		this_leaf->of_node = np;
+		index++;
+	}
+
+	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
+		return -ENOENT;
+
+	return 0;
+}
 #else
 static void cache_of_override_properties(unsigned int cpu) { }
 static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 01/12] drivers: base: cacheinfo: move cache_setup_of_node()
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for the next patch, and to aid in
review of that patch, lets move cache_setup_of_node
farther down in the module without any changes.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/base/cacheinfo.c | 80 ++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 07532d83be0b..a883a213fcd5 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -43,46 +43,6 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 }
 
 #ifdef CONFIG_OF
-static int cache_setup_of_node(unsigned int cpu)
-{
-	struct device_node *np;
-	struct cacheinfo *this_leaf;
-	struct device *cpu_dev = get_cpu_device(cpu);
-	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-	unsigned int index = 0;
-
-	/* skip if of_node is already populated */
-	if (this_cpu_ci->info_list->of_node)
-		return 0;
-
-	if (!cpu_dev) {
-		pr_err("No cpu device for CPU %d\n", cpu);
-		return -ENODEV;
-	}
-	np = cpu_dev->of_node;
-	if (!np) {
-		pr_err("Failed to find cpu%d device node\n", cpu);
-		return -ENOENT;
-	}
-
-	while (index < cache_leaves(cpu)) {
-		this_leaf = this_cpu_ci->info_list + index;
-		if (this_leaf->level != 1)
-			np = of_find_next_cache_node(np);
-		else
-			np = of_node_get(np);/* cpu node itself */
-		if (!np)
-			break;
-		this_leaf->of_node = np;
-		index++;
-	}
-
-	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
-		return -ENOENT;
-
-	return 0;
-}
-
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
@@ -213,6 +173,46 @@ static void cache_of_override_properties(unsigned int cpu)
 		cache_associativity(this_leaf);
 	}
 }
+
+static int cache_setup_of_node(unsigned int cpu)
+{
+	struct device_node *np;
+	struct cacheinfo *this_leaf;
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	unsigned int index = 0;
+
+	/* skip if of_node is already populated */
+	if (this_cpu_ci->info_list->of_node)
+		return 0;
+
+	if (!cpu_dev) {
+		pr_err("No cpu device for CPU %d\n", cpu);
+		return -ENODEV;
+	}
+	np = cpu_dev->of_node;
+	if (!np) {
+		pr_err("Failed to find cpu%d device node\n", cpu);
+		return -ENOENT;
+	}
+
+	while (index < cache_leaves(cpu)) {
+		this_leaf = this_cpu_ci->info_list + index;
+		if (this_leaf->level != 1)
+			np = of_find_next_cache_node(np);
+		else
+			np = of_node_get(np);/* cpu node itself */
+		if (!np)
+			break;
+		this_leaf->of_node = np;
+		index++;
+	}
+
+	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
+		return -ENOENT;
+
+	return 0;
+}
 #else
 static void cache_of_override_properties(unsigned int cpu) { }
 static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-13  0:59 ` Jeremy Linton
  (?)
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, austinwc, catalin.marinas,
	Palmer Dabbelt, will.deacon, morten.rasmussen, vkilari,
	Jayachandran.Nair, lorenzo.pieralisi, jhugo, wangxiongfeng2,
	viresh.kumar, lenb, linux-pm, ahs3, linux-arm-kernel, gregkh,
	rjw, linux-kernel, Jeremy Linton, hanjun.guo, Albert Ou,
	sudeep.holla

The original intent in cacheinfo was that an architecture
specific populate_cache_leaves() would probe the hardware
and then cache_shared_cpu_map_setup() and
cache_override_properties() would provide firmware help to
extend/expand upon what was probed. Arm64 was really
the only architecture that was working this way, and
with the removal of most of the hardware probing logic it
became clear that it was possible to simplify the logic a bit.

This patch combines the walk of the DT nodes with the
code updating the cache size/line_size and nr_sets.
cache_override_properties() (which was DT specific) is
then removed. The result is that cacheinfo.of_node is
no longer used as a temporary place to hold DT references
for future calls that update cache properties. That change
helps to clarify its one remaining use (matching
cacheinfo nodes that represent shared caches) which
will be used by the ACPI/PPTT code in the following patches.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/riscv/kernel/cacheinfo.c |  1 +
 drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
 include/linux/cacheinfo.h     |  1 +
 3 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 10ed2749e246..6f4500233cf8 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 		CACHE_WRITE_BACK
 		| CACHE_READ_ALLOCATE
 		| CACHE_WRITE_ALLOCATE;
+	cache_of_set_props(this_leaf, node);
 }
 
 static int __init_cache_level(unsigned int cpu)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index a883a213fcd5..fc0d42bbd9eb 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -43,6 +43,7 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 }
 
 #ifdef CONFIG_OF
+
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
@@ -82,7 +83,7 @@ static inline int get_cacheinfo_idx(enum cache_type type)
 	return type;
 }
 
-static void cache_size(struct cacheinfo *this_leaf)
+static void cache_size(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	const char *propname;
 	const __be32 *cache_size;
@@ -91,13 +92,14 @@ static void cache_size(struct cacheinfo *this_leaf)
 	ct_idx = get_cacheinfo_idx(this_leaf->type);
 	propname = cache_type_info[ct_idx].size_prop;
 
-	cache_size = of_get_property(this_leaf->of_node, propname, NULL);
+	cache_size = of_get_property(np, propname, NULL);
 	if (cache_size)
 		this_leaf->size = of_read_number(cache_size, 1);
 }
 
 /* not cache_line_size() because that's a macro in include/linux/cache.h */
-static void cache_get_line_size(struct cacheinfo *this_leaf)
+static void cache_get_line_size(struct cacheinfo *this_leaf,
+				struct device_node *np)
 {
 	const __be32 *line_size;
 	int i, lim, ct_idx;
@@ -109,7 +111,7 @@ static void cache_get_line_size(struct cacheinfo *this_leaf)
 		const char *propname;
 
 		propname = cache_type_info[ct_idx].line_size_props[i];
-		line_size = of_get_property(this_leaf->of_node, propname, NULL);
+		line_size = of_get_property(np, propname, NULL);
 		if (line_size)
 			break;
 	}
@@ -118,7 +120,7 @@ static void cache_get_line_size(struct cacheinfo *this_leaf)
 		this_leaf->coherency_line_size = of_read_number(line_size, 1);
 }
 
-static void cache_nr_sets(struct cacheinfo *this_leaf)
+static void cache_nr_sets(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	const char *propname;
 	const __be32 *nr_sets;
@@ -127,7 +129,7 @@ static void cache_nr_sets(struct cacheinfo *this_leaf)
 	ct_idx = get_cacheinfo_idx(this_leaf->type);
 	propname = cache_type_info[ct_idx].nr_sets_prop;
 
-	nr_sets = of_get_property(this_leaf->of_node, propname, NULL);
+	nr_sets = of_get_property(np, propname, NULL);
 	if (nr_sets)
 		this_leaf->number_of_sets = of_read_number(nr_sets, 1);
 }
@@ -146,32 +148,26 @@ static void cache_associativity(struct cacheinfo *this_leaf)
 		this_leaf->ways_of_associativity = (size / nr_sets) / line_size;
 }
 
-static bool cache_node_is_unified(struct cacheinfo *this_leaf)
+static bool cache_node_is_unified(struct cacheinfo *this_leaf,
+				  struct device_node *np)
 {
-	return of_property_read_bool(this_leaf->of_node, "cache-unified");
+	return of_property_read_bool(np, "cache-unified");
 }
 
-static void cache_of_override_properties(unsigned int cpu)
+void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np)
 {
-	int index;
-	struct cacheinfo *this_leaf;
-	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-
-	for (index = 0; index < cache_leaves(cpu); index++) {
-		this_leaf = this_cpu_ci->info_list + index;
-		/*
-		 * init_cache_level must setup the cache level correctly
-		 * overriding the architecturally specified levels, so
-		 * if type is NONE at this stage, it should be unified
-		 */
-		if (this_leaf->type == CACHE_TYPE_NOCACHE &&
-		    cache_node_is_unified(this_leaf))
-			this_leaf->type = CACHE_TYPE_UNIFIED;
-		cache_size(this_leaf);
-		cache_get_line_size(this_leaf);
-		cache_nr_sets(this_leaf);
-		cache_associativity(this_leaf);
-	}
+	/*
+	 * init_cache_level must setup the cache level correctly
+	 * overriding the architecturally specified levels, so
+	 * if type is NONE at this stage, it should be unified
+	 */
+	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
+	    cache_node_is_unified(this_leaf, np))
+		this_leaf->type = CACHE_TYPE_UNIFIED;
+	cache_size(this_leaf, np);
+	cache_get_line_size(this_leaf, np);
+	cache_nr_sets(this_leaf, np);
+	cache_associativity(this_leaf);
 }
 
 static int cache_setup_of_node(unsigned int cpu)
@@ -204,6 +200,7 @@ static int cache_setup_of_node(unsigned int cpu)
 			np = of_node_get(np);/* cpu node itself */
 		if (!np)
 			break;
+		cache_of_set_props(this_leaf, np);
 		this_leaf->of_node = np;
 		index++;
 	}
@@ -214,7 +211,6 @@ static int cache_setup_of_node(unsigned int cpu)
 	return 0;
 }
 #else
-static void cache_of_override_properties(unsigned int cpu) { }
 static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
@@ -297,12 +293,6 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 	}
 }
 
-static void cache_override_properties(unsigned int cpu)
-{
-	if (of_have_populated_dt())
-		return cache_of_override_properties(cpu);
-}
-
 static void free_cache_attributes(unsigned int cpu)
 {
 	if (!per_cpu_cacheinfo(cpu))
@@ -336,6 +326,10 @@ static int detect_cache_attributes(unsigned int cpu)
 	if (per_cpu_cacheinfo(cpu) == NULL)
 		return -ENOMEM;
 
+	/*
+	 * populate_cache_leaves() may completely setup the cache leaves and
+	 * shared_cpu_map or it may leave it partially setup.
+	 */
 	ret = populate_cache_leaves(cpu);
 	if (ret)
 		goto free_ci;
@@ -349,7 +343,6 @@ static int detect_cache_attributes(unsigned int cpu)
 		goto free_ci;
 	}
 
-	cache_override_properties(cpu);
 	return 0;
 
 free_ci:
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 3d9805297cda..d35299a590a4 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton, Palmer Dabbelt, Albert Ou

The original intent in cacheinfo was that an architecture
specific populate_cache_leaves() would probe the hardware
and then cache_shared_cpu_map_setup() and
cache_override_properties() would provide firmware help to
extend/expand upon what was probed. Arm64 was really
the only architecture that was working this way, and
with the removal of most of the hardware probing logic it
became clear that it was possible to simplify the logic a bit.

This patch combines the walk of the DT nodes with the
code updating the cache size/line_size and nr_sets.
cache_override_properties() (which was DT specific) is
then removed. The result is that cacheinfo.of_node is
no longer used as a temporary place to hold DT references
for future calls that update cache properties. That change
helps to clarify its one remaining use (matching
cacheinfo nodes that represent shared caches) which
will be used by the ACPI/PPTT code in the following patches.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/riscv/kernel/cacheinfo.c |  1 +
 drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
 include/linux/cacheinfo.h     |  1 +
 3 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 10ed2749e246..6f4500233cf8 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 		CACHE_WRITE_BACK
 		| CACHE_READ_ALLOCATE
 		| CACHE_WRITE_ALLOCATE;
+	cache_of_set_props(this_leaf, node);
 }
 
 static int __init_cache_level(unsigned int cpu)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index a883a213fcd5..fc0d42bbd9eb 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -43,6 +43,7 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 }
 
 #ifdef CONFIG_OF
+
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
@@ -82,7 +83,7 @@ static inline int get_cacheinfo_idx(enum cache_type type)
 	return type;
 }
 
-static void cache_size(struct cacheinfo *this_leaf)
+static void cache_size(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	const char *propname;
 	const __be32 *cache_size;
@@ -91,13 +92,14 @@ static void cache_size(struct cacheinfo *this_leaf)
 	ct_idx = get_cacheinfo_idx(this_leaf->type);
 	propname = cache_type_info[ct_idx].size_prop;
 
-	cache_size = of_get_property(this_leaf->of_node, propname, NULL);
+	cache_size = of_get_property(np, propname, NULL);
 	if (cache_size)
 		this_leaf->size = of_read_number(cache_size, 1);
 }
 
 /* not cache_line_size() because that's a macro in include/linux/cache.h */
-static void cache_get_line_size(struct cacheinfo *this_leaf)
+static void cache_get_line_size(struct cacheinfo *this_leaf,
+				struct device_node *np)
 {
 	const __be32 *line_size;
 	int i, lim, ct_idx;
@@ -109,7 +111,7 @@ static void cache_get_line_size(struct cacheinfo *this_leaf)
 		const char *propname;
 
 		propname = cache_type_info[ct_idx].line_size_props[i];
-		line_size = of_get_property(this_leaf->of_node, propname, NULL);
+		line_size = of_get_property(np, propname, NULL);
 		if (line_size)
 			break;
 	}
@@ -118,7 +120,7 @@ static void cache_get_line_size(struct cacheinfo *this_leaf)
 		this_leaf->coherency_line_size = of_read_number(line_size, 1);
 }
 
-static void cache_nr_sets(struct cacheinfo *this_leaf)
+static void cache_nr_sets(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	const char *propname;
 	const __be32 *nr_sets;
@@ -127,7 +129,7 @@ static void cache_nr_sets(struct cacheinfo *this_leaf)
 	ct_idx = get_cacheinfo_idx(this_leaf->type);
 	propname = cache_type_info[ct_idx].nr_sets_prop;
 
-	nr_sets = of_get_property(this_leaf->of_node, propname, NULL);
+	nr_sets = of_get_property(np, propname, NULL);
 	if (nr_sets)
 		this_leaf->number_of_sets = of_read_number(nr_sets, 1);
 }
@@ -146,32 +148,26 @@ static void cache_associativity(struct cacheinfo *this_leaf)
 		this_leaf->ways_of_associativity = (size / nr_sets) / line_size;
 }
 
-static bool cache_node_is_unified(struct cacheinfo *this_leaf)
+static bool cache_node_is_unified(struct cacheinfo *this_leaf,
+				  struct device_node *np)
 {
-	return of_property_read_bool(this_leaf->of_node, "cache-unified");
+	return of_property_read_bool(np, "cache-unified");
 }
 
-static void cache_of_override_properties(unsigned int cpu)
+void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np)
 {
-	int index;
-	struct cacheinfo *this_leaf;
-	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-
-	for (index = 0; index < cache_leaves(cpu); index++) {
-		this_leaf = this_cpu_ci->info_list + index;
-		/*
-		 * init_cache_level must setup the cache level correctly
-		 * overriding the architecturally specified levels, so
-		 * if type is NONE at this stage, it should be unified
-		 */
-		if (this_leaf->type == CACHE_TYPE_NOCACHE &&
-		    cache_node_is_unified(this_leaf))
-			this_leaf->type = CACHE_TYPE_UNIFIED;
-		cache_size(this_leaf);
-		cache_get_line_size(this_leaf);
-		cache_nr_sets(this_leaf);
-		cache_associativity(this_leaf);
-	}
+	/*
+	 * init_cache_level must setup the cache level correctly
+	 * overriding the architecturally specified levels, so
+	 * if type is NONE at this stage, it should be unified
+	 */
+	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
+	    cache_node_is_unified(this_leaf, np))
+		this_leaf->type = CACHE_TYPE_UNIFIED;
+	cache_size(this_leaf, np);
+	cache_get_line_size(this_leaf, np);
+	cache_nr_sets(this_leaf, np);
+	cache_associativity(this_leaf);
 }
 
 static int cache_setup_of_node(unsigned int cpu)
@@ -204,6 +200,7 @@ static int cache_setup_of_node(unsigned int cpu)
 			np = of_node_get(np);/* cpu node itself */
 		if (!np)
 			break;
+		cache_of_set_props(this_leaf, np);
 		this_leaf->of_node = np;
 		index++;
 	}
@@ -214,7 +211,6 @@ static int cache_setup_of_node(unsigned int cpu)
 	return 0;
 }
 #else
-static void cache_of_override_properties(unsigned int cpu) { }
 static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
@@ -297,12 +293,6 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 	}
 }
 
-static void cache_override_properties(unsigned int cpu)
-{
-	if (of_have_populated_dt())
-		return cache_of_override_properties(cpu);
-}
-
 static void free_cache_attributes(unsigned int cpu)
 {
 	if (!per_cpu_cacheinfo(cpu))
@@ -336,6 +326,10 @@ static int detect_cache_attributes(unsigned int cpu)
 	if (per_cpu_cacheinfo(cpu) == NULL)
 		return -ENOMEM;
 
+	/*
+	 * populate_cache_leaves() may completely setup the cache leaves and
+	 * shared_cpu_map or it may leave it partially setup.
+	 */
 	ret = populate_cache_leaves(cpu);
 	if (ret)
 		goto free_ci;
@@ -349,7 +343,6 @@ static int detect_cache_attributes(unsigned int cpu)
 		goto free_ci;
 	}
 
-	cache_override_properties(cpu);
 	return 0;
 
 free_ci:
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 3d9805297cda..d35299a590a4 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

The original intent in cacheinfo was that an architecture
specific populate_cache_leaves() would probe the hardware
and then cache_shared_cpu_map_setup() and
cache_override_properties() would provide firmware help to
extend/expand upon what was probed. Arm64 was really
the only architecture that was working this way, and
with the removal of most of the hardware probing logic it
became clear that it was possible to simplify the logic a bit.

This patch combines the walk of the DT nodes with the
code updating the cache size/line_size and nr_sets.
cache_override_properties() (which was DT specific) is
then removed. The result is that cacheinfo.of_node is
no longer used as a temporary place to hold DT references
for future calls that update cache properties. That change
helps to clarify its one remaining use (matching
cacheinfo nodes that represent shared caches) which
will be used by the ACPI/PPTT code in the following patches.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/riscv/kernel/cacheinfo.c |  1 +
 drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
 include/linux/cacheinfo.h     |  1 +
 3 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 10ed2749e246..6f4500233cf8 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 		CACHE_WRITE_BACK
 		| CACHE_READ_ALLOCATE
 		| CACHE_WRITE_ALLOCATE;
+	cache_of_set_props(this_leaf, node);
 }
 
 static int __init_cache_level(unsigned int cpu)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index a883a213fcd5..fc0d42bbd9eb 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -43,6 +43,7 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 }
 
 #ifdef CONFIG_OF
+
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
@@ -82,7 +83,7 @@ static inline int get_cacheinfo_idx(enum cache_type type)
 	return type;
 }
 
-static void cache_size(struct cacheinfo *this_leaf)
+static void cache_size(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	const char *propname;
 	const __be32 *cache_size;
@@ -91,13 +92,14 @@ static void cache_size(struct cacheinfo *this_leaf)
 	ct_idx = get_cacheinfo_idx(this_leaf->type);
 	propname = cache_type_info[ct_idx].size_prop;
 
-	cache_size = of_get_property(this_leaf->of_node, propname, NULL);
+	cache_size = of_get_property(np, propname, NULL);
 	if (cache_size)
 		this_leaf->size = of_read_number(cache_size, 1);
 }
 
 /* not cache_line_size() because that's a macro in include/linux/cache.h */
-static void cache_get_line_size(struct cacheinfo *this_leaf)
+static void cache_get_line_size(struct cacheinfo *this_leaf,
+				struct device_node *np)
 {
 	const __be32 *line_size;
 	int i, lim, ct_idx;
@@ -109,7 +111,7 @@ static void cache_get_line_size(struct cacheinfo *this_leaf)
 		const char *propname;
 
 		propname = cache_type_info[ct_idx].line_size_props[i];
-		line_size = of_get_property(this_leaf->of_node, propname, NULL);
+		line_size = of_get_property(np, propname, NULL);
 		if (line_size)
 			break;
 	}
@@ -118,7 +120,7 @@ static void cache_get_line_size(struct cacheinfo *this_leaf)
 		this_leaf->coherency_line_size = of_read_number(line_size, 1);
 }
 
-static void cache_nr_sets(struct cacheinfo *this_leaf)
+static void cache_nr_sets(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	const char *propname;
 	const __be32 *nr_sets;
@@ -127,7 +129,7 @@ static void cache_nr_sets(struct cacheinfo *this_leaf)
 	ct_idx = get_cacheinfo_idx(this_leaf->type);
 	propname = cache_type_info[ct_idx].nr_sets_prop;
 
-	nr_sets = of_get_property(this_leaf->of_node, propname, NULL);
+	nr_sets = of_get_property(np, propname, NULL);
 	if (nr_sets)
 		this_leaf->number_of_sets = of_read_number(nr_sets, 1);
 }
@@ -146,32 +148,26 @@ static void cache_associativity(struct cacheinfo *this_leaf)
 		this_leaf->ways_of_associativity = (size / nr_sets) / line_size;
 }
 
-static bool cache_node_is_unified(struct cacheinfo *this_leaf)
+static bool cache_node_is_unified(struct cacheinfo *this_leaf,
+				  struct device_node *np)
 {
-	return of_property_read_bool(this_leaf->of_node, "cache-unified");
+	return of_property_read_bool(np, "cache-unified");
 }
 
-static void cache_of_override_properties(unsigned int cpu)
+void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np)
 {
-	int index;
-	struct cacheinfo *this_leaf;
-	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-
-	for (index = 0; index < cache_leaves(cpu); index++) {
-		this_leaf = this_cpu_ci->info_list + index;
-		/*
-		 * init_cache_level must setup the cache level correctly
-		 * overriding the architecturally specified levels, so
-		 * if type is NONE at this stage, it should be unified
-		 */
-		if (this_leaf->type == CACHE_TYPE_NOCACHE &&
-		    cache_node_is_unified(this_leaf))
-			this_leaf->type = CACHE_TYPE_UNIFIED;
-		cache_size(this_leaf);
-		cache_get_line_size(this_leaf);
-		cache_nr_sets(this_leaf);
-		cache_associativity(this_leaf);
-	}
+	/*
+	 * init_cache_level must setup the cache level correctly
+	 * overriding the architecturally specified levels, so
+	 * if type is NONE at this stage, it should be unified
+	 */
+	if (this_leaf->type == CACHE_TYPE_NOCACHE &&
+	    cache_node_is_unified(this_leaf, np))
+		this_leaf->type = CACHE_TYPE_UNIFIED;
+	cache_size(this_leaf, np);
+	cache_get_line_size(this_leaf, np);
+	cache_nr_sets(this_leaf, np);
+	cache_associativity(this_leaf);
 }
 
 static int cache_setup_of_node(unsigned int cpu)
@@ -204,6 +200,7 @@ static int cache_setup_of_node(unsigned int cpu)
 			np = of_node_get(np);/* cpu node itself */
 		if (!np)
 			break;
+		cache_of_set_props(this_leaf, np);
 		this_leaf->of_node = np;
 		index++;
 	}
@@ -214,7 +211,6 @@ static int cache_setup_of_node(unsigned int cpu)
 	return 0;
 }
 #else
-static void cache_of_override_properties(unsigned int cpu) { }
 static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
@@ -297,12 +293,6 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 	}
 }
 
-static void cache_override_properties(unsigned int cpu)
-{
-	if (of_have_populated_dt())
-		return cache_of_override_properties(cpu);
-}
-
 static void free_cache_attributes(unsigned int cpu)
 {
 	if (!per_cpu_cacheinfo(cpu))
@@ -336,6 +326,10 @@ static int detect_cache_attributes(unsigned int cpu)
 	if (per_cpu_cacheinfo(cpu) == NULL)
 		return -ENOMEM;
 
+	/*
+	 * populate_cache_leaves() may completely setup the cache leaves and
+	 * shared_cpu_map or it may leave it partially setup.
+	 */
 	ret = populate_cache_leaves(cpu);
 	if (ret)
 		goto free_ci;
@@ -349,7 +343,6 @@ static int detect_cache_attributes(unsigned int cpu)
 		goto free_ci;
 	}
 
-	cache_override_properties(cpu);
 	return 0;
 
 free_ci:
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 3d9805297cda..d35299a590a4 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
+void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 03/12] cacheinfo: rename of_node to fw_unique
  2018-01-13  0:59 ` Jeremy Linton
  (?)
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, austinwc, catalin.marinas,
	Palmer Dabbelt, will.deacon, morten.rasmussen, vkilari,
	Jayachandran.Nair, lorenzo.pieralisi, jhugo, wangxiongfeng2,
	viresh.kumar, lenb, linux-pm, ahs3, linux-arm-kernel, gregkh,
	rjw, linux-kernel, Jeremy Linton, hanjun.guo, Albert Ou,
	sudeep.holla

Rename and change the type of of_node to indicate
it is a generic pointer which is generally only used
for comparison purposes. In a later patch we will apply
an ACPI/PPTT "token" pointer in fw_unique so that
the code which builds the shared cpu masks can be reused.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/riscv/kernel/cacheinfo.c |  2 +-
 drivers/base/cacheinfo.c      | 16 +++++++++-------
 include/linux/cacheinfo.h     |  8 +++-----
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 6f4500233cf8..6b0219af88d4 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -20,7 +20,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 			 struct device_node *node,
 			 enum cache_type type, unsigned int level)
 {
-	this_leaf->of_node = node;
+	this_leaf->fw_unique = node;
 	this_leaf->level = level;
 	this_leaf->type = type;
 	/* not a sector cache */
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index fc0d42bbd9eb..217aa90fb036 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -47,7 +47,7 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->fw_unique == this_leaf->fw_unique;
 }
 
 /* OF properties to query for a given cache type */
@@ -178,9 +178,10 @@ static int cache_setup_of_node(unsigned int cpu)
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	unsigned int index = 0;
 
-	/* skip if of_node is already populated */
-	if (this_cpu_ci->info_list->of_node)
+	/* skip if fw_unique is already populated */
+	if (this_cpu_ci->info_list->fw_unique) {
 		return 0;
+	}
 
 	if (!cpu_dev) {
 		pr_err("No cpu device for CPU %d\n", cpu);
@@ -201,7 +202,7 @@ static int cache_setup_of_node(unsigned int cpu)
 		if (!np)
 			break;
 		cache_of_set_props(this_leaf, np);
-		this_leaf->of_node = np;
+		this_leaf->fw_unique = np;
 		index++;
 	}
 
@@ -289,7 +290,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 			cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
 			cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
 		}
-		of_node_put(this_leaf->of_node);
+		of_node_put(this_leaf->fw_unique);
 	}
 }
 
@@ -334,8 +335,9 @@ static int detect_cache_attributes(unsigned int cpu)
 	if (ret)
 		goto free_ci;
 	/*
-	 * For systems using DT for cache hierarchy, of_node and shared_cpu_map
-	 * will be set up here only if they are not populated already
+	 * For systems using DT for cache hierarchy, fw_unique
+	 * and shared_cpu_map will be set up here only if they are
+	 * not populated already
 	 */
 	ret = cache_shared_cpu_map_setup(cpu);
 	if (ret) {
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index d35299a590a4..6f2e6c87b64c 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -34,9 +34,8 @@ enum cache_type {
  * @shared_cpu_map: logical cpumask representing all the cpus sharing
  *	this cache node
  * @attributes: bitfield representing various cache attributes
- * @of_node: if devicetree is used, this represents either the cpu node in
- *	case there's no explicit cache node or the cache node itself in the
- *	device tree
+ * @fw_unique: Unique value used to determine if different cacheinfo
+ *	structures represent a single hardware cache instance.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -65,8 +64,7 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	void *fw_unique;
 	bool disable_sysfs;
 	void *priv;
 };
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 03/12] cacheinfo: rename of_node to fw_unique
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton, Palmer Dabbelt, Albert Ou

Rename and change the type of of_node to indicate
it is a generic pointer which is generally only used
for comparison purposes. In a later patch we will apply
an ACPI/PPTT "token" pointer in fw_unique so that
the code which builds the shared cpu masks can be reused.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/riscv/kernel/cacheinfo.c |  2 +-
 drivers/base/cacheinfo.c      | 16 +++++++++-------
 include/linux/cacheinfo.h     |  8 +++-----
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 6f4500233cf8..6b0219af88d4 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -20,7 +20,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 			 struct device_node *node,
 			 enum cache_type type, unsigned int level)
 {
-	this_leaf->of_node = node;
+	this_leaf->fw_unique = node;
 	this_leaf->level = level;
 	this_leaf->type = type;
 	/* not a sector cache */
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index fc0d42bbd9eb..217aa90fb036 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -47,7 +47,7 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->fw_unique == this_leaf->fw_unique;
 }
 
 /* OF properties to query for a given cache type */
@@ -178,9 +178,10 @@ static int cache_setup_of_node(unsigned int cpu)
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	unsigned int index = 0;
 
-	/* skip if of_node is already populated */
-	if (this_cpu_ci->info_list->of_node)
+	/* skip if fw_unique is already populated */
+	if (this_cpu_ci->info_list->fw_unique) {
 		return 0;
+	}
 
 	if (!cpu_dev) {
 		pr_err("No cpu device for CPU %d\n", cpu);
@@ -201,7 +202,7 @@ static int cache_setup_of_node(unsigned int cpu)
 		if (!np)
 			break;
 		cache_of_set_props(this_leaf, np);
-		this_leaf->of_node = np;
+		this_leaf->fw_unique = np;
 		index++;
 	}
 
@@ -289,7 +290,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 			cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
 			cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
 		}
-		of_node_put(this_leaf->of_node);
+		of_node_put(this_leaf->fw_unique);
 	}
 }
 
@@ -334,8 +335,9 @@ static int detect_cache_attributes(unsigned int cpu)
 	if (ret)
 		goto free_ci;
 	/*
-	 * For systems using DT for cache hierarchy, of_node and shared_cpu_map
-	 * will be set up here only if they are not populated already
+	 * For systems using DT for cache hierarchy, fw_unique
+	 * and shared_cpu_map will be set up here only if they are
+	 * not populated already
 	 */
 	ret = cache_shared_cpu_map_setup(cpu);
 	if (ret) {
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index d35299a590a4..6f2e6c87b64c 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -34,9 +34,8 @@ enum cache_type {
  * @shared_cpu_map: logical cpumask representing all the cpus sharing
  *	this cache node
  * @attributes: bitfield representing various cache attributes
- * @of_node: if devicetree is used, this represents either the cpu node in
- *	case there's no explicit cache node or the cache node itself in the
- *	device tree
+ * @fw_unique: Unique value used to determine if different cacheinfo
+ *	structures represent a single hardware cache instance.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -65,8 +64,7 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	void *fw_unique;
 	bool disable_sysfs;
 	void *priv;
 };
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 03/12] cacheinfo: rename of_node to fw_unique
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

Rename and change the type of of_node to indicate
it is a generic pointer which is generally only used
for comparison purposes. In a later patch we will apply
an ACPI/PPTT "token" pointer in fw_unique so that
the code which builds the shared cpu masks can be reused.

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/riscv/kernel/cacheinfo.c |  2 +-
 drivers/base/cacheinfo.c      | 16 +++++++++-------
 include/linux/cacheinfo.h     |  8 +++-----
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 6f4500233cf8..6b0219af88d4 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -20,7 +20,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 			 struct device_node *node,
 			 enum cache_type type, unsigned int level)
 {
-	this_leaf->of_node = node;
+	this_leaf->fw_unique = node;
 	this_leaf->level = level;
 	this_leaf->type = type;
 	/* not a sector cache */
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index fc0d42bbd9eb..217aa90fb036 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -47,7 +47,7 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu)
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
-	return sib_leaf->of_node == this_leaf->of_node;
+	return sib_leaf->fw_unique == this_leaf->fw_unique;
 }
 
 /* OF properties to query for a given cache type */
@@ -178,9 +178,10 @@ static int cache_setup_of_node(unsigned int cpu)
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	unsigned int index = 0;
 
-	/* skip if of_node is already populated */
-	if (this_cpu_ci->info_list->of_node)
+	/* skip if fw_unique is already populated */
+	if (this_cpu_ci->info_list->fw_unique) {
 		return 0;
+	}
 
 	if (!cpu_dev) {
 		pr_err("No cpu device for CPU %d\n", cpu);
@@ -201,7 +202,7 @@ static int cache_setup_of_node(unsigned int cpu)
 		if (!np)
 			break;
 		cache_of_set_props(this_leaf, np);
-		this_leaf->of_node = np;
+		this_leaf->fw_unique = np;
 		index++;
 	}
 
@@ -289,7 +290,7 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 			cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
 			cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
 		}
-		of_node_put(this_leaf->of_node);
+		of_node_put(this_leaf->fw_unique);
 	}
 }
 
@@ -334,8 +335,9 @@ static int detect_cache_attributes(unsigned int cpu)
 	if (ret)
 		goto free_ci;
 	/*
-	 * For systems using DT for cache hierarchy, of_node and shared_cpu_map
-	 * will be set up here only if they are not populated already
+	 * For systems using DT for cache hierarchy, fw_unique
+	 * and shared_cpu_map will be set up here only if they are
+	 * not populated already
 	 */
 	ret = cache_shared_cpu_map_setup(cpu);
 	if (ret) {
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index d35299a590a4..6f2e6c87b64c 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -34,9 +34,8 @@ enum cache_type {
  * @shared_cpu_map: logical cpumask representing all the cpus sharing
  *	this cache node
  * @attributes: bitfield representing various cache attributes
- * @of_node: if devicetree is used, this represents either the cpu node in
- *	case there's no explicit cache node or the cache node itself in the
- *	device tree
+ * @fw_unique: Unique value used to determine if different cacheinfo
+ *	structures represent a single hardware cache instance.
  * @disable_sysfs: indicates whether this node is visible to the user via
  *	sysfs or not
  * @priv: pointer to any private data structure specific to particular
@@ -65,8 +64,7 @@ struct cacheinfo {
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
 #define CACHE_ID		BIT(4)
-
-	struct device_node *of_node;
+	void *fw_unique;
 	bool disable_sysfs;
 	void *priv;
 };
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 04/12] arm64/acpi: Create arch specific cpu to acpi id helper
  2018-01-13  0:59 ` Jeremy Linton
  (?)
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, austinwc, catalin.marinas,
	will.deacon, morten.rasmussen, vkilari, Jayachandran.Nair,
	lorenzo.pieralisi, jhugo, wangxiongfeng2, viresh.kumar, lenb,
	linux-pm, ahs3, linux-arm-kernel, gregkh, rjw, linux-kernel,
	Jeremy Linton, hanjun.guo, sudeep.holla

Its helpful to be able to lookup the acpi_processor_id associated
with a logical cpu. Provide an arm64 helper to do this.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/acpi.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..0db62a4cbce2 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -86,6 +86,10 @@ static inline bool acpi_has_cpu_in_madt(void)
 }
 
 struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu);
+static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
+{
+	return	acpi_cpu_get_madt_gicc(cpu)->uid;
+}
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 04/12] arm64/acpi: Create arch specific cpu to acpi id helper
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

Its helpful to be able to lookup the acpi_processor_id associated
with a logical cpu. Provide an arm64 helper to do this.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/acpi.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..0db62a4cbce2 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -86,6 +86,10 @@ static inline bool acpi_has_cpu_in_madt(void)
 }
 
 struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu);
+static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
+{
+	return	acpi_cpu_get_madt_gicc(cpu)->uid;
+}
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 04/12] arm64/acpi: Create arch specific cpu to acpi id helper
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

Its helpful to be able to lookup the acpi_processor_id associated
with a logical cpu. Provide an arm64 helper to do this.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/acpi.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..0db62a4cbce2 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -86,6 +86,10 @@ static inline bool acpi_has_cpu_in_madt(void)
 }
 
 struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu);
+static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
+{
+	return	acpi_cpu_get_madt_gicc(cpu)->uid;
+}
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

An additional patch later in the set adds the ability to report
peers in the topology using find_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 476 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..2c4b3ed862a8
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,0 +1,476 @@
+/*
+ * Copyright (C) 2018, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ *
+ * The PPTT structure is an inverted tree, with each node potentially
+ * holding one or two inverted tree data structures describing
+ * the caches available at that level. Each cache structure optionally
+ * contains properties describing the cache at a given level which can be
+ * used to override hardware probed values.
+ */
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/* total number of attributes checked by the properties code */
+#define PPTT_CHECKED_ATTRIBUTES 6
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (pptt_ref < sizeof(struct acpi_subtable_header))
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
+								 pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
+							     pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 *ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
+	ref += resource;
+
+	return fetch_pptt_subtable(table_hdr, *ref);
+}
+
+/*
+ * Attempt to find a given cache level, while counting the max number
+ * of cache levels for the cache node.
+ *
+ * Given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if ((*found != NULL) && (cache != *found))
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * Given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Determine if the *node parameter is a leaf node by iterating the
+ * PPTT table, looking for nodes which reference it.
+ * Return 0 if we find a node referencing the passed node,
+ * or 1 if we don't.
+ */
+static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
+			       struct acpi_pptt_processor *node)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	u32 node_entry;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	node_entry = ACPI_PTR_DIFF(node, table_hdr);
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+
+	while ((unsigned long)(entry + 1) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->parent == node_entry))
+			return 0;
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+	return 1;
+}
+
+/*
+ * Find the subtable entry describing the provided processor.
+ * This is done by iterating the PPTT table looking for processor nodes
+ * which have an acpi_processor_id that matches the acpi_cpu_id parameter
+ * passed into the function. If we find a node that matches this criteria
+ * we verify that its a leaf node in the topology rather than depending
+ * on the valid flag, which doesn't need to be set for leaf nodes.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while ((unsigned long)(entry + 1) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (acpi_cpu_id == cpu_node->acpi_processor_id) &&
+		     acpi_pptt_leaf_node(table_hdr, cpu_node)) {
+			return (struct acpi_pptt_processor *)entry;
+		}
+
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+
+	return NULL;
+}
+
+static int acpi_find_cache_levels(struct acpi_table_header *table_hdr,
+				  u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+/* Convert the linux cache_type to a ACPI PPTT cache type value */
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_PPTT_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_PPTT_CACHE_TYPE_INSTR;
+	default:
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		/*
+		 * It is important that ACPI_PPTT_CACHE_TYPE_UNIFIED
+		 * contains the bit pattern that will match both
+		 * ACPI unified bit patterns because we use it later
+		 * to match both cases.
+		 */
+		return ACPI_PPTT_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level,
+	struct acpi_pptt_processor **node)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+
+	while ((cpu_node) && (!found)) {
+		found = acpi_find_cache_level(table_hdr, cpu_node,
+					      &total_levels, level, acpi_type);
+		*node = cpu_node;
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	}
+
+	return found;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache,
+				    struct acpi_pptt_processor *cpu_node)
+{
+	int valid_flags = 0;
+
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
+		this_leaf->size = found_cache->size;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID) {
+		this_leaf->coherency_line_size = found_cache->line_size;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID) {
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID) {
+		this_leaf->ways_of_associativity = found_cache->associativity;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_PPTT_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_PPTT_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		}
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_PPTT_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_PPTT_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_PPTT_CACHE_RW_ALLOCATE:
+		case ACPI_PPTT_CACHE_RW_ALLOCATE_ALT:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE;
+			break;
+		}
+		valid_flags++;
+	}
+	/*
+	 * If all the above flags are valid, and the cache type is NOCACHE
+	 * update the cache type as well.
+	 */
+	if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
+	    (valid_flags == PPTT_CHECKED_ATTRIBUTES))
+		this_leaf->type = CACHE_TYPE_UNIFIED;
+}
+
+/*
+ * Update the kernel cache information for each level of cache
+ * associated with the given acpi cpu.
+ */
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+	struct acpi_pptt_processor *cpu_node = NULL;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level,
+						   &cpu_node);
+		pr_debug("found = %p %p\n", found_cache, cpu_node);
+		if (found_cache)
+			update_cache_properties(this_leaf,
+						found_cache,
+						cpu_node);
+
+		index++;
+	}
+}
+
+/**
+ * acpi_find_last_cache_level() - Determines the number of cache levels for a PE
+ * @cpu: Kernel logical cpu number
+ *
+ * Given a logical cpu number, returns the number of levels of cache represented
+ * in the PPTT. Errors caused by lack of a PPTT table, or otherwise, return 0
+ * indicating we didn't find any cache levels.
+ *
+ * Return: Cache levels visible to this core.
+ */
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_find_cache_levels(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/**
+ * cache_setup_acpi() - Override CPU cache topology with data from the PPTT
+ * @cpu: Kernel logical cpu number
+ *
+ * Updates the global cache info provided by cpu_get_cacheinfo()
+ * when there are valid properties in the acpi_pptt_cache nodes. A
+ * successful parse may not result in any updates if none of the
+ * cache levels have any valid flags set.  Futher, a unique value is
+ * associated with each known CPU cache entry. This unique value
+ * can be used to determine whether caches are shared between cpus.
+ *
+ * Return: -ENOENT on failure to find table, or 0 on success
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
-- 
2.13.5


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

ACPI 6.2 adds a new table, which describes how processing units
are related to each other in tree like fashion. Caches are
also sprinkled throughout the tree and describe the properties
of the caches in relation to other caches and processing units.

Add the code to parse the cache hierarchy and report the total
number of levels of cache for a given core using
acpi_find_last_cache_level() as well as fill out the individual
cores cache information with cache_setup_acpi() once the
cpu_cacheinfo structure has been populated by the arch specific
code.

An additional patch later in the set adds the ability to report
peers in the topology using find_acpi_cpu_topology()
to report a unique ID for each processing unit at a given level
in the tree. These unique id's can then be used to match related
processing units which exist as threads, COD (clusters
on die), within a given package, etc.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 476 insertions(+)
 create mode 100644 drivers/acpi/pptt.c

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
new file mode 100644
index 000000000000..2c4b3ed862a8
--- /dev/null
+++ b/drivers/acpi/pptt.c
@@ -0,0 +1,476 @@
+/*
+ * Copyright (C) 2018, ARM
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file implements parsing of Processor Properties Topology Table (PPTT)
+ * which is optionally used to describe the processor and cache topology.
+ * Due to the relative pointers used throughout the table, this doesn't
+ * leverage the existing subtable parsing in the kernel.
+ *
+ * The PPTT structure is an inverted tree, with each node potentially
+ * holding one or two inverted tree data structures describing
+ * the caches available at that level. Each cache structure optionally
+ * contains properties describing the cache at a given level which can be
+ * used to override hardware probed values.
+ */
+#define pr_fmt(fmt) "ACPI PPTT: " fmt
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <acpi/processor.h>
+
+/* total number of attributes checked by the properties code */
+#define PPTT_CHECKED_ATTRIBUTES 6
+
+/*
+ * Given the PPTT table, find and verify that the subtable entry
+ * is located within the table
+ */
+static struct acpi_subtable_header *fetch_pptt_subtable(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	struct acpi_subtable_header *entry;
+
+	/* there isn't a subtable at reference 0 */
+	if (pptt_ref < sizeof(struct acpi_subtable_header))
+		return NULL;
+
+	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
+		return NULL;
+
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
+
+	if (pptt_ref + entry->length > table_hdr->length)
+		return NULL;
+
+	return entry;
+}
+
+static struct acpi_pptt_processor *fetch_pptt_node(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
+								 pptt_ref);
+}
+
+static struct acpi_pptt_cache *fetch_pptt_cache(
+	struct acpi_table_header *table_hdr, u32 pptt_ref)
+{
+	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
+							     pptt_ref);
+}
+
+static struct acpi_subtable_header *acpi_get_pptt_resource(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *node, int resource)
+{
+	u32 *ref;
+
+	if (resource >= node->number_of_priv_resources)
+		return NULL;
+
+	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
+	ref += resource;
+
+	return fetch_pptt_subtable(table_hdr, *ref);
+}
+
+/*
+ * Attempt to find a given cache level, while counting the max number
+ * of cache levels for the cache node.
+ *
+ * Given a pptt resource, verify that it is a cache node, then walk
+ * down each level of caches, counting how many levels are found
+ * as well as checking the cache type (icache, dcache, unified). If a
+ * level & type match, then we set found, and continue the search.
+ * Once the entire cache branch has been walked return its max
+ * depth.
+ */
+static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
+				int local_level,
+				struct acpi_subtable_header *res,
+				struct acpi_pptt_cache **found,
+				int level, int type)
+{
+	struct acpi_pptt_cache *cache;
+
+	if (res->type != ACPI_PPTT_TYPE_CACHE)
+		return 0;
+
+	cache = (struct acpi_pptt_cache *) res;
+	while (cache) {
+		local_level++;
+
+		if ((local_level == level) &&
+		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
+		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
+			if ((*found != NULL) && (cache != *found))
+				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
+
+			pr_debug("Found cache @ level %d\n", level);
+			*found = cache;
+			/*
+			 * continue looking at this node's resource list
+			 * to verify that we don't find a duplicate
+			 * cache node.
+			 */
+		}
+		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
+	}
+	return local_level;
+}
+
+/*
+ * Given a CPU node look for cache levels that exist at this level, and then
+ * for each cache node, count how many levels exist below (logically above) it.
+ * If a level and type are specified, and we find that level/type, abort
+ * processing and return the acpi_pptt_cache structure.
+ */
+static struct acpi_pptt_cache *acpi_find_cache_level(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu_node,
+	int *starting_level, int level, int type)
+{
+	struct acpi_subtable_header *res;
+	int number_of_levels = *starting_level;
+	int resource = 0;
+	struct acpi_pptt_cache *ret = NULL;
+	int local_level;
+
+	/* walk down from processor node */
+	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
+		resource++;
+
+		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
+						   res, &ret, level, type);
+		/*
+		 * we are looking for the max depth. Since its potentially
+		 * possible for a given node to have resources with differing
+		 * depths verify that the depth we have found is the largest.
+		 */
+		if (number_of_levels < local_level)
+			number_of_levels = local_level;
+	}
+	if (number_of_levels > *starting_level)
+		*starting_level = number_of_levels;
+
+	return ret;
+}
+
+/*
+ * Given a processor node containing a processing unit, walk into it and count
+ * how many levels exist solely for it, and then walk up each level until we hit
+ * the root node (ignore the package level because it may be possible to have
+ * caches that exist across packages). Count the number of cache levels that
+ * exist at each level on the way up.
+ */
+static int acpi_process_node(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node)
+{
+	int total_levels = 0;
+
+	do {
+		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	} while (cpu_node);
+
+	return total_levels;
+}
+
+/*
+ * Determine if the *node parameter is a leaf node by iterating the
+ * PPTT table, looking for nodes which reference it.
+ * Return 0 if we find a node referencing the passed node,
+ * or 1 if we don't.
+ */
+static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
+			       struct acpi_pptt_processor *node)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	u32 node_entry;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	node_entry = ACPI_PTR_DIFF(node, table_hdr);
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+
+	while ((unsigned long)(entry + 1) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (cpu_node->parent == node_entry))
+			return 0;
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+	return 1;
+}
+
+/*
+ * Find the subtable entry describing the provided processor.
+ * This is done by iterating the PPTT table looking for processor nodes
+ * which have an acpi_processor_id that matches the acpi_cpu_id parameter
+ * passed into the function. If we find a node that matches this criteria
+ * we verify that its a leaf node in the topology rather than depending
+ * on the valid flag, which doesn't need to be set for leaf nodes.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_node(
+	struct acpi_table_header *table_hdr,
+	u32 acpi_cpu_id)
+{
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	struct acpi_pptt_processor *cpu_node;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+
+	/* find the processor structure associated with this cpuid */
+	while ((unsigned long)(entry + 1) < table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+
+		if (entry->length == 0) {
+			pr_err("Invalid zero length subtable\n");
+			break;
+		}
+		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
+		    (acpi_cpu_id == cpu_node->acpi_processor_id) &&
+		     acpi_pptt_leaf_node(table_hdr, cpu_node)) {
+			return (struct acpi_pptt_processor *)entry;
+		}
+
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+
+	return NULL;
+}
+
+static int acpi_find_cache_levels(struct acpi_table_header *table_hdr,
+				  u32 acpi_cpu_id)
+{
+	int number_of_levels = 0;
+	struct acpi_pptt_processor *cpu;
+
+	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+	if (cpu)
+		number_of_levels = acpi_process_node(table_hdr, cpu);
+
+	return number_of_levels;
+}
+
+/* Convert the linux cache_type to a ACPI PPTT cache type value */
+static u8 acpi_cache_type(enum cache_type type)
+{
+	switch (type) {
+	case CACHE_TYPE_DATA:
+		pr_debug("Looking for data cache\n");
+		return ACPI_PPTT_CACHE_TYPE_DATA;
+	case CACHE_TYPE_INST:
+		pr_debug("Looking for instruction cache\n");
+		return ACPI_PPTT_CACHE_TYPE_INSTR;
+	default:
+	case CACHE_TYPE_UNIFIED:
+		pr_debug("Looking for unified cache\n");
+		/*
+		 * It is important that ACPI_PPTT_CACHE_TYPE_UNIFIED
+		 * contains the bit pattern that will match both
+		 * ACPI unified bit patterns because we use it later
+		 * to match both cases.
+		 */
+		return ACPI_PPTT_CACHE_TYPE_UNIFIED;
+	}
+}
+
+/* find the ACPI node describing the cache type/level for the given CPU */
+static struct acpi_pptt_cache *acpi_find_cache_node(
+	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
+	enum cache_type type, unsigned int level,
+	struct acpi_pptt_processor **node)
+{
+	int total_levels = 0;
+	struct acpi_pptt_cache *found = NULL;
+	struct acpi_pptt_processor *cpu_node;
+	u8 acpi_type = acpi_cache_type(type);
+
+	pr_debug("Looking for CPU %d's level %d cache type %d\n",
+		 acpi_cpu_id, level, acpi_type);
+
+	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
+
+	while ((cpu_node) && (!found)) {
+		found = acpi_find_cache_level(table_hdr, cpu_node,
+					      &total_levels, level, acpi_type);
+		*node = cpu_node;
+		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+	}
+
+	return found;
+}
+
+/*
+ * The ACPI spec implies that the fields in the cache structures are used to
+ * extend and correct the information probed from the hardware. In the case
+ * of arm64 the CCSIDR probing has been removed because it might be incorrect.
+ */
+static void update_cache_properties(struct cacheinfo *this_leaf,
+				    struct acpi_pptt_cache *found_cache,
+				    struct acpi_pptt_processor *cpu_node)
+{
+	int valid_flags = 0;
+
+	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
+		this_leaf->size = found_cache->size;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID) {
+		this_leaf->coherency_line_size = found_cache->line_size;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID) {
+		this_leaf->number_of_sets = found_cache->number_of_sets;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID) {
+		this_leaf->ways_of_associativity = found_cache->associativity;
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
+		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
+		case ACPI_PPTT_CACHE_POLICY_WT:
+			this_leaf->attributes = CACHE_WRITE_THROUGH;
+			break;
+		case ACPI_PPTT_CACHE_POLICY_WB:
+			this_leaf->attributes = CACHE_WRITE_BACK;
+			break;
+		}
+		valid_flags++;
+	}
+	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
+		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
+		case ACPI_PPTT_CACHE_READ_ALLOCATE:
+			this_leaf->attributes |= CACHE_READ_ALLOCATE;
+			break;
+		case ACPI_PPTT_CACHE_WRITE_ALLOCATE:
+			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
+			break;
+		case ACPI_PPTT_CACHE_RW_ALLOCATE:
+		case ACPI_PPTT_CACHE_RW_ALLOCATE_ALT:
+			this_leaf->attributes |=
+				CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE;
+			break;
+		}
+		valid_flags++;
+	}
+	/*
+	 * If all the above flags are valid, and the cache type is NOCACHE
+	 * update the cache type as well.
+	 */
+	if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
+	    (valid_flags == PPTT_CHECKED_ATTRIBUTES))
+		this_leaf->type = CACHE_TYPE_UNIFIED;
+}
+
+/*
+ * Update the kernel cache information for each level of cache
+ * associated with the given acpi cpu.
+ */
+static void cache_setup_acpi_cpu(struct acpi_table_header *table,
+				 unsigned int cpu)
+{
+	struct acpi_pptt_cache *found_cache;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	struct cacheinfo *this_leaf;
+	unsigned int index = 0;
+	struct acpi_pptt_processor *cpu_node = NULL;
+
+	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
+		this_leaf = this_cpu_ci->info_list + index;
+		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
+						   this_leaf->type,
+						   this_leaf->level,
+						   &cpu_node);
+		pr_debug("found = %p %p\n", found_cache, cpu_node);
+		if (found_cache)
+			update_cache_properties(this_leaf,
+						found_cache,
+						cpu_node);
+
+		index++;
+	}
+}
+
+/**
+ * acpi_find_last_cache_level() - Determines the number of cache levels for a PE
+ * @cpu: Kernel logical cpu number
+ *
+ * Given a logical cpu number, returns the number of levels of cache represented
+ * in the PPTT. Errors caused by lack of a PPTT table, or otherwise, return 0
+ * indicating we didn't find any cache levels.
+ *
+ * Return: Cache levels visible to this core.
+ */
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	u32 acpi_cpu_id;
+	struct acpi_table_header *table;
+	int number_of_levels = 0;
+	acpi_status status;
+
+	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
+
+	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+	} else {
+		number_of_levels = acpi_find_cache_levels(table, acpi_cpu_id);
+		acpi_put_table(table);
+	}
+	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
+
+	return number_of_levels;
+}
+
+/**
+ * cache_setup_acpi() - Override CPU cache topology with data from the PPTT
+ * @cpu: Kernel logical cpu number
+ *
+ * Updates the global cache info provided by cpu_get_cacheinfo()
+ * when there are valid properties in the acpi_pptt_cache nodes. A
+ * successful parse may not result in any updates if none of the
+ * cache levels have any valid flags set.  Futher, a unique value is
+ * associated with each known CPU cache entry. This unique value
+ * can be used to determine whether caches are shared between cpus.
+ *
+ * Return: -ENOENT on failure to find table, or 0 on success
+ */
+int cache_setup_acpi(unsigned int cpu)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+
+	pr_debug("Cache Setup ACPI cpu %d\n", cpu);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
+		return -ENOENT;
+	}
+
+	cache_setup_acpi_cpu(table, cpu);
+	acpi_put_table(table);
+
+	return status;
+}
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 06/12] ACPI: Enable PPTT support on ARM64
  2018-01-13  0:59 ` Jeremy Linton
  (?)
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: mark.rutland, Jonathan.Zhang, austinwc, catalin.marinas,
	will.deacon, morten.rasmussen, vkilari, Jayachandran.Nair,
	lorenzo.pieralisi, jhugo, wangxiongfeng2, viresh.kumar, lenb,
	linux-pm, ahs3, linux-arm-kernel, gregkh, rjw, linux-kernel,
	Jeremy Linton, hanjun.guo, sudeep.holla

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig    | 1 +
 drivers/acpi/Kconfig  | 3 +++
 drivers/acpi/Makefile | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..23bf30319d31 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 46505396869e..df7aebf0af0e 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -545,6 +545,9 @@ config ACPI_CONFIGFS
 
 if ARM64
 source "drivers/acpi/arm64/Kconfig"
+
+config ACPI_PPTT
+	bool
 endif
 
 config TPS68470_PMIC_OPREGION
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 41954a601989..b6056b566df4 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 06/12] ACPI: Enable PPTT support on ARM64
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig    | 1 +
 drivers/acpi/Kconfig  | 3 +++
 drivers/acpi/Makefile | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..23bf30319d31 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 46505396869e..df7aebf0af0e 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -545,6 +545,9 @@ config ACPI_CONFIGFS
 
 if ARM64
 source "drivers/acpi/arm64/Kconfig"
+
+config ACPI_PPTT
+	bool
 endif
 
 config TPS68470_PMIC_OPREGION
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 41954a601989..b6056b566df4 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 06/12] ACPI: Enable PPTT support on ARM64
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

Now that we have a PPTT parser, in preparation for its use
on arm64, lets build it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/Kconfig    | 1 +
 drivers/acpi/Kconfig  | 3 +++
 drivers/acpi/Makefile | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..23bf30319d31 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -7,6 +7,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ACPI_MCFG if ACPI
 	select ACPI_SPCR_TABLE if ACPI
+	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 46505396869e..df7aebf0af0e 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -545,6 +545,9 @@ config ACPI_CONFIGFS
 
 if ARM64
 source "drivers/acpi/arm64/Kconfig"
+
+config ACPI_PPTT
+	bool
 endif
 
 config TPS68470_PMIC_OPREGION
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 41954a601989..b6056b566df4 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
+obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

Add a entry to to struct cacheinfo to maintain a reference to the PPTT
node which can be used to match identical caches across cores. Also
stub out cache_setup_acpi() so that individual architectures can
enable ACPI topology parsing.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c       |  1 +
 drivers/base/cacheinfo.c  | 20 +++++++++++++-------
 include/linux/cacheinfo.h |  9 +++++++++
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 2c4b3ed862a8..4f5ab19c3a08 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
 {
 	int valid_flags = 0;
 
+	this_leaf->fw_unique = cpu_node;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
 		this_leaf->size = found_cache->size;
 		valid_flags++;
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 217aa90fb036..ee51e33cc37c 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
 
 	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
 		return -ENOENT;
-
 	return 0;
 }
+
 #else
 static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
 	/*
-	 * For non-DT systems, assume unique level 1 cache, system-wide
+	 * For non-DT/ACPI systems, assume unique level 1 caches, system-wide
 	 * shared caches for all other levels. This will be used only if
 	 * arch specific code has not populated shared_cpu_map
 	 */
@@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -290,7 +295,8 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 			cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
 			cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
 		}
-		of_node_put(this_leaf->fw_unique);
+		if (of_have_populated_dt())
+			of_node_put(this_leaf->fw_unique);
 	}
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6f2e6c87b64c..65b0ae30016e 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -98,6 +98,15 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
 void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
+int cache_setup_acpi(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

Add a entry to to struct cacheinfo to maintain a reference to the PPTT
node which can be used to match identical caches across cores. Also
stub out cache_setup_acpi() so that individual architectures can
enable ACPI topology parsing.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c       |  1 +
 drivers/base/cacheinfo.c  | 20 +++++++++++++-------
 include/linux/cacheinfo.h |  9 +++++++++
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 2c4b3ed862a8..4f5ab19c3a08 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
 {
 	int valid_flags = 0;
 
+	this_leaf->fw_unique = cpu_node;
 	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
 		this_leaf->size = found_cache->size;
 		valid_flags++;
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 217aa90fb036..ee51e33cc37c 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
 
 	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
 		return -ENOENT;
-
 	return 0;
 }
+
 #else
 static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
 static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 					   struct cacheinfo *sib_leaf)
 {
 	/*
-	 * For non-DT systems, assume unique level 1 cache, system-wide
+	 * For non-DT/ACPI systems, assume unique level 1 caches, system-wide
 	 * shared caches for all other levels. This will be used only if
 	 * arch specific code has not populated shared_cpu_map
 	 */
@@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
 }
 #endif
 
+int __weak cache_setup_acpi(unsigned int cpu)
+{
+	return -ENOTSUPP;
+}
+
 static int cache_shared_cpu_map_setup(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
@@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
 	if (this_cpu_ci->cpu_map_populated)
 		return 0;
 
-	if (of_have_populated_dt())
+	if (!acpi_disabled)
+		ret = cache_setup_acpi(cpu);
+	else if (of_have_populated_dt())
 		ret = cache_setup_of_node(cpu);
-	else if (!acpi_disabled)
-		/* No cache property/hierarchy support yet in ACPI */
-		ret = -ENOTSUPP;
+
 	if (ret)
 		return ret;
 
@@ -290,7 +295,8 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
 			cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
 			cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
 		}
-		of_node_put(this_leaf->fw_unique);
+		if (of_have_populated_dt())
+			of_node_put(this_leaf->fw_unique);
 	}
 }
 
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 6f2e6c87b64c..65b0ae30016e 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -98,6 +98,15 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
 void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
+int cache_setup_acpi(unsigned int cpu);
+int acpi_find_last_cache_level(unsigned int cpu);
+#ifndef CONFIG_ACPI
+int acpi_find_last_cache_level(unsigned int cpu)
+{
+	/*ACPI kernels should be built with PPTT support*/
+	return 0;
+}
+#endif
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 08/12] arm64: Add support for ACPI based firmware tables
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using the PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..0bf0a835122f 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -46,7 +47,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +60,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 08/12] arm64: Add support for ACPI based firmware tables
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

The /sys cache entries should support ACPI/PPTT generated cache
topology information. Lets detect ACPI systems and call
an arch specific cache_setup_acpi() routine to update the hardware
probed cache topology.

For arm64, if ACPI is enabled, determine the max number of cache
levels and populate them using the PPTT table if one is available.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/cacheinfo.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 380f2e2fbed5..0bf0a835122f 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -17,6 +17,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/cacheinfo.h>
 #include <linux/of.h>
 
@@ -46,7 +47,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
 
 static int __init_cache_level(unsigned int cpu)
 {
-	unsigned int ctype, level, leaves, of_level;
+	unsigned int ctype, level, leaves, fw_level;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
 	for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
@@ -59,15 +60,19 @@ static int __init_cache_level(unsigned int cpu)
 		leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
 	}
 
-	of_level = of_find_last_cache_level(cpu);
-	if (level < of_level) {
+	if (acpi_disabled)
+		fw_level = of_find_last_cache_level(cpu);
+	else
+		fw_level = acpi_find_last_cache_level(cpu);
+
+	if (level < fw_level) {
 		/*
 		 * some external caches not specified in CLIDR_EL1
 		 * the information may be available in the device tree
 		 * only unified external caches are considered here
 		 */
-		leaves += (of_level - level);
-		level = of_level;
+		leaves += (fw_level - level);
+		level = fw_level;
 	}
 
 	this_cpu_ci->num_levels = level;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 09/12] ACPI/PPTT: Add topology parsing code
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton

The PPTT can be used to determine the groupings of CPU's at
given levels in the system. Lets add a few routines to the PPTT
parsing code to return a unique id for each unique level in the
processor hierarchy. This can then be matched to build
thread/core/cluster/die/package/etc mappings for each processing
element in the system.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c  | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |   3 ++
 2 files changed, 118 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 4f5ab19c3a08..83d89d683f16 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -412,6 +412,79 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
 	}
 }
 
+/* Passing level values greater than this will result in search termination */
+#define PPTT_ABORT_PACKAGE 0xFF
+
+/*
+ * Given an acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with, or we run out of levels
+ * to request or the search is terminated with a flag match
+ * The level parameter also serves to limit possible loops within the tree.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level, int flag)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level) {
+		if (cpu->flags & flag)
+			break;
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+/*
+ * Get a unique value given a cpu, and a topology level, that can be
+ * matched to determine which cpus share common topological features
+ * at that level.
+ */
+static int topology_get_acpi_cpu_tag(struct acpi_table_header *table,
+				     unsigned int cpu, int level, int flag)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node,
+							  level, flag);
+		/* Only the first level has a guaranteed id */
+		if (level == 0)
+			return cpu_node->acpi_processor_id;
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+static int find_acpi_cpu_topology_tag(unsigned int cpu, int level, int flag)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_get_acpi_cpu_tag(table, cpu, level, flag);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
+
 /**
  * acpi_find_last_cache_level() - Determines the number of cache levels for a PE
  * @cpu: Kernel logical cpu number
@@ -475,3 +548,45 @@ int cache_setup_acpi(unsigned int cpu)
 
 	return status;
 }
+
+/**
+ * find_acpi_cpu_topology() - Determine a unique topology value for a given cpu
+ * @cpu: Kernel logical cpu number
+ * @level: The topological level for which we would like a unique ID
+ *
+ * Determine a topology unique ID for each thread/core/cluster/mc_grouping
+ * /socket/etc. This ID can then be used to group peers, which will have
+ * matching ids.
+ *
+ * The search terminates when either the requested level is found or
+ * we reach a root node. Levels beyond the termination point will return the
+ * same unique ID. The unique id for level 0 is the acpi processor id. All
+ * other levels beyond this use a generated value to uniquely identify
+ * a topological feature.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cpu cannot be found.
+ * Otherwise returns a value which represents a unique topological feature.
+ */
+int find_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	return find_acpi_cpu_topology_tag(cpu, level, 0);
+}
+
+/**
+ * find_acpi_cpu_topology_package() - Determine a unique cpu package value
+ * @cpu: Kernel logical cpu number
+ *
+ * Determine a topology unique package ID for the given cpu.
+ * This ID can then be used to group peers, which will have matching ids.
+ *
+ * The search terminates when either a level is found with the PHYSICAL_PACKAGE
+ * flag set or we reach a root node.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cpu cannot be found.
+ * Otherwise returns a value which represents the package for this cpu.
+ */
+int find_acpi_cpu_topology_package(unsigned int cpu)
+{
+	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
+					  ACPI_PPTT_PHYSICAL_PACKAGE);
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index dc1ebfeeb5ec..117d13934487 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1266,4 +1266,7 @@ static inline int lpit_read_residency_count_address(u64 *address)
 }
 #endif
 
+int find_acpi_cpu_topology(unsigned int cpu, int level);
+int find_acpi_cpu_topology_package(unsigned int cpu);
+
 #endif	/*_LINUX_ACPI_H*/
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 09/12] ACPI/PPTT: Add topology parsing code
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

The PPTT can be used to determine the groupings of CPU's at
given levels in the system. Lets add a few routines to the PPTT
parsing code to return a unique id for each unique level in the
processor hierarchy. This can then be matched to build
thread/core/cluster/die/package/etc mappings for each processing
element in the system.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/pptt.c  | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |   3 ++
 2 files changed, 118 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 4f5ab19c3a08..83d89d683f16 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -412,6 +412,79 @@ static void cache_setup_acpi_cpu(struct acpi_table_header *table,
 	}
 }
 
+/* Passing level values greater than this will result in search termination */
+#define PPTT_ABORT_PACKAGE 0xFF
+
+/*
+ * Given an acpi_pptt_processor node, walk up until we identify the
+ * package that the node is associated with, or we run out of levels
+ * to request or the search is terminated with a flag match
+ * The level parameter also serves to limit possible loops within the tree.
+ */
+static struct acpi_pptt_processor *acpi_find_processor_package_id(
+	struct acpi_table_header *table_hdr,
+	struct acpi_pptt_processor *cpu,
+	int level, int flag)
+{
+	struct acpi_pptt_processor *prev_node;
+
+	while (cpu && level) {
+		if (cpu->flags & flag)
+			break;
+		pr_debug("level %d\n", level);
+		prev_node = fetch_pptt_node(table_hdr, cpu->parent);
+		if (prev_node == NULL)
+			break;
+		cpu = prev_node;
+		level--;
+	}
+	return cpu;
+}
+
+/*
+ * Get a unique value given a cpu, and a topology level, that can be
+ * matched to determine which cpus share common topological features
+ * at that level.
+ */
+static int topology_get_acpi_cpu_tag(struct acpi_table_header *table,
+				     unsigned int cpu, int level, int flag)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+
+	cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+	if (cpu_node) {
+		cpu_node = acpi_find_processor_package_id(table, cpu_node,
+							  level, flag);
+		/* Only the first level has a guaranteed id */
+		if (level == 0)
+			return cpu_node->acpi_processor_id;
+		return (int)((u8 *)cpu_node - (u8 *)table);
+	}
+	pr_err_once("PPTT table found, but unable to locate core for %d\n",
+		    cpu);
+	return -ENOENT;
+}
+
+static int find_acpi_cpu_topology_tag(unsigned int cpu, int level, int flag)
+{
+	struct acpi_table_header *table;
+	acpi_status status;
+	int retval;
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
+	if (ACPI_FAILURE(status)) {
+		pr_err_once("No PPTT table found, cpu topology may be inaccurate\n");
+		return -ENOENT;
+	}
+	retval = topology_get_acpi_cpu_tag(table, cpu, level, flag);
+	pr_debug("Topology Setup ACPI cpu %d, level %d ret = %d\n",
+		 cpu, level, retval);
+	acpi_put_table(table);
+
+	return retval;
+}
+
 /**
  * acpi_find_last_cache_level() - Determines the number of cache levels for a PE
  * @cpu: Kernel logical cpu number
@@ -475,3 +548,45 @@ int cache_setup_acpi(unsigned int cpu)
 
 	return status;
 }
+
+/**
+ * find_acpi_cpu_topology() - Determine a unique topology value for a given cpu
+ * @cpu: Kernel logical cpu number
+ * @level: The topological level for which we would like a unique ID
+ *
+ * Determine a topology unique ID for each thread/core/cluster/mc_grouping
+ * /socket/etc. This ID can then be used to group peers, which will have
+ * matching ids.
+ *
+ * The search terminates when either the requested level is found or
+ * we reach a root node. Levels beyond the termination point will return the
+ * same unique ID. The unique id for level 0 is the acpi processor id. All
+ * other levels beyond this use a generated value to uniquely identify
+ * a topological feature.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cpu cannot be found.
+ * Otherwise returns a value which represents a unique topological feature.
+ */
+int find_acpi_cpu_topology(unsigned int cpu, int level)
+{
+	return find_acpi_cpu_topology_tag(cpu, level, 0);
+}
+
+/**
+ * find_acpi_cpu_topology_package() - Determine a unique cpu package value
+ * @cpu: Kernel logical cpu number
+ *
+ * Determine a topology unique package ID for the given cpu.
+ * This ID can then be used to group peers, which will have matching ids.
+ *
+ * The search terminates when either a level is found with the PHYSICAL_PACKAGE
+ * flag set or we reach a root node.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cpu cannot be found.
+ * Otherwise returns a value which represents the package for this cpu.
+ */
+int find_acpi_cpu_topology_package(unsigned int cpu)
+{
+	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
+					  ACPI_PPTT_PHYSICAL_PACKAGE);
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index dc1ebfeeb5ec..117d13934487 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1266,4 +1266,7 @@ static inline int lpit_read_residency_count_address(u64 *address)
 }
 #endif
 
+int find_acpi_cpu_topology(unsigned int cpu, int level);
+int find_acpi_cpu_topology_package(unsigned int cpu);
+
 #endif	/*_LINUX_ACPI_H*/
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 10/12] arm64: topology: rename cluster_id
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton, Vincent Guittot,
	Dietmar Eggemann, Juri Lelli

Lets match the name of the arm64 topology field
to the kernel macro that uses it.

Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h |  4 ++--
 arch/arm64/kernel/topology.c      | 27 ++++++++++++++-------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index c4f2d50491eb..6b10459e6905 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,14 +7,14 @@
 struct cpu_topology {
 	int thread_id;
 	int core_id;
-	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..7b06e263fdd1 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -51,7 +51,7 @@ static int __init get_cpu_for_node(struct device_node *node)
 	return -1;
 }
 
-static int __init parse_core(struct device_node *core, int cluster_id,
+static int __init parse_core(struct device_node *core, int package_id,
 			     int core_id)
 {
 	char name[10];
@@ -67,7 +67,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
-				cpu_topology[cpu].cluster_id = cluster_id;
+				cpu_topology[cpu].package_id = package_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
 			} else {
@@ -89,7 +89,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			return -EINVAL;
 		}
 
-		cpu_topology[cpu].cluster_id = cluster_id;
+		cpu_topology[cpu].package_id = package_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
 		pr_err("%pOF: Can't get CPU for leaf core\n", core);
@@ -105,7 +105,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth)
 	bool leaf = true;
 	bool has_cores = false;
 	struct device_node *c;
-	static int cluster_id __initdata;
+	static int package_id __initdata;
 	int core_id = 0;
 	int i, ret;
 
@@ -144,7 +144,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth)
 			}
 
 			if (leaf) {
-				ret = parse_core(c, cluster_id, core_id++);
+				ret = parse_core(c, package_id, core_id++);
 			} else {
 				pr_err("%pOF: Non-leaf cluster with core %s\n",
 				       cluster, name);
@@ -162,7 +162,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth)
 		pr_warn("%pOF: empty cluster\n", cluster);
 
 	if (leaf)
-		cluster_id++;
+		package_id++;
 
 	return 0;
 }
@@ -198,7 +198,7 @@ static int __init parse_dt_topology(void)
 	 * only mark cores described in the DT as possible.
 	 */
 	for_each_possible_cpu(cpu)
-		if (cpu_topology[cpu].cluster_id == -1)
+		if (cpu_topology[cpu].package_id == -1)
 			ret = -EINVAL;
 
 out_map:
@@ -228,7 +228,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -249,7 +249,7 @@ void store_cpu_topology(unsigned int cpuid)
 	struct cpu_topology *cpuid_topo = &cpu_topology[cpuid];
 	u64 mpidr;
 
-	if (cpuid_topo->cluster_id != -1)
+	if (cpuid_topo->package_id != -1)
 		goto topology_populated;
 
 	mpidr = read_cpuid_mpidr();
@@ -263,19 +263,19 @@ void store_cpu_topology(unsigned int cpuid)
 		/* Multiprocessor system : Multi-threads per core */
 		cpuid_topo->thread_id  = MPIDR_AFFINITY_LEVEL(mpidr, 0);
 		cpuid_topo->core_id    = MPIDR_AFFINITY_LEVEL(mpidr, 1);
-		cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) |
+		cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8;
 	} else {
 		/* Multiprocessor system : Single-thread per core */
 		cpuid_topo->thread_id  = -1;
 		cpuid_topo->core_id    = MPIDR_AFFINITY_LEVEL(mpidr, 0);
-		cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) |
+		cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
-		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
+		 cpuid, cpuid_topo->package_id, cpuid_topo->core_id,
 		 cpuid_topo->thread_id, mpidr);
 
 topology_populated:
@@ -291,7 +291,7 @@ static void __init reset_cpu_topology(void)
 
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
-		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
@@ -300,6 +300,7 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 10/12] arm64: topology: rename cluster_id
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

Lets match the name of the arm64 topology field
to the kernel macro that uses it.

Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/include/asm/topology.h |  4 ++--
 arch/arm64/kernel/topology.c      | 27 ++++++++++++++-------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index c4f2d50491eb..6b10459e6905 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -7,14 +7,14 @@
 struct cpu_topology {
 	int thread_id;
 	int core_id;
-	int cluster_id;
+	int package_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
+#define topology_physical_package_id(cpu)	(cpu_topology[cpu].package_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b233e6ce..7b06e263fdd1 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -51,7 +51,7 @@ static int __init get_cpu_for_node(struct device_node *node)
 	return -1;
 }
 
-static int __init parse_core(struct device_node *core, int cluster_id,
+static int __init parse_core(struct device_node *core, int package_id,
 			     int core_id)
 {
 	char name[10];
@@ -67,7 +67,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			leaf = false;
 			cpu = get_cpu_for_node(t);
 			if (cpu >= 0) {
-				cpu_topology[cpu].cluster_id = cluster_id;
+				cpu_topology[cpu].package_id = package_id;
 				cpu_topology[cpu].core_id = core_id;
 				cpu_topology[cpu].thread_id = i;
 			} else {
@@ -89,7 +89,7 @@ static int __init parse_core(struct device_node *core, int cluster_id,
 			return -EINVAL;
 		}
 
-		cpu_topology[cpu].cluster_id = cluster_id;
+		cpu_topology[cpu].package_id = package_id;
 		cpu_topology[cpu].core_id = core_id;
 	} else if (leaf) {
 		pr_err("%pOF: Can't get CPU for leaf core\n", core);
@@ -105,7 +105,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth)
 	bool leaf = true;
 	bool has_cores = false;
 	struct device_node *c;
-	static int cluster_id __initdata;
+	static int package_id __initdata;
 	int core_id = 0;
 	int i, ret;
 
@@ -144,7 +144,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth)
 			}
 
 			if (leaf) {
-				ret = parse_core(c, cluster_id, core_id++);
+				ret = parse_core(c, package_id, core_id++);
 			} else {
 				pr_err("%pOF: Non-leaf cluster with core %s\n",
 				       cluster, name);
@@ -162,7 +162,7 @@ static int __init parse_cluster(struct device_node *cluster, int depth)
 		pr_warn("%pOF: empty cluster\n", cluster);
 
 	if (leaf)
-		cluster_id++;
+		package_id++;
 
 	return 0;
 }
@@ -198,7 +198,7 @@ static int __init parse_dt_topology(void)
 	 * only mark cores described in the DT as possible.
 	 */
 	for_each_possible_cpu(cpu)
-		if (cpu_topology[cpu].cluster_id == -1)
+		if (cpu_topology[cpu].package_id == -1)
 			ret = -EINVAL;
 
 out_map:
@@ -228,7 +228,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	for_each_possible_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
-		if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
+		if (cpuid_topo->package_id != cpu_topo->package_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
@@ -249,7 +249,7 @@ void store_cpu_topology(unsigned int cpuid)
 	struct cpu_topology *cpuid_topo = &cpu_topology[cpuid];
 	u64 mpidr;
 
-	if (cpuid_topo->cluster_id != -1)
+	if (cpuid_topo->package_id != -1)
 		goto topology_populated;
 
 	mpidr = read_cpuid_mpidr();
@@ -263,19 +263,19 @@ void store_cpu_topology(unsigned int cpuid)
 		/* Multiprocessor system : Multi-threads per core */
 		cpuid_topo->thread_id  = MPIDR_AFFINITY_LEVEL(mpidr, 0);
 		cpuid_topo->core_id    = MPIDR_AFFINITY_LEVEL(mpidr, 1);
-		cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) |
+		cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8;
 	} else {
 		/* Multiprocessor system : Single-thread per core */
 		cpuid_topo->thread_id  = -1;
 		cpuid_topo->core_id    = MPIDR_AFFINITY_LEVEL(mpidr, 0);
-		cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) |
+		cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 |
 					 MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16;
 	}
 
 	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
-		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
+		 cpuid, cpuid_topo->package_id, cpuid_topo->core_id,
 		 cpuid_topo->thread_id, mpidr);
 
 topology_populated:
@@ -291,7 +291,7 @@ static void __init reset_cpu_topology(void)
 
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
-		cpu_topo->cluster_id = -1;
+		cpu_topo->package_id = -1;
 
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
@@ -300,6 +300,7 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+
 void __init init_cpu_topology(void)
 {
 	reset_cpu_topology();
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton, Juri Lelli

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by calling find_acpi_cpu_topology_package() which terminates
its search when it finds an ACPI node flagged as the physical
package. If the tree doesn't contain enough levels to represent
all of the requested levels then the root node will be returned
for all subsequent levels.

Cc: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 7b06e263fdd1..ce8ec7fd6b32 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	bool is_threaded;
+	int cpu, topology_id;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = find_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if (is_threaded) {
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = find_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = find_acpi_cpu_topology_package(cpu);
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			cpu_topology[cpu].thread_id  = -1;
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = find_acpi_cpu_topology_package(cpu);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+
+	return 0;
+}
+
+#else
+static inline int __init parse_acpi_topology(void)
+{
+	return -EINVAL;
+}
+#endif
 
 void __init init_cpu_topology(void)
 {
@@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by calling find_acpi_cpu_topology_package() which terminates
its search when it finds an ACPI node flagged as the physical
package. If the tree doesn't contain enough levels to represent
all of the requested levels then the root node will be returned
for all subsequent levels.

Cc: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 7b06e263fdd1..ce8ec7fd6b32 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
  * for more details.
  */
 
+#include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -22,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/sched/topology.h>
 #include <linux/slab.h>
+#include <linux/smp.h>
 #include <linux/string.h>
 
 #include <asm/cpu.h>
@@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Propagate the topology information of the processor_topology_node tree to the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+	bool is_threaded;
+	int cpu, topology_id;
+
+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+
+	for_each_possible_cpu(cpu) {
+		topology_id = find_acpi_cpu_topology(cpu, 0);
+		if (topology_id < 0)
+			return topology_id;
+
+		if (is_threaded) {
+			cpu_topology[cpu].thread_id = topology_id;
+			topology_id = find_acpi_cpu_topology(cpu, 1);
+			cpu_topology[cpu].core_id   = topology_id;
+			topology_id = find_acpi_cpu_topology_package(cpu);
+			cpu_topology[cpu].package_id = topology_id;
+		} else {
+			cpu_topology[cpu].thread_id  = -1;
+			cpu_topology[cpu].core_id    = topology_id;
+			topology_id = find_acpi_cpu_topology_package(cpu);
+			cpu_topology[cpu].package_id = topology_id;
+		}
+	}
+
+	return 0;
+}
+
+#else
+static inline int __init parse_acpi_topology(void)
+{
+	return -EINVAL;
+}
+#endif
 
 void __init init_cpu_topology(void)
 {
@@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
 	 * Discard anything that was parsed if we hit an error so we
 	 * don't use partial information.
 	 */
-	if (of_have_populated_dt() && parse_dt_topology())
+	if ((!acpi_disabled) && parse_acpi_topology())
+		reset_cpu_topology();
+	else if (of_have_populated_dt() && parse_dt_topology())
 		reset_cpu_topology();
 }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 12/12] ACPI: Add PPTT to injectable table list
  2018-01-13  0:59 ` Jeremy Linton
@ 2018-01-13  0:59   ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Jeremy Linton, Geoffrey Blake

Add ACPI_SIG_PPTT to the table so initrd's can override the
system topology.

Signed-off-by: Geoffrey Blake <geoffrey.blake@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/tables.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 80ce2a7d224b..6d254450115b 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -456,7 +456,8 @@ static const char * const table_sigs[] = {
 	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
 	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
 	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
+	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, ACPI_SIG_PPTT,
+	NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v6 12/12] ACPI: Add PPTT to injectable table list
@ 2018-01-13  0:59   ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-13  0:59 UTC (permalink / raw)
  To: linux-arm-kernel

Add ACPI_SIG_PPTT to the table so initrd's can override the
system topology.

Signed-off-by: Geoffrey Blake <geoffrey.blake@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/acpi/tables.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 80ce2a7d224b..6d254450115b 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -456,7 +456,8 @@ static const char * const table_sigs[] = {
 	ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
 	ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
 	ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
+	ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, ACPI_SIG_PPTT,
+	NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 01/12] drivers: base: cacheinfo: move cache_setup_of_node()
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 12:23     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 12:23 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:09PM -0600, Jeremy Linton wrote:
> In preparation for the next patch, and to aid in
> review of that patch, lets move cache_setup_of_node
> farther down in the module without any changes.
> 
  ^^^^
s/farther/further/

Makes sense

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 01/12] drivers: base: cacheinfo: move cache_setup_of_node()
@ 2018-01-15 12:23     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:09PM -0600, Jeremy Linton wrote:
> In preparation for the next patch, and to aid in
> review of that patch, lets move cache_setup_of_node
> farther down in the module without any changes.
> 
  ^^^^
s/farther/further/

Makes sense

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 12:33     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 12:33 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Palmer Dabbelt, Albert Ou, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
> The original intent in cacheinfo was that an architecture
> specific populate_cache_leaves() would probe the hardware
> and then cache_shared_cpu_map_setup() and
> cache_override_properties() would provide firmware help to
> extend/expand upon what was probed. Arm64 was really
> the only architecture that was working this way, and
> with the removal of most of the hardware probing logic it
> became clear that it was possible to simplify the logic a bit.
> 
> This patch combines the walk of the DT nodes with the
> code updating the cache size/line_size and nr_sets.
> cache_override_properties() (which was DT specific) is
> then removed. The result is that cacheinfo.of_node is
> no longer used as a temporary place to hold DT references
> for future calls that update cache properties. That change
> helps to clarify its one remaining use (matching
> cacheinfo nodes that represent shared caches) which
> will be used by the ACPI/PPTT code in the following patches.
> 
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <albert@sifive.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/riscv/kernel/cacheinfo.c |  1 +
>  drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
>  include/linux/cacheinfo.h     |  1 +
>  3 files changed, 31 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
> index 10ed2749e246..6f4500233cf8 100644
> --- a/arch/riscv/kernel/cacheinfo.c
> +++ b/arch/riscv/kernel/cacheinfo.c
> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>  		CACHE_WRITE_BACK
>  		| CACHE_READ_ALLOCATE
>  		| CACHE_WRITE_ALLOCATE;
> +	cache_of_set_props(this_leaf, node);

This may be necessary but can it be done as later patch ? So far nothing
is added that may break riscv IIUC.

Palmer, Albert,

Can you confirm ? Also, as I see we can thin down arch specific
implementation on riscv if it's just using DT like ARM64. Sorry if
I am missing to see something, so thought of checking.

[...]

> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index 3d9805297cda..d35299a590a4 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>  int init_cache_level(unsigned int cpu);
>  int populate_cache_leaves(unsigned int cpu);
> +void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
>

IIUC riscv is the only user for this outside of cacheinfo.c, right ?
Hopefully we can get rid of it.

Other than that, it looks OK. I will wait for response from riscv team
do that these riscv related changes can be dropped or move to later
patch if really needed.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-15 12:33     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
> The original intent in cacheinfo was that an architecture
> specific populate_cache_leaves() would probe the hardware
> and then cache_shared_cpu_map_setup() and
> cache_override_properties() would provide firmware help to
> extend/expand upon what was probed. Arm64 was really
> the only architecture that was working this way, and
> with the removal of most of the hardware probing logic it
> became clear that it was possible to simplify the logic a bit.
> 
> This patch combines the walk of the DT nodes with the
> code updating the cache size/line_size and nr_sets.
> cache_override_properties() (which was DT specific) is
> then removed. The result is that cacheinfo.of_node is
> no longer used as a temporary place to hold DT references
> for future calls that update cache properties. That change
> helps to clarify its one remaining use (matching
> cacheinfo nodes that represent shared caches) which
> will be used by the ACPI/PPTT code in the following patches.
> 
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <albert@sifive.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/riscv/kernel/cacheinfo.c |  1 +
>  drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
>  include/linux/cacheinfo.h     |  1 +
>  3 files changed, 31 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
> index 10ed2749e246..6f4500233cf8 100644
> --- a/arch/riscv/kernel/cacheinfo.c
> +++ b/arch/riscv/kernel/cacheinfo.c
> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>  		CACHE_WRITE_BACK
>  		| CACHE_READ_ALLOCATE
>  		| CACHE_WRITE_ALLOCATE;
> +	cache_of_set_props(this_leaf, node);

This may be necessary but can it be done as later patch ? So far nothing
is added that may break riscv IIUC.

Palmer, Albert,

Can you confirm ? Also, as I see we can thin down arch specific
implementation on riscv if it's just using DT like ARM64. Sorry if
I am missing to see something, so thought of checking.

[...]

> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index 3d9805297cda..d35299a590a4 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>  int init_cache_level(unsigned int cpu);
>  int populate_cache_leaves(unsigned int cpu);
> +void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
>

IIUC riscv is the only user for this outside of cacheinfo.c, right ?
Hopefully we can get rid of it.

Other than that, it looks OK. I will wait for response from riscv team
do that these riscv related changes can be dropped or move to later
patch if really needed.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 03/12] cacheinfo: rename of_node to fw_unique
  2018-01-13  0:59   ` Jeremy Linton
  (?)
@ 2018-01-15 12:36     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 12:36 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: mark.rutland, Jonathan.Zhang, austinwc, viresh.kumar,
	Palmer Dabbelt, will.deacon, morten.rasmussen, vkilari,
	Jayachandran.Nair, lorenzo.pieralisi, jhugo, wangxiongfeng2,
	linux-acpi, catalin.marinas, lenb, linux-pm, ahs3,
	linux-arm-kernel, gregkh, rjw, linux-kernel, hanjun.guo,
	Albert Ou, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:11PM -0600, Jeremy Linton wrote:
> Rename and change the type of of_node to indicate
> it is a generic pointer which is generally only used
> for comparison purposes. In a later patch we will apply
> an ACPI/PPTT "token" pointer in fw_unique so that
> the code which builds the shared cpu masks can be reused.
> 

[Nit] as used in the commit log, I prefer fw_token over fw_unique.
But I am fine if you/others prefer fw_unique.

Otherwise,

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 03/12] cacheinfo: rename of_node to fw_unique
@ 2018-01-15 12:36     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 12:36 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Palmer Dabbelt, Albert Ou, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:11PM -0600, Jeremy Linton wrote:
> Rename and change the type of of_node to indicate
> it is a generic pointer which is generally only used
> for comparison purposes. In a later patch we will apply
> an ACPI/PPTT "token" pointer in fw_unique so that
> the code which builds the shared cpu masks can be reused.
> 

[Nit] as used in the commit log, I prefer fw_token over fw_unique.
But I am fine if you/others prefer fw_unique.

Otherwise,

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 03/12] cacheinfo: rename of_node to fw_unique
@ 2018-01-15 12:36     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 12:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:11PM -0600, Jeremy Linton wrote:
> Rename and change the type of of_node to indicate
> it is a generic pointer which is generally only used
> for comparison purposes. In a later patch we will apply
> an ACPI/PPTT "token" pointer in fw_unique so that
> the code which builds the shared cpu masks can be reused.
> 

[Nit] as used in the commit log, I prefer fw_token over fw_unique.
But I am fine if you/others prefer fw_unique.

Otherwise,

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 04/12] arm64/acpi: Create arch specific cpu to acpi id helper
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 13:46     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 13:46 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:12PM -0600, Jeremy Linton wrote:
> Its helpful to be able to lookup the acpi_processor_id associated
> with a logical cpu. Provide an arm64 helper to do this.
> 

Makes sense to me, but wondering if we need a weak implementation for
CONFIG_ACPI. Anyways, will check when this gets used in the series.

Not sure if this can be squashed into the patch using it, anyways looks
good on its own.

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 04/12] arm64/acpi: Create arch specific cpu to acpi id helper
@ 2018-01-15 13:46     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:12PM -0600, Jeremy Linton wrote:
> Its helpful to be able to lookup the acpi_processor_id associated
> with a logical cpu. Provide an arm64 helper to do this.
> 

Makes sense to me, but wondering if we need a weak implementation for
CONFIG_ACPI. Anyways, will check when this gets used in the series.

Not sure if this can be squashed into the patch using it, anyways looks
good on its own.

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 06/12] ACPI: Enable PPTT support on ARM64
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 13:52     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 13:52 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:14PM -0600, Jeremy Linton wrote:
> Now that we have a PPTT parser, in preparation for its use
> on arm64, lets build it.
> 

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 06/12] ACPI: Enable PPTT support on ARM64
@ 2018-01-15 13:52     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:14PM -0600, Jeremy Linton wrote:
> Now that we have a PPTT parser, in preparation for its use
> on arm64, lets build it.
> 

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 08/12] arm64: Add support for ACPI based firmware tables
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 13:54     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 13:54 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:16PM -0600, Jeremy Linton wrote:
> The /sys cache entries should support ACPI/PPTT generated cache
> topology information. Lets detect ACPI systems and call
> an arch specific cache_setup_acpi() routine to update the hardware
> probed cache topology.
> 
> For arm64, if ACPI is enabled, determine the max number of cache
> levels and populate them using the PPTT table if one is available.
> 

Looks good,

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 08/12] arm64: Add support for ACPI based firmware tables
@ 2018-01-15 13:54     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:16PM -0600, Jeremy Linton wrote:
> The /sys cache entries should support ACPI/PPTT generated cache
> topology information. Lets detect ACPI systems and call
> an arch specific cache_setup_acpi() routine to update the hardware
> probed cache topology.
> 
> For arm64, if ACPI is enabled, determine the max number of cache
> levels and populate them using the PPTT table if one is available.
> 

Looks good,

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 14:58     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 14:58 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:13PM -0600, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> An additional patch later in the set adds the ability to report
> peers in the topology using find_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 476 insertions(+)
>  create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..2c4b3ed862a8
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,0 +1,476 @@
> +/*
> + * Copyright (C) 2018, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + *
> + * The PPTT structure is an inverted tree, with each node potentially
> + * holding one or two inverted tree data structures describing
> + * the caches available at that level. Each cache structure optionally
> + * contains properties describing the cache at a given level which can be
> + * used to override hardware probed values.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/* total number of attributes checked by the properties code */
> +#define PPTT_CHECKED_ATTRIBUTES 6

See comment on this below. If we retain this, move it closer to the usage so
that it's easier to understand what it actually stands for.

> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (pptt_ref < sizeof(struct acpi_subtable_header))
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
> +								 pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
> +							     pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 *ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
> +	ref += resource;
> +
> +	return fetch_pptt_subtable(table_hdr, *ref);
> +}
> +
> +/*
> + * Attempt to find a given cache level, while counting the max number
> + * of cache levels for the cache node.
> + *
> + * Given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if ((*found != NULL) && (cache != *found))
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * Given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/*
> + * Determine if the *node parameter is a leaf node by iterating the
> + * PPTT table, looking for nodes which reference it.
> + * Return 0 if we find a node referencing the passed node,
> + * or 1 if we don't.
> + */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = ACPI_PTR_DIFF(node, table_hdr);
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +
> +	while ((unsigned long)(entry + 1) < table_end) {

Is entry + 1 check sufficient to access entry of length ?
Shouldn't that be entry + sizeof(struct acpi_pptt_processor *) so that
we are sure it's valid entry ?

> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor.
> + * This is done by iterating the PPTT table looking for processor nodes
> + * which have an acpi_processor_id that matches the acpi_cpu_id parameter
> + * passed into the function. If we find a node that matches this criteria
> + * we verify that its a leaf node in the topology rather than depending
> + * on the valid flag, which doesn't need to be set for leaf nodes.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while ((unsigned long)(entry + 1) < table_end) {

Same comment as above on entry + 1.

> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (acpi_cpu_id == cpu_node->acpi_processor_id) &&
> +		     acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			return (struct acpi_pptt_processor *)entry;
> +		}
> +
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +static int acpi_find_cache_levels(struct acpi_table_header *table_hdr,
> +				  u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +/* Convert the linux cache_type to a ACPI PPTT cache type value */
> +static u8 acpi_cache_type(enum cache_type type)
> +{

[nit] Just wondering if we can avoid this with some static mapping:

static u8 acpi_cache_type[] = {
        [CACHE_TYPE_NONE] = 0,
        [CACHE_TYPE_DATA] = ACPI_PPTT_CACHE_TYPE_DATA,
        [CACHE_TYPE_INST] = ACPI_PPTT_CACHE_TYPE_INSTR,
        [CACHE_TYPE_UNIFIED] = ACPI_PPTT_CACHE_TYPE_UNIFIED,
};

> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_PPTT_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_PPTT_CACHE_TYPE_INSTR;
> +	default:
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		/*
> +		 * It is important that ACPI_PPTT_CACHE_TYPE_UNIFIED
> +		 * contains the bit pattern that will match both
> +		 * ACPI unified bit patterns because we use it later
> +		 * to match both cases.
> +		 */
> +		return ACPI_PPTT_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +
> +	while ((cpu_node) && (!found)) {
> +		found = acpi_find_cache_level(table_hdr, cpu_node,
> +					      &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	}
> +
> +	return found;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.

Though ARM64 is only user now, it may get obsolete, so better to drop that
comment.

> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	int valid_flags = 0;
> +
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
> +		this_leaf->size = found_cache->size;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID) {
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID) {
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID) {
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_PPTT_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_PPTT_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		}
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_PPTT_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_PPTT_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_PPTT_CACHE_RW_ALLOCATE:
> +		case ACPI_PPTT_CACHE_RW_ALLOCATE_ALT:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE;
> +			break;
> +		}
> +		valid_flags++;
> +	}
> +	/*
> +	 * If all the above flags are valid, and the cache type is NOCACHE
> +	 * update the cache type as well.
> +	 */

I am not sure if it makes sense to mandate at least last 2 (read allocate
and write policy). They can be optional.

> +	if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
> +	    (valid_flags == PPTT_CHECKED_ATTRIBUTES))
> +		this_leaf->type = CACHE_TYPE_UNIFIED;
> +}
> +
> +/*
> + * Update the kernel cache information for each level of cache
> + * associated with the given acpi cpu.
> + */
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);

[nit] unnecessary line break ?

> +
> +		index++;
> +	}
> +}
> +
> +/**
> + * acpi_find_last_cache_level() - Determines the number of cache levels for a PE

[nit] PE ? I think you mean processing element, but that's too ARM ARM thingy
:), can you s/PE/CPU ?

> + * @cpu: Kernel logical cpu number
> + *
> + * Given a logical cpu number, returns the number of levels of cache represented
> + * in the PPTT. Errors caused by lack of a PPTT table, or otherwise, return 0
> + * indicating we didn't find any cache levels.
> + *
> + * Return: Cache levels visible to this core.
> + */
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_find_cache_levels(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/**
> + * cache_setup_acpi() - Override CPU cache topology with data from the PPTT

[nit]			  ^^^^ may be override/setup or just setup ?

> + * @cpu: Kernel logical cpu number

[nit] kernel is implicit, no ?

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2018-01-15 14:58     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:13PM -0600, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> An additional patch later in the set adds the ability to report
> peers in the topology using find_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 476 insertions(+)
>  create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..2c4b3ed862a8
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,0 +1,476 @@
> +/*
> + * Copyright (C) 2018, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + *
> + * The PPTT structure is an inverted tree, with each node potentially
> + * holding one or two inverted tree data structures describing
> + * the caches available at that level. Each cache structure optionally
> + * contains properties describing the cache at a given level which can be
> + * used to override hardware probed values.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/* total number of attributes checked by the properties code */
> +#define PPTT_CHECKED_ATTRIBUTES 6

See comment on this below. If we retain this, move it closer to the usage so
that it's easier to understand what it actually stands for.

> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (pptt_ref < sizeof(struct acpi_subtable_header))
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
> +								 pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
> +							     pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 *ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
> +	ref += resource;
> +
> +	return fetch_pptt_subtable(table_hdr, *ref);
> +}
> +
> +/*
> + * Attempt to find a given cache level, while counting the max number
> + * of cache levels for the cache node.
> + *
> + * Given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if ((*found != NULL) && (cache != *found))
> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
> +
> +			pr_debug("Found cache @ level %d\n", level);
> +			*found = cache;
> +			/*
> +			 * continue looking at this node's resource list
> +			 * to verify that we don't find a duplicate
> +			 * cache node.
> +			 */
> +		}
> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> +	}
> +	return local_level;
> +}
> +
> +/*
> + * Given a CPU node look for cache levels that exist at this level, and then
> + * for each cache node, count how many levels exist below (logically above) it.
> + * If a level and type are specified, and we find that level/type, abort
> + * processing and return the acpi_pptt_cache structure.
> + */
> +static struct acpi_pptt_cache *acpi_find_cache_level(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *cpu_node,
> +	int *starting_level, int level, int type)
> +{
> +	struct acpi_subtable_header *res;
> +	int number_of_levels = *starting_level;
> +	int resource = 0;
> +	struct acpi_pptt_cache *ret = NULL;
> +	int local_level;
> +
> +	/* walk down from processor node */
> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
> +		resource++;
> +
> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
> +						   res, &ret, level, type);
> +		/*
> +		 * we are looking for the max depth. Since its potentially
> +		 * possible for a given node to have resources with differing
> +		 * depths verify that the depth we have found is the largest.
> +		 */
> +		if (number_of_levels < local_level)
> +			number_of_levels = local_level;
> +	}
> +	if (number_of_levels > *starting_level)
> +		*starting_level = number_of_levels;
> +
> +	return ret;
> +}
> +
> +/*
> + * Given a processor node containing a processing unit, walk into it and count
> + * how many levels exist solely for it, and then walk up each level until we hit
> + * the root node (ignore the package level because it may be possible to have
> + * caches that exist across packages). Count the number of cache levels that
> + * exist at each level on the way up.
> + */
> +static int acpi_process_node(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node)
> +{
> +	int total_levels = 0;
> +
> +	do {
> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	} while (cpu_node);
> +
> +	return total_levels;
> +}
> +
> +/*
> + * Determine if the *node parameter is a leaf node by iterating the
> + * PPTT table, looking for nodes which reference it.
> + * Return 0 if we find a node referencing the passed node,
> + * or 1 if we don't.
> + */
> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
> +			       struct acpi_pptt_processor *node)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	u32 node_entry;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	node_entry = ACPI_PTR_DIFF(node, table_hdr);
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +
> +	while ((unsigned long)(entry + 1) < table_end) {

Is entry + 1 check sufficient to access entry of length ?
Shouldn't that be entry + sizeof(struct acpi_pptt_processor *) so that
we are sure it's valid entry ?

> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (cpu_node->parent == node_entry))
> +			return 0;
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Find the subtable entry describing the provided processor.
> + * This is done by iterating the PPTT table looking for processor nodes
> + * which have an acpi_processor_id that matches the acpi_cpu_id parameter
> + * passed into the function. If we find a node that matches this criteria
> + * we verify that its a leaf node in the topology rather than depending
> + * on the valid flag, which doesn't need to be set for leaf nodes.
> + */
> +static struct acpi_pptt_processor *acpi_find_processor_node(
> +	struct acpi_table_header *table_hdr,
> +	u32 acpi_cpu_id)
> +{
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	struct acpi_pptt_processor *cpu_node;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +
> +	/* find the processor structure associated with this cpuid */
> +	while ((unsigned long)(entry + 1) < table_end) {

Same comment as above on entry + 1.

> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +
> +		if (entry->length == 0) {
> +			pr_err("Invalid zero length subtable\n");
> +			break;
> +		}
> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
> +		    (acpi_cpu_id == cpu_node->acpi_processor_id) &&
> +		     acpi_pptt_leaf_node(table_hdr, cpu_node)) {
> +			return (struct acpi_pptt_processor *)entry;
> +		}
> +
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +
> +	return NULL;
> +}
> +
> +static int acpi_find_cache_levels(struct acpi_table_header *table_hdr,
> +				  u32 acpi_cpu_id)
> +{
> +	int number_of_levels = 0;
> +	struct acpi_pptt_processor *cpu;
> +
> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +	if (cpu)
> +		number_of_levels = acpi_process_node(table_hdr, cpu);
> +
> +	return number_of_levels;
> +}
> +
> +/* Convert the linux cache_type to a ACPI PPTT cache type value */
> +static u8 acpi_cache_type(enum cache_type type)
> +{

[nit] Just wondering if we can avoid this with some static mapping:

static u8 acpi_cache_type[] = {
        [CACHE_TYPE_NONE] = 0,
        [CACHE_TYPE_DATA] = ACPI_PPTT_CACHE_TYPE_DATA,
        [CACHE_TYPE_INST] = ACPI_PPTT_CACHE_TYPE_INSTR,
        [CACHE_TYPE_UNIFIED] = ACPI_PPTT_CACHE_TYPE_UNIFIED,
};

> +	switch (type) {
> +	case CACHE_TYPE_DATA:
> +		pr_debug("Looking for data cache\n");
> +		return ACPI_PPTT_CACHE_TYPE_DATA;
> +	case CACHE_TYPE_INST:
> +		pr_debug("Looking for instruction cache\n");
> +		return ACPI_PPTT_CACHE_TYPE_INSTR;
> +	default:
> +	case CACHE_TYPE_UNIFIED:
> +		pr_debug("Looking for unified cache\n");
> +		/*
> +		 * It is important that ACPI_PPTT_CACHE_TYPE_UNIFIED
> +		 * contains the bit pattern that will match both
> +		 * ACPI unified bit patterns because we use it later
> +		 * to match both cases.
> +		 */
> +		return ACPI_PPTT_CACHE_TYPE_UNIFIED;
> +	}
> +}
> +
> +/* find the ACPI node describing the cache type/level for the given CPU */
> +static struct acpi_pptt_cache *acpi_find_cache_node(
> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
> +	enum cache_type type, unsigned int level,
> +	struct acpi_pptt_processor **node)
> +{
> +	int total_levels = 0;
> +	struct acpi_pptt_cache *found = NULL;
> +	struct acpi_pptt_processor *cpu_node;
> +	u8 acpi_type = acpi_cache_type(type);
> +
> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
> +		 acpi_cpu_id, level, acpi_type);
> +
> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
> +
> +	while ((cpu_node) && (!found)) {
> +		found = acpi_find_cache_level(table_hdr, cpu_node,
> +					      &total_levels, level, acpi_type);
> +		*node = cpu_node;
> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +	}
> +
> +	return found;
> +}
> +
> +/*
> + * The ACPI spec implies that the fields in the cache structures are used to
> + * extend and correct the information probed from the hardware. In the case
> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.

Though ARM64 is only user now, it may get obsolete, so better to drop that
comment.

> + */
> +static void update_cache_properties(struct cacheinfo *this_leaf,
> +				    struct acpi_pptt_cache *found_cache,
> +				    struct acpi_pptt_processor *cpu_node)
> +{
> +	int valid_flags = 0;
> +
> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
> +		this_leaf->size = found_cache->size;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID) {
> +		this_leaf->coherency_line_size = found_cache->line_size;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID) {
> +		this_leaf->number_of_sets = found_cache->number_of_sets;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID) {
> +		this_leaf->ways_of_associativity = found_cache->associativity;
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
> +		case ACPI_PPTT_CACHE_POLICY_WT:
> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
> +			break;
> +		case ACPI_PPTT_CACHE_POLICY_WB:
> +			this_leaf->attributes = CACHE_WRITE_BACK;
> +			break;
> +		}
> +		valid_flags++;
> +	}
> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
> +		case ACPI_PPTT_CACHE_READ_ALLOCATE:
> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
> +			break;
> +		case ACPI_PPTT_CACHE_WRITE_ALLOCATE:
> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
> +			break;
> +		case ACPI_PPTT_CACHE_RW_ALLOCATE:
> +		case ACPI_PPTT_CACHE_RW_ALLOCATE_ALT:
> +			this_leaf->attributes |=
> +				CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE;
> +			break;
> +		}
> +		valid_flags++;
> +	}
> +	/*
> +	 * If all the above flags are valid, and the cache type is NOCACHE
> +	 * update the cache type as well.
> +	 */

I am not sure if it makes sense to mandate at least last 2 (read allocate
and write policy). They can be optional.

> +	if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
> +	    (valid_flags == PPTT_CHECKED_ATTRIBUTES))
> +		this_leaf->type = CACHE_TYPE_UNIFIED;
> +}
> +
> +/*
> + * Update the kernel cache information for each level of cache
> + * associated with the given acpi cpu.
> + */
> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
> +				 unsigned int cpu)
> +{
> +	struct acpi_pptt_cache *found_cache;
> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +	struct cacheinfo *this_leaf;
> +	unsigned int index = 0;
> +	struct acpi_pptt_processor *cpu_node = NULL;
> +
> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
> +		this_leaf = this_cpu_ci->info_list + index;
> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						   this_leaf->type,
> +						   this_leaf->level,
> +						   &cpu_node);
> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
> +		if (found_cache)
> +			update_cache_properties(this_leaf,
> +						found_cache,
> +						cpu_node);

[nit] unnecessary line break ?

> +
> +		index++;
> +	}
> +}
> +
> +/**
> + * acpi_find_last_cache_level() - Determines the number of cache levels for a PE

[nit] PE ? I think you mean processing element, but that's too ARM ARM thingy
:), can you s/PE/CPU ?

> + * @cpu: Kernel logical cpu number
> + *
> + * Given a logical cpu number, returns the number of levels of cache represented
> + * in the PPTT. Errors caused by lack of a PPTT table, or otherwise, return 0
> + * indicating we didn't find any cache levels.
> + *
> + * Return: Cache levels visible to this core.
> + */
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	u32 acpi_cpu_id;
> +	struct acpi_table_header *table;
> +	int number_of_levels = 0;
> +	acpi_status status;
> +
> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
> +
> +	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
> +	if (ACPI_FAILURE(status)) {
> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
> +	} else {
> +		number_of_levels = acpi_find_cache_levels(table, acpi_cpu_id);
> +		acpi_put_table(table);
> +	}
> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
> +
> +	return number_of_levels;
> +}
> +
> +/**
> + * cache_setup_acpi() - Override CPU cache topology with data from the PPTT

[nit]			  ^^^^ may be override/setup or just setup ?

> + * @cpu: Kernel logical cpu number

[nit] kernel is implicit, no ?

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 15:06     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 15:06 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
> node which can be used to match identical caches across cores. Also
> stub out cache_setup_acpi() so that individual architectures can
> enable ACPI topology parsing.
>

You need to reword above message as you no longer add a new entry.
Other than a minor nit below, looks good.

With those changes done,
Acked-by: Sudeep Holla <sudeep.holla@arm.com>

> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c       |  1 +
>  drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>  include/linux/cacheinfo.h |  9 +++++++++
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 2c4b3ed862a8..4f5ab19c3a08 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  {
>  	int valid_flags = 0;
>  
> +	this_leaf->fw_unique = cpu_node;
>  	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>  		this_leaf->size = found_cache->size;
>  		valid_flags++;
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 217aa90fb036..ee51e33cc37c 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>  
>  	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>  		return -ENOENT;
> -
>  	return 0;
>  }
> +

[nit] may be it looks better now but still unnecessary, please drop in
the context of $subject patch.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
@ 2018-01-15 15:06     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
> node which can be used to match identical caches across cores. Also
> stub out cache_setup_acpi() so that individual architectures can
> enable ACPI topology parsing.
>

You need to reword above message as you no longer add a new entry.
Other than a minor nit below, looks good.

With those changes done,
Acked-by: Sudeep Holla <sudeep.holla@arm.com>

> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c       |  1 +
>  drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>  include/linux/cacheinfo.h |  9 +++++++++
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 2c4b3ed862a8..4f5ab19c3a08 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  {
>  	int valid_flags = 0;
>  
> +	this_leaf->fw_unique = cpu_node;
>  	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>  		this_leaf->size = found_cache->size;
>  		valid_flags++;
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 217aa90fb036..ee51e33cc37c 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>  
>  	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>  		return -ENOENT;
> -
>  	return 0;
>  }
> +

[nit] may be it looks better now but still unnecessary, please drop in
the context of $subject patch.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-15 15:48     ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 15:48 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Sudeep Holla

On Fri, Jan 12, 2018 at 06:59:13PM -0600, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> An additional patch later in the set adds the ability to report
> peers in the topology using find_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 476 insertions(+)
>  create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..2c4b3ed862a8
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,0 +1,476 @@
> +/*
> + * Copyright (C) 2018, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + *
> + * The PPTT structure is an inverted tree, with each node potentially
> + * holding one or two inverted tree data structures describing
> + * the caches available at that level. Each cache structure optionally
> + * contains properties describing the cache at a given level which can be
> + * used to override hardware probed values.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/* total number of attributes checked by the properties code */
> +#define PPTT_CHECKED_ATTRIBUTES 6
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (pptt_ref < sizeof(struct acpi_subtable_header))
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
> +								 pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
> +							     pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 *ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
> +	ref += resource;
> +
> +	return fetch_pptt_subtable(table_hdr, *ref);
> +}
> +
> +/*
> + * Attempt to find a given cache level, while counting the max number
> + * of cache levels for the cache node.
> + *
> + * Given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if ((*found != NULL) && (cache != *found))

I forgot to mention this earlier, I see you have used parentheses too
liberally at quite some place in this file like the above 4 line.
Please drop those unnecessary parentheses.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2018-01-15 15:48     ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-15 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:13PM -0600, Jeremy Linton wrote:
> ACPI 6.2 adds a new table, which describes how processing units
> are related to each other in tree like fashion. Caches are
> also sprinkled throughout the tree and describe the properties
> of the caches in relation to other caches and processing units.
> 
> Add the code to parse the cache hierarchy and report the total
> number of levels of cache for a given core using
> acpi_find_last_cache_level() as well as fill out the individual
> cores cache information with cache_setup_acpi() once the
> cpu_cacheinfo structure has been populated by the arch specific
> code.
> 
> An additional patch later in the set adds the ability to report
> peers in the topology using find_acpi_cpu_topology()
> to report a unique ID for each processing unit at a given level
> in the tree. These unique id's can then be used to match related
> processing units which exist as threads, COD (clusters
> on die), within a given package, etc.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 476 insertions(+)
>  create mode 100644 drivers/acpi/pptt.c
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> new file mode 100644
> index 000000000000..2c4b3ed862a8
> --- /dev/null
> +++ b/drivers/acpi/pptt.c
> @@ -0,0 +1,476 @@
> +/*
> + * Copyright (C) 2018, ARM
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * This file implements parsing of Processor Properties Topology Table (PPTT)
> + * which is optionally used to describe the processor and cache topology.
> + * Due to the relative pointers used throughout the table, this doesn't
> + * leverage the existing subtable parsing in the kernel.
> + *
> + * The PPTT structure is an inverted tree, with each node potentially
> + * holding one or two inverted tree data structures describing
> + * the caches available at that level. Each cache structure optionally
> + * contains properties describing the cache at a given level which can be
> + * used to override hardware probed values.
> + */
> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <acpi/processor.h>
> +
> +/* total number of attributes checked by the properties code */
> +#define PPTT_CHECKED_ATTRIBUTES 6
> +
> +/*
> + * Given the PPTT table, find and verify that the subtable entry
> + * is located within the table
> + */
> +static struct acpi_subtable_header *fetch_pptt_subtable(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	struct acpi_subtable_header *entry;
> +
> +	/* there isn't a subtable at reference 0 */
> +	if (pptt_ref < sizeof(struct acpi_subtable_header))
> +		return NULL;
> +
> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
> +		return NULL;
> +
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
> +
> +	if (pptt_ref + entry->length > table_hdr->length)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +static struct acpi_pptt_processor *fetch_pptt_node(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
> +								 pptt_ref);
> +}
> +
> +static struct acpi_pptt_cache *fetch_pptt_cache(
> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
> +{
> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
> +							     pptt_ref);
> +}
> +
> +static struct acpi_subtable_header *acpi_get_pptt_resource(
> +	struct acpi_table_header *table_hdr,
> +	struct acpi_pptt_processor *node, int resource)
> +{
> +	u32 *ref;
> +
> +	if (resource >= node->number_of_priv_resources)
> +		return NULL;
> +
> +	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
> +	ref += resource;
> +
> +	return fetch_pptt_subtable(table_hdr, *ref);
> +}
> +
> +/*
> + * Attempt to find a given cache level, while counting the max number
> + * of cache levels for the cache node.
> + *
> + * Given a pptt resource, verify that it is a cache node, then walk
> + * down each level of caches, counting how many levels are found
> + * as well as checking the cache type (icache, dcache, unified). If a
> + * level & type match, then we set found, and continue the search.
> + * Once the entire cache branch has been walked return its max
> + * depth.
> + */
> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
> +				int local_level,
> +				struct acpi_subtable_header *res,
> +				struct acpi_pptt_cache **found,
> +				int level, int type)
> +{
> +	struct acpi_pptt_cache *cache;
> +
> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
> +		return 0;
> +
> +	cache = (struct acpi_pptt_cache *) res;
> +	while (cache) {
> +		local_level++;
> +
> +		if ((local_level == level) &&
> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
> +			if ((*found != NULL) && (cache != *found))

I forgot to mention this earlier, I see you have used parentheses too
liberally at quite some place in this file like the above 4 line.
Please drop those unnecessary parentheses.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-15 12:33     ` Sudeep Holla
  (?)
@ 2018-01-15 16:07       ` Palmer Dabbelt
  -1 siblings, 0 replies; 104+ messages in thread
From: Palmer Dabbelt @ 2018-01-15 16:07 UTC (permalink / raw)
  Cc: jeremy.linton, linux-acpi, linux-arm-kernel, hanjun.guo,
	lorenzo.pieralisi, rjw, Will Deacon, catalin.marinas, Greg KH,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen, albert, sudeep.holla

On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla@arm.com wrote:
> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>> The original intent in cacheinfo was that an architecture
>> specific populate_cache_leaves() would probe the hardware
>> and then cache_shared_cpu_map_setup() and
>> cache_override_properties() would provide firmware help to
>> extend/expand upon what was probed. Arm64 was really
>> the only architecture that was working this way, and
>> with the removal of most of the hardware probing logic it
>> became clear that it was possible to simplify the logic a bit.
>>
>> This patch combines the walk of the DT nodes with the
>> code updating the cache size/line_size and nr_sets.
>> cache_override_properties() (which was DT specific) is
>> then removed. The result is that cacheinfo.of_node is
>> no longer used as a temporary place to hold DT references
>> for future calls that update cache properties. That change
>> helps to clarify its one remaining use (matching
>> cacheinfo nodes that represent shared caches) which
>> will be used by the ACPI/PPTT code in the following patches.
>>
>> Cc: Palmer Dabbelt <palmer@sifive.com>
>> Cc: Albert Ou <albert@sifive.com>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>  arch/riscv/kernel/cacheinfo.c |  1 +
>>  drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
>>  include/linux/cacheinfo.h     |  1 +
>>  3 files changed, 31 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
>> index 10ed2749e246..6f4500233cf8 100644
>> --- a/arch/riscv/kernel/cacheinfo.c
>> +++ b/arch/riscv/kernel/cacheinfo.c
>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>  		CACHE_WRITE_BACK
>>  		| CACHE_READ_ALLOCATE
>>  		| CACHE_WRITE_ALLOCATE;
>> +	cache_of_set_props(this_leaf, node);
>
> This may be necessary but can it be done as later patch ? So far nothing
> is added that may break riscv IIUC.
>
> Palmer, Albert,
>
> Can you confirm ? Also, as I see we can thin down arch specific
> implementation on riscv if it's just using DT like ARM64. Sorry if
> I am missing to see something, so thought of checking.
>
> [...]

Sorry, I guess I'm a bit confused as to what's going on here.  RISC-V uses 
device tree on all Linux systems.

>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 3d9805297cda..d35299a590a4 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
>>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>>  int init_cache_level(unsigned int cpu);
>>  int populate_cache_leaves(unsigned int cpu);
>> +void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
>>
>
> IIUC riscv is the only user for this outside of cacheinfo.c, right ?
> Hopefully we can get rid of it.
>
> Other than that, it looks OK. I will wait for response from riscv team
> do that these riscv related changes can be dropped or move to later
> patch if really needed.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-15 16:07       ` Palmer Dabbelt
  0 siblings, 0 replies; 104+ messages in thread
From: Palmer Dabbelt @ 2018-01-15 16:07 UTC (permalink / raw)
  To: sudeep.holla
  Cc: jeremy.linton, linux-acpi, linux-arm-kernel, hanjun.guo,
	lorenzo.pieralisi, rjw, Will Deacon, catalin.marinas, Greg KH,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen, albert, sudeep.holla

On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla@arm.com wrote:
> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>> The original intent in cacheinfo was that an architecture
>> specific populate_cache_leaves() would probe the hardware
>> and then cache_shared_cpu_map_setup() and
>> cache_override_properties() would provide firmware help to
>> extend/expand upon what was probed. Arm64 was really
>> the only architecture that was working this way, and
>> with the removal of most of the hardware probing logic it
>> became clear that it was possible to simplify the logic a bit.
>>
>> This patch combines the walk of the DT nodes with the
>> code updating the cache size/line_size and nr_sets.
>> cache_override_properties() (which was DT specific) is
>> then removed. The result is that cacheinfo.of_node is
>> no longer used as a temporary place to hold DT references
>> for future calls that update cache properties. That change
>> helps to clarify its one remaining use (matching
>> cacheinfo nodes that represent shared caches) which
>> will be used by the ACPI/PPTT code in the following patches.
>>
>> Cc: Palmer Dabbelt <palmer@sifive.com>
>> Cc: Albert Ou <albert@sifive.com>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>  arch/riscv/kernel/cacheinfo.c |  1 +
>>  drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
>>  include/linux/cacheinfo.h     |  1 +
>>  3 files changed, 31 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
>> index 10ed2749e246..6f4500233cf8 100644
>> --- a/arch/riscv/kernel/cacheinfo.c
>> +++ b/arch/riscv/kernel/cacheinfo.c
>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>  		CACHE_WRITE_BACK
>>  		| CACHE_READ_ALLOCATE
>>  		| CACHE_WRITE_ALLOCATE;
>> +	cache_of_set_props(this_leaf, node);
>
> This may be necessary but can it be done as later patch ? So far nothing
> is added that may break riscv IIUC.
>
> Palmer, Albert,
>
> Can you confirm ? Also, as I see we can thin down arch specific
> implementation on riscv if it's just using DT like ARM64. Sorry if
> I am missing to see something, so thought of checking.
>
> [...]

Sorry, I guess I'm a bit confused as to what's going on here.  RISC-V uses 
device tree on all Linux systems.

>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 3d9805297cda..d35299a590a4 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
>>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>>  int init_cache_level(unsigned int cpu);
>>  int populate_cache_leaves(unsigned int cpu);
>> +void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
>>
>
> IIUC riscv is the only user for this outside of cacheinfo.c, right ?
> Hopefully we can get rid of it.
>
> Other than that, it looks OK. I will wait for response from riscv team
> do that these riscv related changes can be dropped or move to later
> patch if really needed.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-15 16:07       ` Palmer Dabbelt
  0 siblings, 0 replies; 104+ messages in thread
From: Palmer Dabbelt @ 2018-01-15 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla at arm.com wrote:
> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>> The original intent in cacheinfo was that an architecture
>> specific populate_cache_leaves() would probe the hardware
>> and then cache_shared_cpu_map_setup() and
>> cache_override_properties() would provide firmware help to
>> extend/expand upon what was probed. Arm64 was really
>> the only architecture that was working this way, and
>> with the removal of most of the hardware probing logic it
>> became clear that it was possible to simplify the logic a bit.
>>
>> This patch combines the walk of the DT nodes with the
>> code updating the cache size/line_size and nr_sets.
>> cache_override_properties() (which was DT specific) is
>> then removed. The result is that cacheinfo.of_node is
>> no longer used as a temporary place to hold DT references
>> for future calls that update cache properties. That change
>> helps to clarify its one remaining use (matching
>> cacheinfo nodes that represent shared caches) which
>> will be used by the ACPI/PPTT code in the following patches.
>>
>> Cc: Palmer Dabbelt <palmer@sifive.com>
>> Cc: Albert Ou <albert@sifive.com>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>  arch/riscv/kernel/cacheinfo.c |  1 +
>>  drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
>>  include/linux/cacheinfo.h     |  1 +
>>  3 files changed, 31 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
>> index 10ed2749e246..6f4500233cf8 100644
>> --- a/arch/riscv/kernel/cacheinfo.c
>> +++ b/arch/riscv/kernel/cacheinfo.c
>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>  		CACHE_WRITE_BACK
>>  		| CACHE_READ_ALLOCATE
>>  		| CACHE_WRITE_ALLOCATE;
>> +	cache_of_set_props(this_leaf, node);
>
> This may be necessary but can it be done as later patch ? So far nothing
> is added that may break riscv IIUC.
>
> Palmer, Albert,
>
> Can you confirm ? Also, as I see we can thin down arch specific
> implementation on riscv if it's just using DT like ARM64. Sorry if
> I am missing to see something, so thought of checking.
>
> [...]

Sorry, I guess I'm a bit confused as to what's going on here.  RISC-V uses 
device tree on all Linux systems.

>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 3d9805297cda..d35299a590a4 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
>>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>>  int init_cache_level(unsigned int cpu);
>>  int populate_cache_leaves(unsigned int cpu);
>> +void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
>>
>
> IIUC riscv is the only user for this outside of cacheinfo.c, right ?
> Hopefully we can get rid of it.
>
> Other than that, it looks OK. I will wait for response from riscv team
> do that these riscv related changes can be dropped or move to later
> patch if really needed.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2018-01-15 15:48     ` Sudeep Holla
  (?)
@ 2018-01-16 20:22     ` Jeremy Linton
  2018-01-17 18:00         ` Sudeep Holla
  -1 siblings, 1 reply; 104+ messages in thread
From: Jeremy Linton @ 2018-01-16 20:22 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen

Hi,

On 01/15/2018 09:48 AM, Sudeep Holla wrote:
> On Fri, Jan 12, 2018 at 06:59:13PM -0600, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> An additional patch later in the set adds the ability to report
>> peers in the topology using find_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 476 insertions(+)
>>   create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..2c4b3ed862a8
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,0 +1,476 @@
>> +/*
>> + * Copyright (C) 2018, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology Table (PPTT)
>> + * which is optionally used to describe the processor and cache topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + *
>> + * The PPTT structure is an inverted tree, with each node potentially
>> + * holding one or two inverted tree data structures describing
>> + * the caches available at that level. Each cache structure optionally
>> + * contains properties describing the cache at a given level which can be
>> + * used to override hardware probed values.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/* total number of attributes checked by the properties code */
>> +#define PPTT_CHECKED_ATTRIBUTES 6
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +
>> +	/* there isn't a subtable at reference 0 */
>> +	if (pptt_ref < sizeof(struct acpi_subtable_header))
>> +		return NULL;
>> +
>> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
>> +		return NULL;
>> +
>> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
>> +
>> +	if (pptt_ref + entry->length > table_hdr->length)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
>> +								 pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
>> +							     pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *node, int resource)
>> +{
>> +	u32 *ref;
>> +
>> +	if (resource >= node->number_of_priv_resources)
>> +		return NULL;
>> +
>> +	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
>> +	ref += resource;
>> +
>> +	return fetch_pptt_subtable(table_hdr, *ref);
>> +}
>> +
>> +/*
>> + * Attempt to find a given cache level, while counting the max number
>> + * of cache levels for the cache node.
>> + *
>> + * Given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +				int local_level,
>> +				struct acpi_subtable_header *res,
>> +				struct acpi_pptt_cache **found,
>> +				int level, int type)
>> +{
>> +	struct acpi_pptt_cache *cache;
>> +
>> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +		return 0;
>> +
>> +	cache = (struct acpi_pptt_cache *) res;
>> +	while (cache) {
>> +		local_level++;
>> +
>> +		if ((local_level == level) &&
>> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +			if ((*found != NULL) && (cache != *found))
> 
> I forgot to mention this earlier, I see you have used parentheses too
> liberally at quite some place in this file like the above 4 line.
> Please drop those unnecessary parentheses.

Yes, those tend to get leftover when I remove code, I realized not long 
after posting this that in one of the previous patches I actually added 
a pair where there wasn't one because I first shuffled something into 
the if condition, then a few revisions later dropped it, and when I 
squashed it before posting the result was some pointless churn.

I will rescan the whole set for this.




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2018-01-15 14:58     ` Sudeep Holla
  (?)
@ 2018-01-16 20:55     ` Jeremy Linton
  2018-01-17 17:58         ` Sudeep Holla
  -1 siblings, 1 reply; 104+ messages in thread
From: Jeremy Linton @ 2018-01-16 20:55 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen


Hi,

On 01/15/2018 08:58 AM, Sudeep Holla wrote:
> On Fri, Jan 12, 2018 at 06:59:13PM -0600, Jeremy Linton wrote:
>> ACPI 6.2 adds a new table, which describes how processing units
>> are related to each other in tree like fashion. Caches are
>> also sprinkled throughout the tree and describe the properties
>> of the caches in relation to other caches and processing units.
>>
>> Add the code to parse the cache hierarchy and report the total
>> number of levels of cache for a given core using
>> acpi_find_last_cache_level() as well as fill out the individual
>> cores cache information with cache_setup_acpi() once the
>> cpu_cacheinfo structure has been populated by the arch specific
>> code.
>>
>> An additional patch later in the set adds the ability to report
>> peers in the topology using find_acpi_cpu_topology()
>> to report a unique ID for each processing unit at a given level
>> in the tree. These unique id's can then be used to match related
>> processing units which exist as threads, COD (clusters
>> on die), within a given package, etc.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/pptt.c | 476 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 476 insertions(+)
>>   create mode 100644 drivers/acpi/pptt.c
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> new file mode 100644
>> index 000000000000..2c4b3ed862a8
>> --- /dev/null
>> +++ b/drivers/acpi/pptt.c
>> @@ -0,0 +1,476 @@
>> +/*
>> + * Copyright (C) 2018, ARM
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * This file implements parsing of Processor Properties Topology Table (PPTT)
>> + * which is optionally used to describe the processor and cache topology.
>> + * Due to the relative pointers used throughout the table, this doesn't
>> + * leverage the existing subtable parsing in the kernel.
>> + *
>> + * The PPTT structure is an inverted tree, with each node potentially
>> + * holding one or two inverted tree data structures describing
>> + * the caches available at that level. Each cache structure optionally
>> + * contains properties describing the cache at a given level which can be
>> + * used to override hardware probed values.
>> + */
>> +#define pr_fmt(fmt) "ACPI PPTT: " fmt
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/cacheinfo.h>
>> +#include <acpi/processor.h>
>> +
>> +/* total number of attributes checked by the properties code */
>> +#define PPTT_CHECKED_ATTRIBUTES 6
> 
> See comment on this below. If we retain this, move it closer to the usage so
> that it's easier to understand what it actually stands for.

Sure.

> 
>> +
>> +/*
>> + * Given the PPTT table, find and verify that the subtable entry
>> + * is located within the table
>> + */
>> +static struct acpi_subtable_header *fetch_pptt_subtable(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +
>> +	/* there isn't a subtable at reference 0 */
>> +	if (pptt_ref < sizeof(struct acpi_subtable_header))
>> +		return NULL;
>> +
>> +	if (pptt_ref + sizeof(struct acpi_subtable_header) > table_hdr->length)
>> +		return NULL;
>> +
>> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr, pptt_ref);
>> +
>> +	if (pptt_ref + entry->length > table_hdr->length)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>> +static struct acpi_pptt_processor *fetch_pptt_node(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_processor *)fetch_pptt_subtable(table_hdr,
>> +								 pptt_ref);
>> +}
>> +
>> +static struct acpi_pptt_cache *fetch_pptt_cache(
>> +	struct acpi_table_header *table_hdr, u32 pptt_ref)
>> +{
>> +	return (struct acpi_pptt_cache *)fetch_pptt_subtable(table_hdr,
>> +							     pptt_ref);
>> +}
>> +
>> +static struct acpi_subtable_header *acpi_get_pptt_resource(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *node, int resource)
>> +{
>> +	u32 *ref;
>> +
>> +	if (resource >= node->number_of_priv_resources)
>> +		return NULL;
>> +
>> +	ref = ACPI_ADD_PTR(u32, node, sizeof(struct acpi_pptt_processor));
>> +	ref += resource;
>> +
>> +	return fetch_pptt_subtable(table_hdr, *ref);
>> +}
>> +
>> +/*
>> + * Attempt to find a given cache level, while counting the max number
>> + * of cache levels for the cache node.
>> + *
>> + * Given a pptt resource, verify that it is a cache node, then walk
>> + * down each level of caches, counting how many levels are found
>> + * as well as checking the cache type (icache, dcache, unified). If a
>> + * level & type match, then we set found, and continue the search.
>> + * Once the entire cache branch has been walked return its max
>> + * depth.
>> + */
>> +static int acpi_pptt_walk_cache(struct acpi_table_header *table_hdr,
>> +				int local_level,
>> +				struct acpi_subtable_header *res,
>> +				struct acpi_pptt_cache **found,
>> +				int level, int type)
>> +{
>> +	struct acpi_pptt_cache *cache;
>> +
>> +	if (res->type != ACPI_PPTT_TYPE_CACHE)
>> +		return 0;
>> +
>> +	cache = (struct acpi_pptt_cache *) res;
>> +	while (cache) {
>> +		local_level++;
>> +
>> +		if ((local_level == level) &&
>> +		    (cache->flags & ACPI_PPTT_CACHE_TYPE_VALID) &&
>> +		    ((cache->attributes & ACPI_PPTT_MASK_CACHE_TYPE) == type)) {
>> +			if ((*found != NULL) && (cache != *found))
>> +				pr_err("Found duplicate cache level/type unable to determine uniqueness\n");
>> +
>> +			pr_debug("Found cache @ level %d\n", level);
>> +			*found = cache;
>> +			/*
>> +			 * continue looking at this node's resource list
>> +			 * to verify that we don't find a duplicate
>> +			 * cache node.
>> +			 */
>> +		}
>> +		cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
>> +	}
>> +	return local_level;
>> +}
>> +
>> +/*
>> + * Given a CPU node look for cache levels that exist at this level, and then
>> + * for each cache node, count how many levels exist below (logically above) it.
>> + * If a level and type are specified, and we find that level/type, abort
>> + * processing and return the acpi_pptt_cache structure.
>> + */
>> +static struct acpi_pptt_cache *acpi_find_cache_level(
>> +	struct acpi_table_header *table_hdr,
>> +	struct acpi_pptt_processor *cpu_node,
>> +	int *starting_level, int level, int type)
>> +{
>> +	struct acpi_subtable_header *res;
>> +	int number_of_levels = *starting_level;
>> +	int resource = 0;
>> +	struct acpi_pptt_cache *ret = NULL;
>> +	int local_level;
>> +
>> +	/* walk down from processor node */
>> +	while ((res = acpi_get_pptt_resource(table_hdr, cpu_node, resource))) {
>> +		resource++;
>> +
>> +		local_level = acpi_pptt_walk_cache(table_hdr, *starting_level,
>> +						   res, &ret, level, type);
>> +		/*
>> +		 * we are looking for the max depth. Since its potentially
>> +		 * possible for a given node to have resources with differing
>> +		 * depths verify that the depth we have found is the largest.
>> +		 */
>> +		if (number_of_levels < local_level)
>> +			number_of_levels = local_level;
>> +	}
>> +	if (number_of_levels > *starting_level)
>> +		*starting_level = number_of_levels;
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Given a processor node containing a processing unit, walk into it and count
>> + * how many levels exist solely for it, and then walk up each level until we hit
>> + * the root node (ignore the package level because it may be possible to have
>> + * caches that exist across packages). Count the number of cache levels that
>> + * exist at each level on the way up.
>> + */
>> +static int acpi_process_node(struct acpi_table_header *table_hdr,
>> +			     struct acpi_pptt_processor *cpu_node)
>> +{
>> +	int total_levels = 0;
>> +
>> +	do {
>> +		acpi_find_cache_level(table_hdr, cpu_node, &total_levels, 0, 0);
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	} while (cpu_node);
>> +
>> +	return total_levels;
>> +}
>> +
>> +/*
>> + * Determine if the *node parameter is a leaf node by iterating the
>> + * PPTT table, looking for nodes which reference it.
>> + * Return 0 if we find a node referencing the passed node,
>> + * or 1 if we don't.
>> + */
>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>> +			       struct acpi_pptt_processor *node)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	u32 node_entry;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	node_entry = ACPI_PTR_DIFF(node, table_hdr);
>> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> +			     sizeof(struct acpi_table_pptt));
>> +
>> +	while ((unsigned long)(entry + 1) < table_end) {
> 
> Is entry + 1 check sufficient to access entry of length ?
> Shouldn't that be entry + sizeof(struct acpi_pptt_processor *) so that
> we are sure it's valid entry ?

All we need is the subtable_header size which gives us the type/len. As 
we are just scanning the whole table touching the entry->length and the 
while() terminates if that is > table_end it should be ok. In general 
sizeof(acpi_pptt_processor) isn't right either since the structure only 
covers a larger "header" portion due to attached entries extending 
beyond it.
> 
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    (cpu_node->parent == node_entry))
>> +			return 0;
>> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
>> +				     entry->length);
>> +	}
>> +	return 1;
>> +}
>> +
>> +/*
>> + * Find the subtable entry describing the provided processor.
>> + * This is done by iterating the PPTT table looking for processor nodes
>> + * which have an acpi_processor_id that matches the acpi_cpu_id parameter
>> + * passed into the function. If we find a node that matches this criteria
>> + * we verify that its a leaf node in the topology rather than depending
>> + * on the valid flag, which doesn't need to be set for leaf nodes.
>> + */
>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>> +	struct acpi_table_header *table_hdr,
>> +	u32 acpi_cpu_id)
>> +{
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	struct acpi_pptt_processor *cpu_node;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> +			     sizeof(struct acpi_table_pptt));
>> +
>> +	/* find the processor structure associated with this cpuid */
>> +	while ((unsigned long)(entry + 1) < table_end) {
> 
> Same comment as above on entry + 
This one is probably less clear than the one above, because we do access 
a full acpi_pptt_processor sized structure, but only after making sure 
that is actually a processor node. If anything the check should probably 
dereference the len as a second check aka

while ((entry+1 < table_end) && (entry+1->length < table_end))

I think this may have been changed after previous review comments asked 
for the cpu_node assignment earlier and of course moving the leaf_node 
check into the if condition to avoid a bit of extra processing.

> 
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +
>> +		if (entry->length == 0) {
>> +			pr_err("Invalid zero length subtable\n");
>> +			break;
>> +		}
>> +		if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>> +		    (acpi_cpu_id == cpu_node->acpi_processor_id) &&
>> +		     acpi_pptt_leaf_node(table_hdr, cpu_node)) {
>> +			return (struct acpi_pptt_processor *)entry;
>> +		}
>> +
>> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
>> +				     entry->length);
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int acpi_find_cache_levels(struct acpi_table_header *table_hdr,
>> +				  u32 acpi_cpu_id)
>> +{
>> +	int number_of_levels = 0;
>> +	struct acpi_pptt_processor *cpu;
>> +
>> +	cpu = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +	if (cpu)
>> +		number_of_levels = acpi_process_node(table_hdr, cpu);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +/* Convert the linux cache_type to a ACPI PPTT cache type value */
>> +static u8 acpi_cache_type(enum cache_type type)
>> +{
> 
> [nit] Just wondering if we can avoid this with some static mapping:
> 
> static u8 acpi_cache_type[] = {
>          [CACHE_TYPE_NONE] = 0,
>          [CACHE_TYPE_DATA] = ACPI_PPTT_CACHE_TYPE_DATA,
>          [CACHE_TYPE_INST] = ACPI_PPTT_CACHE_TYPE_INSTR,
>          [CACHE_TYPE_UNIFIED] = ACPI_PPTT_CACHE_TYPE_UNIFIED,
> };

Potentially, but the default case below is important and makes it a 
little less brittle because, as the recent DT commit, in your table 
TYPE_NONE actually needs to map to ACPI TYPE_UNIFIED to find the nodes.

Doesn't matter much to me, and I would convert it if the switch() got a 
lot bigger, but right now I tend to think what the code actually would 
look like is a two entry conversion (data/instruction) with a default 
initially set. So a loop for two entries is borderline IMHO.

> 
>> +	switch (type) {
>> +	case CACHE_TYPE_DATA:
>> +		pr_debug("Looking for data cache\n");
>> +		return ACPI_PPTT_CACHE_TYPE_DATA;
>> +	case CACHE_TYPE_INST:
>> +		pr_debug("Looking for instruction cache\n");
>> +		return ACPI_PPTT_CACHE_TYPE_INSTR;
>> +	default:
>> +	case CACHE_TYPE_UNIFIED:
>> +		pr_debug("Looking for unified cache\n");
>> +		/*
>> +		 * It is important that ACPI_PPTT_CACHE_TYPE_UNIFIED
>> +		 * contains the bit pattern that will match both
>> +		 * ACPI unified bit patterns because we use it later
>> +		 * to match both cases.
>> +		 */
>> +		return ACPI_PPTT_CACHE_TYPE_UNIFIED;
>> +	}
>> +}
>> +
>> +/* find the ACPI node describing the cache type/level for the given CPU */
>> +static struct acpi_pptt_cache *acpi_find_cache_node(
>> +	struct acpi_table_header *table_hdr, u32 acpi_cpu_id,
>> +	enum cache_type type, unsigned int level,
>> +	struct acpi_pptt_processor **node)
>> +{
>> +	int total_levels = 0;
>> +	struct acpi_pptt_cache *found = NULL;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u8 acpi_type = acpi_cache_type(type);
>> +
>> +	pr_debug("Looking for CPU %d's level %d cache type %d\n",
>> +		 acpi_cpu_id, level, acpi_type);
>> +
>> +	cpu_node = acpi_find_processor_node(table_hdr, acpi_cpu_id);
>> +
>> +	while ((cpu_node) && (!found)) {
>> +		found = acpi_find_cache_level(table_hdr, cpu_node,
>> +					      &total_levels, level, acpi_type);
>> +		*node = cpu_node;
>> +		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +	}
>> +
>> +	return found;
>> +}
>> +
>> +/*
>> + * The ACPI spec implies that the fields in the cache structures are used to
>> + * extend and correct the information probed from the hardware. In the case
>> + * of arm64 the CCSIDR probing has been removed because it might be incorrect.
> 
> Though ARM64 is only user now, it may get obsolete, so better to drop that
> comment.

Ok.

> 
>> + */
>> +static void update_cache_properties(struct cacheinfo *this_leaf,
>> +				    struct acpi_pptt_cache *found_cache,
>> +				    struct acpi_pptt_processor *cpu_node)
>> +{
>> +	int valid_flags = 0;
>> +
>> +	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>> +		this_leaf->size = found_cache->size;
>> +		valid_flags++;
>> +	}
>> +	if (found_cache->flags & ACPI_PPTT_LINE_SIZE_VALID) {
>> +		this_leaf->coherency_line_size = found_cache->line_size;
>> +		valid_flags++;
>> +	}
>> +	if (found_cache->flags & ACPI_PPTT_NUMBER_OF_SETS_VALID) {
>> +		this_leaf->number_of_sets = found_cache->number_of_sets;
>> +		valid_flags++;
>> +	}
>> +	if (found_cache->flags & ACPI_PPTT_ASSOCIATIVITY_VALID) {
>> +		this_leaf->ways_of_associativity = found_cache->associativity;
>> +		valid_flags++;
>> +	}
>> +	if (found_cache->flags & ACPI_PPTT_WRITE_POLICY_VALID) {
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_WRITE_POLICY) {
>> +		case ACPI_PPTT_CACHE_POLICY_WT:
>> +			this_leaf->attributes = CACHE_WRITE_THROUGH;
>> +			break;
>> +		case ACPI_PPTT_CACHE_POLICY_WB:
>> +			this_leaf->attributes = CACHE_WRITE_BACK;
>> +			break;
>> +		}
>> +		valid_flags++;
>> +	}
>> +	if (found_cache->flags & ACPI_PPTT_ALLOCATION_TYPE_VALID) {
>> +		switch (found_cache->attributes & ACPI_PPTT_MASK_ALLOCATION_TYPE) {
>> +		case ACPI_PPTT_CACHE_READ_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_READ_ALLOCATE;
>> +			break;
>> +		case ACPI_PPTT_CACHE_WRITE_ALLOCATE:
>> +			this_leaf->attributes |= CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		case ACPI_PPTT_CACHE_RW_ALLOCATE:
>> +		case ACPI_PPTT_CACHE_RW_ALLOCATE_ALT:
>> +			this_leaf->attributes |=
>> +				CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE;
>> +			break;
>> +		}
>> +		valid_flags++;
>> +	}
>> +	/*
>> +	 * If all the above flags are valid, and the cache type is NOCACHE
>> +	 * update the cache type as well.
>> +	 */
> 
> I am not sure if it makes sense to mandate at least last 2 (read allocate
> and write policy). They can be optional.

As I mentioned in the previous set, I'm of the opinion that some are 
more useful than others, but to avoid having a discussion about which 
ones, just decided to do them all. After all, its not going to hurt (AFAIK).


If your more _sure_ and no one else has an opinion then i will remove 
those two.


> 
>> +	if ((this_leaf->type == CACHE_TYPE_NOCACHE) &&
>> +	    (valid_flags == PPTT_CHECKED_ATTRIBUTES))
>> +		this_leaf->type = CACHE_TYPE_UNIFIED;
>> +}
>> +
>> +/*
>> + * Update the kernel cache information for each level of cache
>> + * associated with the given acpi cpu.
>> + */
>> +static void cache_setup_acpi_cpu(struct acpi_table_header *table,
>> +				 unsigned int cpu)
>> +{
>> +	struct acpi_pptt_cache *found_cache;
>> +	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> +	u32 acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +	struct cacheinfo *this_leaf;
>> +	unsigned int index = 0;
>> +	struct acpi_pptt_processor *cpu_node = NULL;
>> +
>> +	while (index < get_cpu_cacheinfo(cpu)->num_leaves) {
>> +		this_leaf = this_cpu_ci->info_list + index;
>> +		found_cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						   this_leaf->type,
>> +						   this_leaf->level,
>> +						   &cpu_node);
>> +		pr_debug("found = %p %p\n", found_cache, cpu_node);
>> +		if (found_cache)
>> +			update_cache_properties(this_leaf,
>> +						found_cache,
>> +						cpu_node);
> 
> [nit] unnecessary line break ?
> 
>> +
>> +		index++;
>> +	}
>> +}
>> +
>> +/**
>> + * acpi_find_last_cache_level() - Determines the number of cache levels for a PE
> 
> [nit] PE ? I think you mean processing element, but that's too ARM ARM thingy
> :), can you s/PE/CPU ?

Yes, I probably slipped up there.

> 
>> + * @cpu: Kernel logical cpu number
>> + *
>> + * Given a logical cpu number, returns the number of levels of cache represented
>> + * in the PPTT. Errors caused by lack of a PPTT table, or otherwise, return 0
>> + * indicating we didn't find any cache levels.
>> + *
>> + * Return: Cache levels visible to this core.
>> + */
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	u32 acpi_cpu_id;
>> +	struct acpi_table_header *table;
>> +	int number_of_levels = 0;
>> +	acpi_status status;
>> +
>> +	pr_debug("Cache Setup find last level cpu=%d\n", cpu);
>> +
>> +	acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err_once("No PPTT table found, cache topology may be inaccurate\n");
>> +	} else {
>> +		number_of_levels = acpi_find_cache_levels(table, acpi_cpu_id);
>> +		acpi_put_table(table);
>> +	}
>> +	pr_debug("Cache Setup find last level level=%d\n", number_of_levels);
>> +
>> +	return number_of_levels;
>> +}
>> +
>> +/**
>> + * cache_setup_acpi() - Override CPU cache topology with data from the PPTT
> 
> [nit]			  ^^^^ may be override/setup or just setup ?

I think of it more as "expand upon", or override, but its obviosuly 
creating (setting new) things too.


> 
>> + * @cpu: Kernel logical cpu number
> 
> [nit] kernel is implicit, no ?

Probably...

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-15 12:33     ` Sudeep Holla
  (?)
  (?)
@ 2018-01-16 21:07     ` Jeremy Linton
  2018-01-17 18:20         ` Sudeep Holla
  -1 siblings, 1 reply; 104+ messages in thread
From: Jeremy Linton @ 2018-01-16 21:07 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Palmer Dabbelt, Albert Ou

Hi,

On 01/15/2018 06:33 AM, Sudeep Holla wrote:
> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>> The original intent in cacheinfo was that an architecture
>> specific populate_cache_leaves() would probe the hardware
>> and then cache_shared_cpu_map_setup() and
>> cache_override_properties() would provide firmware help to
>> extend/expand upon what was probed. Arm64 was really
>> the only architecture that was working this way, and
>> with the removal of most of the hardware probing logic it
>> became clear that it was possible to simplify the logic a bit.
>>
>> This patch combines the walk of the DT nodes with the
>> code updating the cache size/line_size and nr_sets.
>> cache_override_properties() (which was DT specific) is
>> then removed. The result is that cacheinfo.of_node is
>> no longer used as a temporary place to hold DT references
>> for future calls that update cache properties. That change
>> helps to clarify its one remaining use (matching
>> cacheinfo nodes that represent shared caches) which
>> will be used by the ACPI/PPTT code in the following patches.
>>
>> Cc: Palmer Dabbelt <palmer@sifive.com>
>> Cc: Albert Ou <albert@sifive.com>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/riscv/kernel/cacheinfo.c |  1 +
>>   drivers/base/cacheinfo.c      | 65 +++++++++++++++++++------------------------
>>   include/linux/cacheinfo.h     |  1 +
>>   3 files changed, 31 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
>> index 10ed2749e246..6f4500233cf8 100644
>> --- a/arch/riscv/kernel/cacheinfo.c
>> +++ b/arch/riscv/kernel/cacheinfo.c
>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>   		CACHE_WRITE_BACK
>>   		| CACHE_READ_ALLOCATE
>>   		| CACHE_WRITE_ALLOCATE;
>> +	cache_of_set_props(this_leaf, node);
> 
> This may be necessary but can it be done as later patch ? So far nothing
> is added that may break riscv IIUC.

Well I think you have a bisection issue where the additional information 
will disappear between this patch and wherever we put this code back in.

> 
> Palmer, Albert,
> 
> Can you confirm ? Also, as I see we can thin down arch specific
> implementation on riscv if it's just using DT like ARM64. Sorry if
> I am missing to see something, so thought of checking.
> 
> [...]
> 
>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 3d9805297cda..d35299a590a4 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -99,6 +99,7 @@ int func(unsigned int cpu)					\
>>   struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>>   int init_cache_level(unsigned int cpu);
>>   int populate_cache_leaves(unsigned int cpu);
>> +void cache_of_set_props(struct cacheinfo *this_leaf, struct device_node *np);
>>
> 
> IIUC riscv is the only user for this outside of cacheinfo.c, right ?
> Hopefully we can get rid of it.

Yes, it should be the only user. I spent some time looking at the other 
users of this code to assure that they weren't doing partial DT setups 
and then having the common code use the resulting DT nodes. Riscv was 
the only case I found like that and that behavior is fairly new.

I think that they are doing it that way in order to get the cache type 
setup earlier. (to work around problems like the one recently fixed for 
the NONE->UNIFIED node conversion).


> 
> Other than that, it looks OK. I will wait for response from riscv team
> do that these riscv related changes can be dropped or move to later
> patch if really needed.
> 
> --
> Regards,
> Sudeep
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-15 16:07       ` Palmer Dabbelt
  (?)
  (?)
@ 2018-01-16 21:26       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-16 21:26 UTC (permalink / raw)
  To: Palmer Dabbelt, sudeep.holla
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	Will Deacon, catalin.marinas, Greg KH, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, wangxiongfeng2,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, albert

Hi,

On 01/15/2018 10:07 AM, Palmer Dabbelt wrote:
> On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla@arm.com wrote:
>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>> The original intent in cacheinfo was that an architecture
>>> specific populate_cache_leaves() would probe the hardware
>>> and then cache_shared_cpu_map_setup() and
>>> cache_override_properties() would provide firmware help to
>>> extend/expand upon what was probed. Arm64 was really
>>> the only architecture that was working this way, and
>>> with the removal of most of the hardware probing logic it
>>> became clear that it was possible to simplify the logic a bit.
>>>
>>> This patch combines the walk of the DT nodes with the
>>> code updating the cache size/line_size and nr_sets.
>>> cache_override_properties() (which was DT specific) is
>>> then removed. The result is that cacheinfo.of_node is
>>> no longer used as a temporary place to hold DT references
>>> for future calls that update cache properties. That change
>>> helps to clarify its one remaining use (matching
>>> cacheinfo nodes that represent shared caches) which
>>> will be used by the ACPI/PPTT code in the following patches.
>>>
>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>> Cc: Albert Ou <albert@sifive.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>  arch/riscv/kernel/cacheinfo.c |  1 +
>>>  drivers/base/cacheinfo.c      | 65 
>>> +++++++++++++++++++------------------------
>>>  include/linux/cacheinfo.h     |  1 +
>>>  3 files changed, 31 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/arch/riscv/kernel/cacheinfo.c 
>>> b/arch/riscv/kernel/cacheinfo.c
>>> index 10ed2749e246..6f4500233cf8 100644
>>> --- a/arch/riscv/kernel/cacheinfo.c
>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>          CACHE_WRITE_BACK
>>>          | CACHE_READ_ALLOCATE
>>>          | CACHE_WRITE_ALLOCATE;
>>> +    cache_of_set_props(this_leaf, node);
>>
>> This may be necessary but can it be done as later patch ? So far nothing
>> is added that may break riscv IIUC.
>>
>> Palmer, Albert,
>>
>> Can you confirm ? Also, as I see we can thin down arch specific
>> implementation on riscv if it's just using DT like ARM64. Sorry if
>> I am missing to see something, so thought of checking.
>>
>> [...]
> 
> Sorry, I guess I'm a bit confused as to what's going on here.  RISC-V 
> uses device tree on all Linux systems.

If I'm understanding the context correctly:

The first part of this patch set just straightens out the DT setup order 
so it happens in a single pass (rather that doing one pass to find the 
DT nodes, then another later on to update the cacheinfo from DT). This 
clarifies/simplifies how the firmware_unique (firmware_token?) field in 
cacheinfo is actually being used.

riscv is a bit odd because its actually doing some DT manipulation in 
its arch setup (populate_cache_leaves()). I think the thought process is 
that it might be nicer if the common DT code handled whatever required 
that bit of logic to be added to riscv. If that were changed, then it 
might be possible to drop most of the DT cache setup code from the 
riscv/arch tree.


> 
>>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>>> index 3d9805297cda..d35299a590a4 100644
>>> --- a/include/linux/cacheinfo.h
>>> +++ b/include/linux/cacheinfo.h
>>> @@ -99,6 +99,7 @@ int func(unsigned int cpu)                    \
>>>  struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
>>>  int init_cache_level(unsigned int cpu);
>>>  int populate_cache_leaves(unsigned int cpu);
>>> +void cache_of_set_props(struct cacheinfo *this_leaf, struct 
>>> device_node *np);
>>>
>>
>> IIUC riscv is the only user for this outside of cacheinfo.c, right ?
>> Hopefully we can get rid of it.
>>
>> Other than that, it looks OK. I will wait for response from riscv team
>> do that these riscv related changes can be dropped or move to later
>> patch if really needed.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2018-01-16 20:55     ` Jeremy Linton
@ 2018-01-17 17:58         ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 17:58 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Sudeep Holla, linux-acpi, linux-arm-kernel, hanjun.guo,
	lorenzo.pieralisi, rjw, will.deacon, catalin.marinas, gregkh,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen



On 16/01/18 20:55, Jeremy Linton wrote:
> 

[...]

>>> +/*
>>> + * Determine if the *node parameter is a leaf node by iterating the
>>> + * PPTT table, looking for nodes which reference it.
>>> + * Return 0 if we find a node referencing the passed node,
>>> + * or 1 if we don't.
>>> + */
>>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>>> +                   struct acpi_pptt_processor *node)
>>> +{
>>> +    struct acpi_subtable_header *entry;
>>> +    unsigned long table_end;
>>> +    u32 node_entry;
>>> +    struct acpi_pptt_processor *cpu_node;
>>> +
>>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +    node_entry = ACPI_PTR_DIFF(node, table_hdr);
>>> +    entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>>> +                 sizeof(struct acpi_table_pptt));
>>> +
>>> +    while ((unsigned long)(entry + 1) < table_end) {
>>
>> Is entry + 1 check sufficient to access entry of length ?
>> Shouldn't that be entry + sizeof(struct acpi_pptt_processor *) so that
>> we are sure it's valid entry ?
> 
> All we need is the subtable_header size which gives us the type/len. As
> we are just scanning the whole table touching the entry->length and the
> while() terminates if that is > table_end it should be ok. In general
> sizeof(acpi_pptt_processor) isn't right either since the structure only
> covers a larger "header" portion due to attached entries extending
> beyond it.

OK, understood. In that case it should be at least
entry + sizeof(struct acpi_subtable_header), no ?

I did a quick check and acpi_parse_entries_array does exactly same.
Does it make sense to keep it consistent with that ?

Also looking at acpi_parse_entries_array, I recall now that it has some
kind of handler to deal with such table parsing. I think we should be
able to reuse, I need to stare more at the code to see how :(.
Let me know if you already looked at that and found reasons not to use it.

>>> +        cpu_node = (struct acpi_pptt_processor *)entry;
>>> +        if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>>> +            (cpu_node->parent == node_entry))
>>> +            return 0;
>>> +        entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
>>> +                     entry->length);
>>> +    }
>>> +    return 1;
>>> +}
>>> +
>>> +/*
>>> + * Find the subtable entry describing the provided processor.
>>> + * This is done by iterating the PPTT table looking for processor nodes
>>> + * which have an acpi_processor_id that matches the acpi_cpu_id
>>> parameter
>>> + * passed into the function. If we find a node that matches this
>>> criteria
>>> + * we verify that its a leaf node in the topology rather than depending
>>> + * on the valid flag, which doesn't need to be set for leaf nodes.
>>> + */
>>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>>> +    struct acpi_table_header *table_hdr,
>>> +    u32 acpi_cpu_id)
>>> +{
>>> +    struct acpi_subtable_header *entry;
>>> +    unsigned long table_end;
>>> +    struct acpi_pptt_processor *cpu_node;
>>> +
>>> +    table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +    entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>>> +                 sizeof(struct acpi_table_pptt));
>>> +
>>> +    /* find the processor structure associated with this cpuid */
>>> +    while ((unsigned long)(entry + 1) < table_end) {
>>
>> Same comment as above on entry + 
> This one is probably less clear than the one above, because we do access
> a full acpi_pptt_processor sized structure, but only after making sure
> that is actually a processor node. If anything the check should probably
> dereference the len as a second check aka
> 
> while ((entry+1 < table_end) && (entry+1->length < table_end))
> 
> I think this may have been changed after previous review comments asked
> for the cpu_node assignment earlier and of course moving the leaf_node
> check into the if condition to avoid a bit of extra processing.
> 

Makes sense.


>>> +/* Convert the linux cache_type to a ACPI PPTT cache type value */
>>> +static u8 acpi_cache_type(enum cache_type type)
>>> +{
>>
>> [nit] Just wondering if we can avoid this with some static mapping:
>>
>> static u8 acpi_cache_type[] = {
>>          [CACHE_TYPE_NONE] = 0,
>>          [CACHE_TYPE_DATA] = ACPI_PPTT_CACHE_TYPE_DATA,
>>          [CACHE_TYPE_INST] = ACPI_PPTT_CACHE_TYPE_INSTR,
>>          [CACHE_TYPE_UNIFIED] = ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> };
> 
> Potentially, but the default case below is important and makes it a
> little less brittle because, as the recent DT commit, in your table
> TYPE_NONE actually needs to map to ACPI TYPE_UNIFIED to find the nodes.
> 

OK

> Doesn't matter much to me, and I would convert it if the switch() got a
> lot bigger, but right now I tend to think what the code actually would
> look like is a two entry conversion (data/instruction) with a default
> initially set. So a loop for two entries is borderline IMHO.
> 

Sure, that sounds good.


>>> +    /*
>>> +     * If all the above flags are valid, and the cache type is NOCACHE
>>> +     * update the cache type as well.
>>> +     */
>>
>> I am not sure if it makes sense to mandate at least last 2 (read allocate
>> and write policy). They can be optional.
> 
> As I mentioned in the previous set, I'm of the opinion that some are
> more useful than others, but to avoid having a discussion about which
> ones, just decided to do them all. After all, its not going to hurt
> (AFAIK).
> 

Sorry I missed to notice that.

> 
> If your more _sure_ and no one else has an opinion then i will remove
> those two.
>

That was just my opinion based on the possibility that some vendors
don't what to provide those information. We can wait until we come
across tables that have these missing.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2018-01-17 17:58         ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 17:58 UTC (permalink / raw)
  To: linux-arm-kernel



On 16/01/18 20:55, Jeremy Linton wrote:
> 

[...]

>>> +/*
>>> + * Determine if the *node parameter is a leaf node by iterating the
>>> + * PPTT table, looking for nodes which reference it.
>>> + * Return 0 if we find a node referencing the passed node,
>>> + * or 1 if we don't.
>>> + */
>>> +static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
>>> +?????????????????? struct acpi_pptt_processor *node)
>>> +{
>>> +??? struct acpi_subtable_header *entry;
>>> +??? unsigned long table_end;
>>> +??? u32 node_entry;
>>> +??? struct acpi_pptt_processor *cpu_node;
>>> +
>>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +??? node_entry = ACPI_PTR_DIFF(node, table_hdr);
>>> +??? entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>>> +???????????????? sizeof(struct acpi_table_pptt));
>>> +
>>> +??? while ((unsigned long)(entry + 1) < table_end) {
>>
>> Is entry + 1 check sufficient to access entry of length ?
>> Shouldn't that be entry + sizeof(struct acpi_pptt_processor *) so that
>> we are sure it's valid entry ?
> 
> All we need is the subtable_header size which gives us the type/len. As
> we are just scanning the whole table touching the entry->length and the
> while() terminates if that is > table_end it should be ok. In general
> sizeof(acpi_pptt_processor) isn't right either since the structure only
> covers a larger "header" portion due to attached entries extending
> beyond it.

OK, understood. In that case it should be at least
entry + sizeof(struct acpi_subtable_header), no ?

I did a quick check and acpi_parse_entries_array does exactly same.
Does it make sense to keep it consistent with that ?

Also looking at acpi_parse_entries_array, I recall now that it has some
kind of handler to deal with such table parsing. I think we should be
able to reuse, I need to stare more at the code to see how :(.
Let me know if you already looked at that and found reasons not to use it.

>>> +??????? cpu_node = (struct acpi_pptt_processor *)entry;
>>> +??????? if ((entry->type == ACPI_PPTT_TYPE_PROCESSOR) &&
>>> +??????????? (cpu_node->parent == node_entry))
>>> +??????????? return 0;
>>> +??????? entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
>>> +???????????????????? entry->length);
>>> +??? }
>>> +??? return 1;
>>> +}
>>> +
>>> +/*
>>> + * Find the subtable entry describing the provided processor.
>>> + * This is done by iterating the PPTT table looking for processor nodes
>>> + * which have an acpi_processor_id that matches the acpi_cpu_id
>>> parameter
>>> + * passed into the function. If we find a node that matches this
>>> criteria
>>> + * we verify that its a leaf node in the topology rather than depending
>>> + * on the valid flag, which doesn't need to be set for leaf nodes.
>>> + */
>>> +static struct acpi_pptt_processor *acpi_find_processor_node(
>>> +??? struct acpi_table_header *table_hdr,
>>> +??? u32 acpi_cpu_id)
>>> +{
>>> +??? struct acpi_subtable_header *entry;
>>> +??? unsigned long table_end;
>>> +??? struct acpi_pptt_processor *cpu_node;
>>> +
>>> +??? table_end = (unsigned long)table_hdr + table_hdr->length;
>>> +??? entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>>> +???????????????? sizeof(struct acpi_table_pptt));
>>> +
>>> +??? /* find the processor structure associated with this cpuid */
>>> +??? while ((unsigned long)(entry + 1) < table_end) {
>>
>> Same comment as above on entry + 
> This one is probably less clear than the one above, because we do access
> a full acpi_pptt_processor sized structure, but only after making sure
> that is actually a processor node. If anything the check should probably
> dereference the len as a second check aka
> 
> while ((entry+1 < table_end) && (entry+1->length < table_end))
> 
> I think this may have been changed after previous review comments asked
> for the cpu_node assignment earlier and of course moving the leaf_node
> check into the if condition to avoid a bit of extra processing.
> 

Makes sense.


>>> +/* Convert the linux cache_type to a ACPI PPTT cache type value */
>>> +static u8 acpi_cache_type(enum cache_type type)
>>> +{
>>
>> [nit] Just wondering if we can avoid this with some static mapping:
>>
>> static u8 acpi_cache_type[] = {
>> ???????? [CACHE_TYPE_NONE] = 0,
>> ???????? [CACHE_TYPE_DATA] = ACPI_PPTT_CACHE_TYPE_DATA,
>> ???????? [CACHE_TYPE_INST] = ACPI_PPTT_CACHE_TYPE_INSTR,
>> ???????? [CACHE_TYPE_UNIFIED] = ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> };
> 
> Potentially, but the default case below is important and makes it a
> little less brittle because, as the recent DT commit, in your table
> TYPE_NONE actually needs to map to ACPI TYPE_UNIFIED to find the nodes.
> 

OK

> Doesn't matter much to me, and I would convert it if the switch() got a
> lot bigger, but right now I tend to think what the code actually would
> look like is a two entry conversion (data/instruction) with a default
> initially set. So a loop for two entries is borderline IMHO.
> 

Sure, that sounds good.


>>> +??? /*
>>> +???? * If all the above flags are valid, and the cache type is NOCACHE
>>> +???? * update the cache type as well.
>>> +???? */
>>
>> I am not sure if it makes sense to mandate at least last 2 (read allocate
>> and write policy). They can be optional.
> 
> As I mentioned in the previous set, I'm of the opinion that some are
> more useful than others, but to avoid having a discussion about which
> ones, just decided to do them all. After all, its not going to hurt
> (AFAIK).
> 

Sorry I missed to notice that.

> 
> If your more _sure_ and no one else has an opinion then i will remove
> those two.
>

That was just my opinion based on the possibility that some vendors
don't what to provide those information. We can wait until we come
across tables that have these missing.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
  2018-01-16 20:22     ` Jeremy Linton
@ 2018-01-17 18:00         ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 18:00 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Sudeep Holla, linux-acpi, linux-arm-kernel, hanjun.guo,
	lorenzo.pieralisi, rjw, will.deacon, catalin.marinas, gregkh,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen



On 16/01/18 20:22, Jeremy Linton wrote:
> Hi,
> 

[..]

>>
>> I forgot to mention this earlier, I see you have used parentheses too
>> liberally at quite some place in this file like the above 4 line.
>> Please drop those unnecessary parentheses.
> 
> Yes, those tend to get leftover when I remove code, I realized not long
> after posting this that in one of the previous patches I actually added
> a pair where there wasn't one because I first shuffled something into
> the if condition, then a few revisions later dropped it, and when I
> squashed it before posting the result was some pointless churn.
> 
> I will rescan the whole set for this.
> 

Better, just to avoid keeping such things until the end, resulting in
churning up series just for that.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing
@ 2018-01-17 18:00         ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 18:00 UTC (permalink / raw)
  To: linux-arm-kernel



On 16/01/18 20:22, Jeremy Linton wrote:
> Hi,
> 

[..]

>>
>> I forgot to mention this earlier, I see you have used parentheses too
>> liberally at quite some place in this file like the above 4 line.
>> Please drop those unnecessary parentheses.
> 
> Yes, those tend to get leftover when I remove code, I realized not long
> after posting this that in one of the previous patches I actually added
> a pair where there wasn't one because I first shuffled something into
> the if condition, then a few revisions later dropped it, and when I
> squashed it before posting the result was some pointless churn.
> 
> I will rescan the whole set for this.
> 

Better, just to avoid keeping such things until the end, resulting in
churning up series just for that.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-15 16:07       ` Palmer Dabbelt
@ 2018-01-17 18:08         ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 18:08 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Sudeep Holla, jeremy.linton, linux-acpi, linux-arm-kernel,
	hanjun.guo, lorenzo.pieralisi, rjw, Will Deacon, catalin.marinas,
	Greg KH, viresh.kumar, mark.rutland, linux-kernel, linux-pm,
	jhugo, wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen, albert

(Sorry, somehow I missed this email until I saw Jeremy's reply today)

On 15/01/18 16:07, Palmer Dabbelt wrote:
> On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla@arm.com wrote:
>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>> The original intent in cacheinfo was that an architecture
>>> specific populate_cache_leaves() would probe the hardware
>>> and then cache_shared_cpu_map_setup() and
>>> cache_override_properties() would provide firmware help to
>>> extend/expand upon what was probed. Arm64 was really
>>> the only architecture that was working this way, and
>>> with the removal of most of the hardware probing logic it
>>> became clear that it was possible to simplify the logic a bit.
>>>
>>> This patch combines the walk of the DT nodes with the
>>> code updating the cache size/line_size and nr_sets.
>>> cache_override_properties() (which was DT specific) is
>>> then removed. The result is that cacheinfo.of_node is
>>> no longer used as a temporary place to hold DT references
>>> for future calls that update cache properties. That change
>>> helps to clarify its one remaining use (matching
>>> cacheinfo nodes that represent shared caches) which
>>> will be used by the ACPI/PPTT code in the following patches.
>>>
>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>> Cc: Albert Ou <albert@sifive.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>  arch/riscv/kernel/cacheinfo.c |  1 +
>>>  drivers/base/cacheinfo.c      | 65
>>> +++++++++++++++++++------------------------
>>>  include/linux/cacheinfo.h     |  1 +
>>>  3 files changed, 31 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>> b/arch/riscv/kernel/cacheinfo.c
>>> index 10ed2749e246..6f4500233cf8 100644
>>> --- a/arch/riscv/kernel/cacheinfo.c
>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>          CACHE_WRITE_BACK
>>>          | CACHE_READ_ALLOCATE
>>>          | CACHE_WRITE_ALLOCATE;
>>> +    cache_of_set_props(this_leaf, node);
>>
>> This may be necessary but can it be done as later patch ? So far nothing
>> is added that may break riscv IIUC.
>>
>> Palmer, Albert,
>>
>> Can you confirm ? Also, as I see we can thin down arch specific
>> implementation on riscv if it's just using DT like ARM64. Sorry if
>> I am missing to see something, so thought of checking.
>>
>> [...]
> 
> Sorry, I guess I'm a bit confused as to what's going on here.  RISC-V
> uses device tree on all Linux systems.
> 

Good. By thin down, I was thinking of moving the init_cache_level and
populate_cache_leaves implementation of riscv to generic weak function
under CONFIG_OF. You may even endup deleting riscv cacheinfo.c

Just a thought, sorry for not being clear earlier.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-17 18:08         ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 18:08 UTC (permalink / raw)
  To: linux-arm-kernel

(Sorry, somehow I missed this email until I saw Jeremy's reply today)

On 15/01/18 16:07, Palmer Dabbelt wrote:
> On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla at arm.com wrote:
>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>> The original intent in cacheinfo was that an architecture
>>> specific populate_cache_leaves() would probe the hardware
>>> and then cache_shared_cpu_map_setup() and
>>> cache_override_properties() would provide firmware help to
>>> extend/expand upon what was probed. Arm64 was really
>>> the only architecture that was working this way, and
>>> with the removal of most of the hardware probing logic it
>>> became clear that it was possible to simplify the logic a bit.
>>>
>>> This patch combines the walk of the DT nodes with the
>>> code updating the cache size/line_size and nr_sets.
>>> cache_override_properties() (which was DT specific) is
>>> then removed. The result is that cacheinfo.of_node is
>>> no longer used as a temporary place to hold DT references
>>> for future calls that update cache properties. That change
>>> helps to clarify its one remaining use (matching
>>> cacheinfo nodes that represent shared caches) which
>>> will be used by the ACPI/PPTT code in the following patches.
>>>
>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>> Cc: Albert Ou <albert@sifive.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>> ?arch/riscv/kernel/cacheinfo.c |? 1 +
>>> ?drivers/base/cacheinfo.c????? | 65
>>> +++++++++++++++++++------------------------
>>> ?include/linux/cacheinfo.h???? |? 1 +
>>> ?3 files changed, 31 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>> b/arch/riscv/kernel/cacheinfo.c
>>> index 10ed2749e246..6f4500233cf8 100644
>>> --- a/arch/riscv/kernel/cacheinfo.c
>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>> ???????? CACHE_WRITE_BACK
>>> ???????? | CACHE_READ_ALLOCATE
>>> ???????? | CACHE_WRITE_ALLOCATE;
>>> +??? cache_of_set_props(this_leaf, node);
>>
>> This may be necessary but can it be done as later patch ? So far nothing
>> is added that may break riscv IIUC.
>>
>> Palmer, Albert,
>>
>> Can you confirm ? Also, as I see we can thin down arch specific
>> implementation on riscv if it's just using DT like ARM64. Sorry if
>> I am missing to see something, so thought of checking.
>>
>> [...]
> 
> Sorry, I guess I'm a bit confused as to what's going on here.? RISC-V
> uses device tree on all Linux systems.
> 

Good. By thin down, I was thinking of moving the init_cache_level and
populate_cache_leaves implementation of riscv to generic weak function
under CONFIG_OF. You may even endup deleting riscv cacheinfo.c

Just a thought, sorry for not being clear earlier.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-16 21:07     ` Jeremy Linton
@ 2018-01-17 18:20         ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 18:20 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Sudeep Holla, linux-acpi, linux-arm-kernel, hanjun.guo,
	lorenzo.pieralisi, rjw, will.deacon, catalin.marinas, gregkh,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen, Palmer Dabbelt,
	Albert Ou



On 16/01/18 21:07, Jeremy Linton wrote:
> Hi,
> 
> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>> The original intent in cacheinfo was that an architecture
>>> specific populate_cache_leaves() would probe the hardware
>>> and then cache_shared_cpu_map_setup() and
>>> cache_override_properties() would provide firmware help to
>>> extend/expand upon what was probed. Arm64 was really
>>> the only architecture that was working this way, and
>>> with the removal of most of the hardware probing logic it
>>> became clear that it was possible to simplify the logic a bit.
>>>
>>> This patch combines the walk of the DT nodes with the
>>> code updating the cache size/line_size and nr_sets.
>>> cache_override_properties() (which was DT specific) is
>>> then removed. The result is that cacheinfo.of_node is
>>> no longer used as a temporary place to hold DT references
>>> for future calls that update cache properties. That change
>>> helps to clarify its one remaining use (matching
>>> cacheinfo nodes that represent shared caches) which
>>> will be used by the ACPI/PPTT code in the following patches.
>>>
>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>> Cc: Albert Ou <albert@sifive.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/riscv/kernel/cacheinfo.c |  1 +
>>>   drivers/base/cacheinfo.c      | 65
>>> +++++++++++++++++++------------------------
>>>   include/linux/cacheinfo.h     |  1 +
>>>   3 files changed, 31 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>> b/arch/riscv/kernel/cacheinfo.c
>>> index 10ed2749e246..6f4500233cf8 100644
>>> --- a/arch/riscv/kernel/cacheinfo.c
>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>           CACHE_WRITE_BACK
>>>           | CACHE_READ_ALLOCATE
>>>           | CACHE_WRITE_ALLOCATE;
>>> +    cache_of_set_props(this_leaf, node);
>>
>> This may be necessary but can it be done as later patch ? So far nothing
>> is added that may break riscv IIUC.
> 
> Well I think you have a bisection issue where the additional information
> will disappear between this patch and wherever we put this code back in.
> 

Hmm, I am sorry but I fail to see the issue. Before this change,
populate_cache_leaves just populated the info as per ci_leaf_init in
arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
the remaining.

After this patch, the same is achieved in cache_shared_cpu_map_setup.

In both case, it was by the end of detect_cache_attributes, so I see no
issue.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-17 18:20         ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-17 18:20 UTC (permalink / raw)
  To: linux-arm-kernel



On 16/01/18 21:07, Jeremy Linton wrote:
> Hi,
> 
> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>> The original intent in cacheinfo was that an architecture
>>> specific populate_cache_leaves() would probe the hardware
>>> and then cache_shared_cpu_map_setup() and
>>> cache_override_properties() would provide firmware help to
>>> extend/expand upon what was probed. Arm64 was really
>>> the only architecture that was working this way, and
>>> with the removal of most of the hardware probing logic it
>>> became clear that it was possible to simplify the logic a bit.
>>>
>>> This patch combines the walk of the DT nodes with the
>>> code updating the cache size/line_size and nr_sets.
>>> cache_override_properties() (which was DT specific) is
>>> then removed. The result is that cacheinfo.of_node is
>>> no longer used as a temporary place to hold DT references
>>> for future calls that update cache properties. That change
>>> helps to clarify its one remaining use (matching
>>> cacheinfo nodes that represent shared caches) which
>>> will be used by the ACPI/PPTT code in the following patches.
>>>
>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>> Cc: Albert Ou <albert@sifive.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>> ? arch/riscv/kernel/cacheinfo.c |? 1 +
>>> ? drivers/base/cacheinfo.c????? | 65
>>> +++++++++++++++++++------------------------
>>> ? include/linux/cacheinfo.h???? |? 1 +
>>> ? 3 files changed, 31 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>> b/arch/riscv/kernel/cacheinfo.c
>>> index 10ed2749e246..6f4500233cf8 100644
>>> --- a/arch/riscv/kernel/cacheinfo.c
>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>> ????????? CACHE_WRITE_BACK
>>> ????????? | CACHE_READ_ALLOCATE
>>> ????????? | CACHE_WRITE_ALLOCATE;
>>> +??? cache_of_set_props(this_leaf, node);
>>
>> This may be necessary but can it be done as later patch ? So far nothing
>> is added that may break riscv IIUC.
> 
> Well I think you have a bisection issue where the additional information
> will disappear between this patch and wherever we put this code back in.
> 

Hmm, I am sorry but I fail to see the issue. Before this change,
populate_cache_leaves just populated the info as per ci_leaf_init in
arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
the remaining.

After this patch, the same is achieved in cache_shared_cpu_map_setup.

In both case, it was by the end of detect_cache_attributes, so I see no
issue.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-17 18:20         ` Sudeep Holla
@ 2018-01-17 18:51           ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-17 18:51 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Palmer Dabbelt, Albert Ou

Hi,

On 01/17/2018 12:20 PM, Sudeep Holla wrote:
> 
> 
> On 16/01/18 21:07, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>> The original intent in cacheinfo was that an architecture
>>>> specific populate_cache_leaves() would probe the hardware
>>>> and then cache_shared_cpu_map_setup() and
>>>> cache_override_properties() would provide firmware help to
>>>> extend/expand upon what was probed. Arm64 was really
>>>> the only architecture that was working this way, and
>>>> with the removal of most of the hardware probing logic it
>>>> became clear that it was possible to simplify the logic a bit.
>>>>
>>>> This patch combines the walk of the DT nodes with the
>>>> code updating the cache size/line_size and nr_sets.
>>>> cache_override_properties() (which was DT specific) is
>>>> then removed. The result is that cacheinfo.of_node is
>>>> no longer used as a temporary place to hold DT references
>>>> for future calls that update cache properties. That change
>>>> helps to clarify its one remaining use (matching
>>>> cacheinfo nodes that represent shared caches) which
>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>
>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>> Cc: Albert Ou <albert@sifive.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>    arch/riscv/kernel/cacheinfo.c |  1 +
>>>>    drivers/base/cacheinfo.c      | 65
>>>> +++++++++++++++++++------------------------
>>>>    include/linux/cacheinfo.h     |  1 +
>>>>    3 files changed, 31 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>> b/arch/riscv/kernel/cacheinfo.c
>>>> index 10ed2749e246..6f4500233cf8 100644
>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>>            CACHE_WRITE_BACK
>>>>            | CACHE_READ_ALLOCATE
>>>>            | CACHE_WRITE_ALLOCATE;
>>>> +    cache_of_set_props(this_leaf, node);
>>>
>>> This may be necessary but can it be done as later patch ? So far nothing
>>> is added that may break riscv IIUC.
>>
>> Well I think you have a bisection issue where the additional information
>> will disappear between this patch and wherever we put this code back in.
>>
> 
> Hmm, I am sorry but I fail to see the issue. Before this change,
> populate_cache_leaves just populated the info as per ci_leaf_init in
> arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
> the remaining.
> 
> After this patch, the same is achieved in cache_shared_cpu_map_setup.
> 
> In both case, it was by the end of detect_cache_attributes, so I see no
> issue.
> 


Hi,

I must be misunderstanding something.

AFAIK, The code in cache_setup_of_node() won't call cache_of_set_props() 
because it returns when there is an existing of_node (fw_unique) created 
by the riscv populate_cache_leaves(). That's why I'm making the direct 
call here. If we fail to get that change in place before 
cache_override_properties() is removed then the fields not set by the 
riscv code (size/etc) will be missing.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-17 18:51           ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-17 18:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 01/17/2018 12:20 PM, Sudeep Holla wrote:
> 
> 
> On 16/01/18 21:07, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>> The original intent in cacheinfo was that an architecture
>>>> specific populate_cache_leaves() would probe the hardware
>>>> and then cache_shared_cpu_map_setup() and
>>>> cache_override_properties() would provide firmware help to
>>>> extend/expand upon what was probed. Arm64 was really
>>>> the only architecture that was working this way, and
>>>> with the removal of most of the hardware probing logic it
>>>> became clear that it was possible to simplify the logic a bit.
>>>>
>>>> This patch combines the walk of the DT nodes with the
>>>> code updating the cache size/line_size and nr_sets.
>>>> cache_override_properties() (which was DT specific) is
>>>> then removed. The result is that cacheinfo.of_node is
>>>> no longer used as a temporary place to hold DT references
>>>> for future calls that update cache properties. That change
>>>> helps to clarify its one remaining use (matching
>>>> cacheinfo nodes that represent shared caches) which
>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>
>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>> Cc: Albert Ou <albert@sifive.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>  ? arch/riscv/kernel/cacheinfo.c |? 1 +
>>>>  ? drivers/base/cacheinfo.c????? | 65
>>>> +++++++++++++++++++------------------------
>>>>  ? include/linux/cacheinfo.h???? |? 1 +
>>>>  ? 3 files changed, 31 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>> b/arch/riscv/kernel/cacheinfo.c
>>>> index 10ed2749e246..6f4500233cf8 100644
>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>>  ????????? CACHE_WRITE_BACK
>>>>  ????????? | CACHE_READ_ALLOCATE
>>>>  ????????? | CACHE_WRITE_ALLOCATE;
>>>> +??? cache_of_set_props(this_leaf, node);
>>>
>>> This may be necessary but can it be done as later patch ? So far nothing
>>> is added that may break riscv IIUC.
>>
>> Well I think you have a bisection issue where the additional information
>> will disappear between this patch and wherever we put this code back in.
>>
> 
> Hmm, I am sorry but I fail to see the issue. Before this change,
> populate_cache_leaves just populated the info as per ci_leaf_init in
> arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
> the remaining.
> 
> After this patch, the same is achieved in cache_shared_cpu_map_setup.
> 
> In both case, it was by the end of detect_cache_attributes, so I see no
> issue.
> 


Hi,

I must be misunderstanding something.

AFAIK, The code in cache_setup_of_node() won't call cache_of_set_props() 
because it returns when there is an existing of_node (fw_unique) created 
by the riscv populate_cache_leaves(). That's why I'm making the direct 
call here. If we fail to get that change in place before 
cache_override_properties() is removed then the fields not set by the 
riscv code (size/etc) will be missing.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-17 18:51           ` Jeremy Linton
@ 2018-01-18 10:14             ` Sudeep Holla
  -1 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-18 10:14 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Sudeep Holla, linux-acpi, linux-arm-kernel, hanjun.guo,
	lorenzo.pieralisi, rjw, will.deacon, catalin.marinas, gregkh,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen, Palmer Dabbelt,
	Albert Ou



On 17/01/18 18:51, Jeremy Linton wrote:
> Hi,
> 
> On 01/17/2018 12:20 PM, Sudeep Holla wrote:
>>
>>
>> On 16/01/18 21:07, Jeremy Linton wrote:
>>> Hi,
>>>
>>> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>>> The original intent in cacheinfo was that an architecture
>>>>> specific populate_cache_leaves() would probe the hardware
>>>>> and then cache_shared_cpu_map_setup() and
>>>>> cache_override_properties() would provide firmware help to
>>>>> extend/expand upon what was probed. Arm64 was really
>>>>> the only architecture that was working this way, and
>>>>> with the removal of most of the hardware probing logic it
>>>>> became clear that it was possible to simplify the logic a bit.
>>>>>
>>>>> This patch combines the walk of the DT nodes with the
>>>>> code updating the cache size/line_size and nr_sets.
>>>>> cache_override_properties() (which was DT specific) is
>>>>> then removed. The result is that cacheinfo.of_node is
>>>>> no longer used as a temporary place to hold DT references
>>>>> for future calls that update cache properties. That change
>>>>> helps to clarify its one remaining use (matching
>>>>> cacheinfo nodes that represent shared caches) which
>>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>>
>>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>>> Cc: Albert Ou <albert@sifive.com>
>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>> ---
>>>>>    arch/riscv/kernel/cacheinfo.c |  1 +
>>>>>    drivers/base/cacheinfo.c      | 65
>>>>> +++++++++++++++++++------------------------
>>>>>    include/linux/cacheinfo.h     |  1 +
>>>>>    3 files changed, 31 insertions(+), 36 deletions(-)
>>>>>
>>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>>> b/arch/riscv/kernel/cacheinfo.c
>>>>> index 10ed2749e246..6f4500233cf8 100644
>>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo
>>>>> *this_leaf,
>>>>>            CACHE_WRITE_BACK
>>>>>            | CACHE_READ_ALLOCATE
>>>>>            | CACHE_WRITE_ALLOCATE;
>>>>> +    cache_of_set_props(this_leaf, node);
>>>>
>>>> This may be necessary but can it be done as later patch ? So far
>>>> nothing
>>>> is added that may break riscv IIUC.
>>>
>>> Well I think you have a bisection issue where the additional information
>>> will disappear between this patch and wherever we put this code back in.
>>>
>>
>> Hmm, I am sorry but I fail to see the issue. Before this change,
>> populate_cache_leaves just populated the info as per ci_leaf_init in
>> arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
>> the remaining.
>>
>> After this patch, the same is achieved in cache_shared_cpu_map_setup.
>>
>> In both case, it was by the end of detect_cache_attributes, so I see no
>> issue.
>>
> 
> 
> Hi,
> 
> I must be misunderstanding something.
> 

Looks like I was missing to understand something :)

> AFAIK, The code in cache_setup_of_node() won't call cache_of_set_props()
> because it returns when there is an existing of_node (fw_unique) created
> by the riscv populate_cache_leaves(). That's why I'm making the direct
> call here. If we fail to get that change in place before
> cache_override_properties() is removed then the fields not set by the
> riscv code (size/etc) will be missing.

Indeed. I am trying to avoid use of cache_of_set_props outside.
How about skipping setting up of fw_unique in ci_leaf_init instead ?

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-18 10:14             ` Sudeep Holla
  0 siblings, 0 replies; 104+ messages in thread
From: Sudeep Holla @ 2018-01-18 10:14 UTC (permalink / raw)
  To: linux-arm-kernel



On 17/01/18 18:51, Jeremy Linton wrote:
> Hi,
> 
> On 01/17/2018 12:20 PM, Sudeep Holla wrote:
>>
>>
>> On 16/01/18 21:07, Jeremy Linton wrote:
>>> Hi,
>>>
>>> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>>> The original intent in cacheinfo was that an architecture
>>>>> specific populate_cache_leaves() would probe the hardware
>>>>> and then cache_shared_cpu_map_setup() and
>>>>> cache_override_properties() would provide firmware help to
>>>>> extend/expand upon what was probed. Arm64 was really
>>>>> the only architecture that was working this way, and
>>>>> with the removal of most of the hardware probing logic it
>>>>> became clear that it was possible to simplify the logic a bit.
>>>>>
>>>>> This patch combines the walk of the DT nodes with the
>>>>> code updating the cache size/line_size and nr_sets.
>>>>> cache_override_properties() (which was DT specific) is
>>>>> then removed. The result is that cacheinfo.of_node is
>>>>> no longer used as a temporary place to hold DT references
>>>>> for future calls that update cache properties. That change
>>>>> helps to clarify its one remaining use (matching
>>>>> cacheinfo nodes that represent shared caches) which
>>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>>
>>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>>> Cc: Albert Ou <albert@sifive.com>
>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>> ---
>>>>> ?? arch/riscv/kernel/cacheinfo.c |? 1 +
>>>>> ?? drivers/base/cacheinfo.c????? | 65
>>>>> +++++++++++++++++++------------------------
>>>>> ?? include/linux/cacheinfo.h???? |? 1 +
>>>>> ?? 3 files changed, 31 insertions(+), 36 deletions(-)
>>>>>
>>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>>> b/arch/riscv/kernel/cacheinfo.c
>>>>> index 10ed2749e246..6f4500233cf8 100644
>>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo
>>>>> *this_leaf,
>>>>> ?????????? CACHE_WRITE_BACK
>>>>> ?????????? | CACHE_READ_ALLOCATE
>>>>> ?????????? | CACHE_WRITE_ALLOCATE;
>>>>> +??? cache_of_set_props(this_leaf, node);
>>>>
>>>> This may be necessary but can it be done as later patch ? So far
>>>> nothing
>>>> is added that may break riscv IIUC.
>>>
>>> Well I think you have a bisection issue where the additional information
>>> will disappear between this patch and wherever we put this code back in.
>>>
>>
>> Hmm, I am sorry but I fail to see the issue. Before this change,
>> populate_cache_leaves just populated the info as per ci_leaf_init in
>> arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
>> the remaining.
>>
>> After this patch, the same is achieved in cache_shared_cpu_map_setup.
>>
>> In both case, it was by the end of detect_cache_attributes, so I see no
>> issue.
>>
> 
> 
> Hi,
> 
> I must be misunderstanding something.
> 

Looks like I was missing to understand something :)

> AFAIK, The code in cache_setup_of_node() won't call cache_of_set_props()
> because it returns when there is an existing of_node (fw_unique) created
> by the riscv populate_cache_leaves(). That's why I'm making the direct
> call here. If we fail to get that change in place before
> cache_override_properties() is removed then the fields not set by the
> riscv code (size/etc) will be missing.

Indeed. I am trying to avoid use of cache_of_set_props outside.
How about skipping setting up of fw_unique in ci_leaf_init instead ?

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-17 18:08         ` Sudeep Holla
  (?)
@ 2018-01-18 17:36           ` Palmer Dabbelt
  -1 siblings, 0 replies; 104+ messages in thread
From: Palmer Dabbelt @ 2018-01-18 17:36 UTC (permalink / raw)
  Cc: sudeep.holla, jeremy.linton, linux-acpi, linux-arm-kernel,
	hanjun.guo, lorenzo.pieralisi, rjw, Will Deacon, catalin.marinas,
	Greg KH, viresh.kumar, mark.rutland, linux-kernel, linux-pm,
	jhugo, wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen, albert

On Wed, 17 Jan 2018 10:08:27 PST (-0800), sudeep.holla@arm.com wrote:
> (Sorry, somehow I missed this email until I saw Jeremy's reply today)
>
> On 15/01/18 16:07, Palmer Dabbelt wrote:
>> On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla@arm.com wrote:
>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>> The original intent in cacheinfo was that an architecture
>>>> specific populate_cache_leaves() would probe the hardware
>>>> and then cache_shared_cpu_map_setup() and
>>>> cache_override_properties() would provide firmware help to
>>>> extend/expand upon what was probed. Arm64 was really
>>>> the only architecture that was working this way, and
>>>> with the removal of most of the hardware probing logic it
>>>> became clear that it was possible to simplify the logic a bit.
>>>>
>>>> This patch combines the walk of the DT nodes with the
>>>> code updating the cache size/line_size and nr_sets.
>>>> cache_override_properties() (which was DT specific) is
>>>> then removed. The result is that cacheinfo.of_node is
>>>> no longer used as a temporary place to hold DT references
>>>> for future calls that update cache properties. That change
>>>> helps to clarify its one remaining use (matching
>>>> cacheinfo nodes that represent shared caches) which
>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>
>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>> Cc: Albert Ou <albert@sifive.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>  arch/riscv/kernel/cacheinfo.c |  1 +
>>>>  drivers/base/cacheinfo.c      | 65
>>>> +++++++++++++++++++------------------------
>>>>  include/linux/cacheinfo.h     |  1 +
>>>>  3 files changed, 31 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>> b/arch/riscv/kernel/cacheinfo.c
>>>> index 10ed2749e246..6f4500233cf8 100644
>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>>          CACHE_WRITE_BACK
>>>>          | CACHE_READ_ALLOCATE
>>>>          | CACHE_WRITE_ALLOCATE;
>>>> +    cache_of_set_props(this_leaf, node);
>>>
>>> This may be necessary but can it be done as later patch ? So far nothing
>>> is added that may break riscv IIUC.
>>>
>>> Palmer, Albert,
>>>
>>> Can you confirm ? Also, as I see we can thin down arch specific
>>> implementation on riscv if it's just using DT like ARM64. Sorry if
>>> I am missing to see something, so thought of checking.
>>>
>>> [...]
>>
>> Sorry, I guess I'm a bit confused as to what's going on here.  RISC-V
>> uses device tree on all Linux systems.
>>
>
> Good. By thin down, I was thinking of moving the init_cache_level and
> populate_cache_leaves implementation of riscv to generic weak function
> under CONFIG_OF. You may even endup deleting riscv cacheinfo.c
>
> Just a thought, sorry for not being clear earlier.

OK, well, I'm always happy to convert things to generic implementations.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-18 17:36           ` Palmer Dabbelt
  0 siblings, 0 replies; 104+ messages in thread
From: Palmer Dabbelt @ 2018-01-18 17:36 UTC (permalink / raw)
  To: sudeep.holla
  Cc: sudeep.holla, jeremy.linton, linux-acpi, linux-arm-kernel,
	hanjun.guo, lorenzo.pieralisi, rjw, Will Deacon, catalin.marinas,
	Greg KH, viresh.kumar, mark.rutland, linux-kernel, linux-pm,
	jhugo, wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen, albert

On Wed, 17 Jan 2018 10:08:27 PST (-0800), sudeep.holla@arm.com wrote:
> (Sorry, somehow I missed this email until I saw Jeremy's reply today)
>
> On 15/01/18 16:07, Palmer Dabbelt wrote:
>> On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla@arm.com wrote:
>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>> The original intent in cacheinfo was that an architecture
>>>> specific populate_cache_leaves() would probe the hardware
>>>> and then cache_shared_cpu_map_setup() and
>>>> cache_override_properties() would provide firmware help to
>>>> extend/expand upon what was probed. Arm64 was really
>>>> the only architecture that was working this way, and
>>>> with the removal of most of the hardware probing logic it
>>>> became clear that it was possible to simplify the logic a bit.
>>>>
>>>> This patch combines the walk of the DT nodes with the
>>>> code updating the cache size/line_size and nr_sets.
>>>> cache_override_properties() (which was DT specific) is
>>>> then removed. The result is that cacheinfo.of_node is
>>>> no longer used as a temporary place to hold DT references
>>>> for future calls that update cache properties. That change
>>>> helps to clarify its one remaining use (matching
>>>> cacheinfo nodes that represent shared caches) which
>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>
>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>> Cc: Albert Ou <albert@sifive.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>  arch/riscv/kernel/cacheinfo.c |  1 +
>>>>  drivers/base/cacheinfo.c      | 65
>>>> +++++++++++++++++++------------------------
>>>>  include/linux/cacheinfo.h     |  1 +
>>>>  3 files changed, 31 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>> b/arch/riscv/kernel/cacheinfo.c
>>>> index 10ed2749e246..6f4500233cf8 100644
>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>>          CACHE_WRITE_BACK
>>>>          | CACHE_READ_ALLOCATE
>>>>          | CACHE_WRITE_ALLOCATE;
>>>> +    cache_of_set_props(this_leaf, node);
>>>
>>> This may be necessary but can it be done as later patch ? So far nothing
>>> is added that may break riscv IIUC.
>>>
>>> Palmer, Albert,
>>>
>>> Can you confirm ? Also, as I see we can thin down arch specific
>>> implementation on riscv if it's just using DT like ARM64. Sorry if
>>> I am missing to see something, so thought of checking.
>>>
>>> [...]
>>
>> Sorry, I guess I'm a bit confused as to what's going on here.  RISC-V
>> uses device tree on all Linux systems.
>>
>
> Good. By thin down, I was thinking of moving the init_cache_level and
> populate_cache_leaves implementation of riscv to generic weak function
> under CONFIG_OF. You may even endup deleting riscv cacheinfo.c
>
> Just a thought, sorry for not being clear earlier.

OK, well, I'm always happy to convert things to generic implementations.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-18 17:36           ` Palmer Dabbelt
  0 siblings, 0 replies; 104+ messages in thread
From: Palmer Dabbelt @ 2018-01-18 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 17 Jan 2018 10:08:27 PST (-0800), sudeep.holla at arm.com wrote:
> (Sorry, somehow I missed this email until I saw Jeremy's reply today)
>
> On 15/01/18 16:07, Palmer Dabbelt wrote:
>> On Mon, 15 Jan 2018 04:33:38 PST (-0800), sudeep.holla at arm.com wrote:
>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>> The original intent in cacheinfo was that an architecture
>>>> specific populate_cache_leaves() would probe the hardware
>>>> and then cache_shared_cpu_map_setup() and
>>>> cache_override_properties() would provide firmware help to
>>>> extend/expand upon what was probed. Arm64 was really
>>>> the only architecture that was working this way, and
>>>> with the removal of most of the hardware probing logic it
>>>> became clear that it was possible to simplify the logic a bit.
>>>>
>>>> This patch combines the walk of the DT nodes with the
>>>> code updating the cache size/line_size and nr_sets.
>>>> cache_override_properties() (which was DT specific) is
>>>> then removed. The result is that cacheinfo.of_node is
>>>> no longer used as a temporary place to hold DT references
>>>> for future calls that update cache properties. That change
>>>> helps to clarify its one remaining use (matching
>>>> cacheinfo nodes that represent shared caches) which
>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>
>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>> Cc: Albert Ou <albert@sifive.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>> ?arch/riscv/kernel/cacheinfo.c |? 1 +
>>>> ?drivers/base/cacheinfo.c????? | 65
>>>> +++++++++++++++++++------------------------
>>>> ?include/linux/cacheinfo.h???? |? 1 +
>>>> ?3 files changed, 31 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>> b/arch/riscv/kernel/cacheinfo.c
>>>> index 10ed2749e246..6f4500233cf8 100644
>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
>>>> ???????? CACHE_WRITE_BACK
>>>> ???????? | CACHE_READ_ALLOCATE
>>>> ???????? | CACHE_WRITE_ALLOCATE;
>>>> +??? cache_of_set_props(this_leaf, node);
>>>
>>> This may be necessary but can it be done as later patch ? So far nothing
>>> is added that may break riscv IIUC.
>>>
>>> Palmer, Albert,
>>>
>>> Can you confirm ? Also, as I see we can thin down arch specific
>>> implementation on riscv if it's just using DT like ARM64. Sorry if
>>> I am missing to see something, so thought of checking.
>>>
>>> [...]
>>
>> Sorry, I guess I'm a bit confused as to what's going on here.? RISC-V
>> uses device tree on all Linux systems.
>>
>
> Good. By thin down, I was thinking of moving the init_cache_level and
> populate_cache_leaves implementation of riscv to generic weak function
> under CONFIG_OF. You may even endup deleting riscv cacheinfo.c
>
> Just a thought, sorry for not being clear earlier.

OK, well, I'm always happy to convert things to generic implementations.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
  2018-01-18 10:14             ` Sudeep Holla
@ 2018-01-19 23:27               ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-19 23:27 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-acpi, linux-arm-kernel, hanjun.guo, lorenzo.pieralisi, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, wangxiongfeng2, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Palmer Dabbelt, Albert Ou

Hi,

On 01/18/2018 04:14 AM, Sudeep Holla wrote:
> 
> 
> On 17/01/18 18:51, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/17/2018 12:20 PM, Sudeep Holla wrote:
>>>
>>>
>>> On 16/01/18 21:07, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>>>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>>>> The original intent in cacheinfo was that an architecture
>>>>>> specific populate_cache_leaves() would probe the hardware
>>>>>> and then cache_shared_cpu_map_setup() and
>>>>>> cache_override_properties() would provide firmware help to
>>>>>> extend/expand upon what was probed. Arm64 was really
>>>>>> the only architecture that was working this way, and
>>>>>> with the removal of most of the hardware probing logic it
>>>>>> became clear that it was possible to simplify the logic a bit.
>>>>>>
>>>>>> This patch combines the walk of the DT nodes with the
>>>>>> code updating the cache size/line_size and nr_sets.
>>>>>> cache_override_properties() (which was DT specific) is
>>>>>> then removed. The result is that cacheinfo.of_node is
>>>>>> no longer used as a temporary place to hold DT references
>>>>>> for future calls that update cache properties. That change
>>>>>> helps to clarify its one remaining use (matching
>>>>>> cacheinfo nodes that represent shared caches) which
>>>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>>>
>>>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>>>> Cc: Albert Ou <albert@sifive.com>
>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>>> ---
>>>>>>     arch/riscv/kernel/cacheinfo.c |  1 +
>>>>>>     drivers/base/cacheinfo.c      | 65
>>>>>> +++++++++++++++++++------------------------
>>>>>>     include/linux/cacheinfo.h     |  1 +
>>>>>>     3 files changed, 31 insertions(+), 36 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>>>> b/arch/riscv/kernel/cacheinfo.c
>>>>>> index 10ed2749e246..6f4500233cf8 100644
>>>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo
>>>>>> *this_leaf,
>>>>>>             CACHE_WRITE_BACK
>>>>>>             | CACHE_READ_ALLOCATE
>>>>>>             | CACHE_WRITE_ALLOCATE;
>>>>>> +    cache_of_set_props(this_leaf, node);
>>>>>
>>>>> This may be necessary but can it be done as later patch ? So far
>>>>> nothing
>>>>> is added that may break riscv IIUC.
>>>>
>>>> Well I think you have a bisection issue where the additional information
>>>> will disappear between this patch and wherever we put this code back in.
>>>>
>>>
>>> Hmm, I am sorry but I fail to see the issue. Before this change,
>>> populate_cache_leaves just populated the info as per ci_leaf_init in
>>> arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
>>> the remaining.
>>>
>>> After this patch, the same is achieved in cache_shared_cpu_map_setup.
>>>
>>> In both case, it was by the end of detect_cache_attributes, so I see no
>>> issue.
>>>
>>
>>
>> Hi,
>>
>> I must be misunderstanding something.
>>
> 
> Looks like I was missing to understand something :)
> 
>> AFAIK, The code in cache_setup_of_node() won't call cache_of_set_props()
>> because it returns when there is an existing of_node (fw_unique) created
>> by the riscv populate_cache_leaves(). That's why I'm making the direct
>> call here. If we fail to get that change in place before
>> cache_override_properties() is removed then the fields not set by the
>> riscv code (size/etc) will be missing.
> 
> Indeed. I am trying to avoid use of cache_of_set_props outside.
> How about skipping setting up of fw_unique in ci_leaf_init instead ?
> 

I've been thinking about this, but I'm hesitant because I don't have a 
good test platform for this code. Plus, I'm not 100% sure that the 
results are the same (is it possible that the platform setup node isn't 
the same as the one the the common code would find?).

I also think I'm getting a little off topic with these patches in 
relation to what the core goal is (adding PPTT parsing).



^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early
@ 2018-01-19 23:27               ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-19 23:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 01/18/2018 04:14 AM, Sudeep Holla wrote:
> 
> 
> On 17/01/18 18:51, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/17/2018 12:20 PM, Sudeep Holla wrote:
>>>
>>>
>>> On 16/01/18 21:07, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 01/15/2018 06:33 AM, Sudeep Holla wrote:
>>>>> On Fri, Jan 12, 2018 at 06:59:10PM -0600, Jeremy Linton wrote:
>>>>>> The original intent in cacheinfo was that an architecture
>>>>>> specific populate_cache_leaves() would probe the hardware
>>>>>> and then cache_shared_cpu_map_setup() and
>>>>>> cache_override_properties() would provide firmware help to
>>>>>> extend/expand upon what was probed. Arm64 was really
>>>>>> the only architecture that was working this way, and
>>>>>> with the removal of most of the hardware probing logic it
>>>>>> became clear that it was possible to simplify the logic a bit.
>>>>>>
>>>>>> This patch combines the walk of the DT nodes with the
>>>>>> code updating the cache size/line_size and nr_sets.
>>>>>> cache_override_properties() (which was DT specific) is
>>>>>> then removed. The result is that cacheinfo.of_node is
>>>>>> no longer used as a temporary place to hold DT references
>>>>>> for future calls that update cache properties. That change
>>>>>> helps to clarify its one remaining use (matching
>>>>>> cacheinfo nodes that represent shared caches) which
>>>>>> will be used by the ACPI/PPTT code in the following patches.
>>>>>>
>>>>>> Cc: Palmer Dabbelt <palmer@sifive.com>
>>>>>> Cc: Albert Ou <albert@sifive.com>
>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>>> ---
>>>>>>  ?? arch/riscv/kernel/cacheinfo.c |? 1 +
>>>>>>  ?? drivers/base/cacheinfo.c????? | 65
>>>>>> +++++++++++++++++++------------------------
>>>>>>  ?? include/linux/cacheinfo.h???? |? 1 +
>>>>>>  ?? 3 files changed, 31 insertions(+), 36 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/riscv/kernel/cacheinfo.c
>>>>>> b/arch/riscv/kernel/cacheinfo.c
>>>>>> index 10ed2749e246..6f4500233cf8 100644
>>>>>> --- a/arch/riscv/kernel/cacheinfo.c
>>>>>> +++ b/arch/riscv/kernel/cacheinfo.c
>>>>>> @@ -30,6 +30,7 @@ static void ci_leaf_init(struct cacheinfo
>>>>>> *this_leaf,
>>>>>>  ?????????? CACHE_WRITE_BACK
>>>>>>  ?????????? | CACHE_READ_ALLOCATE
>>>>>>  ?????????? | CACHE_WRITE_ALLOCATE;
>>>>>> +??? cache_of_set_props(this_leaf, node);
>>>>>
>>>>> This may be necessary but can it be done as later patch ? So far
>>>>> nothing
>>>>> is added that may break riscv IIUC.
>>>>
>>>> Well I think you have a bisection issue where the additional information
>>>> will disappear between this patch and wherever we put this code back in.
>>>>
>>>
>>> Hmm, I am sorry but I fail to see the issue. Before this change,
>>> populate_cache_leaves just populated the info as per ci_leaf_init in
>>> arch/riscv/kernel/cacheinfo.c and cache_override_properties used to fill
>>> the remaining.
>>>
>>> After this patch, the same is achieved in cache_shared_cpu_map_setup.
>>>
>>> In both case, it was by the end of detect_cache_attributes, so I see no
>>> issue.
>>>
>>
>>
>> Hi,
>>
>> I must be misunderstanding something.
>>
> 
> Looks like I was missing to understand something :)
> 
>> AFAIK, The code in cache_setup_of_node() won't call cache_of_set_props()
>> because it returns when there is an existing of_node (fw_unique) created
>> by the riscv populate_cache_leaves(). That's why I'm making the direct
>> call here. If we fail to get that change in place before
>> cache_override_properties() is removed then the fields not set by the
>> riscv code (size/etc) will be missing.
> 
> Indeed. I am trying to avoid use of cache_of_set_props outside.
> How about skipping setting up of fw_unique in ci_leaf_init instead ?
> 

I've been thinking about this, but I'm hesitant because I don't have a 
good test platform for this code. Plus, I'm not 100% sure that the 
results are the same (is it possible that the platform setup node isn't 
the same as the one the the common code would find?).

I also think I'm getting a little off topic with these patches in 
relation to what the core goal is (adding PPTT parsing).

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
  2018-01-13  0:59   ` Jeremy Linton
@ 2018-01-22 15:50     ` Greg KH
  -1 siblings, 0 replies; 104+ messages in thread
From: Greg KH @ 2018-01-22 15:50 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo,
	lorenzo.pieralisi, rjw, will.deacon, catalin.marinas,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen

On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
> node which can be used to match identical caches across cores. Also
> stub out cache_setup_acpi() so that individual architectures can
> enable ACPI topology parsing.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c       |  1 +
>  drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>  include/linux/cacheinfo.h |  9 +++++++++
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 2c4b3ed862a8..4f5ab19c3a08 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  {
>  	int valid_flags = 0;
>  
> +	this_leaf->fw_unique = cpu_node;
>  	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>  		this_leaf->size = found_cache->size;
>  		valid_flags++;
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 217aa90fb036..ee51e33cc37c 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>  
>  	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>  		return -ENOENT;
> -
>  	return 0;
>  }
> +

Whitespace changes not needed for this patch :(


>  #else
>  static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
>  static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  					   struct cacheinfo *sib_leaf)
>  {
>  	/*
> -	 * For non-DT systems, assume unique level 1 cache, system-wide
> +	 * For non-DT/ACPI systems, assume unique level 1 caches, system-wide
>  	 * shared caches for all other levels. This will be used only if
>  	 * arch specific code has not populated shared_cpu_map
>  	 */
> @@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  }
>  #endif
>  
> +int __weak cache_setup_acpi(unsigned int cpu)
> +{
> +	return -ENOTSUPP;
> +}
> +
>  static int cache_shared_cpu_map_setup(unsigned int cpu)
>  {
>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> @@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>  	if (this_cpu_ci->cpu_map_populated)
>  		return 0;
>  
> -	if (of_have_populated_dt())
> +	if (!acpi_disabled)
> +		ret = cache_setup_acpi(cpu);

Why does acpi go first?  :)

> +	else if (of_have_populated_dt())
>  		ret = cache_setup_of_node(cpu);
> -	else if (!acpi_disabled)
> -		/* No cache property/hierarchy support yet in ACPI */
> -		ret = -ENOTSUPP;
> +
>  	if (ret)
>  		return ret;
>  
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	/*ACPI kernels should be built with PPTT support*/

Here are some extra ' ' characters, you need them...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
@ 2018-01-22 15:50     ` Greg KH
  0 siblings, 0 replies; 104+ messages in thread
From: Greg KH @ 2018-01-22 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
> node which can be used to match identical caches across cores. Also
> stub out cache_setup_acpi() so that individual architectures can
> enable ACPI topology parsing.
> 
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  drivers/acpi/pptt.c       |  1 +
>  drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>  include/linux/cacheinfo.h |  9 +++++++++
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 2c4b3ed862a8..4f5ab19c3a08 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>  {
>  	int valid_flags = 0;
>  
> +	this_leaf->fw_unique = cpu_node;
>  	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>  		this_leaf->size = found_cache->size;
>  		valid_flags++;
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 217aa90fb036..ee51e33cc37c 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>  
>  	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>  		return -ENOENT;
> -
>  	return 0;
>  }
> +

Whitespace changes not needed for this patch :(


>  #else
>  static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
>  static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  					   struct cacheinfo *sib_leaf)
>  {
>  	/*
> -	 * For non-DT systems, assume unique level 1 cache, system-wide
> +	 * For non-DT/ACPI systems, assume unique level 1 caches, system-wide
>  	 * shared caches for all other levels. This will be used only if
>  	 * arch specific code has not populated shared_cpu_map
>  	 */
> @@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>  }
>  #endif
>  
> +int __weak cache_setup_acpi(unsigned int cpu)
> +{
> +	return -ENOTSUPP;
> +}
> +
>  static int cache_shared_cpu_map_setup(unsigned int cpu)
>  {
>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> @@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>  	if (this_cpu_ci->cpu_map_populated)
>  		return 0;
>  
> -	if (of_have_populated_dt())
> +	if (!acpi_disabled)
> +		ret = cache_setup_acpi(cpu);

Why does acpi go first?  :)

> +	else if (of_have_populated_dt())
>  		ret = cache_setup_of_node(cpu);
> -	else if (!acpi_disabled)
> -		/* No cache property/hierarchy support yet in ACPI */
> -		ret = -ENOTSUPP;
> +
>  	if (ret)
>  		return ret;
>  
> +int acpi_find_last_cache_level(unsigned int cpu)
> +{
> +	/*ACPI kernels should be built with PPTT support*/

Here are some extra ' ' characters, you need them...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
  2018-01-22 15:50     ` Greg KH
@ 2018-01-22 21:14       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-22 21:14 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo,
	lorenzo.pieralisi, rjw, will.deacon, catalin.marinas,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	wangxiongfeng2, Jonathan.Zhang, ahs3, Jayachandran.Nair,
	austinwc, lenb, vkilari, morten.rasmussen

Hi,

Thanks for taking a look at this.

On 01/22/2018 09:50 AM, Greg KH wrote:
> On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
>> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
>> node which can be used to match identical caches across cores. Also
>> stub out cache_setup_acpi() so that individual architectures can
>> enable ACPI topology parsing.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/pptt.c       |  1 +
>>   drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>>   include/linux/cacheinfo.h |  9 +++++++++
>>   3 files changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 2c4b3ed862a8..4f5ab19c3a08 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>>   {
>>   	int valid_flags = 0;
>>   
>> +	this_leaf->fw_unique = cpu_node;
>>   	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>>   		this_leaf->size = found_cache->size;
>>   		valid_flags++;
>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>> index 217aa90fb036..ee51e33cc37c 100644
>> --- a/drivers/base/cacheinfo.c
>> +++ b/drivers/base/cacheinfo.c
>> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>>   
>>   	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>>   		return -ENOENT;
>> -
>>   	return 0;
>>   }
>> +
> 
> Whitespace changes not needed for this patch :(

Sure.

> 
> 
>>   #else
>>   static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
>>   static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   					   struct cacheinfo *sib_leaf)
>>   {
>>   	/*
>> -	 * For non-DT systems, assume unique level 1 cache, system-wide
>> +	 * For non-DT/ACPI systems, assume unique level 1 caches, system-wide
>>   	 * shared caches for all other levels. This will be used only if
>>   	 * arch specific code has not populated shared_cpu_map
>>   	 */
>> @@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   }
>>   #endif
>>   
>> +int __weak cache_setup_acpi(unsigned int cpu)
>> +{
>> +	return -ENOTSUPP;
>> +}
>> +
>>   static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   {
>>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> @@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   	if (this_cpu_ci->cpu_map_populated)
>>   		return 0;
>>   
>> -	if (of_have_populated_dt())
>> +	if (!acpi_disabled)
>> +		ret = cache_setup_acpi(cpu);
> 
> Why does acpi go first?  :)

This sounds like a joke i heard...

OTOH, given that we have machines with both ACPI and DT tables, it 
seemed a little clearer and a little more robust to code that so that if 
ACPI is enabled to prefer it over DT information. As long as the 
routines which set of of_root are protected by if (acpi_disabled) checks 
it should be safe to do it either way.


> 
>> +	else if (of_have_populated_dt())
>>   		ret = cache_setup_of_node(cpu);
>> -	else if (!acpi_disabled)
>> -		/* No cache property/hierarchy support yet in ACPI */
>> -		ret = -ENOTSUPP;
>> +
>>   	if (ret)
>>   		return ret;
>>   
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
> 
> Here are some extra ' ' characters, you need them...

Oh ok, thanks! :)



^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
@ 2018-01-22 21:14       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-22 21:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Thanks for taking a look at this.

On 01/22/2018 09:50 AM, Greg KH wrote:
> On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
>> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
>> node which can be used to match identical caches across cores. Also
>> stub out cache_setup_acpi() so that individual architectures can
>> enable ACPI topology parsing.
>>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   drivers/acpi/pptt.c       |  1 +
>>   drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>>   include/linux/cacheinfo.h |  9 +++++++++
>>   3 files changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 2c4b3ed862a8..4f5ab19c3a08 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo *this_leaf,
>>   {
>>   	int valid_flags = 0;
>>   
>> +	this_leaf->fw_unique = cpu_node;
>>   	if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>>   		this_leaf->size = found_cache->size;
>>   		valid_flags++;
>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>> index 217aa90fb036..ee51e33cc37c 100644
>> --- a/drivers/base/cacheinfo.c
>> +++ b/drivers/base/cacheinfo.c
>> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>>   
>>   	if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>>   		return -ENOENT;
>> -
>>   	return 0;
>>   }
>> +
> 
> Whitespace changes not needed for this patch :(

Sure.

> 
> 
>>   #else
>>   static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
>>   static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   					   struct cacheinfo *sib_leaf)
>>   {
>>   	/*
>> -	 * For non-DT systems, assume unique level 1 cache, system-wide
>> +	 * For non-DT/ACPI systems, assume unique level 1 caches, system-wide
>>   	 * shared caches for all other levels. This will be used only if
>>   	 * arch specific code has not populated shared_cpu_map
>>   	 */
>> @@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>   }
>>   #endif
>>   
>> +int __weak cache_setup_acpi(unsigned int cpu)
>> +{
>> +	return -ENOTSUPP;
>> +}
>> +
>>   static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   {
>>   	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> @@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
>>   	if (this_cpu_ci->cpu_map_populated)
>>   		return 0;
>>   
>> -	if (of_have_populated_dt())
>> +	if (!acpi_disabled)
>> +		ret = cache_setup_acpi(cpu);
> 
> Why does acpi go first?  :)

This sounds like a joke i heard...

OTOH, given that we have machines with both ACPI and DT tables, it 
seemed a little clearer and a little more robust to code that so that if 
ACPI is enabled to prefer it over DT information. As long as the 
routines which set of of_root are protected by if (acpi_disabled) checks 
it should be safe to do it either way.


> 
>> +	else if (of_have_populated_dt())
>>   		ret = cache_setup_of_node(cpu);
>> -	else if (!acpi_disabled)
>> -		/* No cache property/hierarchy support yet in ACPI */
>> -		ret = -ENOTSUPP;
>> +
>>   	if (ret)
>>   		return ret;
>>   
>> +int acpi_find_last_cache_level(unsigned int cpu)
>> +{
>> +	/*ACPI kernels should be built with PPTT support*/
> 
> Here are some extra ' ' characters, you need them...

Oh ok, thanks! :)

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
  2018-01-22 21:14       ` Jeremy Linton
  (?)
@ 2018-01-23  0:11         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 104+ messages in thread
From: Rafael J. Wysocki @ 2018-01-23  0:11 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Greg KH, ACPI Devel Maling List, linux-arm-kernel, Sudeep Holla,
	Hanjun Guo, Lorenzo Pieralisi, Rafael J. Wysocki, Will Deacon,
	Catalin Marinas, Viresh Kumar, Mark Rutland,
	Linux Kernel Mailing List, Linux PM, jhugo, wangxiongfeng2,
	Jonathan.Zhang, Al Stone, Jayachandran.Nair, austinwc, Len Brown,
	vkilari, Morten

On Mon, Jan 22, 2018 at 10:14 PM, Jeremy Linton <jeremy.linton@arm.com> wrote:
> Hi,
>
> Thanks for taking a look at this.
>
>
> On 01/22/2018 09:50 AM, Greg KH wrote:
>>
>> On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
>>>
>>> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
>>> node which can be used to match identical caches across cores. Also
>>> stub out cache_setup_acpi() so that individual architectures can
>>> enable ACPI topology parsing.
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   drivers/acpi/pptt.c       |  1 +
>>>   drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>>>   include/linux/cacheinfo.h |  9 +++++++++
>>>   3 files changed, 23 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>> index 2c4b3ed862a8..4f5ab19c3a08 100644
>>> --- a/drivers/acpi/pptt.c
>>> +++ b/drivers/acpi/pptt.c
>>> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo
>>> *this_leaf,
>>>   {
>>>         int valid_flags = 0;
>>>   +     this_leaf->fw_unique = cpu_node;
>>>         if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>>>                 this_leaf->size = found_cache->size;
>>>                 valid_flags++;
>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>>> index 217aa90fb036..ee51e33cc37c 100644
>>> --- a/drivers/base/cacheinfo.c
>>> +++ b/drivers/base/cacheinfo.c
>>> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>>>         if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>>>                 return -ENOENT;
>>> -
>>>         return 0;
>>>   }
>>> +
>>
>>
>> Whitespace changes not needed for this patch :(
>
>
> Sure.
>
>
>>
>>
>>>   #else
>>>   static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
>>>   static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>>                                            struct cacheinfo *sib_leaf)
>>>   {
>>>         /*
>>> -        * For non-DT systems, assume unique level 1 cache, system-wide
>>> +        * For non-DT/ACPI systems, assume unique level 1 caches,
>>> system-wide
>>>          * shared caches for all other levels. This will be used only if
>>>          * arch specific code has not populated shared_cpu_map
>>>          */
>>> @@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct
>>> cacheinfo *this_leaf,
>>>   }
>>>   #endif
>>>   +int __weak cache_setup_acpi(unsigned int cpu)
>>> +{
>>> +       return -ENOTSUPP;
>>> +}
>>> +
>>>   static int cache_shared_cpu_map_setup(unsigned int cpu)
>>>   {
>>>         struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>>> @@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int
>>> cpu)
>>>         if (this_cpu_ci->cpu_map_populated)
>>>                 return 0;
>>>   -     if (of_have_populated_dt())
>>> +       if (!acpi_disabled)
>>> +               ret = cache_setup_acpi(cpu);
>>
>>
>> Why does acpi go first?  :)
>
>
> This sounds like a joke i heard...
>
> OTOH, given that we have machines with both ACPI and DT tables, it seemed a
> little clearer and a little more robust to code that so that if ACPI is
> enabled to prefer it over DT information. As long as the routines which set
> of of_root are protected by if (acpi_disabled) checks it should be safe to
> do it either way.

I guess adding a comment about that might help.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
@ 2018-01-23  0:11         ` Rafael J. Wysocki
  0 siblings, 0 replies; 104+ messages in thread
From: Rafael J. Wysocki @ 2018-01-23  0:11 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Greg KH, ACPI Devel Maling List, linux-arm-kernel, Sudeep Holla,
	Hanjun Guo, Lorenzo Pieralisi, Rafael J. Wysocki, Will Deacon,
	Catalin Marinas, Viresh Kumar, Mark Rutland,
	Linux Kernel Mailing List, Linux PM, jhugo, wangxiongfeng2,
	Jonathan.Zhang, Al Stone, Jayachandran.Nair, austinwc, Len Brown,
	vkilari, Morten Rasmussen

On Mon, Jan 22, 2018 at 10:14 PM, Jeremy Linton <jeremy.linton@arm.com> wrote:
> Hi,
>
> Thanks for taking a look at this.
>
>
> On 01/22/2018 09:50 AM, Greg KH wrote:
>>
>> On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
>>>
>>> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
>>> node which can be used to match identical caches across cores. Also
>>> stub out cache_setup_acpi() so that individual architectures can
>>> enable ACPI topology parsing.
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   drivers/acpi/pptt.c       |  1 +
>>>   drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>>>   include/linux/cacheinfo.h |  9 +++++++++
>>>   3 files changed, 23 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>> index 2c4b3ed862a8..4f5ab19c3a08 100644
>>> --- a/drivers/acpi/pptt.c
>>> +++ b/drivers/acpi/pptt.c
>>> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo
>>> *this_leaf,
>>>   {
>>>         int valid_flags = 0;
>>>   +     this_leaf->fw_unique = cpu_node;
>>>         if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>>>                 this_leaf->size = found_cache->size;
>>>                 valid_flags++;
>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>>> index 217aa90fb036..ee51e33cc37c 100644
>>> --- a/drivers/base/cacheinfo.c
>>> +++ b/drivers/base/cacheinfo.c
>>> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>>>         if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>>>                 return -ENOENT;
>>> -
>>>         return 0;
>>>   }
>>> +
>>
>>
>> Whitespace changes not needed for this patch :(
>
>
> Sure.
>
>
>>
>>
>>>   #else
>>>   static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
>>>   static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>>                                            struct cacheinfo *sib_leaf)
>>>   {
>>>         /*
>>> -        * For non-DT systems, assume unique level 1 cache, system-wide
>>> +        * For non-DT/ACPI systems, assume unique level 1 caches,
>>> system-wide
>>>          * shared caches for all other levels. This will be used only if
>>>          * arch specific code has not populated shared_cpu_map
>>>          */
>>> @@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct
>>> cacheinfo *this_leaf,
>>>   }
>>>   #endif
>>>   +int __weak cache_setup_acpi(unsigned int cpu)
>>> +{
>>> +       return -ENOTSUPP;
>>> +}
>>> +
>>>   static int cache_shared_cpu_map_setup(unsigned int cpu)
>>>   {
>>>         struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>>> @@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int
>>> cpu)
>>>         if (this_cpu_ci->cpu_map_populated)
>>>                 return 0;
>>>   -     if (of_have_populated_dt())
>>> +       if (!acpi_disabled)
>>> +               ret = cache_setup_acpi(cpu);
>>
>>
>> Why does acpi go first?  :)
>
>
> This sounds like a joke i heard...
>
> OTOH, given that we have machines with both ACPI and DT tables, it seemed a
> little clearer and a little more robust to code that so that if ACPI is
> enabled to prefer it over DT information. As long as the routines which set
> of of_root are protected by if (acpi_disabled) checks it should be safe to
> do it either way.

I guess adding a comment about that might help.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables
@ 2018-01-23  0:11         ` Rafael J. Wysocki
  0 siblings, 0 replies; 104+ messages in thread
From: Rafael J. Wysocki @ 2018-01-23  0:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 22, 2018 at 10:14 PM, Jeremy Linton <jeremy.linton@arm.com> wrote:
> Hi,
>
> Thanks for taking a look at this.
>
>
> On 01/22/2018 09:50 AM, Greg KH wrote:
>>
>> On Fri, Jan 12, 2018 at 06:59:15PM -0600, Jeremy Linton wrote:
>>>
>>> Add a entry to to struct cacheinfo to maintain a reference to the PPTT
>>> node which can be used to match identical caches across cores. Also
>>> stub out cache_setup_acpi() so that individual architectures can
>>> enable ACPI topology parsing.
>>>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   drivers/acpi/pptt.c       |  1 +
>>>   drivers/base/cacheinfo.c  | 20 +++++++++++++-------
>>>   include/linux/cacheinfo.h |  9 +++++++++
>>>   3 files changed, 23 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>> index 2c4b3ed862a8..4f5ab19c3a08 100644
>>> --- a/drivers/acpi/pptt.c
>>> +++ b/drivers/acpi/pptt.c
>>> @@ -329,6 +329,7 @@ static void update_cache_properties(struct cacheinfo
>>> *this_leaf,
>>>   {
>>>         int valid_flags = 0;
>>>   +     this_leaf->fw_unique = cpu_node;
>>>         if (found_cache->flags & ACPI_PPTT_SIZE_PROPERTY_VALID) {
>>>                 this_leaf->size = found_cache->size;
>>>                 valid_flags++;
>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>>> index 217aa90fb036..ee51e33cc37c 100644
>>> --- a/drivers/base/cacheinfo.c
>>> +++ b/drivers/base/cacheinfo.c
>>> @@ -208,16 +208,16 @@ static int cache_setup_of_node(unsigned int cpu)
>>>         if (index != cache_leaves(cpu)) /* not all OF nodes populated */
>>>                 return -ENOENT;
>>> -
>>>         return 0;
>>>   }
>>> +
>>
>>
>> Whitespace changes not needed for this patch :(
>
>
> Sure.
>
>
>>
>>
>>>   #else
>>>   static inline int cache_setup_of_node(unsigned int cpu) { return 0; }
>>>   static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
>>>                                            struct cacheinfo *sib_leaf)
>>>   {
>>>         /*
>>> -        * For non-DT systems, assume unique level 1 cache, system-wide
>>> +        * For non-DT/ACPI systems, assume unique level 1 caches,
>>> system-wide
>>>          * shared caches for all other levels. This will be used only if
>>>          * arch specific code has not populated shared_cpu_map
>>>          */
>>> @@ -225,6 +225,11 @@ static inline bool cache_leaves_are_shared(struct
>>> cacheinfo *this_leaf,
>>>   }
>>>   #endif
>>>   +int __weak cache_setup_acpi(unsigned int cpu)
>>> +{
>>> +       return -ENOTSUPP;
>>> +}
>>> +
>>>   static int cache_shared_cpu_map_setup(unsigned int cpu)
>>>   {
>>>         struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>>> @@ -235,11 +240,11 @@ static int cache_shared_cpu_map_setup(unsigned int
>>> cpu)
>>>         if (this_cpu_ci->cpu_map_populated)
>>>                 return 0;
>>>   -     if (of_have_populated_dt())
>>> +       if (!acpi_disabled)
>>> +               ret = cache_setup_acpi(cpu);
>>
>>
>> Why does acpi go first?  :)
>
>
> This sounds like a joke i heard...
>
> OTOH, given that we have machines with both ACPI and DT tables, it seemed a
> little clearer and a little more robust to code that so that if ACPI is
> enabled to prefer it over DT information. As long as the routines which set
> of of_root are protected by if (acpi_disabled) checks it should be safe to
> do it either way.

I guess adding a comment about that might help.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-01-13  0:59   ` Jeremy Linton
  (?)
@ 2018-01-25 12:15     ` Xiongfeng Wang
  -1 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-01-25 12:15 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Juri Lelli

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[    4.017371] BUG: arch topology borken
[    4.021069] BUG: arch topology borken
[    4.024764] BUG: arch topology borken
[    4.028460] BUG: arch topology borken
[    4.032153] BUG: arch topology borken
[    4.035849] BUG: arch topology borken
[    4.039543] BUG: arch topology borken
[    4.043239] BUG: arch topology borken
[    4.046932] BUG: arch topology borken
[    4.050629] BUG: arch topology borken
[    4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).

When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.

If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
to record which cores are in each cluster.


Thanks,
Xiongfeng

On 2018/1/13 8:59, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by calling find_acpi_cpu_topology_package() which terminates
> its search when it finds an ACPI node flagged as the physical
> package. If the tree doesn't contain enough levels to represent
> all of the requested levels then the root node will be returned
> for all subsequent levels.
> 
> Cc: Juri Lelli <juri.lelli@arm.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 7b06e263fdd1..ce8ec7fd6b32 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -22,6 +23,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/topology.h>
>  #include <linux/slab.h>
> +#include <linux/smp.h>
>  #include <linux/string.h>
>  
>  #include <asm/cpu.h>
> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> +	bool is_threaded;
> +	int cpu, topology_id;
> +
> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +
> +	for_each_possible_cpu(cpu) {
> +		topology_id = find_acpi_cpu_topology(cpu, 0);
> +		if (topology_id < 0)
> +			return topology_id;
> +
> +		if (is_threaded) {
> +			cpu_topology[cpu].thread_id = topology_id;
> +			topology_id = find_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id   = topology_id;
> +			topology_id = find_acpi_cpu_topology_package(cpu);
> +			cpu_topology[cpu].package_id = topology_id;
> +		} else {
> +			cpu_topology[cpu].thread_id  = -1;
> +			cpu_topology[cpu].core_id    = topology_id;
> +			topology_id = find_acpi_cpu_topology_package(cpu);
> +			cpu_topology[cpu].package_id = topology_id;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +#else
> +static inline int __init parse_acpi_topology(void)
> +{
> +	return -EINVAL;
> +}
> +#endif
>  
>  void __init init_cpu_topology(void)
>  {
> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>  	 * Discard anything that was parsed if we hit an error so we
>  	 * don't use partial information.
>  	 */
> -	if (of_have_populated_dt() && parse_dt_topology())
> +	if ((!acpi_disabled) && parse_acpi_topology())
> +		reset_cpu_topology();
> +	else if (of_have_populated_dt() && parse_dt_topology())
>  		reset_cpu_topology();
>  }
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-01-25 12:15     ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-01-25 12:15 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Juri Lelli

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[    4.017371] BUG: arch topology borken
[    4.021069] BUG: arch topology borken
[    4.024764] BUG: arch topology borken
[    4.028460] BUG: arch topology borken
[    4.032153] BUG: arch topology borken
[    4.035849] BUG: arch topology borken
[    4.039543] BUG: arch topology borken
[    4.043239] BUG: arch topology borken
[    4.046932] BUG: arch topology borken
[    4.050629] BUG: arch topology borken
[    4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).

When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.

If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
to record which cores are in each cluster.


Thanks,
Xiongfeng

On 2018/1/13 8:59, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by calling find_acpi_cpu_topology_package() which terminates
> its search when it finds an ACPI node flagged as the physical
> package. If the tree doesn't contain enough levels to represent
> all of the requested levels then the root node will be returned
> for all subsequent levels.
> 
> Cc: Juri Lelli <juri.lelli@arm.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 7b06e263fdd1..ce8ec7fd6b32 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -22,6 +23,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/topology.h>
>  #include <linux/slab.h>
> +#include <linux/smp.h>
>  #include <linux/string.h>
>  
>  #include <asm/cpu.h>
> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> +	bool is_threaded;
> +	int cpu, topology_id;
> +
> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +
> +	for_each_possible_cpu(cpu) {
> +		topology_id = find_acpi_cpu_topology(cpu, 0);
> +		if (topology_id < 0)
> +			return topology_id;
> +
> +		if (is_threaded) {
> +			cpu_topology[cpu].thread_id = topology_id;
> +			topology_id = find_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id   = topology_id;
> +			topology_id = find_acpi_cpu_topology_package(cpu);
> +			cpu_topology[cpu].package_id = topology_id;
> +		} else {
> +			cpu_topology[cpu].thread_id  = -1;
> +			cpu_topology[cpu].core_id    = topology_id;
> +			topology_id = find_acpi_cpu_topology_package(cpu);
> +			cpu_topology[cpu].package_id = topology_id;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +#else
> +static inline int __init parse_acpi_topology(void)
> +{
> +	return -EINVAL;
> +}
> +#endif
>  
>  void __init init_cpu_topology(void)
>  {
> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>  	 * Discard anything that was parsed if we hit an error so we
>  	 * don't use partial information.
>  	 */
> -	if (of_have_populated_dt() && parse_dt_topology())
> +	if ((!acpi_disabled) && parse_acpi_topology())
> +		reset_cpu_topology();
> +	else if (of_have_populated_dt() && parse_dt_topology())
>  		reset_cpu_topology();
>  }
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-01-25 12:15     ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-01-25 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[    4.017371] BUG: arch topology borken
[    4.021069] BUG: arch topology borken
[    4.024764] BUG: arch topology borken
[    4.028460] BUG: arch topology borken
[    4.032153] BUG: arch topology borken
[    4.035849] BUG: arch topology borken
[    4.039543] BUG: arch topology borken
[    4.043239] BUG: arch topology borken
[    4.046932] BUG: arch topology borken
[    4.050629] BUG: arch topology borken
[    4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).

When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.

If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
to record which cores are in each cluster.


Thanks,
Xiongfeng

On 2018/1/13 8:59, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by calling find_acpi_cpu_topology_package() which terminates
> its search when it finds an ACPI node flagged as the physical
> package. If the tree doesn't contain enough levels to represent
> all of the requested levels then the root node will be returned
> for all subsequent levels.
> 
> Cc: Juri Lelli <juri.lelli@arm.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 7b06e263fdd1..ce8ec7fd6b32 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include <linux/acpi.h>
>  #include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -22,6 +23,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/topology.h>
>  #include <linux/slab.h>
> +#include <linux/smp.h>
>  #include <linux/string.h>
>  
>  #include <asm/cpu.h>
> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>  	}
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> +	bool is_threaded;
> +	int cpu, topology_id;
> +
> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +
> +	for_each_possible_cpu(cpu) {
> +		topology_id = find_acpi_cpu_topology(cpu, 0);
> +		if (topology_id < 0)
> +			return topology_id;
> +
> +		if (is_threaded) {
> +			cpu_topology[cpu].thread_id = topology_id;
> +			topology_id = find_acpi_cpu_topology(cpu, 1);
> +			cpu_topology[cpu].core_id   = topology_id;
> +			topology_id = find_acpi_cpu_topology_package(cpu);
> +			cpu_topology[cpu].package_id = topology_id;
> +		} else {
> +			cpu_topology[cpu].thread_id  = -1;
> +			cpu_topology[cpu].core_id    = topology_id;
> +			topology_id = find_acpi_cpu_topology_package(cpu);
> +			cpu_topology[cpu].package_id = topology_id;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +#else
> +static inline int __init parse_acpi_topology(void)
> +{
> +	return -EINVAL;
> +}
> +#endif
>  
>  void __init init_cpu_topology(void)
>  {
> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>  	 * Discard anything that was parsed if we hit an error so we
>  	 * don't use partial information.
>  	 */
> -	if (of_have_populated_dt() && parse_dt_topology())
> +	if ((!acpi_disabled) && parse_acpi_topology())
> +		reset_cpu_topology();
> +	else if (of_have_populated_dt() && parse_dt_topology())
>  		reset_cpu_topology();
>  }
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-01-25 12:15     ` Xiongfeng Wang
@ 2018-01-25 15:56       ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-25 15:56 UTC (permalink / raw)
  To: Xiongfeng Wang, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Juri Lelli

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> I have tested the patch with the newest UEFI. It prints the below error:
> 
> [    4.017371] BUG: arch topology borken
> [    4.021069] BUG: arch topology borken
> [    4.024764] BUG: arch topology borken
> [    4.028460] BUG: arch topology borken
> [    4.032153] BUG: arch topology borken
> [    4.035849] BUG: arch topology borken
> [    4.039543] BUG: arch topology borken
> [    4.043239] BUG: arch topology borken
> [    4.046932] BUG: arch topology borken
> [    4.050629] BUG: arch topology borken
> [    4.054322] BUG: arch topology borken
> 
> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).

I commented about that on the EDK2 mailing list. While the current spec 
doesn't explicitly ban having the flag set multiple times between the 
leaf and the root I consider it a "bug" and there is an effort to 
clarify the spec and the use of that flag.
> 
> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.

Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is 
detected/located within the PPTT topology (a node with a 1:1 
correspondence isn't even required). As you can see from this patch set, 
we are making the general assumption that the proximity domains are at 
the same level as the physical socket. This isn't ideal for NUMA 
topologies, like the D05, that don't align with the physical socket.

There are efforts underway to clarify and expand upon the specification 
to deal with this general problem. The simple solution is another flag 
(say PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could 
be used to find nodes with 1:1 correspondence. At that point we could 
add a fairly trivial patch to correct just the scheduler topology 
without affecting the rest of the system topology code.

> 
> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
> to record which cores are in each cluster.

The problem is that there isn't a generic way to identify which level of 
cache sharing is the "correct" top layer MC domain. For one system 
cluster might be appropriate, for another it might be the highest 
caching level within a socket, for another is might be a something in 
between or a group of clusters or LLCs..

Hence the effort to standardize/guarantee a PPTT node that exactly 
matches a SRAT domain. With that, each SOC/system provider has clearly 
defined method for communicating where they want the proximity domain 
information to begin.

Thanks,

> 
> 
> Thanks,
> Xiongfeng
> 
> On 2018/1/13 8:59, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by calling find_acpi_cpu_topology_package() which terminates
>> its search when it finds an ACPI node flagged as the physical
>> package. If the tree doesn't contain enough levels to represent
>> all of the requested levels then the root node will be returned
>> for all subsequent levels.
>>
>> Cc: Juri Lelli <juri.lelli@arm.com>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -11,6 +11,7 @@
>>    * for more details.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/arch_topology.h>
>>   #include <linux/cpu.h>
>>   #include <linux/cpumask.h>
>> @@ -22,6 +23,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/sched/topology.h>
>>   #include <linux/slab.h>
>> +#include <linux/smp.h>
>>   #include <linux/string.h>
>>   
>>   #include <asm/cpu.h>
>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Propagate the topology information of the processor_topology_node tree to the
>> + * cpu_topology array.
>> + */
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	bool is_threaded;
>> +	int cpu, topology_id;
>> +
>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>> +		if (topology_id < 0)
>> +			return topology_id;
>> +
>> +		if (is_threaded) {
>> +			cpu_topology[cpu].thread_id = topology_id;
>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id   = topology_id;
>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else {
>> +			cpu_topology[cpu].thread_id  = -1;
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +#else
>> +static inline int __init parse_acpi_topology(void)
>> +{
>> +	return -EINVAL;
>> +}
>> +#endif
>>   
>>   void __init init_cpu_topology(void)
>>   {
>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>   	 * Discard anything that was parsed if we hit an error so we
>>   	 * don't use partial information.
>>   	 */
>> -	if (of_have_populated_dt() && parse_dt_topology())
>> +	if ((!acpi_disabled) && parse_acpi_topology())
>> +		reset_cpu_topology();
>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>   		reset_cpu_topology();
>>   }
>>
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-01-25 15:56       ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-01-25 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> Hi Jeremy,
> 
> I have tested the patch with the newest UEFI. It prints the below error:
> 
> [    4.017371] BUG: arch topology borken
> [    4.021069] BUG: arch topology borken
> [    4.024764] BUG: arch topology borken
> [    4.028460] BUG: arch topology borken
> [    4.032153] BUG: arch topology borken
> [    4.035849] BUG: arch topology borken
> [    4.039543] BUG: arch topology borken
> [    4.043239] BUG: arch topology borken
> [    4.046932] BUG: arch topology borken
> [    4.050629] BUG: arch topology borken
> [    4.054322] BUG: arch topology borken
> 
> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).

I commented about that on the EDK2 mailing list. While the current spec 
doesn't explicitly ban having the flag set multiple times between the 
leaf and the root I consider it a "bug" and there is an effort to 
clarify the spec and the use of that flag.
> 
> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.

Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is 
detected/located within the PPTT topology (a node with a 1:1 
correspondence isn't even required). As you can see from this patch set, 
we are making the general assumption that the proximity domains are at 
the same level as the physical socket. This isn't ideal for NUMA 
topologies, like the D05, that don't align with the physical socket.

There are efforts underway to clarify and expand upon the specification 
to deal with this general problem. The simple solution is another flag 
(say PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could 
be used to find nodes with 1:1 correspondence. At that point we could 
add a fairly trivial patch to correct just the scheduler topology 
without affecting the rest of the system topology code.

> 
> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
> to record which cores are in each cluster.

The problem is that there isn't a generic way to identify which level of 
cache sharing is the "correct" top layer MC domain. For one system 
cluster might be appropriate, for another it might be the highest 
caching level within a socket, for another is might be a something in 
between or a group of clusters or LLCs..

Hence the effort to standardize/guarantee a PPTT node that exactly 
matches a SRAT domain. With that, each SOC/system provider has clearly 
defined method for communicating where they want the proximity domain 
information to begin.

Thanks,

> 
> 
> Thanks,
> Xiongfeng
> 
> On 2018/1/13 8:59, Jeremy Linton wrote:
>> Propagate the topology information from the PPTT tree to the
>> cpu_topology array. We can get the thread id, core_id and
>> cluster_id by assuming certain levels of the PPTT tree correspond
>> to those concepts. The package_id is flagged in the tree and can be
>> found by calling find_acpi_cpu_topology_package() which terminates
>> its search when it finds an ACPI node flagged as the physical
>> package. If the tree doesn't contain enough levels to represent
>> all of the requested levels then the root node will be returned
>> for all subsequent levels.
>>
>> Cc: Juri Lelli <juri.lelli@arm.com>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -11,6 +11,7 @@
>>    * for more details.
>>    */
>>   
>> +#include <linux/acpi.h>
>>   #include <linux/arch_topology.h>
>>   #include <linux/cpu.h>
>>   #include <linux/cpumask.h>
>> @@ -22,6 +23,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/sched/topology.h>
>>   #include <linux/slab.h>
>> +#include <linux/smp.h>
>>   #include <linux/string.h>
>>   
>>   #include <asm/cpu.h>
>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Propagate the topology information of the processor_topology_node tree to the
>> + * cpu_topology array.
>> + */
>> +static int __init parse_acpi_topology(void)
>> +{
>> +	bool is_threaded;
>> +	int cpu, topology_id;
>> +
>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>> +		if (topology_id < 0)
>> +			return topology_id;
>> +
>> +		if (is_threaded) {
>> +			cpu_topology[cpu].thread_id = topology_id;
>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>> +			cpu_topology[cpu].core_id   = topology_id;
>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		} else {
>> +			cpu_topology[cpu].thread_id  = -1;
>> +			cpu_topology[cpu].core_id    = topology_id;
>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>> +			cpu_topology[cpu].package_id = topology_id;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +#else
>> +static inline int __init parse_acpi_topology(void)
>> +{
>> +	return -EINVAL;
>> +}
>> +#endif
>>   
>>   void __init init_cpu_topology(void)
>>   {
>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>   	 * Discard anything that was parsed if we hit an error so we
>>   	 * don't use partial information.
>>   	 */
>> -	if (of_have_populated_dt() && parse_dt_topology())
>> +	if ((!acpi_disabled) && parse_acpi_topology())
>> +		reset_cpu_topology();
>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>   		reset_cpu_topology();
>>   }
>>
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-01-25 15:56       ` Jeremy Linton
  (?)
@ 2018-01-26  4:21         ` Xiongfeng Wang
  -1 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-01-26  4:21 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Juri Lelli


Hi Jeremy,
On 2018/1/25 23:56, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> I have tested the patch with the newest UEFI. It prints the below error:
>>
>> [    4.017371] BUG: arch topology borken
>> [    4.021069] BUG: arch topology borken
>> [    4.024764] BUG: arch topology borken
>> [    4.028460] BUG: arch topology borken
>> [    4.032153] BUG: arch topology borken
>> [    4.035849] BUG: arch topology borken
>> [    4.039543] BUG: arch topology borken
>> [    4.043239] BUG: arch topology borken
>> [    4.046932] BUG: arch topology borken
>> [    4.050629] BUG: arch topology borken
>> [    4.054322] BUG: arch topology borken
>>
>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec doesn't explicitly ban having the flag set multiple times between the leaf and the root I consider it a "bug" and there is an effort to clarify the spec and the use of that flag.
>>
>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is detected/located within the PPTT topology (a node with a 1:1 correspondence isn't even required). As you can see from this patch set, we are making the general assumption that the proximity domains are at the same level as the physical socket. This isn't ideal for NUMA topologies, like the D05, that don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to deal with this general problem. The simple solution is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to find nodes with 1:1 correspondence. At that point we could add a fairly trivial patch to correct just the scheduler topology without affecting the rest of the system topology code.
> 
>>
>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>> to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of cache sharing is the "correct" top layer MC domain. For one system cluster might be appropriate, for another it might be the highest caching level within a socket, for another is might be a something in between or a group of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a SRAT domain. With that, each SOC/system provider has clearly defined method for communicating where they want the proximity domain information to begin.
> 
Or maybe we can add a multi-core flag in PPTT to indicate in which layer of the topology tree each node represent a multi-core.
And also require that the layer of multi-core be below the layer of NUMA. So we don't need the  PPTT_PROXIMITY_DOMAIN flag either.
I think it's reasonable for PPTT to report muli-core topology information.

Thanks,
Xiongfeng

> Thanks,
> 
>>
>>
>> Thanks,
>> Xiongfeng
>>
>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>> Propagate the topology information from the PPTT tree to the
>>> cpu_topology array. We can get the thread id, core_id and
>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>> to those concepts. The package_id is flagged in the tree and can be
>>> found by calling find_acpi_cpu_topology_package() which terminates
>>> its search when it finds an ACPI node flagged as the physical
>>> package. If the tree doesn't contain enough levels to represent
>>> all of the requested levels then the root node will be returned
>>> for all subsequent levels.
>>>
>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -11,6 +11,7 @@
>>>    * for more details.
>>>    */
>>>   +#include <linux/acpi.h>
>>>   #include <linux/arch_topology.h>
>>>   #include <linux/cpu.h>
>>>   #include <linux/cpumask.h>
>>> @@ -22,6 +23,7 @@
>>>   #include <linux/sched.h>
>>>   #include <linux/sched/topology.h>
>>>   #include <linux/slab.h>
>>> +#include <linux/smp.h>
>>>   #include <linux/string.h>
>>>     #include <asm/cpu.h>
>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>       }
>>>   }
>>>   +#ifdef CONFIG_ACPI
>>> +/*
>>> + * Propagate the topology information of the processor_topology_node tree to the
>>> + * cpu_topology array.
>>> + */
>>> +static int __init parse_acpi_topology(void)
>>> +{
>>> +    bool is_threaded;
>>> +    int cpu, topology_id;
>>> +
>>> +    is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        topology_id = find_acpi_cpu_topology(cpu, 0);
>>> +        if (topology_id < 0)
>>> +            return topology_id;
>>> +
>>> +        if (is_threaded) {
>>> +            cpu_topology[cpu].thread_id = topology_id;
>>> +            topology_id = find_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id   = topology_id;
>>> +            topology_id = find_acpi_cpu_topology_package(cpu);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        } else {
>>> +            cpu_topology[cpu].thread_id  = -1;
>>> +            cpu_topology[cpu].core_id    = topology_id;
>>> +            topology_id = find_acpi_cpu_topology_package(cpu);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        }
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +#else
>>> +static inline int __init parse_acpi_topology(void)
>>> +{
>>> +    return -EINVAL;
>>> +}
>>> +#endif
>>>     void __init init_cpu_topology(void)
>>>   {
>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>        * Discard anything that was parsed if we hit an error so we
>>>        * don't use partial information.
>>>        */
>>> -    if (of_have_populated_dt() && parse_dt_topology())
>>> +    if ((!acpi_disabled) && parse_acpi_topology())
>>> +        reset_cpu_topology();
>>> +    else if (of_have_populated_dt() && parse_dt_topology())
>>>           reset_cpu_topology();
>>>   }
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-01-26  4:21         ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-01-26  4:21 UTC (permalink / raw)
  To: Jeremy Linton, linux-acpi
  Cc: linux-arm-kernel, sudeep.holla, hanjun.guo, lorenzo.pieralisi,
	rjw, will.deacon, catalin.marinas, gregkh, viresh.kumar,
	mark.rutland, linux-kernel, linux-pm, jhugo, Jonathan.Zhang,
	ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Juri Lelli


Hi Jeremy,
On 2018/1/25 23:56, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> I have tested the patch with the newest UEFI. It prints the below error:
>>
>> [    4.017371] BUG: arch topology borken
>> [    4.021069] BUG: arch topology borken
>> [    4.024764] BUG: arch topology borken
>> [    4.028460] BUG: arch topology borken
>> [    4.032153] BUG: arch topology borken
>> [    4.035849] BUG: arch topology borken
>> [    4.039543] BUG: arch topology borken
>> [    4.043239] BUG: arch topology borken
>> [    4.046932] BUG: arch topology borken
>> [    4.050629] BUG: arch topology borken
>> [    4.054322] BUG: arch topology borken
>>
>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec doesn't explicitly ban having the flag set multiple times between the leaf and the root I consider it a "bug" and there is an effort to clarify the spec and the use of that flag.
>>
>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is detected/located within the PPTT topology (a node with a 1:1 correspondence isn't even required). As you can see from this patch set, we are making the general assumption that the proximity domains are at the same level as the physical socket. This isn't ideal for NUMA topologies, like the D05, that don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to deal with this general problem. The simple solution is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to find nodes with 1:1 correspondence. At that point we could add a fairly trivial patch to correct just the scheduler topology without affecting the rest of the system topology code.
> 
>>
>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>> to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of cache sharing is the "correct" top layer MC domain. For one system cluster might be appropriate, for another it might be the highest caching level within a socket, for another is might be a something in between or a group of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a SRAT domain. With that, each SOC/system provider has clearly defined method for communicating where they want the proximity domain information to begin.
> 
Or maybe we can add a multi-core flag in PPTT to indicate in which layer of the topology tree each node represent a multi-core.
And also require that the layer of multi-core be below the layer of NUMA. So we don't need the  PPTT_PROXIMITY_DOMAIN flag either.
I think it's reasonable for PPTT to report muli-core topology information.

Thanks,
Xiongfeng

> Thanks,
> 
>>
>>
>> Thanks,
>> Xiongfeng
>>
>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>> Propagate the topology information from the PPTT tree to the
>>> cpu_topology array. We can get the thread id, core_id and
>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>> to those concepts. The package_id is flagged in the tree and can be
>>> found by calling find_acpi_cpu_topology_package() which terminates
>>> its search when it finds an ACPI node flagged as the physical
>>> package. If the tree doesn't contain enough levels to represent
>>> all of the requested levels then the root node will be returned
>>> for all subsequent levels.
>>>
>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -11,6 +11,7 @@
>>>    * for more details.
>>>    */
>>>   +#include <linux/acpi.h>
>>>   #include <linux/arch_topology.h>
>>>   #include <linux/cpu.h>
>>>   #include <linux/cpumask.h>
>>> @@ -22,6 +23,7 @@
>>>   #include <linux/sched.h>
>>>   #include <linux/sched/topology.h>
>>>   #include <linux/slab.h>
>>> +#include <linux/smp.h>
>>>   #include <linux/string.h>
>>>     #include <asm/cpu.h>
>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>       }
>>>   }
>>>   +#ifdef CONFIG_ACPI
>>> +/*
>>> + * Propagate the topology information of the processor_topology_node tree to the
>>> + * cpu_topology array.
>>> + */
>>> +static int __init parse_acpi_topology(void)
>>> +{
>>> +    bool is_threaded;
>>> +    int cpu, topology_id;
>>> +
>>> +    is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        topology_id = find_acpi_cpu_topology(cpu, 0);
>>> +        if (topology_id < 0)
>>> +            return topology_id;
>>> +
>>> +        if (is_threaded) {
>>> +            cpu_topology[cpu].thread_id = topology_id;
>>> +            topology_id = find_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id   = topology_id;
>>> +            topology_id = find_acpi_cpu_topology_package(cpu);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        } else {
>>> +            cpu_topology[cpu].thread_id  = -1;
>>> +            cpu_topology[cpu].core_id    = topology_id;
>>> +            topology_id = find_acpi_cpu_topology_package(cpu);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        }
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +#else
>>> +static inline int __init parse_acpi_topology(void)
>>> +{
>>> +    return -EINVAL;
>>> +}
>>> +#endif
>>>     void __init init_cpu_topology(void)
>>>   {
>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>        * Discard anything that was parsed if we hit an error so we
>>>        * don't use partial information.
>>>        */
>>> -    if (of_have_populated_dt() && parse_dt_topology())
>>> +    if ((!acpi_disabled) && parse_acpi_topology())
>>> +        reset_cpu_topology();
>>> +    else if (of_have_populated_dt() && parse_dt_topology())
>>>           reset_cpu_topology();
>>>   }
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-01-26  4:21         ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-01-26  4:21 UTC (permalink / raw)
  To: linux-arm-kernel


Hi Jeremy,
On 2018/1/25 23:56, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> I have tested the patch with the newest UEFI. It prints the below error:
>>
>> [    4.017371] BUG: arch topology borken
>> [    4.021069] BUG: arch topology borken
>> [    4.024764] BUG: arch topology borken
>> [    4.028460] BUG: arch topology borken
>> [    4.032153] BUG: arch topology borken
>> [    4.035849] BUG: arch topology borken
>> [    4.039543] BUG: arch topology borken
>> [    4.043239] BUG: arch topology borken
>> [    4.046932] BUG: arch topology borken
>> [    4.050629] BUG: arch topology borken
>> [    4.054322] BUG: arch topology borken
>>
>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec doesn't explicitly ban having the flag set multiple times between the leaf and the root I consider it a "bug" and there is an effort to clarify the spec and the use of that flag.
>>
>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is detected/located within the PPTT topology (a node with a 1:1 correspondence isn't even required). As you can see from this patch set, we are making the general assumption that the proximity domains are at the same level as the physical socket. This isn't ideal for NUMA topologies, like the D05, that don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to deal with this general problem. The simple solution is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to find nodes with 1:1 correspondence. At that point we could add a fairly trivial patch to correct just the scheduler topology without affecting the rest of the system topology code.
> 
>>
>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>> to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of cache sharing is the "correct" top layer MC domain. For one system cluster might be appropriate, for another it might be the highest caching level within a socket, for another is might be a something in between or a group of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a SRAT domain. With that, each SOC/system provider has clearly defined method for communicating where they want the proximity domain information to begin.
> 
Or maybe we can add a multi-core flag in PPTT to indicate in which layer of the topology tree each node represent a multi-core.
And also require that the layer of multi-core be below the layer of NUMA. So we don't need the  PPTT_PROXIMITY_DOMAIN flag either.
I think it's reasonable for PPTT to report muli-core topology information.

Thanks,
Xiongfeng

> Thanks,
> 
>>
>>
>> Thanks,
>> Xiongfeng
>>
>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>> Propagate the topology information from the PPTT tree to the
>>> cpu_topology array. We can get the thread id, core_id and
>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>> to those concepts. The package_id is flagged in the tree and can be
>>> found by calling find_acpi_cpu_topology_package() which terminates
>>> its search when it finds an ACPI node flagged as the physical
>>> package. If the tree doesn't contain enough levels to represent
>>> all of the requested levels then the root node will be returned
>>> for all subsequent levels.
>>>
>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>> ---
>>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -11,6 +11,7 @@
>>>    * for more details.
>>>    */
>>>   +#include <linux/acpi.h>
>>>   #include <linux/arch_topology.h>
>>>   #include <linux/cpu.h>
>>>   #include <linux/cpumask.h>
>>> @@ -22,6 +23,7 @@
>>>   #include <linux/sched.h>
>>>   #include <linux/sched/topology.h>
>>>   #include <linux/slab.h>
>>> +#include <linux/smp.h>
>>>   #include <linux/string.h>
>>>     #include <asm/cpu.h>
>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>       }
>>>   }
>>>   +#ifdef CONFIG_ACPI
>>> +/*
>>> + * Propagate the topology information of the processor_topology_node tree to the
>>> + * cpu_topology array.
>>> + */
>>> +static int __init parse_acpi_topology(void)
>>> +{
>>> +    bool is_threaded;
>>> +    int cpu, topology_id;
>>> +
>>> +    is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        topology_id = find_acpi_cpu_topology(cpu, 0);
>>> +        if (topology_id < 0)
>>> +            return topology_id;
>>> +
>>> +        if (is_threaded) {
>>> +            cpu_topology[cpu].thread_id = topology_id;
>>> +            topology_id = find_acpi_cpu_topology(cpu, 1);
>>> +            cpu_topology[cpu].core_id   = topology_id;
>>> +            topology_id = find_acpi_cpu_topology_package(cpu);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        } else {
>>> +            cpu_topology[cpu].thread_id  = -1;
>>> +            cpu_topology[cpu].core_id    = topology_id;
>>> +            topology_id = find_acpi_cpu_topology_package(cpu);
>>> +            cpu_topology[cpu].package_id = topology_id;
>>> +        }
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +#else
>>> +static inline int __init parse_acpi_topology(void)
>>> +{
>>> +    return -EINVAL;
>>> +}
>>> +#endif
>>>     void __init init_cpu_topology(void)
>>>   {
>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>        * Discard anything that was parsed if we hit an error so we
>>>        * don't use partial information.
>>>        */
>>> -    if (of_have_populated_dt() && parse_dt_topology())
>>> +    if ((!acpi_disabled) && parse_acpi_topology())
>>> +        reset_cpu_topology();
>>> +    else if (of_have_populated_dt() && parse_dt_topology())
>>>           reset_cpu_topology();
>>>   }
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-01-25 15:56       ` Jeremy Linton
  (?)
@ 2018-02-23 11:02         ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2018-02-23 11:02 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: mark.rutland, Jonathan.Zhang, austinwc, catalin.marinas,
	will.deacon, morten.rasmussen, vkilari, Jayachandran.Nair,
	Juri Lelli, jhugo, Xiongfeng Wang, linux-acpi, viresh.kumar,
	lenb, linux-pm, ahs3, linux-arm-kernel, gregkh, rjw,
	linux-kernel, hanjun.guo, sudeep.holla

On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >Hi Jeremy,
> >
> >I have tested the patch with the newest UEFI. It prints the below error:
> >
> >[    4.017371] BUG: arch topology borken
> >[    4.021069] BUG: arch topology borken
> >[    4.024764] BUG: arch topology borken
> >[    4.028460] BUG: arch topology borken
> >[    4.032153] BUG: arch topology borken
> >[    4.035849] BUG: arch topology borken
> >[    4.039543] BUG: arch topology borken
> >[    4.043239] BUG: arch topology borken
> >[    4.046932] BUG: arch topology borken
> >[    4.050629] BUG: arch topology borken
> >[    4.054322] BUG: arch topology borken
> >
> >I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> >the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec
> doesn't explicitly ban having the flag set multiple times between the leaf
> and the root I consider it a "bug" and there is an effort to clarify the
> spec and the use of that flag.
> >
> >When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> >and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> >sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is
> detected/located within the PPTT topology (a node with a 1:1 correspondence
> isn't even required). As you can see from this patch set, we are making the
> general assumption that the proximity domains are at the same level as the
> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to
> deal with this general problem. The simple solution is another flag (say
> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> find nodes with 1:1 correspondence. At that point we could add a fairly
> trivial patch to correct just the scheduler topology without affecting the
> rest of the system topology code.

I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?

Put it differently: do we really need to rely on another PPTT flag to
collect this information ?

I can't merge code that breaks a platform with legitimate firmware
bindings.

Thanks,
Lorenzo

> 
> >
> >If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
> >within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
> >the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> >If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
> >to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of
> cache sharing is the "correct" top layer MC domain. For one system cluster
> might be appropriate, for another it might be the highest caching level
> within a socket, for another is might be a something in between or a group
> of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
> SRAT domain. With that, each SOC/system provider has clearly defined method
> for communicating where they want the proximity domain information to begin.
> 
> Thanks,
> 
> >
> >
> >Thanks,
> >Xiongfeng
> >
> >On 2018/1/13 8:59, Jeremy Linton wrote:
> >>Propagate the topology information from the PPTT tree to the
> >>cpu_topology array. We can get the thread id, core_id and
> >>cluster_id by assuming certain levels of the PPTT tree correspond
> >>to those concepts. The package_id is flagged in the tree and can be
> >>found by calling find_acpi_cpu_topology_package() which terminates
> >>its search when it finds an ACPI node flagged as the physical
> >>package. If the tree doesn't contain enough levels to represent
> >>all of the requested levels then the root node will be returned
> >>for all subsequent levels.
> >>
> >>Cc: Juri Lelli <juri.lelli@arm.com>
> >>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>---
> >>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 45 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> >>index 7b06e263fdd1..ce8ec7fd6b32 100644
> >>--- a/arch/arm64/kernel/topology.c
> >>+++ b/arch/arm64/kernel/topology.c
> >>@@ -11,6 +11,7 @@
> >>   * for more details.
> >>   */
> >>+#include <linux/acpi.h>
> >>  #include <linux/arch_topology.h>
> >>  #include <linux/cpu.h>
> >>  #include <linux/cpumask.h>
> >>@@ -22,6 +23,7 @@
> >>  #include <linux/sched.h>
> >>  #include <linux/sched/topology.h>
> >>  #include <linux/slab.h>
> >>+#include <linux/smp.h>
> >>  #include <linux/string.h>
> >>  #include <asm/cpu.h>
> >>@@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
> >>  	}
> >>  }
> >>+#ifdef CONFIG_ACPI
> >>+/*
> >>+ * Propagate the topology information of the processor_topology_node tree to the
> >>+ * cpu_topology array.
> >>+ */
> >>+static int __init parse_acpi_topology(void)
> >>+{
> >>+	bool is_threaded;
> >>+	int cpu, topology_id;
> >>+
> >>+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> >>+
> >>+	for_each_possible_cpu(cpu) {
> >>+		topology_id = find_acpi_cpu_topology(cpu, 0);
> >>+		if (topology_id < 0)
> >>+			return topology_id;
> >>+
> >>+		if (is_threaded) {
> >>+			cpu_topology[cpu].thread_id = topology_id;
> >>+			topology_id = find_acpi_cpu_topology(cpu, 1);
> >>+			cpu_topology[cpu].core_id   = topology_id;
> >>+			topology_id = find_acpi_cpu_topology_package(cpu);
> >>+			cpu_topology[cpu].package_id = topology_id;
> >>+		} else {
> >>+			cpu_topology[cpu].thread_id  = -1;
> >>+			cpu_topology[cpu].core_id    = topology_id;
> >>+			topology_id = find_acpi_cpu_topology_package(cpu);
> >>+			cpu_topology[cpu].package_id = topology_id;
> >>+		}
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+#else
> >>+static inline int __init parse_acpi_topology(void)
> >>+{
> >>+	return -EINVAL;
> >>+}
> >>+#endif
> >>  void __init init_cpu_topology(void)
> >>  {
> >>@@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
> >>  	 * Discard anything that was parsed if we hit an error so we
> >>  	 * don't use partial information.
> >>  	 */
> >>-	if (of_have_populated_dt() && parse_dt_topology())
> >>+	if ((!acpi_disabled) && parse_acpi_topology())
> >>+		reset_cpu_topology();
> >>+	else if (of_have_populated_dt() && parse_dt_topology())
> >>  		reset_cpu_topology();
> >>  }
> >>
> >
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-23 11:02         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2018-02-23 11:02 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Xiongfeng Wang, linux-acpi, linux-arm-kernel, sudeep.holla,
	hanjun.guo, rjw, will.deacon, catalin.marinas, gregkh,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Juri Lelli

On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >Hi Jeremy,
> >
> >I have tested the patch with the newest UEFI. It prints the below error:
> >
> >[    4.017371] BUG: arch topology borken
> >[    4.021069] BUG: arch topology borken
> >[    4.024764] BUG: arch topology borken
> >[    4.028460] BUG: arch topology borken
> >[    4.032153] BUG: arch topology borken
> >[    4.035849] BUG: arch topology borken
> >[    4.039543] BUG: arch topology borken
> >[    4.043239] BUG: arch topology borken
> >[    4.046932] BUG: arch topology borken
> >[    4.050629] BUG: arch topology borken
> >[    4.054322] BUG: arch topology borken
> >
> >I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> >the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec
> doesn't explicitly ban having the flag set multiple times between the leaf
> and the root I consider it a "bug" and there is an effort to clarify the
> spec and the use of that flag.
> >
> >When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> >and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> >sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is
> detected/located within the PPTT topology (a node with a 1:1 correspondence
> isn't even required). As you can see from this patch set, we are making the
> general assumption that the proximity domains are at the same level as the
> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to
> deal with this general problem. The simple solution is another flag (say
> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> find nodes with 1:1 correspondence. At that point we could add a fairly
> trivial patch to correct just the scheduler topology without affecting the
> rest of the system topology code.

I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?

Put it differently: do we really need to rely on another PPTT flag to
collect this information ?

I can't merge code that breaks a platform with legitimate firmware
bindings.

Thanks,
Lorenzo

> 
> >
> >If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
> >within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
> >the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> >If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
> >to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of
> cache sharing is the "correct" top layer MC domain. For one system cluster
> might be appropriate, for another it might be the highest caching level
> within a socket, for another is might be a something in between or a group
> of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
> SRAT domain. With that, each SOC/system provider has clearly defined method
> for communicating where they want the proximity domain information to begin.
> 
> Thanks,
> 
> >
> >
> >Thanks,
> >Xiongfeng
> >
> >On 2018/1/13 8:59, Jeremy Linton wrote:
> >>Propagate the topology information from the PPTT tree to the
> >>cpu_topology array. We can get the thread id, core_id and
> >>cluster_id by assuming certain levels of the PPTT tree correspond
> >>to those concepts. The package_id is flagged in the tree and can be
> >>found by calling find_acpi_cpu_topology_package() which terminates
> >>its search when it finds an ACPI node flagged as the physical
> >>package. If the tree doesn't contain enough levels to represent
> >>all of the requested levels then the root node will be returned
> >>for all subsequent levels.
> >>
> >>Cc: Juri Lelli <juri.lelli@arm.com>
> >>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>---
> >>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 45 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> >>index 7b06e263fdd1..ce8ec7fd6b32 100644
> >>--- a/arch/arm64/kernel/topology.c
> >>+++ b/arch/arm64/kernel/topology.c
> >>@@ -11,6 +11,7 @@
> >>   * for more details.
> >>   */
> >>+#include <linux/acpi.h>
> >>  #include <linux/arch_topology.h>
> >>  #include <linux/cpu.h>
> >>  #include <linux/cpumask.h>
> >>@@ -22,6 +23,7 @@
> >>  #include <linux/sched.h>
> >>  #include <linux/sched/topology.h>
> >>  #include <linux/slab.h>
> >>+#include <linux/smp.h>
> >>  #include <linux/string.h>
> >>  #include <asm/cpu.h>
> >>@@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
> >>  	}
> >>  }
> >>+#ifdef CONFIG_ACPI
> >>+/*
> >>+ * Propagate the topology information of the processor_topology_node tree to the
> >>+ * cpu_topology array.
> >>+ */
> >>+static int __init parse_acpi_topology(void)
> >>+{
> >>+	bool is_threaded;
> >>+	int cpu, topology_id;
> >>+
> >>+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> >>+
> >>+	for_each_possible_cpu(cpu) {
> >>+		topology_id = find_acpi_cpu_topology(cpu, 0);
> >>+		if (topology_id < 0)
> >>+			return topology_id;
> >>+
> >>+		if (is_threaded) {
> >>+			cpu_topology[cpu].thread_id = topology_id;
> >>+			topology_id = find_acpi_cpu_topology(cpu, 1);
> >>+			cpu_topology[cpu].core_id   = topology_id;
> >>+			topology_id = find_acpi_cpu_topology_package(cpu);
> >>+			cpu_topology[cpu].package_id = topology_id;
> >>+		} else {
> >>+			cpu_topology[cpu].thread_id  = -1;
> >>+			cpu_topology[cpu].core_id    = topology_id;
> >>+			topology_id = find_acpi_cpu_topology_package(cpu);
> >>+			cpu_topology[cpu].package_id = topology_id;
> >>+		}
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+#else
> >>+static inline int __init parse_acpi_topology(void)
> >>+{
> >>+	return -EINVAL;
> >>+}
> >>+#endif
> >>  void __init init_cpu_topology(void)
> >>  {
> >>@@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
> >>  	 * Discard anything that was parsed if we hit an error so we
> >>  	 * don't use partial information.
> >>  	 */
> >>-	if (of_have_populated_dt() && parse_dt_topology())
> >>+	if ((!acpi_disabled) && parse_acpi_topology())
> >>+		reset_cpu_topology();
> >>+	else if (of_have_populated_dt() && parse_dt_topology())
> >>  		reset_cpu_topology();
> >>  }
> >>
> >
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-23 11:02         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 104+ messages in thread
From: Lorenzo Pieralisi @ 2018-02-23 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >Hi Jeremy,
> >
> >I have tested the patch with the newest UEFI. It prints the below error:
> >
> >[    4.017371] BUG: arch topology borken
> >[    4.021069] BUG: arch topology borken
> >[    4.024764] BUG: arch topology borken
> >[    4.028460] BUG: arch topology borken
> >[    4.032153] BUG: arch topology borken
> >[    4.035849] BUG: arch topology borken
> >[    4.039543] BUG: arch topology borken
> >[    4.043239] BUG: arch topology borken
> >[    4.046932] BUG: arch topology borken
> >[    4.050629] BUG: arch topology borken
> >[    4.054322] BUG: arch topology borken
> >
> >I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> >the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec
> doesn't explicitly ban having the flag set multiple times between the leaf
> and the root I consider it a "bug" and there is an effort to clarify the
> spec and the use of that flag.
> >
> >When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> >and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> >sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is
> detected/located within the PPTT topology (a node with a 1:1 correspondence
> isn't even required). As you can see from this patch set, we are making the
> general assumption that the proximity domains are at the same level as the
> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to
> deal with this general problem. The simple solution is another flag (say
> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> find nodes with 1:1 correspondence. At that point we could add a fairly
> trivial patch to correct just the scheduler topology without affecting the
> rest of the system topology code.

I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?

Put it differently: do we really need to rely on another PPTT flag to
collect this information ?

I can't merge code that breaks a platform with legitimate firmware
bindings.

Thanks,
Lorenzo

> 
> >
> >If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
> >within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
> >the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> >If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
> >to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of
> cache sharing is the "correct" top layer MC domain. For one system cluster
> might be appropriate, for another it might be the highest caching level
> within a socket, for another is might be a something in between or a group
> of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
> SRAT domain. With that, each SOC/system provider has clearly defined method
> for communicating where they want the proximity domain information to begin.
> 
> Thanks,
> 
> >
> >
> >Thanks,
> >Xiongfeng
> >
> >On 2018/1/13 8:59, Jeremy Linton wrote:
> >>Propagate the topology information from the PPTT tree to the
> >>cpu_topology array. We can get the thread id, core_id and
> >>cluster_id by assuming certain levels of the PPTT tree correspond
> >>to those concepts. The package_id is flagged in the tree and can be
> >>found by calling find_acpi_cpu_topology_package() which terminates
> >>its search when it finds an ACPI node flagged as the physical
> >>package. If the tree doesn't contain enough levels to represent
> >>all of the requested levels then the root node will be returned
> >>for all subsequent levels.
> >>
> >>Cc: Juri Lelli <juri.lelli@arm.com>
> >>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>---
> >>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 45 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> >>index 7b06e263fdd1..ce8ec7fd6b32 100644
> >>--- a/arch/arm64/kernel/topology.c
> >>+++ b/arch/arm64/kernel/topology.c
> >>@@ -11,6 +11,7 @@
> >>   * for more details.
> >>   */
> >>+#include <linux/acpi.h>
> >>  #include <linux/arch_topology.h>
> >>  #include <linux/cpu.h>
> >>  #include <linux/cpumask.h>
> >>@@ -22,6 +23,7 @@
> >>  #include <linux/sched.h>
> >>  #include <linux/sched/topology.h>
> >>  #include <linux/slab.h>
> >>+#include <linux/smp.h>
> >>  #include <linux/string.h>
> >>  #include <asm/cpu.h>
> >>@@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
> >>  	}
> >>  }
> >>+#ifdef CONFIG_ACPI
> >>+/*
> >>+ * Propagate the topology information of the processor_topology_node tree to the
> >>+ * cpu_topology array.
> >>+ */
> >>+static int __init parse_acpi_topology(void)
> >>+{
> >>+	bool is_threaded;
> >>+	int cpu, topology_id;
> >>+
> >>+	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> >>+
> >>+	for_each_possible_cpu(cpu) {
> >>+		topology_id = find_acpi_cpu_topology(cpu, 0);
> >>+		if (topology_id < 0)
> >>+			return topology_id;
> >>+
> >>+		if (is_threaded) {
> >>+			cpu_topology[cpu].thread_id = topology_id;
> >>+			topology_id = find_acpi_cpu_topology(cpu, 1);
> >>+			cpu_topology[cpu].core_id   = topology_id;
> >>+			topology_id = find_acpi_cpu_topology_package(cpu);
> >>+			cpu_topology[cpu].package_id = topology_id;
> >>+		} else {
> >>+			cpu_topology[cpu].thread_id  = -1;
> >>+			cpu_topology[cpu].core_id    = topology_id;
> >>+			topology_id = find_acpi_cpu_topology_package(cpu);
> >>+			cpu_topology[cpu].package_id = topology_id;
> >>+		}
> >>+	}
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+#else
> >>+static inline int __init parse_acpi_topology(void)
> >>+{
> >>+	return -EINVAL;
> >>+}
> >>+#endif
> >>  void __init init_cpu_topology(void)
> >>  {
> >>@@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
> >>  	 * Discard anything that was parsed if we hit an error so we
> >>  	 * don't use partial information.
> >>  	 */
> >>-	if (of_have_populated_dt() && parse_dt_topology())
> >>+	if ((!acpi_disabled) && parse_acpi_topology())
> >>+		reset_cpu_topology();
> >>+	else if (of_have_populated_dt() && parse_dt_topology())
> >>  		reset_cpu_topology();
> >>  }
> >>
> >
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-02-23 11:02         ` Lorenzo Pieralisi
  (?)
@ 2018-02-24  3:05           ` Xiongfeng Wang
  -1 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-02-24  3:05 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jeremy Linton
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair, catalin.marinas,
	Juri Lelli, gregkh, jhugo, rjw, linux-pm, will.deacon,
	linux-kernel, morten.rasmussen, linux-acpi, viresh.kumar,
	hanjun.guo, sudeep.holla, austinwc, vkilari, ahs3,
	linux-arm-kernel, lenb


Hi,
On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [    4.017371] BUG: arch topology borken
>>> [    4.021069] BUG: arch topology borken
>>> [    4.024764] BUG: arch topology borken
>>> [    4.028460] BUG: arch topology borken
>>> [    4.032153] BUG: arch topology borken
>>> [    4.035849] BUG: arch topology borken
>>> [    4.039543] BUG: arch topology borken
>>> [    4.043239] BUG: arch topology borken
>>> [    4.046932] BUG: arch topology borken
>>> [    4.050629] BUG: arch topology borken
>>> [    4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?
> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?
> 
> I can't merge code that breaks a platform with legitimate firmware
> bindings.

I think we really need another PPTT flag, from which we can get information
about how to build a multi-core sched_domain. I think only cache-sharing information
is not enough to get information about how to build a multi-core shced_domain.

How about this? We assume the upper layer of the lowest layer to be multi-core layer.
After that flag is added into ACPI specs, we add another patch to adapt to the change.

Thanks,
Xiongfeng

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by calling find_acpi_cpu_topology_package() which terminates
>>>> its search when it finds an ACPI node flagged as the physical
>>>> package. If the tree doesn't contain enough levels to represent
>>>> all of the requested levels then the root node will be returned
>>>> for all subsequent levels.
>>>>
>>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 45 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -11,6 +11,7 @@
>>>>   * for more details.
>>>>   */
>>>> +#include <linux/acpi.h>
>>>>  #include <linux/arch_topology.h>
>>>>  #include <linux/cpu.h>
>>>>  #include <linux/cpumask.h>
>>>> @@ -22,6 +23,7 @@
>>>>  #include <linux/sched.h>
>>>>  #include <linux/sched/topology.h>
>>>>  #include <linux/slab.h>
>>>> +#include <linux/smp.h>
>>>>  #include <linux/string.h>
>>>>  #include <asm/cpu.h>
>>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>>  	}
>>>>  }
>>>> +#ifdef CONFIG_ACPI
>>>> +/*
>>>> + * Propagate the topology information of the processor_topology_node tree to the
>>>> + * cpu_topology array.
>>>> + */
>>>> +static int __init parse_acpi_topology(void)
>>>> +{
>>>> +	bool is_threaded;
>>>> +	int cpu, topology_id;
>>>> +
>>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>>>> +		if (topology_id < 0)
>>>> +			return topology_id;
>>>> +
>>>> +		if (is_threaded) {
>>>> +			cpu_topology[cpu].thread_id = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		} else {
>>>> +			cpu_topology[cpu].thread_id  = -1;
>>>> +			cpu_topology[cpu].core_id    = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +#else
>>>> +static inline int __init parse_acpi_topology(void)
>>>> +{
>>>> +	return -EINVAL;
>>>> +}
>>>> +#endif
>>>>  void __init init_cpu_topology(void)
>>>>  {
>>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>>  	 * Discard anything that was parsed if we hit an error so we
>>>>  	 * don't use partial information.
>>>>  	 */
>>>> -	if (of_have_populated_dt() && parse_dt_topology())
>>>> +	if ((!acpi_disabled) && parse_acpi_topology())
>>>> +		reset_cpu_topology();
>>>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>>>  		reset_cpu_topology();
>>>>  }
>>>>
>>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-24  3:05           ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-02-24  3:05 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jeremy Linton
  Cc: linux-acpi, linux-arm-kernel, sudeep.holla, hanjun.guo, rjw,
	will.deacon, catalin.marinas, gregkh, viresh.kumar, mark.rutland,
	linux-kernel, linux-pm, jhugo, Jonathan.Zhang, ahs3,
	Jayachandran.Nair, austinwc, lenb, vkilari, morten.rasmussen,
	Juri Lelli


Hi,
On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [    4.017371] BUG: arch topology borken
>>> [    4.021069] BUG: arch topology borken
>>> [    4.024764] BUG: arch topology borken
>>> [    4.028460] BUG: arch topology borken
>>> [    4.032153] BUG: arch topology borken
>>> [    4.035849] BUG: arch topology borken
>>> [    4.039543] BUG: arch topology borken
>>> [    4.043239] BUG: arch topology borken
>>> [    4.046932] BUG: arch topology borken
>>> [    4.050629] BUG: arch topology borken
>>> [    4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?
> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?
> 
> I can't merge code that breaks a platform with legitimate firmware
> bindings.

I think we really need another PPTT flag, from which we can get information
about how to build a multi-core sched_domain. I think only cache-sharing information
is not enough to get information about how to build a multi-core shced_domain.

How about this? We assume the upper layer of the lowest layer to be multi-core layer.
After that flag is added into ACPI specs, we add another patch to adapt to the change.

Thanks,
Xiongfeng

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by calling find_acpi_cpu_topology_package() which terminates
>>>> its search when it finds an ACPI node flagged as the physical
>>>> package. If the tree doesn't contain enough levels to represent
>>>> all of the requested levels then the root node will be returned
>>>> for all subsequent levels.
>>>>
>>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 45 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -11,6 +11,7 @@
>>>>   * for more details.
>>>>   */
>>>> +#include <linux/acpi.h>
>>>>  #include <linux/arch_topology.h>
>>>>  #include <linux/cpu.h>
>>>>  #include <linux/cpumask.h>
>>>> @@ -22,6 +23,7 @@
>>>>  #include <linux/sched.h>
>>>>  #include <linux/sched/topology.h>
>>>>  #include <linux/slab.h>
>>>> +#include <linux/smp.h>
>>>>  #include <linux/string.h>
>>>>  #include <asm/cpu.h>
>>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>>  	}
>>>>  }
>>>> +#ifdef CONFIG_ACPI
>>>> +/*
>>>> + * Propagate the topology information of the processor_topology_node tree to the
>>>> + * cpu_topology array.
>>>> + */
>>>> +static int __init parse_acpi_topology(void)
>>>> +{
>>>> +	bool is_threaded;
>>>> +	int cpu, topology_id;
>>>> +
>>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>>>> +		if (topology_id < 0)
>>>> +			return topology_id;
>>>> +
>>>> +		if (is_threaded) {
>>>> +			cpu_topology[cpu].thread_id = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		} else {
>>>> +			cpu_topology[cpu].thread_id  = -1;
>>>> +			cpu_topology[cpu].core_id    = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +#else
>>>> +static inline int __init parse_acpi_topology(void)
>>>> +{
>>>> +	return -EINVAL;
>>>> +}
>>>> +#endif
>>>>  void __init init_cpu_topology(void)
>>>>  {
>>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>>  	 * Discard anything that was parsed if we hit an error so we
>>>>  	 * don't use partial information.
>>>>  	 */
>>>> -	if (of_have_populated_dt() && parse_dt_topology())
>>>> +	if ((!acpi_disabled) && parse_acpi_topology())
>>>> +		reset_cpu_topology();
>>>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>>>  		reset_cpu_topology();
>>>>  }
>>>>
>>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-24  3:05           ` Xiongfeng Wang
  0 siblings, 0 replies; 104+ messages in thread
From: Xiongfeng Wang @ 2018-02-24  3:05 UTC (permalink / raw)
  To: linux-arm-kernel


Hi,
On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [    4.017371] BUG: arch topology borken
>>> [    4.021069] BUG: arch topology borken
>>> [    4.024764] BUG: arch topology borken
>>> [    4.028460] BUG: arch topology borken
>>> [    4.032153] BUG: arch topology borken
>>> [    4.035849] BUG: arch topology borken
>>> [    4.039543] BUG: arch topology borken
>>> [    4.043239] BUG: arch topology borken
>>> [    4.046932] BUG: arch topology borken
>>> [    4.050629] BUG: arch topology borken
>>> [    4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?
> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?
> 
> I can't merge code that breaks a platform with legitimate firmware
> bindings.

I think we really need another PPTT flag, from which we can get information
about how to build a multi-core sched_domain. I think only cache-sharing information
is not enough to get information about how to build a multi-core shced_domain.

How about this? We assume the upper layer of the lowest layer to be multi-core layer.
After that flag is added into ACPI specs, we add another patch to adapt to the change.

Thanks,
Xiongfeng

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by calling find_acpi_cpu_topology_package() which terminates
>>>> its search when it finds an ACPI node flagged as the physical
>>>> package. If the tree doesn't contain enough levels to represent
>>>> all of the requested levels then the root node will be returned
>>>> for all subsequent levels.
>>>>
>>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>  arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 45 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -11,6 +11,7 @@
>>>>   * for more details.
>>>>   */
>>>> +#include <linux/acpi.h>
>>>>  #include <linux/arch_topology.h>
>>>>  #include <linux/cpu.h>
>>>>  #include <linux/cpumask.h>
>>>> @@ -22,6 +23,7 @@
>>>>  #include <linux/sched.h>
>>>>  #include <linux/sched/topology.h>
>>>>  #include <linux/slab.h>
>>>> +#include <linux/smp.h>
>>>>  #include <linux/string.h>
>>>>  #include <asm/cpu.h>
>>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>>  	}
>>>>  }
>>>> +#ifdef CONFIG_ACPI
>>>> +/*
>>>> + * Propagate the topology information of the processor_topology_node tree to the
>>>> + * cpu_topology array.
>>>> + */
>>>> +static int __init parse_acpi_topology(void)
>>>> +{
>>>> +	bool is_threaded;
>>>> +	int cpu, topology_id;
>>>> +
>>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>>>> +		if (topology_id < 0)
>>>> +			return topology_id;
>>>> +
>>>> +		if (is_threaded) {
>>>> +			cpu_topology[cpu].thread_id = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		} else {
>>>> +			cpu_topology[cpu].thread_id  = -1;
>>>> +			cpu_topology[cpu].core_id    = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +#else
>>>> +static inline int __init parse_acpi_topology(void)
>>>> +{
>>>> +	return -EINVAL;
>>>> +}
>>>> +#endif
>>>>  void __init init_cpu_topology(void)
>>>>  {
>>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>>  	 * Discard anything that was parsed if we hit an error so we
>>>>  	 * don't use partial information.
>>>>  	 */
>>>> -	if (of_have_populated_dt() && parse_dt_topology())
>>>> +	if ((!acpi_disabled) && parse_acpi_topology())
>>>> +		reset_cpu_topology();
>>>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>>>  		reset_cpu_topology();
>>>>  }
>>>>
>>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-02-23 11:02         ` Lorenzo Pieralisi
  (?)
@ 2018-02-24  4:37           ` Jeremy Linton
  -1 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-02-24  4:37 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: mark.rutland, Jonathan.Zhang, austinwc, catalin.marinas,
	will.deacon, morten.rasmussen, vkilari, Jayachandran.Nair,
	Juri Lelli, jhugo, Xiongfeng Wang, linux-acpi, viresh.kumar,
	lenb, linux-pm, ahs3, linux-arm-kernel, gregkh, rjw,
	linux-kernel, hanjun.guo, sudeep.holla

On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [    4.017371] BUG: arch topology borken
>>> [    4.021069] BUG: arch topology borken
>>> [    4.024764] BUG: arch topology borken
>>> [    4.028460] BUG: arch topology borken
>>> [    4.032153] BUG: arch topology borken
>>> [    4.035849] BUG: arch topology borken
>>> [    4.039543] BUG: arch topology borken
>>> [    4.043239] BUG: arch topology borken
>>> [    4.046932] BUG: arch topology borken
>>> [    4.050629] BUG: arch topology borken
>>> [    4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?

I'm not sure what removing the die level does for you, but its not 
really the problem AFAIK, the problem is because MC layer is larger than 
the NUMA domains.

> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?

Strictly no, and I have a partial patch around here i've been meaning to 
flush out which uses the early node information to detect if there are 
nodes smaller than the package. Initially I've been claiming i was going 
to stay away from making scheduler topology changes in this patch set, 
but it seems that at least providing a patch which does the minimal bits 
is in the cards. The PXN flag was is more of a shortcut to finding the 
cache levels at or below the numa domains, rather than any hard 
requirement. Similarly, to the request someone else was making for a 
leaf node flag (or node ordering) to avoid multiple passes in the table. 
That request would simplify the posted code a bit but it works without it.


> I can't merge code that breaks a platform with legitimate firmware
> bindings.

Breaks in this case is a BUG warning that shows up right before it 
"corrects" a scheduler domain.

Basically, as i've mentioned a few times, this patch set corrects the 
existing topology problems, in doing so it uncovers issues with the way 
we are mapping that topology for the scheduler. That is actually not 
difficult thing to fix, my assumption originally is that we would 
already be at the point of discussion the finer points of the scheduler 
changes but we are still here.

Anyway, I was planning on posting a v7 this week, but time flys... I 
will include a further scheduler tweak to work around the inverted numa 
domain problem in that set early next week.

Thanks,

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by calling find_acpi_cpu_topology_package() which terminates
>>>> its search when it finds an ACPI node flagged as the physical
>>>> package. If the tree doesn't contain enough levels to represent
>>>> all of the requested levels then the root node will be returned
>>>> for all subsequent levels.
>>>>
>>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -11,6 +11,7 @@
>>>>    * for more details.
>>>>    */
>>>> +#include <linux/acpi.h>
>>>>   #include <linux/arch_topology.h>
>>>>   #include <linux/cpu.h>
>>>>   #include <linux/cpumask.h>
>>>> @@ -22,6 +23,7 @@
>>>>   #include <linux/sched.h>
>>>>   #include <linux/sched/topology.h>
>>>>   #include <linux/slab.h>
>>>> +#include <linux/smp.h>
>>>>   #include <linux/string.h>
>>>>   #include <asm/cpu.h>
>>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>>   	}
>>>>   }
>>>> +#ifdef CONFIG_ACPI
>>>> +/*
>>>> + * Propagate the topology information of the processor_topology_node tree to the
>>>> + * cpu_topology array.
>>>> + */
>>>> +static int __init parse_acpi_topology(void)
>>>> +{
>>>> +	bool is_threaded;
>>>> +	int cpu, topology_id;
>>>> +
>>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>>>> +		if (topology_id < 0)
>>>> +			return topology_id;
>>>> +
>>>> +		if (is_threaded) {
>>>> +			cpu_topology[cpu].thread_id = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		} else {
>>>> +			cpu_topology[cpu].thread_id  = -1;
>>>> +			cpu_topology[cpu].core_id    = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +#else
>>>> +static inline int __init parse_acpi_topology(void)
>>>> +{
>>>> +	return -EINVAL;
>>>> +}
>>>> +#endif
>>>>   void __init init_cpu_topology(void)
>>>>   {
>>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>>   	 * Discard anything that was parsed if we hit an error so we
>>>>   	 * don't use partial information.
>>>>   	 */
>>>> -	if (of_have_populated_dt() && parse_dt_topology())
>>>> +	if ((!acpi_disabled) && parse_acpi_topology())
>>>> +		reset_cpu_topology();
>>>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>>>   		reset_cpu_topology();
>>>>   }
>>>>
>>>
>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-24  4:37           ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-02-24  4:37 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Xiongfeng Wang, linux-acpi, linux-arm-kernel, sudeep.holla,
	hanjun.guo, rjw, will.deacon, catalin.marinas, gregkh,
	viresh.kumar, mark.rutland, linux-kernel, linux-pm, jhugo,
	Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb, vkilari,
	morten.rasmussen, Juri Lelli

On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [    4.017371] BUG: arch topology borken
>>> [    4.021069] BUG: arch topology borken
>>> [    4.024764] BUG: arch topology borken
>>> [    4.028460] BUG: arch topology borken
>>> [    4.032153] BUG: arch topology borken
>>> [    4.035849] BUG: arch topology borken
>>> [    4.039543] BUG: arch topology borken
>>> [    4.043239] BUG: arch topology borken
>>> [    4.046932] BUG: arch topology borken
>>> [    4.050629] BUG: arch topology borken
>>> [    4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?

I'm not sure what removing the die level does for you, but its not 
really the problem AFAIK, the problem is because MC layer is larger than 
the NUMA domains.

> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?

Strictly no, and I have a partial patch around here i've been meaning to 
flush out which uses the early node information to detect if there are 
nodes smaller than the package. Initially I've been claiming i was going 
to stay away from making scheduler topology changes in this patch set, 
but it seems that at least providing a patch which does the minimal bits 
is in the cards. The PXN flag was is more of a shortcut to finding the 
cache levels at or below the numa domains, rather than any hard 
requirement. Similarly, to the request someone else was making for a 
leaf node flag (or node ordering) to avoid multiple passes in the table. 
That request would simplify the posted code a bit but it works without it.


> I can't merge code that breaks a platform with legitimate firmware
> bindings.

Breaks in this case is a BUG warning that shows up right before it 
"corrects" a scheduler domain.

Basically, as i've mentioned a few times, this patch set corrects the 
existing topology problems, in doing so it uncovers issues with the way 
we are mapping that topology for the scheduler. That is actually not 
difficult thing to fix, my assumption originally is that we would 
already be at the point of discussion the finer points of the scheduler 
changes but we are still here.

Anyway, I was planning on posting a v7 this week, but time flys... I 
will include a further scheduler tweak to work around the inverted numa 
domain problem in that set early next week.

Thanks,

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by calling find_acpi_cpu_topology_package() which terminates
>>>> its search when it finds an ACPI node flagged as the physical
>>>> package. If the tree doesn't contain enough levels to represent
>>>> all of the requested levels then the root node will be returned
>>>> for all subsequent levels.
>>>>
>>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -11,6 +11,7 @@
>>>>    * for more details.
>>>>    */
>>>> +#include <linux/acpi.h>
>>>>   #include <linux/arch_topology.h>
>>>>   #include <linux/cpu.h>
>>>>   #include <linux/cpumask.h>
>>>> @@ -22,6 +23,7 @@
>>>>   #include <linux/sched.h>
>>>>   #include <linux/sched/topology.h>
>>>>   #include <linux/slab.h>
>>>> +#include <linux/smp.h>
>>>>   #include <linux/string.h>
>>>>   #include <asm/cpu.h>
>>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>>   	}
>>>>   }
>>>> +#ifdef CONFIG_ACPI
>>>> +/*
>>>> + * Propagate the topology information of the processor_topology_node tree to the
>>>> + * cpu_topology array.
>>>> + */
>>>> +static int __init parse_acpi_topology(void)
>>>> +{
>>>> +	bool is_threaded;
>>>> +	int cpu, topology_id;
>>>> +
>>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>>>> +		if (topology_id < 0)
>>>> +			return topology_id;
>>>> +
>>>> +		if (is_threaded) {
>>>> +			cpu_topology[cpu].thread_id = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		} else {
>>>> +			cpu_topology[cpu].thread_id  = -1;
>>>> +			cpu_topology[cpu].core_id    = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +#else
>>>> +static inline int __init parse_acpi_topology(void)
>>>> +{
>>>> +	return -EINVAL;
>>>> +}
>>>> +#endif
>>>>   void __init init_cpu_topology(void)
>>>>   {
>>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>>   	 * Discard anything that was parsed if we hit an error so we
>>>>   	 * don't use partial information.
>>>>   	 */
>>>> -	if (of_have_populated_dt() && parse_dt_topology())
>>>> +	if ((!acpi_disabled) && parse_acpi_topology())
>>>> +		reset_cpu_topology();
>>>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>>>   		reset_cpu_topology();
>>>>   }
>>>>
>>>
>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-24  4:37           ` Jeremy Linton
  0 siblings, 0 replies; 104+ messages in thread
From: Jeremy Linton @ 2018-02-24  4:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [    4.017371] BUG: arch topology borken
>>> [    4.021069] BUG: arch topology borken
>>> [    4.024764] BUG: arch topology borken
>>> [    4.028460] BUG: arch topology borken
>>> [    4.032153] BUG: arch topology borken
>>> [    4.035849] BUG: arch topology borken
>>> [    4.039543] BUG: arch topology borken
>>> [    4.043239] BUG: arch topology borken
>>> [    4.046932] BUG: arch topology borken
>>> [    4.050629] BUG: arch topology borken
>>> [    4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?

I'm not sure what removing the die level does for you, but its not 
really the problem AFAIK, the problem is because MC layer is larger than 
the NUMA domains.

> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?

Strictly no, and I have a partial patch around here i've been meaning to 
flush out which uses the early node information to detect if there are 
nodes smaller than the package. Initially I've been claiming i was going 
to stay away from making scheduler topology changes in this patch set, 
but it seems that at least providing a patch which does the minimal bits 
is in the cards. The PXN flag was is more of a shortcut to finding the 
cache levels at or below the numa domains, rather than any hard 
requirement. Similarly, to the request someone else was making for a 
leaf node flag (or node ordering) to avoid multiple passes in the table. 
That request would simplify the posted code a bit but it works without it.


> I can't merge code that breaks a platform with legitimate firmware
> bindings.

Breaks in this case is a BUG warning that shows up right before it 
"corrects" a scheduler domain.

Basically, as i've mentioned a few times, this patch set corrects the 
existing topology problems, in doing so it uncovers issues with the way 
we are mapping that topology for the scheduler. That is actually not 
difficult thing to fix, my assumption originally is that we would 
already be at the point of discussion the finer points of the scheduler 
changes but we are still here.

Anyway, I was planning on posting a v7 this week, but time flys... I 
will include a further scheduler tweak to work around the inverted numa 
domain problem in that set early next week.

Thanks,

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of package, then all the topology information
>>> within the package will be discarded. I think we need to build the multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>>> Propagate the topology information from the PPTT tree to the
>>>> cpu_topology array. We can get the thread id, core_id and
>>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>>> to those concepts. The package_id is flagged in the tree and can be
>>>> found by calling find_acpi_cpu_topology_package() which terminates
>>>> its search when it finds an ACPI node flagged as the physical
>>>> package. If the tree doesn't contain enough levels to represent
>>>> all of the requested levels then the root node will be returned
>>>> for all subsequent levels.
>>>>
>>>> Cc: Juri Lelli <juri.lelli@arm.com>
>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>> ---
>>>>   arch/arm64/kernel/topology.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
>>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>>> --- a/arch/arm64/kernel/topology.c
>>>> +++ b/arch/arm64/kernel/topology.c
>>>> @@ -11,6 +11,7 @@
>>>>    * for more details.
>>>>    */
>>>> +#include <linux/acpi.h>
>>>>   #include <linux/arch_topology.h>
>>>>   #include <linux/cpu.h>
>>>>   #include <linux/cpumask.h>
>>>> @@ -22,6 +23,7 @@
>>>>   #include <linux/sched.h>
>>>>   #include <linux/sched/topology.h>
>>>>   #include <linux/slab.h>
>>>> +#include <linux/smp.h>
>>>>   #include <linux/string.h>
>>>>   #include <asm/cpu.h>
>>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>>>>   	}
>>>>   }
>>>> +#ifdef CONFIG_ACPI
>>>> +/*
>>>> + * Propagate the topology information of the processor_topology_node tree to the
>>>> + * cpu_topology array.
>>>> + */
>>>> +static int __init parse_acpi_topology(void)
>>>> +{
>>>> +	bool is_threaded;
>>>> +	int cpu, topology_id;
>>>> +
>>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
>>>> +
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
>>>> +		if (topology_id < 0)
>>>> +			return topology_id;
>>>> +
>>>> +		if (is_threaded) {
>>>> +			cpu_topology[cpu].thread_id = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology(cpu, 1);
>>>> +			cpu_topology[cpu].core_id   = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		} else {
>>>> +			cpu_topology[cpu].thread_id  = -1;
>>>> +			cpu_topology[cpu].core_id    = topology_id;
>>>> +			topology_id = find_acpi_cpu_topology_package(cpu);
>>>> +			cpu_topology[cpu].package_id = topology_id;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +#else
>>>> +static inline int __init parse_acpi_topology(void)
>>>> +{
>>>> +	return -EINVAL;
>>>> +}
>>>> +#endif
>>>>   void __init init_cpu_topology(void)
>>>>   {
>>>> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>>>>   	 * Discard anything that was parsed if we hit an error so we
>>>>   	 * don't use partial information.
>>>>   	 */
>>>> -	if (of_have_populated_dt() && parse_dt_topology())
>>>> +	if ((!acpi_disabled) && parse_acpi_topology())
>>>> +		reset_cpu_topology();
>>>> +	else if (of_have_populated_dt() && parse_dt_topology())
>>>>   		reset_cpu_topology();
>>>>   }
>>>>
>>>
>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-02-24  3:05           ` Xiongfeng Wang
  (?)
@ 2018-02-25  6:17             ` vkilari
  -1 siblings, 0 replies; 104+ messages in thread
From: vkilari @ 2018-02-25  6:17 UTC (permalink / raw)
  To: 'Xiongfeng Wang', 'Lorenzo Pieralisi',
	'Jeremy Linton'
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair, austinwc,
	'Juri Lelli',
	vikrams, linux-pm, jhugo, gregkh, sudeep.holla, rjw,
	linux-kernel, will.deacon, ahs3, linux-acpi, viresh.kumar,
	hanjun.guo, catalin.marinas, morten.rasmussen, linux-arm-kernel,
	lenb

Hi,

> -----Original Message-----
> From: linux-arm-kernel
[mailto:linux-arm-kernel-bounces@lists.infradead.org]
> On Behalf Of Xiongfeng Wang
> Sent: Saturday, February 24, 2018 8:36 AM
> To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>; Jeremy Linton
> <jeremy.linton@arm.com>
> Cc: mark.rutland@arm.com; Jonathan.Zhang@cavium.com;
> Jayachandran.Nair@cavium.com; catalin.marinas@arm.com; Juri Lelli
> <juri.lelli@arm.com>; gregkh@linuxfoundation.org; jhugo@codeaurora.org;
> rjw@rjwysocki.net; linux-pm@vger.kernel.org; will.deacon@arm.com; linux-
> kernel@vger.kernel.org; morten.rasmussen@arm.com; linux-
> acpi@vger.kernel.org; viresh.kumar@linaro.org; hanjun.guo@linaro.org;
> sudeep.holla@arm.com; austinwc@codeaurora.org; vkilari@codeaurora.org;
> ahs3@redhat.com; linux-arm-kernel@lists.infradead.org; lenb@kernel.org
> Subject: Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU
> topology
> 
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below
error:
> >>>
> >>> [    4.017371] BUG: arch topology borken
> >>> [    4.021069] BUG: arch topology borken
> >>> [    4.024764] BUG: arch topology borken
> >>> [    4.028460] BUG: arch topology borken
> >>> [    4.032153] BUG: arch topology borken
> >>> [    4.035849] BUG: arch topology borken
> >>> [    4.039543] BUG: arch topology borken
> >>> [    4.043239] BUG: arch topology borken
> >>> [    4.046932] BUG: arch topology borken
> >>> [    4.050629] BUG: arch topology borken
> >>> [    4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT
> >>> physical_package_flag on a physical package node and the NUMA domain
> (SRAT domains) starts from the layer of DIE. (The topology of our board is
core-
> >cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current
> >> spec doesn't explicitly ban having the flag set multiple times
> >> between the leaf and the root I consider it a "bug" and there is an
> >> effort to clarify the spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core
> >>> sched_domain contains all the cores within a package, and the lowest
> >>> NUMA sched_domain contains all the cores within a die. But the kernel
> requires that the multi-core sched_domain should be a subset of the lowest
> NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain
> >> is detected/located within the PPTT topology (a node with a 1:1
> >> correspondence isn't even required). As you can see from this patch
> >> set, we are making the general assumption that the proximity domains
> >> are at the same level as the physical socket. This isn't ideal for
> >> NUMA topologies, like the D05, that don't align with the physical
socket.
> >>
> >> There are efforts underway to clarify and expand upon the
> >> specification to deal with this general problem. The simple solution
> >> is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05
> >> die) which could be used to find nodes with 1:1 correspondence. At
> >> that point we could add a fairly trivial patch to correct just the
> >> scheduler topology without affecting the rest of the system topology
code.
> >
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is
> > detected (instead of using the default_topology[]) and we create our
> > own ARM64 domain hierarchy (with DIE level removed) through
> > set_sched_topology() accordingly ?

To overcome this, on x86 as well the DIE level is removed when
NUMA-within-package is detected with this patch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ar
ch/x86/kernel/smpboot.c?h=v4.16-rc2&id=8f37961cf22304fb286c7604d3a7f6104dcc1
283

Solving with PPTT would be clean approach instead of overriding
default_topology[]

> >
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> >
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
> 
> I think we really need another PPTT flag, from which we can get
information
> about how to build a multi-core sched_domain. I think only cache-sharing
> information is not enough to get information about how to build a
multi-core
> shced_domain.
> 
> How about this? We assume the upper layer of the lowest layer to be multi-
> core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to
the
> change.
> 
> Thanks,
> Xiongfeng
> 
> >
> > Thanks,
> > Lorenzo
> >
> >>
> >>>
> >>> If we modify the UEFI to make NUMA sched_domain start from the layer
> >>> of package, then all the topology information within the package
> >>> will be discarded. I think we need to build the multi-core
sched_domain
> using the cores within the cluster instead of the cores within the
package. I
> think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> >>> If we build the multi-core sched_domain using the cores within a
> >>> cluster, I think we need to add fields in struct cpu_topology to
record which
> cores are in each cluster.
> >>
> >> The problem is that there isn't a generic way to identify which level
> >> of cache sharing is the "correct" top layer MC domain. For one system
> >> cluster might be appropriate, for another it might be the highest
> >> caching level within a socket, for another is might be a something in
> >> between or a group of clusters or LLCs..
> >>
> >> Hence the effort to standardize/guarantee a PPTT node that exactly
> >> matches a SRAT domain. With that, each SOC/system provider has
> >> clearly defined method for communicating where they want the proximity
> domain information to begin.
> >>
> >> Thanks,
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Xiongfeng
> >>>
> >>> On 2018/1/13 8:59, Jeremy Linton wrote:
> >>>> Propagate the topology information from the PPTT tree to the
> >>>> cpu_topology array. We can get the thread id, core_id and
> >>>> cluster_id by assuming certain levels of the PPTT tree correspond
> >>>> to those concepts. The package_id is flagged in the tree and can be
> >>>> found by calling find_acpi_cpu_topology_package() which terminates
> >>>> its search when it finds an ACPI node flagged as the physical
> >>>> package. If the tree doesn't contain enough levels to represent all
> >>>> of the requested levels then the root node will be returned for all
> >>>> subsequent levels.
> >>>>
> >>>> Cc: Juri Lelli <juri.lelli@arm.com>
> >>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>>> ---
> >>>>  arch/arm64/kernel/topology.c | 46
> >>>> +++++++++++++++++++++++++++++++++++++++++++-
> >>>>  1 file changed, 45 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/arm64/kernel/topology.c
> >>>> b/arch/arm64/kernel/topology.c index 7b06e263fdd1..ce8ec7fd6b32
> >>>> 100644
> >>>> --- a/arch/arm64/kernel/topology.c
> >>>> +++ b/arch/arm64/kernel/topology.c
> >>>> @@ -11,6 +11,7 @@
> >>>>   * for more details.
> >>>>   */
> >>>> +#include <linux/acpi.h>
> >>>>  #include <linux/arch_topology.h>
> >>>>  #include <linux/cpu.h>
> >>>>  #include <linux/cpumask.h>
> >>>> @@ -22,6 +23,7 @@
> >>>>  #include <linux/sched.h>
> >>>>  #include <linux/sched/topology.h>
> >>>>  #include <linux/slab.h>
> >>>> +#include <linux/smp.h>
> >>>>  #include <linux/string.h>
> >>>>  #include <asm/cpu.h>
> >>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
> >>>>  	}
> >>>>  }
> >>>> +#ifdef CONFIG_ACPI
> >>>> +/*
> >>>> + * Propagate the topology information of the
> >>>> +processor_topology_node tree to the
> >>>> + * cpu_topology array.
> >>>> + */
> >>>> +static int __init parse_acpi_topology(void) {
> >>>> +	bool is_threaded;
> >>>> +	int cpu, topology_id;
> >>>> +
> >>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> >>>> +
> >>>> +	for_each_possible_cpu(cpu) {
> >>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
> >>>> +		if (topology_id < 0)
> >>>> +			return topology_id;
> >>>> +
> >>>> +		if (is_threaded) {
> >>>> +			cpu_topology[cpu].thread_id = topology_id;
> >>>> +			topology_id = find_acpi_cpu_topology(cpu,
1);
> >>>> +			cpu_topology[cpu].core_id   = topology_id;
> >>>> +			topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> +			cpu_topology[cpu].package_id = topology_id;
> >>>> +		} else {
> >>>> +			cpu_topology[cpu].thread_id  = -1;
> >>>> +			cpu_topology[cpu].core_id    = topology_id;
> >>>> +			topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> +			cpu_topology[cpu].package_id = topology_id;
> >>>> +		}
> >>>> +	}
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>> +#else
> >>>> +static inline int __init parse_acpi_topology(void) {
> >>>> +	return -EINVAL;
> >>>> +}
> >>>> +#endif
> >>>>  void __init init_cpu_topology(void)  { @@ -309,6 +351,8 @@ void
> >>>> __init init_cpu_topology(void)
> >>>>  	 * Discard anything that was parsed if we hit an error so we
> >>>>  	 * don't use partial information.
> >>>>  	 */
> >>>> -	if (of_have_populated_dt() && parse_dt_topology())
> >>>> +	if ((!acpi_disabled) && parse_acpi_topology())
> >>>> +		reset_cpu_topology();
> >>>> +	else if (of_have_populated_dt() && parse_dt_topology())
> >>>>  		reset_cpu_topology();
> >>>>  }
> >>>>
> >>>
> >>
> >
> > .
> >
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-25  6:17             ` vkilari
  0 siblings, 0 replies; 104+ messages in thread
From: vkilari @ 2018-02-25  6:17 UTC (permalink / raw)
  To: 'Xiongfeng Wang', 'Lorenzo Pieralisi',
	'Jeremy Linton'
  Cc: mark.rutland, Jonathan.Zhang, Jayachandran.Nair, catalin.marinas,
	'Juri Lelli',
	gregkh, jhugo, rjw, linux-pm, will.deacon, linux-kernel,
	morten.rasmussen, linux-acpi, viresh.kumar, hanjun.guo,
	sudeep.holla, austinwc, ahs3, linux-arm-kernel, lenb, vikrams

Hi,

> -----Original Message-----
> From: linux-arm-kernel
[mailto:linux-arm-kernel-bounces@lists.infradead.org]
> On Behalf Of Xiongfeng Wang
> Sent: Saturday, February 24, 2018 8:36 AM
> To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>; Jeremy Linton
> <jeremy.linton@arm.com>
> Cc: mark.rutland@arm.com; Jonathan.Zhang@cavium.com;
> Jayachandran.Nair@cavium.com; catalin.marinas@arm.com; Juri Lelli
> <juri.lelli@arm.com>; gregkh@linuxfoundation.org; jhugo@codeaurora.org;
> rjw@rjwysocki.net; linux-pm@vger.kernel.org; will.deacon@arm.com; linux-
> kernel@vger.kernel.org; morten.rasmussen@arm.com; linux-
> acpi@vger.kernel.org; viresh.kumar@linaro.org; hanjun.guo@linaro.org;
> sudeep.holla@arm.com; austinwc@codeaurora.org; vkilari@codeaurora.org;
> ahs3@redhat.com; linux-arm-kernel@lists.infradead.org; lenb@kernel.org
> Subject: Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU
> topology
> 
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below
error:
> >>>
> >>> [    4.017371] BUG: arch topology borken
> >>> [    4.021069] BUG: arch topology borken
> >>> [    4.024764] BUG: arch topology borken
> >>> [    4.028460] BUG: arch topology borken
> >>> [    4.032153] BUG: arch topology borken
> >>> [    4.035849] BUG: arch topology borken
> >>> [    4.039543] BUG: arch topology borken
> >>> [    4.043239] BUG: arch topology borken
> >>> [    4.046932] BUG: arch topology borken
> >>> [    4.050629] BUG: arch topology borken
> >>> [    4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT
> >>> physical_package_flag on a physical package node and the NUMA domain
> (SRAT domains) starts from the layer of DIE. (The topology of our board is
core-
> >cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current
> >> spec doesn't explicitly ban having the flag set multiple times
> >> between the leaf and the root I consider it a "bug" and there is an
> >> effort to clarify the spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core
> >>> sched_domain contains all the cores within a package, and the lowest
> >>> NUMA sched_domain contains all the cores within a die. But the kernel
> requires that the multi-core sched_domain should be a subset of the lowest
> NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain
> >> is detected/located within the PPTT topology (a node with a 1:1
> >> correspondence isn't even required). As you can see from this patch
> >> set, we are making the general assumption that the proximity domains
> >> are at the same level as the physical socket. This isn't ideal for
> >> NUMA topologies, like the D05, that don't align with the physical
socket.
> >>
> >> There are efforts underway to clarify and expand upon the
> >> specification to deal with this general problem. The simple solution
> >> is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05
> >> die) which could be used to find nodes with 1:1 correspondence. At
> >> that point we could add a fairly trivial patch to correct just the
> >> scheduler topology without affecting the rest of the system topology
code.
> >
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is
> > detected (instead of using the default_topology[]) and we create our
> > own ARM64 domain hierarchy (with DIE level removed) through
> > set_sched_topology() accordingly ?

To overcome this, on x86 as well the DIE level is removed when
NUMA-within-package is detected with this patch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ar
ch/x86/kernel/smpboot.c?h=v4.16-rc2&id=8f37961cf22304fb286c7604d3a7f6104dcc1
283

Solving with PPTT would be clean approach instead of overriding
default_topology[]

> >
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> >
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
> 
> I think we really need another PPTT flag, from which we can get
information
> about how to build a multi-core sched_domain. I think only cache-sharing
> information is not enough to get information about how to build a
multi-core
> shced_domain.
> 
> How about this? We assume the upper layer of the lowest layer to be multi-
> core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to
the
> change.
> 
> Thanks,
> Xiongfeng
> 
> >
> > Thanks,
> > Lorenzo
> >
> >>
> >>>
> >>> If we modify the UEFI to make NUMA sched_domain start from the layer
> >>> of package, then all the topology information within the package
> >>> will be discarded. I think we need to build the multi-core
sched_domain
> using the cores within the cluster instead of the cores within the
package. I
> think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> >>> If we build the multi-core sched_domain using the cores within a
> >>> cluster, I think we need to add fields in struct cpu_topology to
record which
> cores are in each cluster.
> >>
> >> The problem is that there isn't a generic way to identify which level
> >> of cache sharing is the "correct" top layer MC domain. For one system
> >> cluster might be appropriate, for another it might be the highest
> >> caching level within a socket, for another is might be a something in
> >> between or a group of clusters or LLCs..
> >>
> >> Hence the effort to standardize/guarantee a PPTT node that exactly
> >> matches a SRAT domain. With that, each SOC/system provider has
> >> clearly defined method for communicating where they want the proximity
> domain information to begin.
> >>
> >> Thanks,
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Xiongfeng
> >>>
> >>> On 2018/1/13 8:59, Jeremy Linton wrote:
> >>>> Propagate the topology information from the PPTT tree to the
> >>>> cpu_topology array. We can get the thread id, core_id and
> >>>> cluster_id by assuming certain levels of the PPTT tree correspond
> >>>> to those concepts. The package_id is flagged in the tree and can be
> >>>> found by calling find_acpi_cpu_topology_package() which terminates
> >>>> its search when it finds an ACPI node flagged as the physical
> >>>> package. If the tree doesn't contain enough levels to represent all
> >>>> of the requested levels then the root node will be returned for all
> >>>> subsequent levels.
> >>>>
> >>>> Cc: Juri Lelli <juri.lelli@arm.com>
> >>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>>> ---
> >>>>  arch/arm64/kernel/topology.c | 46
> >>>> +++++++++++++++++++++++++++++++++++++++++++-
> >>>>  1 file changed, 45 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/arm64/kernel/topology.c
> >>>> b/arch/arm64/kernel/topology.c index 7b06e263fdd1..ce8ec7fd6b32
> >>>> 100644
> >>>> --- a/arch/arm64/kernel/topology.c
> >>>> +++ b/arch/arm64/kernel/topology.c
> >>>> @@ -11,6 +11,7 @@
> >>>>   * for more details.
> >>>>   */
> >>>> +#include <linux/acpi.h>
> >>>>  #include <linux/arch_topology.h>
> >>>>  #include <linux/cpu.h>
> >>>>  #include <linux/cpumask.h>
> >>>> @@ -22,6 +23,7 @@
> >>>>  #include <linux/sched.h>
> >>>>  #include <linux/sched/topology.h>
> >>>>  #include <linux/slab.h>
> >>>> +#include <linux/smp.h>
> >>>>  #include <linux/string.h>
> >>>>  #include <asm/cpu.h>
> >>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
> >>>>  	}
> >>>>  }
> >>>> +#ifdef CONFIG_ACPI
> >>>> +/*
> >>>> + * Propagate the topology information of the
> >>>> +processor_topology_node tree to the
> >>>> + * cpu_topology array.
> >>>> + */
> >>>> +static int __init parse_acpi_topology(void) {
> >>>> +	bool is_threaded;
> >>>> +	int cpu, topology_id;
> >>>> +
> >>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> >>>> +
> >>>> +	for_each_possible_cpu(cpu) {
> >>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
> >>>> +		if (topology_id < 0)
> >>>> +			return topology_id;
> >>>> +
> >>>> +		if (is_threaded) {
> >>>> +			cpu_topology[cpu].thread_id = topology_id;
> >>>> +			topology_id = find_acpi_cpu_topology(cpu,
1);
> >>>> +			cpu_topology[cpu].core_id   = topology_id;
> >>>> +			topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> +			cpu_topology[cpu].package_id = topology_id;
> >>>> +		} else {
> >>>> +			cpu_topology[cpu].thread_id  = -1;
> >>>> +			cpu_topology[cpu].core_id    = topology_id;
> >>>> +			topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> +			cpu_topology[cpu].package_id = topology_id;
> >>>> +		}
> >>>> +	}
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>> +#else
> >>>> +static inline int __init parse_acpi_topology(void) {
> >>>> +	return -EINVAL;
> >>>> +}
> >>>> +#endif
> >>>>  void __init init_cpu_topology(void)  { @@ -309,6 +351,8 @@ void
> >>>> __init init_cpu_topology(void)
> >>>>  	 * Discard anything that was parsed if we hit an error so we
> >>>>  	 * don't use partial information.
> >>>>  	 */
> >>>> -	if (of_have_populated_dt() && parse_dt_topology())
> >>>> +	if ((!acpi_disabled) && parse_acpi_topology())
> >>>> +		reset_cpu_topology();
> >>>> +	else if (of_have_populated_dt() && parse_dt_topology())
> >>>>  		reset_cpu_topology();
> >>>>  }
> >>>>
> >>>
> >>
> >
> > .
> >
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-02-25  6:17             ` vkilari
  0 siblings, 0 replies; 104+ messages in thread
From: vkilari at codeaurora.org @ 2018-02-25  6:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

> -----Original Message-----
> From: linux-arm-kernel
[mailto:linux-arm-kernel-bounces at lists.infradead.org]
> On Behalf Of Xiongfeng Wang
> Sent: Saturday, February 24, 2018 8:36 AM
> To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>; Jeremy Linton
> <jeremy.linton@arm.com>
> Cc: mark.rutland at arm.com; Jonathan.Zhang at cavium.com;
> Jayachandran.Nair at cavium.com; catalin.marinas at arm.com; Juri Lelli
> <juri.lelli@arm.com>; gregkh at linuxfoundation.org; jhugo at codeaurora.org;
> rjw at rjwysocki.net; linux-pm at vger.kernel.org; will.deacon at arm.com; linux-
> kernel at vger.kernel.org; morten.rasmussen at arm.com; linux-
> acpi at vger.kernel.org; viresh.kumar at linaro.org; hanjun.guo at linaro.org;
> sudeep.holla at arm.com; austinwc at codeaurora.org; vkilari at codeaurora.org;
> ahs3 at redhat.com; linux-arm-kernel at lists.infradead.org; lenb at kernel.org
> Subject: Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU
> topology
> 
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below
error:
> >>>
> >>> [    4.017371] BUG: arch topology borken
> >>> [    4.021069] BUG: arch topology borken
> >>> [    4.024764] BUG: arch topology borken
> >>> [    4.028460] BUG: arch topology borken
> >>> [    4.032153] BUG: arch topology borken
> >>> [    4.035849] BUG: arch topology borken
> >>> [    4.039543] BUG: arch topology borken
> >>> [    4.043239] BUG: arch topology borken
> >>> [    4.046932] BUG: arch topology borken
> >>> [    4.050629] BUG: arch topology borken
> >>> [    4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT
> >>> physical_package_flag on a physical package node and the NUMA domain
> (SRAT domains) starts from the layer of DIE. (The topology of our board is
core-
> >cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current
> >> spec doesn't explicitly ban having the flag set multiple times
> >> between the leaf and the root I consider it a "bug" and there is an
> >> effort to clarify the spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core
> >>> sched_domain contains all the cores within a package, and the lowest
> >>> NUMA sched_domain contains all the cores within a die. But the kernel
> requires that the multi-core sched_domain should be a subset of the lowest
> NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain
> >> is detected/located within the PPTT topology (a node with a 1:1
> >> correspondence isn't even required). As you can see from this patch
> >> set, we are making the general assumption that the proximity domains
> >> are at the same level as the physical socket. This isn't ideal for
> >> NUMA topologies, like the D05, that don't align with the physical
socket.
> >>
> >> There are efforts underway to clarify and expand upon the
> >> specification to deal with this general problem. The simple solution
> >> is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05
> >> die) which could be used to find nodes with 1:1 correspondence. At
> >> that point we could add a fairly trivial patch to correct just the
> >> scheduler topology without affecting the rest of the system topology
code.
> >
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is
> > detected (instead of using the default_topology[]) and we create our
> > own ARM64 domain hierarchy (with DIE level removed) through
> > set_sched_topology() accordingly ?

To overcome this, on x86 as well the DIE level is removed when
NUMA-within-package is detected with this patch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ar
ch/x86/kernel/smpboot.c?h=v4.16-rc2&id=8f37961cf22304fb286c7604d3a7f6104dcc1
283

Solving with PPTT would be clean approach instead of overriding
default_topology[]

> >
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> >
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
> 
> I think we really need another PPTT flag, from which we can get
information
> about how to build a multi-core sched_domain. I think only cache-sharing
> information is not enough to get information about how to build a
multi-core
> shced_domain.
> 
> How about this? We assume the upper layer of the lowest layer to be multi-
> core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to
the
> change.
> 
> Thanks,
> Xiongfeng
> 
> >
> > Thanks,
> > Lorenzo
> >
> >>
> >>>
> >>> If we modify the UEFI to make NUMA sched_domain start from the layer
> >>> of package, then all the topology information within the package
> >>> will be discarded. I think we need to build the multi-core
sched_domain
> using the cores within the cluster instead of the cores within the
package. I
> think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> >>> If we build the multi-core sched_domain using the cores within a
> >>> cluster, I think we need to add fields in struct cpu_topology to
record which
> cores are in each cluster.
> >>
> >> The problem is that there isn't a generic way to identify which level
> >> of cache sharing is the "correct" top layer MC domain. For one system
> >> cluster might be appropriate, for another it might be the highest
> >> caching level within a socket, for another is might be a something in
> >> between or a group of clusters or LLCs..
> >>
> >> Hence the effort to standardize/guarantee a PPTT node that exactly
> >> matches a SRAT domain. With that, each SOC/system provider has
> >> clearly defined method for communicating where they want the proximity
> domain information to begin.
> >>
> >> Thanks,
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Xiongfeng
> >>>
> >>> On 2018/1/13 8:59, Jeremy Linton wrote:
> >>>> Propagate the topology information from the PPTT tree to the
> >>>> cpu_topology array. We can get the thread id, core_id and
> >>>> cluster_id by assuming certain levels of the PPTT tree correspond
> >>>> to those concepts. The package_id is flagged in the tree and can be
> >>>> found by calling find_acpi_cpu_topology_package() which terminates
> >>>> its search when it finds an ACPI node flagged as the physical
> >>>> package. If the tree doesn't contain enough levels to represent all
> >>>> of the requested levels then the root node will be returned for all
> >>>> subsequent levels.
> >>>>
> >>>> Cc: Juri Lelli <juri.lelli@arm.com>
> >>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>>> ---
> >>>>  arch/arm64/kernel/topology.c | 46
> >>>> +++++++++++++++++++++++++++++++++++++++++++-
> >>>>  1 file changed, 45 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/arm64/kernel/topology.c
> >>>> b/arch/arm64/kernel/topology.c index 7b06e263fdd1..ce8ec7fd6b32
> >>>> 100644
> >>>> --- a/arch/arm64/kernel/topology.c
> >>>> +++ b/arch/arm64/kernel/topology.c
> >>>> @@ -11,6 +11,7 @@
> >>>>   * for more details.
> >>>>   */
> >>>> +#include <linux/acpi.h>
> >>>>  #include <linux/arch_topology.h>
> >>>>  #include <linux/cpu.h>
> >>>>  #include <linux/cpumask.h>
> >>>> @@ -22,6 +23,7 @@
> >>>>  #include <linux/sched.h>
> >>>>  #include <linux/sched/topology.h>
> >>>>  #include <linux/slab.h>
> >>>> +#include <linux/smp.h>
> >>>>  #include <linux/string.h>
> >>>>  #include <asm/cpu.h>
> >>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
> >>>>  	}
> >>>>  }
> >>>> +#ifdef CONFIG_ACPI
> >>>> +/*
> >>>> + * Propagate the topology information of the
> >>>> +processor_topology_node tree to the
> >>>> + * cpu_topology array.
> >>>> + */
> >>>> +static int __init parse_acpi_topology(void) {
> >>>> +	bool is_threaded;
> >>>> +	int cpu, topology_id;
> >>>> +
> >>>> +	is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> >>>> +
> >>>> +	for_each_possible_cpu(cpu) {
> >>>> +		topology_id = find_acpi_cpu_topology(cpu, 0);
> >>>> +		if (topology_id < 0)
> >>>> +			return topology_id;
> >>>> +
> >>>> +		if (is_threaded) {
> >>>> +			cpu_topology[cpu].thread_id = topology_id;
> >>>> +			topology_id = find_acpi_cpu_topology(cpu,
1);
> >>>> +			cpu_topology[cpu].core_id   = topology_id;
> >>>> +			topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> +			cpu_topology[cpu].package_id = topology_id;
> >>>> +		} else {
> >>>> +			cpu_topology[cpu].thread_id  = -1;
> >>>> +			cpu_topology[cpu].core_id    = topology_id;
> >>>> +			topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> +			cpu_topology[cpu].package_id = topology_id;
> >>>> +		}
> >>>> +	}
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>> +#else
> >>>> +static inline int __init parse_acpi_topology(void) {
> >>>> +	return -EINVAL;
> >>>> +}
> >>>> +#endif
> >>>>  void __init init_cpu_topology(void)  { @@ -309,6 +351,8 @@ void
> >>>> __init init_cpu_topology(void)
> >>>>  	 * Discard anything that was parsed if we hit an error so we
> >>>>  	 * don't use partial information.
> >>>>  	 */
> >>>> -	if (of_have_populated_dt() && parse_dt_topology())
> >>>> +	if ((!acpi_disabled) && parse_acpi_topology())
> >>>> +		reset_cpu_topology();
> >>>> +	else if (of_have_populated_dt() && parse_dt_topology())
> >>>>  		reset_cpu_topology();
> >>>>  }
> >>>>
> >>>
> >>
> >
> > .
> >
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-02-24  4:37           ` Jeremy Linton
@ 2018-03-01 11:51             ` Morten Rasmussen
  -1 siblings, 0 replies; 104+ messages in thread
From: Morten Rasmussen @ 2018-03-01 11:51 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Lorenzo Pieralisi, Xiongfeng Wang, linux-acpi, linux-arm-kernel,
	sudeep.holla, hanjun.guo, rjw, will.deacon, catalin.marinas,
	gregkh, viresh.kumar, mark.rutland, linux-kernel, linux-pm,
	jhugo, Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb,
	vkilari, Juri Lelli

On Fri, Feb 23, 2018 at 10:37:33PM -0600, Jeremy Linton wrote:
> On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:
> >On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >>Hi,
> >>
> >>On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>>Hi Jeremy,
> >>>
> >>>I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>>[    4.017371] BUG: arch topology borken
> >>>[    4.021069] BUG: arch topology borken
> >>>[    4.024764] BUG: arch topology borken
> >>>[    4.028460] BUG: arch topology borken
> >>>[    4.032153] BUG: arch topology borken
> >>>[    4.035849] BUG: arch topology borken
> >>>[    4.039543] BUG: arch topology borken
> >>>[    4.043239] BUG: arch topology borken
> >>>[    4.046932] BUG: arch topology borken
> >>>[    4.050629] BUG: arch topology borken
> >>>[    4.054322] BUG: arch topology borken
> >>>
> >>>I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> >>>the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> >>
> >>I commented about that on the EDK2 mailing list. While the current spec
> >>doesn't explicitly ban having the flag set multiple times between the leaf
> >>and the root I consider it a "bug" and there is an effort to clarify the
> >>spec and the use of that flag.
> >>>
> >>>When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> >>>and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> >>>sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> >>
> >>Right. I've mentioned this problem a couple of times.
> >>
> >>At at the moment, the spec isn't clear about how the proximity domain is
> >>detected/located within the PPTT topology (a node with a 1:1 correspondence
> >>isn't even required). As you can see from this patch set, we are making the
> >>general assumption that the proximity domains are at the same level as the
> >>physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >>don't align with the physical socket.
> >>
> >>There are efforts underway to clarify and expand upon the specification to
> >>deal with this general problem. The simple solution is another flag (say
> >>PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> >>find nodes with 1:1 correspondence. At that point we could add a fairly
> >>trivial patch to correct just the scheduler topology without affecting the
> >>rest of the system topology code.
> >
> >I think Morten asked already but isn't this the same end result we end
> >up having if we remove the DIE level if NUMA-within-package is detected
> >(instead of using the default_topology[]) and we create our own ARM64
> >domain hierarchy (with DIE level removed) through set_sched_topology()
> >accordingly ?
> 
> I'm not sure what removing the die level does for you, but its not really
> the problem AFAIK, the problem is because MC layer is larger than the NUMA
> domains.

Do you mean MC domains are larger than NUMA domains because that
reflects the hardware topology, i.e. you have caches shared across NUMA
nodes, or do you mean the problem is that the current code generates too
large MC domains?

If is it the first, then you have to make a choice whether you want
multi-core scheduling or NUMA-scheduling at that level in the topology.
You can't have both. If you don't want NUMA scheduling at that level you
should define your NUMA nodes to be larger than (or equal to?) the MC
domains, or not define NUMA nodes at all. If you do want NUMA
scheduling at that level, we have to ignore any cache sharing between
NUMA nodes and reduce the size of the MC domains accordingly.

We should be able to reduce the size of the MC domains based on in the
information already in the ACPI spec. SRAT defines the NUMA domains, if
the PPTT package level is larger than the NUMA nodes we should claim it
is NUMA in package, drop the DIE level and reduce the size of the MC
domain to equal to the NUMA node size ignoring any PPTT topology
information above the NUMA node level.

AFAICT, x86 doesn't have this problem as they don't use PPTT, and the
last-level cache is always inside the NUMA node, even for
numa-in-package. For numa-in-package they seem to let SRAT define the
NUMA nodes, have a special topology table for the non-NUMA levels only
containing SMT and MC, and guarantee the MC isn't larger than the NUMA
node.

Can't we just follow the same approach with the addition that we have to
resize the MC domains if necessary?

> >Put it differently: do we really need to rely on another PPTT flag to
> >collect this information ?
> 
> Strictly no, and I have a partial patch around here i've been meaning to
> flush out which uses the early node information to detect if there are nodes
> smaller than the package. Initially I've been claiming i was going to stay
> away from making scheduler topology changes in this patch set, but it seems
> that at least providing a patch which does the minimal bits is in the cards.
> The PXN flag was is more of a shortcut to finding the cache levels at or
> below the numa domains, rather than any hard requirement. Similarly, to the
> request someone else was making for a leaf node flag (or node ordering) to
> avoid multiple passes in the table. That request would simplify the posted
> code a bit but it works without it.

I don't see how a flag defining the proximity domains in PPTT makes this
a lot easier. PPTT and setting this flag would have to be mandatory for
NUMA in package systems for the flag to make any difference.

Morten

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-03-01 11:51             ` Morten Rasmussen
  0 siblings, 0 replies; 104+ messages in thread
From: Morten Rasmussen @ 2018-03-01 11:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 23, 2018 at 10:37:33PM -0600, Jeremy Linton wrote:
> On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:
> >On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >>Hi,
> >>
> >>On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>>Hi Jeremy,
> >>>
> >>>I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>>[    4.017371] BUG: arch topology borken
> >>>[    4.021069] BUG: arch topology borken
> >>>[    4.024764] BUG: arch topology borken
> >>>[    4.028460] BUG: arch topology borken
> >>>[    4.032153] BUG: arch topology borken
> >>>[    4.035849] BUG: arch topology borken
> >>>[    4.039543] BUG: arch topology borken
> >>>[    4.043239] BUG: arch topology borken
> >>>[    4.046932] BUG: arch topology borken
> >>>[    4.050629] BUG: arch topology borken
> >>>[    4.054322] BUG: arch topology borken
> >>>
> >>>I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> >>>the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> >>
> >>I commented about that on the EDK2 mailing list. While the current spec
> >>doesn't explicitly ban having the flag set multiple times between the leaf
> >>and the root I consider it a "bug" and there is an effort to clarify the
> >>spec and the use of that flag.
> >>>
> >>>When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> >>>and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> >>>sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> >>
> >>Right. I've mentioned this problem a couple of times.
> >>
> >>At at the moment, the spec isn't clear about how the proximity domain is
> >>detected/located within the PPTT topology (a node with a 1:1 correspondence
> >>isn't even required). As you can see from this patch set, we are making the
> >>general assumption that the proximity domains are at the same level as the
> >>physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >>don't align with the physical socket.
> >>
> >>There are efforts underway to clarify and expand upon the specification to
> >>deal with this general problem. The simple solution is another flag (say
> >>PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> >>find nodes with 1:1 correspondence. At that point we could add a fairly
> >>trivial patch to correct just the scheduler topology without affecting the
> >>rest of the system topology code.
> >
> >I think Morten asked already but isn't this the same end result we end
> >up having if we remove the DIE level if NUMA-within-package is detected
> >(instead of using the default_topology[]) and we create our own ARM64
> >domain hierarchy (with DIE level removed) through set_sched_topology()
> >accordingly ?
> 
> I'm not sure what removing the die level does for you, but its not really
> the problem AFAIK, the problem is because MC layer is larger than the NUMA
> domains.

Do you mean MC domains are larger than NUMA domains because that
reflects the hardware topology, i.e. you have caches shared across NUMA
nodes, or do you mean the problem is that the current code generates too
large MC domains?

If is it the first, then you have to make a choice whether you want
multi-core scheduling or NUMA-scheduling at that level in the topology.
You can't have both. If you don't want NUMA scheduling at that level you
should define your NUMA nodes to be larger than (or equal to?) the MC
domains, or not define NUMA nodes at all. If you do want NUMA
scheduling at that level, we have to ignore any cache sharing between
NUMA nodes and reduce the size of the MC domains accordingly.

We should be able to reduce the size of the MC domains based on in the
information already in the ACPI spec. SRAT defines the NUMA domains, if
the PPTT package level is larger than the NUMA nodes we should claim it
is NUMA in package, drop the DIE level and reduce the size of the MC
domain to equal to the NUMA node size ignoring any PPTT topology
information above the NUMA node level.

AFAICT, x86 doesn't have this problem as they don't use PPTT, and the
last-level cache is always inside the NUMA node, even for
numa-in-package. For numa-in-package they seem to let SRAT define the
NUMA nodes, have a special topology table for the non-NUMA levels only
containing SMT and MC, and guarantee the MC isn't larger than the NUMA
node.

Can't we just follow the same approach with the addition that we have to
resize the MC domains if necessary?

> >Put it differently: do we really need to rely on another PPTT flag to
> >collect this information ?
> 
> Strictly no, and I have a partial patch around here i've been meaning to
> flush out which uses the early node information to detect if there are nodes
> smaller than the package. Initially I've been claiming i was going to stay
> away from making scheduler topology changes in this patch set, but it seems
> that at least providing a patch which does the minimal bits is in the cards.
> The PXN flag was is more of a shortcut to finding the cache levels at or
> below the numa domains, rather than any hard requirement. Similarly, to the
> request someone else was making for a leaf node flag (or node ordering) to
> avoid multiple passes in the table. That request would simplify the posted
> code a bit but it works without it.

I don't see how a flag defining the proximity domains in PPTT makes this
a lot easier. PPTT and setting this flag would have to be mandatory for
NUMA in package systems for the flag to make any difference.

Morten

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
  2018-02-24  3:05           ` Xiongfeng Wang
@ 2018-03-01 14:19             ` Morten Rasmussen
  -1 siblings, 0 replies; 104+ messages in thread
From: Morten Rasmussen @ 2018-03-01 14:19 UTC (permalink / raw)
  To: Xiongfeng Wang
  Cc: Lorenzo Pieralisi, Jeremy Linton, linux-acpi, linux-arm-kernel,
	sudeep.holla, hanjun.guo, rjw, will.deacon, catalin.marinas,
	gregkh, viresh.kumar, mark.rutland, linux-kernel, linux-pm,
	jhugo, Jonathan.Zhang, ahs3, Jayachandran.Nair, austinwc, lenb,
	vkilari, Juri Lelli

On Sat, Feb 24, 2018 at 11:05:53AM +0800, Xiongfeng Wang wrote:
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>> [    4.017371] BUG: arch topology borken
> >>> [    4.021069] BUG: arch topology borken
> >>> [    4.024764] BUG: arch topology borken
> >>> [    4.028460] BUG: arch topology borken
> >>> [    4.032153] BUG: arch topology borken
> >>> [    4.035849] BUG: arch topology borken
> >>> [    4.039543] BUG: arch topology borken
> >>> [    4.043239] BUG: arch topology borken
> >>> [    4.046932] BUG: arch topology borken
> >>> [    4.050629] BUG: arch topology borken
> >>> [    4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> >>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current spec
> >> doesn't explicitly ban having the flag set multiple times between the leaf
> >> and the root I consider it a "bug" and there is an effort to clarify the
> >> spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> >>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> >>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain is
> >> detected/located within the PPTT topology (a node with a 1:1 correspondence
> >> isn't even required). As you can see from this patch set, we are making the
> >> general assumption that the proximity domains are at the same level as the
> >> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >> don't align with the physical socket.
> >>
> >> There are efforts underway to clarify and expand upon the specification to
> >> deal with this general problem. The simple solution is another flag (say
> >> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> >> find nodes with 1:1 correspondence. At that point we could add a fairly
> >> trivial patch to correct just the scheduler topology without affecting the
> >> rest of the system topology code.
> > 
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is detected
> > (instead of using the default_topology[]) and we create our own ARM64
> > domain hierarchy (with DIE level removed) through set_sched_topology()
> > accordingly ?
> > 
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> > 
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
> 
> I think we really need another PPTT flag, from which we can get information
> about how to build a multi-core sched_domain. I think only cache-sharing information
> is not enough to get information about how to build a multi-core shced_domain.
> 
> How about this? We assume the upper layer of the lowest layer to be multi-core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to the change.

I'm not sure what you mean by upper layers of the lowest layer.

As I see it for non-numa-in-package system, the PPTT physical package
flag should define the MC domains, any levels above should be
represented in the DIE level, any level below should be ignored, except
the lowest level if we have SMT. If have SMT the lowest level in PPTT
should define the SMT domains.

For numa-in-package, the MC domains should be shrunk to match the NUMA
nodes and DIE is ignored.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology
@ 2018-03-01 14:19             ` Morten Rasmussen
  0 siblings, 0 replies; 104+ messages in thread
From: Morten Rasmussen @ 2018-03-01 14:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Feb 24, 2018 at 11:05:53AM +0800, Xiongfeng Wang wrote:
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>> [    4.017371] BUG: arch topology borken
> >>> [    4.021069] BUG: arch topology borken
> >>> [    4.024764] BUG: arch topology borken
> >>> [    4.028460] BUG: arch topology borken
> >>> [    4.032153] BUG: arch topology borken
> >>> [    4.035849] BUG: arch topology borken
> >>> [    4.039543] BUG: arch topology borken
> >>> [    4.043239] BUG: arch topology borken
> >>> [    4.046932] BUG: arch topology borken
> >>> [    4.050629] BUG: arch topology borken
> >>> [    4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT physical_package_flag on a physical package node and
> >>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board is core->cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current spec
> >> doesn't explicitly ban having the flag set multiple times between the leaf
> >> and the root I consider it a "bug" and there is an effort to clarify the
> >> spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core sched_domain contains all the cores within a package,
> >>> and the lowest NUMA sched_domain contains all the cores within a die. But the kernel requires that the multi-core
> >>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain is
> >> detected/located within the PPTT topology (a node with a 1:1 correspondence
> >> isn't even required). As you can see from this patch set, we are making the
> >> general assumption that the proximity domains are at the same level as the
> >> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >> don't align with the physical socket.
> >>
> >> There are efforts underway to clarify and expand upon the specification to
> >> deal with this general problem. The simple solution is another flag (say
> >> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> >> find nodes with 1:1 correspondence. At that point we could add a fairly
> >> trivial patch to correct just the scheduler topology without affecting the
> >> rest of the system topology code.
> > 
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is detected
> > (instead of using the default_topology[]) and we create our own ARM64
> > domain hierarchy (with DIE level removed) through set_sched_topology()
> > accordingly ?
> > 
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> > 
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
> 
> I think we really need another PPTT flag, from which we can get information
> about how to build a multi-core sched_domain. I think only cache-sharing information
> is not enough to get information about how to build a multi-core shced_domain.
> 
> How about this? We assume the upper layer of the lowest layer to be multi-core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to the change.

I'm not sure what you mean by upper layers of the lowest layer.

As I see it for non-numa-in-package system, the PPTT physical package
flag should define the MC domains, any levels above should be
represented in the DIE level, any level below should be ignored, except
the lowest level if we have SMT. If have SMT the lowest level in PPTT
should define the SMT domains.

For numa-in-package, the MC domains should be shrunk to match the NUMA
nodes and DIE is ignored.

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2018-03-01 14:19 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-13  0:59 [PATCH v6 00/12] Support PPTT for ARM64 Jeremy Linton
2018-01-13  0:59 ` Jeremy Linton
2018-01-13  0:59 ` [PATCH v6 01/12] drivers: base: cacheinfo: move cache_setup_of_node() Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 12:23   ` Sudeep Holla
2018-01-15 12:23     ` Sudeep Holla
2018-01-13  0:59 ` [PATCH v6 02/12] drivers: base: cacheinfo: setup DT cache properties early Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 12:33   ` Sudeep Holla
2018-01-15 12:33     ` Sudeep Holla
2018-01-15 16:07     ` Palmer Dabbelt
2018-01-15 16:07       ` Palmer Dabbelt
2018-01-15 16:07       ` Palmer Dabbelt
2018-01-16 21:26       ` Jeremy Linton
2018-01-17 18:08       ` Sudeep Holla
2018-01-17 18:08         ` Sudeep Holla
2018-01-18 17:36         ` Palmer Dabbelt
2018-01-18 17:36           ` Palmer Dabbelt
2018-01-18 17:36           ` Palmer Dabbelt
2018-01-16 21:07     ` Jeremy Linton
2018-01-17 18:20       ` Sudeep Holla
2018-01-17 18:20         ` Sudeep Holla
2018-01-17 18:51         ` Jeremy Linton
2018-01-17 18:51           ` Jeremy Linton
2018-01-18 10:14           ` Sudeep Holla
2018-01-18 10:14             ` Sudeep Holla
2018-01-19 23:27             ` Jeremy Linton
2018-01-19 23:27               ` Jeremy Linton
2018-01-13  0:59 ` [PATCH v6 03/12] cacheinfo: rename of_node to fw_unique Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 12:36   ` Sudeep Holla
2018-01-15 12:36     ` Sudeep Holla
2018-01-15 12:36     ` Sudeep Holla
2018-01-13  0:59 ` [PATCH v6 04/12] arm64/acpi: Create arch specific cpu to acpi id helper Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 13:46   ` Sudeep Holla
2018-01-15 13:46     ` Sudeep Holla
2018-01-13  0:59 ` [PATCH v6 05/12] ACPI/PPTT: Add Processor Properties Topology Table parsing Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 14:58   ` Sudeep Holla
2018-01-15 14:58     ` Sudeep Holla
2018-01-16 20:55     ` Jeremy Linton
2018-01-17 17:58       ` Sudeep Holla
2018-01-17 17:58         ` Sudeep Holla
2018-01-15 15:48   ` Sudeep Holla
2018-01-15 15:48     ` Sudeep Holla
2018-01-16 20:22     ` Jeremy Linton
2018-01-17 18:00       ` Sudeep Holla
2018-01-17 18:00         ` Sudeep Holla
2018-01-13  0:59 ` [PATCH v6 06/12] ACPI: Enable PPTT support on ARM64 Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 13:52   ` Sudeep Holla
2018-01-15 13:52     ` Sudeep Holla
2018-01-13  0:59 ` [PATCH v6 07/12] drivers: base cacheinfo: Add support for ACPI based firmware tables Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 15:06   ` Sudeep Holla
2018-01-15 15:06     ` Sudeep Holla
2018-01-22 15:50   ` Greg KH
2018-01-22 15:50     ` Greg KH
2018-01-22 21:14     ` Jeremy Linton
2018-01-22 21:14       ` Jeremy Linton
2018-01-23  0:11       ` Rafael J. Wysocki
2018-01-23  0:11         ` Rafael J. Wysocki
2018-01-23  0:11         ` Rafael J. Wysocki
2018-01-13  0:59 ` [PATCH v6 08/12] arm64: " Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-15 13:54   ` Sudeep Holla
2018-01-15 13:54     ` Sudeep Holla
2018-01-13  0:59 ` [PATCH v6 09/12] ACPI/PPTT: Add topology parsing code Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-13  0:59 ` [PATCH v6 10/12] arm64: topology: rename cluster_id Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-13  0:59 ` [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton
2018-01-25 12:15   ` Xiongfeng Wang
2018-01-25 12:15     ` Xiongfeng Wang
2018-01-25 12:15     ` Xiongfeng Wang
2018-01-25 15:56     ` Jeremy Linton
2018-01-25 15:56       ` Jeremy Linton
2018-01-26  4:21       ` Xiongfeng Wang
2018-01-26  4:21         ` Xiongfeng Wang
2018-01-26  4:21         ` Xiongfeng Wang
2018-02-23 11:02       ` Lorenzo Pieralisi
2018-02-23 11:02         ` Lorenzo Pieralisi
2018-02-23 11:02         ` Lorenzo Pieralisi
2018-02-24  3:05         ` Xiongfeng Wang
2018-02-24  3:05           ` Xiongfeng Wang
2018-02-24  3:05           ` Xiongfeng Wang
2018-02-25  6:17           ` vkilari
2018-02-25  6:17             ` vkilari at codeaurora.org
2018-02-25  6:17             ` vkilari
2018-03-01 14:19           ` Morten Rasmussen
2018-03-01 14:19             ` Morten Rasmussen
2018-02-24  4:37         ` Jeremy Linton
2018-02-24  4:37           ` Jeremy Linton
2018-02-24  4:37           ` Jeremy Linton
2018-03-01 11:51           ` Morten Rasmussen
2018-03-01 11:51             ` Morten Rasmussen
2018-01-13  0:59 ` [PATCH v6 12/12] ACPI: Add PPTT to injectable table list Jeremy Linton
2018-01-13  0:59   ` Jeremy Linton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.