* [PATCH v4 1/4] dt-bindings: EDAC: Add Amazon's Annapurna Labs L1 EDAC
2019-08-01 13:09 [PATCH v4 0/4] Add support for Amazon's Annapurna Labs EDAC for L1/L2 Hanna Hawa
@ 2019-08-01 13:09 ` Hanna Hawa
2019-08-01 13:09 ` [PATCH v4 2/4] edac: Add support for " Hanna Hawa
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Hanna Hawa @ 2019-08-01 13:09 UTC (permalink / raw)
To: robh+dt, mark.rutland, bp, mchehab, james.morse, davem, gregkh,
linus.walleij, Jonathan.Cameron, nicolas.ferre, paulmck
Cc: dwmw, benh, ronenk, talel, jonnyc, hanochu, devicetree,
linux-kernel, linux-edac, hhhawa
Document Amazon's Annapurna Labs L1 EDAC SoC binding.
Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
---
.../devicetree/bindings/edac/amazon,al-l1-edac.txt | 14 ++++++++++++++
1 file changed, 14 insertions(+)
create mode 100644 Documentation/devicetree/bindings/edac/amazon,al-l1-edac.txt
diff --git a/Documentation/devicetree/bindings/edac/amazon,al-l1-edac.txt b/Documentation/devicetree/bindings/edac/amazon,al-l1-edac.txt
new file mode 100644
index 000000000000..2ae8370216bc
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/amazon,al-l1-edac.txt
@@ -0,0 +1,14 @@
+* Amazon's Annapurna Labs L1 EDAC
+
+Amazon's Annapurna Labs SoCs supports L1 single bit correction and
+two bits detection capability based on ARM implementation.
+
+Required properties:
+- compatible:
+ should be "amazon,al-l1-edac".
+
+Example:
+
+ al_l1_edac {
+ compatible = "amazon,al-l1-edac";
+ };
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/4] edac: Add support for Amazon's Annapurna Labs L1 EDAC
2019-08-01 13:09 [PATCH v4 0/4] Add support for Amazon's Annapurna Labs EDAC for L1/L2 Hanna Hawa
2019-08-01 13:09 ` [PATCH v4 1/4] dt-bindings: EDAC: Add Amazon's Annapurna Labs L1 EDAC Hanna Hawa
@ 2019-08-01 13:09 ` Hanna Hawa
2019-08-01 13:09 ` [PATCH v4 3/4] dt-bindings: EDAC: Add Amazon's Annapurna Labs L2 EDAC Hanna Hawa
2019-08-01 13:09 ` [PATCH v4 4/4] edac: Add support for " Hanna Hawa
3 siblings, 0 replies; 7+ messages in thread
From: Hanna Hawa @ 2019-08-01 13:09 UTC (permalink / raw)
To: robh+dt, mark.rutland, bp, mchehab, james.morse, davem, gregkh,
linus.walleij, Jonathan.Cameron, nicolas.ferre, paulmck
Cc: dwmw, benh, ronenk, talel, jonnyc, hanochu, devicetree,
linux-kernel, linux-edac, hhhawa
Adds support for Amazon's Annapurna Labs L1 EDAC driver to detect and
report L1 errors.
Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
Reviewed-by: James Morse <james.morse@arm.com>
---
MAINTAINERS | 6 ++
drivers/edac/Kconfig | 8 ++
drivers/edac/Makefile | 1 +
drivers/edac/al_l1_edac.c | 158 ++++++++++++++++++++++++++++++++++++++
4 files changed, 173 insertions(+)
create mode 100644 drivers/edac/al_l1_edac.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 77eae44bf5de..fd29ea62ba29 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -743,6 +743,12 @@ F: drivers/tty/serial/altera_jtaguart.c
F: include/linux/altera_uart.h
F: include/linux/altera_jtaguart.h
+AMAZON ANNAPURNA LABS L1 EDAC
+M: Hanna Hawa <hhhawa@amazon.com>
+S: Maintained
+F: drivers/edac/al_l1_edac.c
+F: Documentation/devicetree/bindings/edac/amazon,al-l1-edac.txt
+
AMAZON ANNAPURNA LABS THERMAL MMIO DRIVER
M: Talel Shenhar <talel@amazon.com>
S: Maintained
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 200c04ce5b0e..58b92bcb39ce 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -74,6 +74,14 @@ config EDAC_GHES
In doubt, say 'Y'.
+config EDAC_AL_L1
+ bool "Amazon's Annapurna Labs L1 EDAC"
+ depends on ARCH_ALPINE
+ help
+ Support for L1 error detection and correction
+ for Amazon's Annapurna Labs SoCs.
+ This driver detects errors of L1 caches.
+
config EDAC_AMD64
tristate "AMD64 (Opteron, Athlon64)"
depends on AMD_NB && EDAC_DECODE_MCE
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 165ca65e1a3a..caa2dc91e8a0 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_EDAC_GHES) += ghes_edac.o
edac_mce_amd-y := mce_amd.o
obj-$(CONFIG_EDAC_DECODE_MCE) += edac_mce_amd.o
+obj-$(CONFIG_EDAC_AL_L1) += al_l1_edac.o
obj-$(CONFIG_EDAC_AMD76X) += amd76x_edac.o
obj-$(CONFIG_EDAC_CPC925) += cpc925_edac.o
obj-$(CONFIG_EDAC_I5000) += i5000_edac.o
diff --git a/drivers/edac/al_l1_edac.c b/drivers/edac/al_l1_edac.c
new file mode 100644
index 000000000000..9404a2fcaa58
--- /dev/null
+++ b/drivers/edac/al_l1_edac.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#include <asm/sysreg.h>
+#include <linux/bitfield.h>
+#include <linux/smp.h>
+
+#include "edac_device.h"
+#include "edac_module.h"
+
+#define DRV_NAME "al_l1_edac"
+
+/* Same bit assignments of CPUMERRSR_EL1 in ARM CA57/CA72 */
+#define ARM_CA57_CPUMERRSR_EL1 sys_reg(3, 1, 15, 2, 2)
+#define ARM_CA57_CPUMERRSR_RAM_ID GENMASK(30, 24)
+#define ARM_CA57_L1_I_TAG_RAM 0x00
+#define ARM_CA57_L1_I_DATA_RAM 0x01
+#define ARM_CA57_L1_D_TAG_RAM 0x08
+#define ARM_CA57_L1_D_DATA_RAM 0x09
+#define ARM_CA57_L2_TLB_RAM 0x18
+#define ARM_CA57_CPUMERRSR_VALID BIT(31)
+#define ARM_CA57_CPUMERRSR_REPEAT GENMASK_ULL(39, 32)
+#define ARM_CA57_CPUMERRSR_OTHER GENMASK_ULL(47, 40)
+#define ARM_CA57_CPUMERRSR_FATAL BIT_ULL(63)
+
+#define AL_L1_EDAC_MSG_MAX 256
+
+static void al_l1_edac_cpumerrsr(void *arg)
+{
+ struct edac_device_ctl_info *edac_dev = arg;
+ int cpu, i;
+ u32 ramid, repeat, other, fatal;
+ u64 val = read_sysreg_s(ARM_CA57_CPUMERRSR_EL1);
+ char msg[AL_L1_EDAC_MSG_MAX];
+ int space, count;
+ char *p;
+
+ if (!(FIELD_GET(ARM_CA57_CPUMERRSR_VALID, val)))
+ return;
+
+ write_sysreg_s(0, ARM_CA57_CPUMERRSR_EL1);
+
+ cpu = smp_processor_id();
+ ramid = FIELD_GET(ARM_CA57_CPUMERRSR_RAM_ID, val);
+ repeat = FIELD_GET(ARM_CA57_CPUMERRSR_REPEAT, val);
+ other = FIELD_GET(ARM_CA57_CPUMERRSR_OTHER, val);
+ fatal = FIELD_GET(ARM_CA57_CPUMERRSR_FATAL, val);
+
+ space = sizeof(msg);
+ p = msg;
+ count = scnprintf(p, space, "CPU%d L1 %serror detected", cpu,
+ (fatal) ? "Fatal " : "");
+ p += count;
+ space -= count;
+
+ switch (ramid) {
+ case ARM_CA57_L1_I_TAG_RAM:
+ count = scnprintf(p, space, " RAMID='L1-I Tag RAM'");
+ break;
+ case ARM_CA57_L1_I_DATA_RAM:
+ count = scnprintf(p, space, " RAMID='L1-I Data RAM'");
+ break;
+ case ARM_CA57_L1_D_TAG_RAM:
+ count = scnprintf(p, space, " RAMID='L1-D Tag RAM'");
+ break;
+ case ARM_CA57_L1_D_DATA_RAM:
+ count = scnprintf(p, space, " RAMID='L1-D Data RAM'");
+ break;
+ case ARM_CA57_L2_TLB_RAM:
+ count = scnprintf(p, space, " RAMID='L2 TLB RAM'");
+ break;
+ default:
+ count = scnprintf(p, space, " RAMID='unknown'");
+ break;
+ }
+
+ p += count;
+ space -= count;
+ count = scnprintf(p, space,
+ " repeat=%d, other=%d (CPUMERRSR_EL1=0x%llx)",
+ repeat, other, val);
+
+ for (i = 0; i < repeat; i++) {
+ if (fatal)
+ edac_device_handle_ue(edac_dev, 0, 0, msg);
+ else
+ edac_device_handle_ce(edac_dev, 0, 0, msg);
+ }
+}
+
+static void al_l1_edac_check(struct edac_device_ctl_info *edac_dev)
+{
+ on_each_cpu(al_l1_edac_cpumerrsr, edac_dev, 1);
+}
+
+static int al_l1_edac_probe(struct platform_device *pdev)
+{
+ struct edac_device_ctl_info *edac_dev;
+ struct device *dev = &pdev->dev;
+ int ret;
+
+ edac_dev = edac_device_alloc_ctl_info(0, (char *)dev_name(dev), 1, "L",
+ 1, 1, NULL, 0,
+ edac_device_alloc_index());
+ if (IS_ERR_OR_NULL(edac_dev))
+ return -ENOMEM;
+
+ edac_dev->edac_check = al_l1_edac_check;
+ edac_dev->dev = dev;
+ edac_dev->mod_name = DRV_NAME;
+ edac_dev->dev_name = dev_name(dev);
+ edac_dev->ctl_name = "L1 cache";
+ platform_set_drvdata(pdev, edac_dev);
+
+ ret = edac_device_add_device(edac_dev);
+ if (ret) {
+ dev_err(dev, "Failed to add L1 edac device\n");
+ goto err;
+ }
+
+ return 0;
+err:
+ edac_device_free_ctl_info(edac_dev);
+
+ return ret;
+}
+
+static int al_l1_edac_remove(struct platform_device *pdev)
+{
+ struct edac_device_ctl_info *edac_dev = platform_get_drvdata(pdev);
+
+ edac_device_del_device(edac_dev->dev);
+ edac_device_free_ctl_info(edac_dev);
+
+ return 0;
+}
+
+static const struct of_device_id al_l1_edac_of_match[] = {
+ { .compatible = "amazon,al-l1-edac" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, al_l1_edac_of_match);
+
+static struct platform_driver al_l1_edac_driver = {
+ .probe = al_l1_edac_probe,
+ .remove = al_l1_edac_remove,
+ .driver = {
+ .name = DRV_NAME,
+ .of_match_table = al_l1_edac_of_match,
+ },
+};
+module_platform_driver(al_l1_edac_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Hanna Hawa <hhhawa@amazon.com>");
+MODULE_DESCRIPTION("Amazon's Annapurna Lab's L1 EDAC Driver");
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 3/4] dt-bindings: EDAC: Add Amazon's Annapurna Labs L2 EDAC
2019-08-01 13:09 [PATCH v4 0/4] Add support for Amazon's Annapurna Labs EDAC for L1/L2 Hanna Hawa
2019-08-01 13:09 ` [PATCH v4 1/4] dt-bindings: EDAC: Add Amazon's Annapurna Labs L1 EDAC Hanna Hawa
2019-08-01 13:09 ` [PATCH v4 2/4] edac: Add support for " Hanna Hawa
@ 2019-08-01 13:09 ` Hanna Hawa
2019-08-01 13:09 ` [PATCH v4 4/4] edac: Add support for " Hanna Hawa
3 siblings, 0 replies; 7+ messages in thread
From: Hanna Hawa @ 2019-08-01 13:09 UTC (permalink / raw)
To: robh+dt, mark.rutland, bp, mchehab, james.morse, davem, gregkh,
linus.walleij, Jonathan.Cameron, nicolas.ferre, paulmck
Cc: dwmw, benh, ronenk, talel, jonnyc, hanochu, devicetree,
linux-kernel, linux-edac, hhhawa
Document Amazon's Annapurna Labs L2 EDAC SoC binding.
Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
---
.../bindings/edac/amazon,al-l2-edac.txt | 20 +++++++++++++++++++
1 file changed, 20 insertions(+)
create mode 100644 Documentation/devicetree/bindings/edac/amazon,al-l2-edac.txt
diff --git a/Documentation/devicetree/bindings/edac/amazon,al-l2-edac.txt b/Documentation/devicetree/bindings/edac/amazon,al-l2-edac.txt
new file mode 100644
index 000000000000..7b0b7347b711
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/amazon,al-l2-edac.txt
@@ -0,0 +1,20 @@
+* Amazon's Annapurna Labs L2 EDAC
+
+Amazon's Annapurna Labs SoCs supports L2 single bit correction and
+two bits detection capability based on ARM implementation.
+
+Required properties:
+- compatible:
+ should be "amazon,al-l2-edac".
+- l2-cache:
+ Phandle to L2 cache handler.
+ This property is used to compare with the CPU node property
+ 'next-level-cache' to create cpu-mask with all CPUs that
+ share same L2 cache.
+
+Example:
+
+ al_l2_edac {
+ compatible = "amazon,al-l2-edac";
+ l2-cache = <&cluster0_l2>;
+ };
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 4/4] edac: Add support for Amazon's Annapurna Labs L2 EDAC
2019-08-01 13:09 [PATCH v4 0/4] Add support for Amazon's Annapurna Labs EDAC for L1/L2 Hanna Hawa
` (2 preceding siblings ...)
2019-08-01 13:09 ` [PATCH v4 3/4] dt-bindings: EDAC: Add Amazon's Annapurna Labs L2 EDAC Hanna Hawa
@ 2019-08-01 13:09 ` Hanna Hawa
2019-08-02 15:11 ` James Morse
3 siblings, 1 reply; 7+ messages in thread
From: Hanna Hawa @ 2019-08-01 13:09 UTC (permalink / raw)
To: robh+dt, mark.rutland, bp, mchehab, james.morse, davem, gregkh,
linus.walleij, Jonathan.Cameron, nicolas.ferre, paulmck
Cc: dwmw, benh, ronenk, talel, jonnyc, hanochu, devicetree,
linux-kernel, linux-edac, hhhawa
Adds support for Amazon's Annapurna Labs L2 EDAC driver to detect and
report L2 errors.
Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
---
MAINTAINERS | 6 ++
drivers/edac/Kconfig | 8 ++
drivers/edac/Makefile | 1 +
drivers/edac/al_l2_edac.c | 189 ++++++++++++++++++++++++++++++++++++++
4 files changed, 204 insertions(+)
create mode 100644 drivers/edac/al_l2_edac.c
diff --git a/MAINTAINERS b/MAINTAINERS
index fd29ea62ba29..a6dcf3d8e12a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -749,6 +749,12 @@ S: Maintained
F: drivers/edac/al_l1_edac.c
F: Documentation/devicetree/bindings/edac/amazon,al-l1-edac.txt
+AMAZON ANNAPURNA LABS L2 EDAC
+M: Hanna Hawa <hhhawa@amazon.com>
+S: Maintained
+F: drivers/edac/al_l2_edac.c
+F: Documentation/devicetree/bindings/edac/amazon,al-l2-edac.txt
+
AMAZON ANNAPURNA LABS THERMAL MMIO DRIVER
M: Talel Shenhar <talel@amazon.com>
S: Maintained
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 58b92bcb39ce..8bbb745b84ed 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -82,6 +82,14 @@ config EDAC_AL_L1
for Amazon's Annapurna Labs SoCs.
This driver detects errors of L1 caches.
+config EDAC_AL_L2
+ bool "Amazon's Annapurna Labs L2 EDAC"
+ depends on ARCH_ALPINE
+ help
+ Support for L2 error detection and correction
+ for Amazon's Annapurna Labs SoCs.
+ This driver detects errors of L2 caches.
+
config EDAC_AMD64
tristate "AMD64 (Opteron, Athlon64)"
depends on AMD_NB && EDAC_DECODE_MCE
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index caa2dc91e8a0..60a6b8bbe2f8 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -23,6 +23,7 @@ edac_mce_amd-y := mce_amd.o
obj-$(CONFIG_EDAC_DECODE_MCE) += edac_mce_amd.o
obj-$(CONFIG_EDAC_AL_L1) += al_l1_edac.o
+obj-$(CONFIG_EDAC_AL_L2) += al_l2_edac.o
obj-$(CONFIG_EDAC_AMD76X) += amd76x_edac.o
obj-$(CONFIG_EDAC_CPC925) += cpc925_edac.o
obj-$(CONFIG_EDAC_I5000) += i5000_edac.o
diff --git a/drivers/edac/al_l2_edac.c b/drivers/edac/al_l2_edac.c
new file mode 100644
index 000000000000..6c6d37cf82ab
--- /dev/null
+++ b/drivers/edac/al_l2_edac.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#include <asm/sysreg.h>
+#include <linux/bitfield.h>
+#include <linux/of.h>
+#include <linux/smp.h>
+
+#include "edac_device.h"
+#include "edac_module.h"
+
+#define DRV_NAME "al_l2_edac"
+
+/* Same bit assignments of L2MERRSR_EL1 in ARM CA57/CA72 */
+#define ARM_CA57_L2MERRSR_EL1 sys_reg(3, 1, 15, 2, 3)
+#define ARM_CA57_L2MERRSR_RAMID GENMASK(30, 24)
+#define ARM_CA57_L2_TAG_RAM 0x10
+#define ARM_CA57_L2_DATA_RAM 0x11
+#define ARM_CA57_L2_SNOOP_RAM 0x12
+#define ARM_CA57_L2_DIRTY_RAM 0x14
+#define ARM_CA57_L2_INC_PF_RAM 0x18
+#define ARM_CA57_L2MERRSR_VALID BIT(31)
+#define ARM_CA57_L2MERRSR_REPEAT GENMASK_ULL(39, 32)
+#define ARM_CA57_L2MERRSR_OTHER GENMASK_ULL(47, 40)
+#define ARM_CA57_L2MERRSR_FATAL BIT_ULL(63)
+
+#define AL_L2_EDAC_MSG_MAX 256
+
+struct al_l2_edac {
+ cpumask_t cluster_cpus;
+};
+
+static void al_l2_edac_l2merrsr(void *arg)
+{
+ struct edac_device_ctl_info *edac_dev = arg;
+ int cpu, i;
+ u32 ramid, repeat, other, fatal;
+ u64 val = read_sysreg_s(ARM_CA57_L2MERRSR_EL1);
+ char msg[AL_L2_EDAC_MSG_MAX];
+ int space, count;
+ char *p;
+
+ if (!(FIELD_GET(ARM_CA57_L2MERRSR_VALID, val)))
+ return;
+
+ write_sysreg_s(0, ARM_CA57_L2MERRSR_EL1);
+
+ cpu = smp_processor_id();
+ ramid = FIELD_GET(ARM_CA57_L2MERRSR_RAMID, val);
+ repeat = FIELD_GET(ARM_CA57_L2MERRSR_REPEAT, val);
+ other = FIELD_GET(ARM_CA57_L2MERRSR_OTHER, val);
+ fatal = FIELD_GET(ARM_CA57_L2MERRSR_FATAL, val);
+
+ space = sizeof(msg);
+ p = msg;
+ count = scnprintf(p, space, "CPU%d L2 %serror detected", cpu,
+ (fatal) ? "Fatal " : "");
+ p += count;
+ space -= count;
+
+ switch (ramid) {
+ case ARM_CA57_L2_TAG_RAM:
+ count = scnprintf(p, space, " RAMID='L2 Tag RAM'");
+ break;
+ case ARM_CA57_L2_DATA_RAM:
+ count = scnprintf(p, space, " RAMID='L2 Data RAM'");
+ break;
+ case ARM_CA57_L2_SNOOP_RAM:
+ count = scnprintf(p, space, " RAMID='L2 Snoop RAM'");
+ break;
+ case ARM_CA57_L2_DIRTY_RAM:
+ count = scnprintf(p, space, " RAMID='L2 Dirty RAM'");
+ break;
+ case ARM_CA57_L2_INC_PF_RAM:
+ count = scnprintf(p, space, " RAMID='L2 internal metadat'");
+ break;
+ default:
+ count = scnprintf(p, space, " RAMID='unknown'");
+ break;
+ }
+
+ p += count;
+ space -= count;
+
+ count = scnprintf(p, space,
+ " repeat=%d, other=%d (L2MERRSR_EL1=0x%llx)",
+ repeat, other, val);
+
+ for (i = 0; i < repeat; i++) {
+ if (fatal)
+ edac_device_handle_ue(edac_dev, 0, 0, msg);
+ else
+ edac_device_handle_ce(edac_dev, 0, 0, msg);
+ }
+}
+
+static void al_l2_edac_check(struct edac_device_ctl_info *edac_dev)
+{
+ struct al_l2_edac *al_l2 = edac_dev->pvt_info;
+
+ smp_call_function_any(&al_l2->cluster_cpus, al_l2_edac_l2merrsr,
+ edac_dev, 1);
+}
+
+static int al_l2_edac_probe(struct platform_device *pdev)
+{
+ struct edac_device_ctl_info *edac_dev;
+ struct al_l2_edac *al_l2;
+ struct device *dev = &pdev->dev;
+ int ret, i;
+
+ edac_dev = edac_device_alloc_ctl_info(sizeof(*al_l2),
+ (char *)dev_name(dev), 1, "L", 1,
+ 2, NULL, 0,
+ edac_device_alloc_index());
+ if (IS_ERR_OR_NULL(edac_dev))
+ return -ENOMEM;
+
+ al_l2 = edac_dev->pvt_info;
+ edac_dev->edac_check = al_l2_edac_check;
+ edac_dev->dev = dev;
+ edac_dev->mod_name = DRV_NAME;
+ edac_dev->dev_name = dev_name(dev);
+ edac_dev->ctl_name = "L2 cache";
+ platform_set_drvdata(pdev, edac_dev);
+
+ for_each_online_cpu(i) {
+ struct device_node *cpu;
+ struct device_node *cpu_cache, *l2_cache;
+
+ cpu = of_get_cpu_node(i, NULL);
+ cpu_cache = of_find_next_cache_node(cpu);
+ l2_cache = of_parse_phandle(dev->of_node, "l2-cache", 0);
+
+ if (cpu_cache == l2_cache)
+ cpumask_set_cpu(i, &al_l2->cluster_cpus);
+ }
+
+ if (cpumask_empty(&al_l2->cluster_cpus)) {
+ dev_err(dev, "CPU mask is empty for this L2 cache\n");
+ ret = -EINVAL;
+ goto err;
+ }
+
+ ret = edac_device_add_device(edac_dev);
+ if (ret) {
+ dev_err(dev, "Failed to add L2 edac device\n");
+ goto err;
+ }
+
+ return 0;
+
+err:
+ edac_device_free_ctl_info(edac_dev);
+
+ return ret;
+}
+
+static int al_l2_edac_remove(struct platform_device *pdev)
+{
+ struct edac_device_ctl_info *edac_dev = platform_get_drvdata(pdev);
+
+ edac_device_del_device(edac_dev->dev);
+ edac_device_free_ctl_info(edac_dev);
+
+ return 0;
+}
+
+static const struct of_device_id al_l2_edac_of_match[] = {
+ { .compatible = "amazon,al-l2-edac" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, al_l2_edac_of_match);
+
+static struct platform_driver al_l2_edac_driver = {
+ .probe = al_l2_edac_probe,
+ .remove = al_l2_edac_remove,
+ .driver = {
+ .name = DRV_NAME,
+ .of_match_table = al_l2_edac_of_match,
+ },
+};
+module_platform_driver(al_l2_edac_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Hanna Hawa <hhhawa@amazon.com>");
+MODULE_DESCRIPTION("Amazon's Annapurna Lab's L2 EDAC Driver");
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 4/4] edac: Add support for Amazon's Annapurna Labs L2 EDAC
2019-08-01 13:09 ` [PATCH v4 4/4] edac: Add support for " Hanna Hawa
@ 2019-08-02 15:11 ` James Morse
2019-08-05 13:14 ` Hawa, Hanna
0 siblings, 1 reply; 7+ messages in thread
From: James Morse @ 2019-08-02 15:11 UTC (permalink / raw)
To: Hanna Hawa
Cc: robh+dt, mark.rutland, bp, mchehab, davem, gregkh, linus.walleij,
Jonathan.Cameron, nicolas.ferre, paulmck, dwmw, benh, ronenk,
talel, jonnyc, hanochu, devicetree, linux-kernel, linux-edac
Hi Hanna,
On 01/08/2019 14:09, Hanna Hawa wrote:
> Adds support for Amazon's Annapurna Labs L2 EDAC driver to detect and
> report L2 errors.
> diff --git a/drivers/edac/al_l2_edac.c b/drivers/edac/al_l2_edac.c
> new file mode 100644
> index 000000000000..6c6d37cf82ab
> --- /dev/null
> +++ b/drivers/edac/al_l2_edac.c
> @@ -0,0 +1,189 @@
> +#include <asm/sysreg.h>
> +#include <linux/bitfield.h>
#include <linux/cpumask.h> ?
> +#include <linux/of.h>
> +#include <linux/smp.h>
[...]
> +static void al_l2_edac_l2merrsr(void *arg)
> +{
> + struct edac_device_ctl_info *edac_dev = arg;
> + int cpu, i;
> + u32 ramid, repeat, other, fatal;
> + u64 val = read_sysreg_s(ARM_CA57_L2MERRSR_EL1);
> + char msg[AL_L2_EDAC_MSG_MAX];
> + int space, count;
> + char *p;
> +
> + if (!(FIELD_GET(ARM_CA57_L2MERRSR_VALID, val)))
> + return;
> +
> + write_sysreg_s(0, ARM_CA57_L2MERRSR_EL1);
> +
> + cpu = smp_processor_id();
> + ramid = FIELD_GET(ARM_CA57_L2MERRSR_RAMID, val);
> + repeat = FIELD_GET(ARM_CA57_L2MERRSR_REPEAT, val);
> + other = FIELD_GET(ARM_CA57_L2MERRSR_OTHER, val);
> + fatal = FIELD_GET(ARM_CA57_L2MERRSR_FATAL, val);
> +
> + space = sizeof(msg);
> + p = msg;
> + count = scnprintf(p, space, "CPU%d L2 %serror detected", cpu,
> + (fatal) ? "Fatal " : "");
> + p += count;
> + space -= count;
> +
> + switch (ramid) {
> + case ARM_CA57_L2_TAG_RAM:
> + count = scnprintf(p, space, " RAMID='L2 Tag RAM'");
> + break;
> + case ARM_CA57_L2_DATA_RAM:
> + count = scnprintf(p, space, " RAMID='L2 Data RAM'");
> + break;
> + case ARM_CA57_L2_SNOOP_RAM:
> + count = scnprintf(p, space, " RAMID='L2 Snoop RAM'");
Nit: The TRMs both call this 'L2 Snoop Tag RAM'. Could we include 'tag' in the
description. 'tag' implies its some kind of metadata, so an uncorrected error here affect
a now unknown location, its more series than a 'data RAM' error. v8.2 would term this kind
of error 'uncontained'.
> + break;
> + case ARM_CA57_L2_DIRTY_RAM:
> + count = scnprintf(p, space, " RAMID='L2 Dirty RAM'");
> + break;
> + case ARM_CA57_L2_INC_PF_RAM:
> + count = scnprintf(p, space, " RAMID='L2 internal metadat'");
Nit: metadata
> + break;
> + default:
> + count = scnprintf(p, space, " RAMID='unknown'");
> + break;
> + }
> +
> + p += count;
> + space -= count;
> +
> + count = scnprintf(p, space,
> + " repeat=%d, other=%d (L2MERRSR_EL1=0x%llx)",
> + repeat, other, val);
> +
> + for (i = 0; i < repeat; i++) {
> + if (fatal)
> + edac_device_handle_ue(edac_dev, 0, 0, msg);
> + else
> + edac_device_handle_ce(edac_dev, 0, 0, msg);
> + }
> +}
[...]
> +static int al_l2_edac_probe(struct platform_device *pdev)
> +{
> + struct edac_device_ctl_info *edac_dev;
> + struct al_l2_edac *al_l2;
> + struct device *dev = &pdev->dev;
> + int ret, i;
> +
> + edac_dev = edac_device_alloc_ctl_info(sizeof(*al_l2),
> + (char *)dev_name(dev), 1, "L", 1,
> + 2, NULL, 0,
> + edac_device_alloc_index());
> + if (IS_ERR_OR_NULL(edac_dev))
> + return -ENOMEM;
> +
> + al_l2 = edac_dev->pvt_info;
> + edac_dev->edac_check = al_l2_edac_check;
> + edac_dev->dev = dev;
> + edac_dev->mod_name = DRV_NAME;
> + edac_dev->dev_name = dev_name(dev);
> + edac_dev->ctl_name = "L2 cache";
> + platform_set_drvdata(pdev, edac_dev);
> + for_each_online_cpu(i) {
for_each_possible_cpu()?
If you boot with maxcpus= the driver's behaviour changes.
But you are only parsing information from the DT, so you don't really need the CPUs to be
online.
> + struct device_node *cpu;
> + struct device_node *cpu_cache, *l2_cache;
> +
> + cpu = of_get_cpu_node(i, NULL);
(of_get_cpu_node() can return NULL, but I don't think it can ever happen like this)
> + cpu_cache = of_find_next_cache_node(cpu);
> + l2_cache = of_parse_phandle(dev->of_node, "l2-cache", 0);
> +
> + if (cpu_cache == l2_cache)
> + cpumask_set_cpu(i, &al_l2->cluster_cpus);
You need to of_node_put() these device_node pointers once you're done with them.
> + }
> +
> + if (cpumask_empty(&al_l2->cluster_cpus)) {
> + dev_err(dev, "CPU mask is empty for this L2 cache\n");
> + ret = -EINVAL;
> + goto err;
> + }
> +
> + ret = edac_device_add_device(edac_dev);
> + if (ret) {
> + dev_err(dev, "Failed to add L2 edac device\n");
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
> + edac_device_free_ctl_info(edac_dev);
> +
> + return ret;
> +}
With the of_node_put()ing:
Reviewed-by: James Morse <james.morse@arm.com>
Thanks,
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 4/4] edac: Add support for Amazon's Annapurna Labs L2 EDAC
2019-08-02 15:11 ` James Morse
@ 2019-08-05 13:14 ` Hawa, Hanna
0 siblings, 0 replies; 7+ messages in thread
From: Hawa, Hanna @ 2019-08-05 13:14 UTC (permalink / raw)
To: James Morse
Cc: robh+dt, mark.rutland, bp, mchehab, davem, gregkh, linus.walleij,
Jonathan.Cameron, nicolas.ferre, paulmck, dwmw, benh, ronenk,
talel, jonnyc, hanochu, devicetree, linux-kernel, linux-edac
On 8/2/2019 6:11 PM, James Morse wrote:
> Hi Hanna,
>
> On 01/08/2019 14:09, Hanna Hawa wrote:
>> Adds support for Amazon's Annapurna Labs L2 EDAC driver to detect and
>> report L2 errors.
>> diff --git a/drivers/edac/al_l2_edac.c b/drivers/edac/al_l2_edac.c
>> new file mode 100644
>> index 000000000000..6c6d37cf82ab
>> --- /dev/null
>> +++ b/drivers/edac/al_l2_edac.c
>> @@ -0,0 +1,189 @@
>
>> +#include <asm/sysreg.h>
>> +#include <linux/bitfield.h>
>
> #include <linux/cpumask.h> ?
Will be added.
>
>> +#include <linux/of.h>
>> +#include <linux/smp.h>
>
> [...]
>
>> +static void al_l2_edac_l2merrsr(void *arg)
>> +{
>> + struct edac_device_ctl_info *edac_dev = arg;
>> + int cpu, i;
>> + u32 ramid, repeat, other, fatal;
>> + u64 val = read_sysreg_s(ARM_CA57_L2MERRSR_EL1);
>> + char msg[AL_L2_EDAC_MSG_MAX];
>> + int space, count;
>> + char *p;
>> +
>> + if (!(FIELD_GET(ARM_CA57_L2MERRSR_VALID, val)))
>> + return;
>> +
>> + write_sysreg_s(0, ARM_CA57_L2MERRSR_EL1);
>> +
>> + cpu = smp_processor_id();
>> + ramid = FIELD_GET(ARM_CA57_L2MERRSR_RAMID, val);
>> + repeat = FIELD_GET(ARM_CA57_L2MERRSR_REPEAT, val);
>> + other = FIELD_GET(ARM_CA57_L2MERRSR_OTHER, val);
>> + fatal = FIELD_GET(ARM_CA57_L2MERRSR_FATAL, val);
>> +
>> + space = sizeof(msg);
>> + p = msg;
>> + count = scnprintf(p, space, "CPU%d L2 %serror detected", cpu,
>> + (fatal) ? "Fatal " : "");
>> + p += count;
>> + space -= count;
>> +
>> + switch (ramid) {
>> + case ARM_CA57_L2_TAG_RAM:
>> + count = scnprintf(p, space, " RAMID='L2 Tag RAM'");
>> + break;
>> + case ARM_CA57_L2_DATA_RAM:
>> + count = scnprintf(p, space, " RAMID='L2 Data RAM'");
>> + break;
>> + case ARM_CA57_L2_SNOOP_RAM:
>> + count = scnprintf(p, space, " RAMID='L2 Snoop RAM'");
>
> Nit: The TRMs both call this 'L2 Snoop Tag RAM'. Could we include 'tag' in the
> description. 'tag' implies its some kind of metadata, so an uncorrected error here affect
> a now unknown location, its more series than a 'data RAM' error. v8.2 would term this kind
> of error 'uncontained'.
Ack, will be fixed.
>
>
>> + break;
>> + case ARM_CA57_L2_DIRTY_RAM:
>> + count = scnprintf(p, space, " RAMID='L2 Dirty RAM'");
>> + break;
>> + case ARM_CA57_L2_INC_PF_RAM:
>> + count = scnprintf(p, space, " RAMID='L2 internal metadat'");
>
> Nit: metadata
Ack, will be fixed.
>
>> + break;
>> + default:
>> + count = scnprintf(p, space, " RAMID='unknown'");
>> + break;
>> + }
>> +
>> + p += count;
>> + space -= count;
>> +
>> + count = scnprintf(p, space,
>> + " repeat=%d, other=%d (L2MERRSR_EL1=0x%llx)",
>> + repeat, other, val);
>> +
>> + for (i = 0; i < repeat; i++) {
>> + if (fatal)
>> + edac_device_handle_ue(edac_dev, 0, 0, msg);
>> + else
>> + edac_device_handle_ce(edac_dev, 0, 0, msg);
>> + }
>> +}
>
> [...]
>
>> +static int al_l2_edac_probe(struct platform_device *pdev)
>> +{
>> + struct edac_device_ctl_info *edac_dev;
>> + struct al_l2_edac *al_l2;
>> + struct device *dev = &pdev->dev;
>> + int ret, i;
>> +
>> + edac_dev = edac_device_alloc_ctl_info(sizeof(*al_l2),
>> + (char *)dev_name(dev), 1, "L", 1,
>> + 2, NULL, 0,
>> + edac_device_alloc_index());
>> + if (IS_ERR_OR_NULL(edac_dev))
>> + return -ENOMEM;
>> +
>> + al_l2 = edac_dev->pvt_info;
>> + edac_dev->edac_check = al_l2_edac_check;
>> + edac_dev->dev = dev;
>> + edac_dev->mod_name = DRV_NAME;
>> + edac_dev->dev_name = dev_name(dev);
>> + edac_dev->ctl_name = "L2 cache";
>> + platform_set_drvdata(pdev, edac_dev);
>
>> + for_each_online_cpu(i) {
>
> for_each_possible_cpu()?
>
> If you boot with maxcpus= the driver's behaviour changes.
> But you are only parsing information from the DT, so you don't really need the CPUs to be
> online.
Agree, for dt parsing no need the online CPUs, if
for_each_possible_cpu() used then smp_call_function_any() will run only
on the online CPUs in the mask.
Will change to for_each_possible_cpu().
>
>
>> + struct device_node *cpu;
>> + struct device_node *cpu_cache, *l2_cache;
>> +
>> + cpu = of_get_cpu_node(i, NULL);
>
> (of_get_cpu_node() can return NULL, but I don't think it can ever happen like this)
>
>> + cpu_cache = of_find_next_cache_node(cpu);
>> + l2_cache = of_parse_phandle(dev->of_node, "l2-cache", 0);
>> +
>> + if (cpu_cache == l2_cache)
>> + cpumask_set_cpu(i, &al_l2->cluster_cpus);
>
> You need to of_node_put() these device_node pointers once you're done with them.
Will be added.
>
>
>> + }
>> +
>> + if (cpumask_empty(&al_l2->cluster_cpus)) {
>> + dev_err(dev, "CPU mask is empty for this L2 cache\n");
>> + ret = -EINVAL;
>> + goto err;
>> + }
>> +
>> + ret = edac_device_add_device(edac_dev);
>> + if (ret) {
>> + dev_err(dev, "Failed to add L2 edac device\n");
>> + goto err;
>> + }
>> +
>> + return 0;
>> +
>> +err:
>> + edac_device_free_ctl_info(edac_dev);
>> +
>> + return ret;
>> +}
>
> With the of_node_put()ing:
> Reviewed-by: James Morse <james.morse@arm.com>
Thanks for review, will publish v5 with the above fixes.
Thanks,
Hanna
>
>
> Thanks,
>
> James
>
^ permalink raw reply [flat|nested] 7+ messages in thread