From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753990AbbJUUna (ORCPT ); Wed, 21 Oct 2015 16:43:30 -0400 Received: from mail-bl2on0089.outbound.protection.outlook.com ([65.55.169.89]:26212 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750698AbbJUUnZ (ORCPT ); Wed, 21 Oct 2015 16:43:25 -0400 Authentication-Results: spf=none (sender IP is 165.204.84.222) smtp.mailfrom=amd.com; alien8.de; dkim=none (message not signed) header.d=none;alien8.de; dmarc=permerror action=none header.from=amd.com; X-WSS-ID: 0NWL884-08-5LF-02 X-M-MSG: From: Brijesh Singh To: CC: Brijesh Singh , , , , , , , , , , , , , , Subject: [PATCH v2] EDAC: Add ARM64 EDAC Date: Wed, 21 Oct 2015 15:41:37 -0500 Message-ID: <1445460097-10260-1-git-send-email-brijeshkumar.singh@amd.com> X-Mailer: git-send-email 1.9.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:165.204.84.222;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(428002)(199003)(189002)(50986999)(19580395003)(33646002)(2351001)(19580405001)(87936001)(101416001)(42186005)(105586002)(48376002)(50466002)(575784001)(106466001)(50226001)(86362001)(64706001)(47776003)(110136002)(5007970100001)(229853001)(97736004)(189998001)(11100500001)(5008740100001)(5001920100001)(5004730100002)(46102003)(5003940100001)(36756003)(92566002)(2004002);DIR:OUT;SFP:1101;SCL:1;SRVR:BN4PR12MB0852;H:atltwp02.amd.com;FPR:;SPF:None;PTR:InfoDomainNonexistent;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;BN4PR12MB0852;2:RFnJj9if6aY81U8Y+uwgmqZ72uvBIEAEPcZ+C4RrahygVxYfUiB+av2agB7cXjbCxxdJ/MTh42iphKuDQmFeYvWQOcX8wPHYU78vybz+BsoexYvF3HXO5BtUXU+5xB12zl81Hs88qyaq4MQzTrowvbb7r9vRNyiavTlDaNfQ8kY=;3:Kh1JgPLhKdsbjpX0R3YgLrMb5OKuqf3Sco8ZW+TPmiM/shpnk/AgLLVoK1r9vNYQtqFpsDMGv1tR1oKferE+n95oQZgh8niVcfKJdcL+vd4Fe8GZ+RKQeGpyuj7dKU7K9ZVckogNXFysCfPW/FmK+PWKaSQhqoq/JhvdyxPrtpUZLu2oZ+AJe4S8izN09gS/MBJu8IZJDg32/M+dtOCU4pP8J21C8fN0Q4uFR6Z/tOd1oyvkA2LxFWzAtbMPYeK+;25:mjPNuXM2KO8MnjS5ItzV4UFuINum3vNKfGQoUyxewmrNilOodBIKMeqn1+5XLUHSuCZ/vp57ek2zWf6+G0C107GCSDu2Ez2LMgNd9egeIV3ApLVsj/wqTikNcZ/x8bRVKk6zdWTc8YOjJ179aBqW+f/JP7IgpvI+sOfH03Agu+oiEpnKZvYnhYKvBsXyoWlBBLi3rpiMYXb1AUhctFL/QGohmOzJI8OsVSzJ4hyKcfgLl3sDy59ZWM+ajFBzR1egbHzZzV33LqG90H3WCAAUxg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN4PR12MB0852; X-Microsoft-Exchange-Diagnostics: 1;BN4PR12MB0852;20:E5twcy4kf2ihAUAfXv/ygHWhn8hO5mfvoaDP70m4uvuQzKO82hddWI6oR9hdZOoBFjKOhNe/+TkhNWs4KXWa15j4eI89Neh/wrnn9z2yP3utEDIHYaXTF3H3qT1Axwd40iIOV/KWJ08tKU59hdocTUs6dzfQMM1t4fs3Tt1WnloJuKB1c1Ioxrar7fNVohR7xxHKKW7C6PuVbsA9vpqVgbra0lWAYu5HUptzjvmabKoGE39Zysmnh7f8MAqasaPwwPATqPxRmIlnO0gN6Hl77R/bpJNl8xAAfv8gluOByjM1FuiW1jIKSFV/6jpqnzV8AuYrPC1noHTUUNBVcgaIoPyOrl2KqWbquuYkuFSdtcABa0b6sHj+WrW2cAcDpbLdpuven727dabAGM16EDcRHlwlIa0PHNGiafYQ/kbejTkguq+0CI2aoFPHXvfH/ei4c+bxwflKuigsxmjPljGTWyZYJGRNefazeXA3YDisRR1+l7ScxtF37yNhAmBkDC60;4:1VG2njLEpuUxkaHlmo5SM/KVV6imnK4d35g6SylVeuRpWhv2f4MGL48ZiFbFGvckV8VhRJX0k0SbEDJNOp+rX9sRsEi1R6swrqjfwIbo5nC2kYyGRBoxE2HZZGsSx17GH1f8sbRaIzrRgI+phkLi2ewVvNQAjeKo2SRTjSUu5HRdP7C/xQe0s167GdkEwkGT/8wrYXNt3qyeNexItN4HVJ+sI/wcSkoDHih3NhnpvR9o6+YPUe7U2Psz6Pav48aaPhMlfXuXzQ37sCkBkHk5IRTfxOI/k3io1UT6rCBSOtXZ/Tnl7Sxa99n4WFhtflGAv4d6DlHddMN5AiGIeZ7owTghs+SSaZvg5eWgxhj5vs3zbJm7u1ei2KhZ1SUvKtER X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(767451399110); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(520078)(5005006)(3002001)(102115026);SRVR:BN4PR12MB0852;BCL:0;PCL:0;RULEID:;SRVR:BN4PR12MB0852; X-Forefront-PRVS: 073631BD3D X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BN4PR12MB0852;23:w9ozxrZrzIpNHuip0BpLmod63LxLqfu4QJnqLnNEf?= =?us-ascii?Q?s8xXytO/jISrNDSfcuYq7u1l5bQyjrA5pDjj76d5ZzqC+m/f+ukPevN+gxjC?= =?us-ascii?Q?9kPwoqLEaB1E9+/2pMzN1qvPg5IYm3bYrbhXZ1HZtqrZgBXDZ4NG2vuq8QY9?= =?us-ascii?Q?CINiHbdz/1PUy7uyJ0CRGFNzlsBNRAgfCqdeLOuSsvZm3s0rge8BfoQWH5un?= =?us-ascii?Q?NXHrhEwavLdNe9JW+ibzYMOb3Rypp/KY3N5XV8qU/hS34PfbUVNhjjIM/FMJ?= =?us-ascii?Q?Ikvc+ybYEN9p7GofXNDYNtSF6oCWI6GLGAqaYDO+NH+UsX+GEwvn/Lfbfq6H?= =?us-ascii?Q?ReeWvtxD0mA/Bj9bWRmayzVGo3C0L6/PVJHx35/e9l+5lyDDDrUFExkPInW7?= =?us-ascii?Q?Hvo4+lVssUS4vrS1hG0nMNFI4J93Bx8nP2kc249beUnahxozdDra8a5ZTzQD?= =?us-ascii?Q?o0JMwcnhR3X5kVdcMoZERCrdGwhvw7uRnRMsGWskyUlhHTeDNd7rHGl7qkQ2?= =?us-ascii?Q?2vaZykCquwwXfTz2qHBzyoo7pjYt67vBfwTGm24YoXHJx8qAe9cw2ShiEje1?= =?us-ascii?Q?4Ymuc8qegRlvfHqsxaLsIl/WRmJi1j7of8rYoOqizASMMjAr74L0PCZCzXIK?= =?us-ascii?Q?J8Sy4knqG2PuDrQAuxiKxm9erhAfCkhd0+t9tEpK18Pbbp+hSd8wLEFSbr/2?= =?us-ascii?Q?75bgjuDTdWRD2GhuurTEciQ10RK/vJKPGOmduU1JJYRnMkE9LxXG+RsvWyCH?= =?us-ascii?Q?eQAE3UKrWDVQ+13c1gl881Dx9YuWFku5Lplbct+BpucCLZAJxRu58FmfoOBh?= =?us-ascii?Q?aeRITpoEB+FZGRUw02Aw/KKxkZcUP7i9Ip7n2CNYMfVAqYZ9/ZyeIJNT/3FY?= =?us-ascii?Q?bXhKezxIAvBl8GWqXRbEWCedLVcn249BFTavPOXTJZzor155P7vEw7NhOJt+?= =?us-ascii?Q?3bdqMlXd5z2wSc6siTpkTsIq9n60Gun8DBvpg8XXdg62cJnrfKFNAHAKZkYy?= =?us-ascii?Q?0Q=3D?= X-Microsoft-Exchange-Diagnostics: 1;BN4PR12MB0852;5:YtvEGhqbJD3LoMQp5yFT+vPMgzjpvVbHDykpEo4NXMhBUUt6+APOyTLA1LeRC7dSeFM2L4V5+zz/8rAtDMpgaVdpIKDfBWS3kp5wq4J6Z/aKrY/S3PzUZjzxjx/lerjPPn9L7xmuZQnpCdVB0EdHvg==;24:HgsC7zm4Sv3sd0YBjsdPaFv2dbGL68EcCQlLefVsUeMMJYmBbBb5xWpphzsqyRx7bdyJJRYls2fHTDZU/EkmAvUT/wnQGBTb8OoJJ9qOFtg=;20:Z9fXUgtNFSb90g6Xao26YJMwqcqWvnWZreBveBFl+TC4gEhfTKIt/cQCgcOwZCaOdrgzqayltYoGMEtUnvTpXJwc0lrU8PqmQYCN8M/QwDfVv385TGg5CHnGvvhAUqgK/c5X2idw77TgwLybMT1PILltVmRyEPdd6DU5mM4hI1IE4ND0VVNNOUpT3dpgJLqiYAtBQn1TDrc7QX0Weqp54JIhhxJuhV3xYMJ8nwzVqi8vA0gii0SNgofG0MZ2v/rl SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Oct 2015 20:43:20.0543 (UTC) X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.222];Helo=[atltwp02.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN4PR12MB0852 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add support for Cortex A57 and A53 EDAC driver. Signed-off-by: Brijesh Singh CC: robh+dt@kernel.org CC: pawel.moll@arm.com CC: mark.rutland@arm.com CC: ijc+devicetree@hellion.org.uk CC: galak@codeaurora.org CC: dougthompson@xmission.com CC: bp@alien8.de CC: mchehab@osg.samsung.com CC: devicetree@vger.kernel.org CC: guohanjun@huawei.com CC: andre.przywara@arm.com CC: arnd@arndb.de CC: linux-kernel@vger.kernel.org CC: linux-edac@vger.kernel.org --- v2: * convert into generic arm64 edac driver * remove AMD specific references from dt binding * remove poll_msec property from dt binding * add poll_msec as a module param, default is 100ms * update copyright text * define macro mnemonics for L1 and L2 RAMID * check L2 error per-cluster instead of per core * update function names * use get_online_cpus() and put_online_cpus() to make L1 and L2 register read hotplug-safe * add error check in probe routine .../devicetree/bindings/edac/armv8-edac.txt | 15 + drivers/edac/Kconfig | 6 + drivers/edac/Makefile | 1 + drivers/edac/cortex_arm64_edac.c | 457 +++++++++++++++++++++ 4 files changed, 479 insertions(+) create mode 100644 Documentation/devicetree/bindings/edac/armv8-edac.txt create mode 100644 drivers/edac/cortex_arm64_edac.c diff --git a/Documentation/devicetree/bindings/edac/armv8-edac.txt b/Documentation/devicetree/bindings/edac/armv8-edac.txt new file mode 100644 index 0000000..dfd128f --- /dev/null +++ b/Documentation/devicetree/bindings/edac/armv8-edac.txt @@ -0,0 +1,15 @@ +* ARMv8 L1/L2 cache error reporting + +On ARMv8, CPU Memory Error Syndrome Register and L2 Memory Error Syndrome +Register can be used for checking L1 and L2 memory errors. + +The following section describes the ARMv8 EDAC DT node binding. + +Required properties: +- compatible: Should be "arm,armv8-edac" + +Example: + edac { + compatible = "arm,armv8-edac"; + }; + diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index ef25000..dd7c195 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -390,4 +390,10 @@ config EDAC_XGENE Support for error detection and correction on the APM X-Gene family of SOCs. +config EDAC_CORTEX_ARM64 + tristate "ARM Cortex A57/A53" + depends on EDAC_MM_EDAC && ARM64 + help + Support for error detection and correction on the + ARM Cortex A57 and A53. endif # EDAC diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index ae3c5f3..ac01660 100644 --- a/drivers/edac/Makefile +++ b/drivers/edac/Makefile @@ -68,3 +68,4 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o obj-$(CONFIG_EDAC_ALTERA_MC) += altera_edac.o obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o +obj-$(CONFIG_EDAC_CORTEX_ARM64) += cortex_arm64_edac.o diff --git a/drivers/edac/cortex_arm64_edac.c b/drivers/edac/cortex_arm64_edac.c new file mode 100644 index 0000000..c37bb94 --- /dev/null +++ b/drivers/edac/cortex_arm64_edac.c @@ -0,0 +1,457 @@ +/* + * Cortex ARM64 EDAC + * + * Copyright (c) 2015, Advanced Micro Devices + * Author: Brijesh Singh + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include + +#include "edac_core.h" + +#define EDAC_MOD_STR "cortex_arm64_edac" + +#define A57_CPUMERRSR_EL1_INDEX(x) ((x) & 0x1ffff) +#define A57_CPUMERRSR_EL1_BANK(x) (((x) >> 18) & 0x1f) +#define A57_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A57_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A57_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0x7f) +#define A57_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A57_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A57_L1_I_TAG_RAM 0x00 +#define A57_L1_I_DATA_RAM 0x01 +#define A57_L1_D_TAG_RAM 0x08 +#define A57_L1_D_DATA_RAM 0x09 +#define A57_L1_TLB_RAM 0x18 + +#define A57_L2MERRSR_EL1_INDEX(x) ((x) & 0x1ffff) +#define A57_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0xf) +#define A57_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A57_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A57_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A57_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A57_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A57_L2_TAG_RAM 0x10 +#define A57_L2_DATA_RAM 0x11 +#define A57_L2_SNOOP_TAG_RAM 0x12 +#define A57_L2_DIRTY_RAM 0x14 +#define A57_L2_INCLUSION_PF_RAM 0x18 + +#define A53_CPUMERRSR_EL1_ADDR(x) ((x) & 0xfff) +#define A53_CPUMERRSR_EL1_CPUID(x) (((x) >> 18) & 0x07) +#define A53_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A53_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A53_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A53_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A53_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A53_L1_I_TAG_RAM 0x00 +#define A53_L1_I_DATA_RAM 0x01 +#define A53_L1_D_TAG_RAM 0x08 +#define A53_L1_D_DATA_RAM 0x09 +#define A53_L1_D_DIRT_RAM 0x0A +#define A53_L1_TLB_RAM 0x18 + +#define A53_L2MERRSR_EL1_INDEX(x) (((x) >> 3) & 0x3fff) +#define A53_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0x0f) +#define A53_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A53_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A53_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A53_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A53_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A53_L2_TAG_RAM 0x10 +#define A53_L2_DATA_RAM 0x11 +#define A53_L2_SNOOP_RAM 0x12 + +#define L1_CACHE 0 +#define L2_CACHE 1 + +int poll_msec = 100; + +struct cortex_arm64_edac { + struct edac_device_ctl_info *edac_ctl; +}; + +static inline u64 read_cpumerrsr_el1(void) +{ + u64 val; + + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val)); + return val; +} + +static inline void write_cpumerrsr_el1(u64 val) +{ + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (val)); +} + +static inline u64 read_l2merrsr_el1(void) +{ + u64 val; + + asm volatile("mrs %0, s3_1_c15_c2_3" : "=r" (val)); + return val; +} + +static inline void write_l2merrsr_el1(u64 val) +{ + asm volatile("msr s3_1_c15_c2_3, %0" :: "r" (val)); +} + +static void a53_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_l2merrsr_el1(); + + if (!A53_L2MERRSR_EL1_VALID(val)) + return; + + fatal = A53_L2MERRSR_EL1_FATAL(val); + repeat_err = A53_L2MERRSR_EL1_REPEAT(val); + other_err = A53_L2MERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A53 CPU%d L2 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); + + switch (A53_L2MERRSR_EL1_RAMID(val)) { + case A53_L2_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); + break; + case A53_L2_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); + break; + case A53_L2_SNOOP_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop filter RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + write_l2merrsr_el1(0); +} + +static void a57_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_l2merrsr_el1(); + + if (!A57_L2MERRSR_EL1_VALID(val)) + return; + + fatal = A57_L2MERRSR_EL1_FATAL(val); + repeat_err = A57_L2MERRSR_EL1_REPEAT(val); + other_err = A57_L2MERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A57 CPU%d L2 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); + + switch (A57_L2MERRSR_EL1_RAMID(val)) { + case A57_L2_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); + break; + case A57_L2_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); + break; + case A57_L2_SNOOP_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop tag RAM\n"); + break; + case A57_L2_DIRTY_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Dirty RAM\n"); + break; + case A57_L2_INCLUSION_PF_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 inclusion PF RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + write_l2merrsr_el1(0); +} + +static void a57_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_cpumerrsr_el1(); + + if (!A57_CPUMERRSR_EL1_VALID(val)) + return; + + fatal = A57_CPUMERRSR_EL1_FATAL(val); + repeat_err = A57_CPUMERRSR_EL1_REPEAT(val); + other_err = A57_CPUMERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "CPU%d L1 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); + + switch (A57_CPUMERRSR_EL1_RAMID(val)) { + case A57_L1_I_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); + break; + case A57_L1_I_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); + break; + case A57_L1_D_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); + break; + case A57_L1_D_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); + break; + case A57_L1_TLB_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + write_cpumerrsr_el1(0); +} + +static void a53_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_cpumerrsr_el1(); + + if (!A53_CPUMERRSR_EL1_VALID(val)) + return; + + fatal = A53_CPUMERRSR_EL1_FATAL(val); + repeat_err = A53_CPUMERRSR_EL1_REPEAT(val); + other_err = A53_CPUMERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A53 CPU%d L1 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); + + switch (A53_CPUMERRSR_EL1_RAMID(val)) { + case A53_L1_I_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); + break; + case A53_L1_I_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); + break; + case A53_L1_D_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); + break; + case A53_L1_D_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); + break; + case A53_L1_TLB_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + write_cpumerrsr_el1(0); +} + +static void parse_cpumerrsr(void *args) +{ + struct edac_device_ctl_info *edac_ctl = args; + int partnum = read_cpuid_part_number(); + + switch (partnum) { + case ARM_CPU_PART_CORTEX_A57: + a57_parse_cpumerrsr(edac_ctl); + break; + case ARM_CPU_PART_CORTEX_A53: + a53_parse_cpumerrsr(edac_ctl); + break; + } +} + +static void parse_l2merrsr(void *args) +{ + struct edac_device_ctl_info *edac_ctl = args; + int partnum = read_cpuid_part_number(); + + switch (partnum) { + case ARM_CPU_PART_CORTEX_A57: + a57_parse_l2merrsr(edac_ctl); + break; + case ARM_CPU_PART_CORTEX_A53: + a53_parse_l2merrsr(edac_ctl); + break; + } +} + +static void arm64_monitor_cache_errors(struct edac_device_ctl_info *edev_ctl) +{ + int cpu; + struct cpumask cluster_mask, old_mask; + + cpumask_clear(&cluster_mask); + cpumask_clear(&old_mask); + + get_online_cpus(); + for_each_online_cpu(cpu) { + smp_call_function_single(cpu, parse_cpumerrsr, edev_ctl, 0); + cpumask_copy(&cluster_mask, topology_core_cpumask(cpu)); + if (cpumask_equal(&cluster_mask, &old_mask)) + continue; + cpumask_copy(&old_mask, &cluster_mask); + smp_call_function_any(&cluster_mask, parse_l2merrsr, + edev_ctl, 0); + } + put_online_cpus(); +} + +static int cortex_arm64_edac_probe(struct platform_device *pdev) +{ + int rc; + struct cortex_arm64_edac *drv; + struct device *dev = &pdev->dev; + + drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL); + if (!drv) + return -ENOMEM; + + drv->edac_ctl = edac_device_alloc_ctl_info(0, "cpu", + num_possible_cpus(), "L", 2, + 1, NULL, 0, + edac_device_alloc_index()); + if (IS_ERR(drv->edac_ctl)) + return -ENOMEM; + + drv->edac_ctl->poll_msec = poll_msec; + drv->edac_ctl->edac_check = arm64_monitor_cache_errors; + drv->edac_ctl->dev = dev; + drv->edac_ctl->mod_name = dev_name(dev); + drv->edac_ctl->dev_name = dev_name(dev); + drv->edac_ctl->ctl_name = "cpu_err"; + drv->edac_ctl->panic_on_ue = 1; + platform_set_drvdata(pdev, drv); + + rc = edac_device_add_device(drv->edac_ctl); + if (rc) + goto edac_alloc_failed; + + return 0; + +edac_alloc_failed: + edac_device_free_ctl_info(drv->edac_ctl); + return rc; +} + +static int cortex_arm64_edac_remove(struct platform_device *pdev) +{ + struct cortex_arm64_edac *drv = dev_get_drvdata(&pdev->dev); + struct edac_device_ctl_info *edac_ctl = drv->edac_ctl; + + edac_device_del_device(edac_ctl->dev); + edac_device_free_ctl_info(edac_ctl); + + return 0; +} + +static const struct of_device_id cortex_arm64_edac_of_match[] = { + { .compatible = "arm,armv8-edac" }, + {}, +}; +MODULE_DEVICE_TABLE(of, cortex_arm64_edac_of_match); + +static struct platform_driver cortex_arm64_edac_driver = { + .probe = cortex_arm64_edac_probe, + .remove = cortex_arm64_edac_remove, + .driver = { + .name = "arm64-edac", + .owner = THIS_MODULE, + .of_match_table = cortex_arm64_edac_of_match, + }, +}; + +static int __init cortex_arm64_edac_init(void) +{ + int rc; + + /* Only POLL mode is supported so far */ + edac_op_state = EDAC_OPSTATE_POLL; + + rc = platform_driver_register(&cortex_arm64_edac_driver); + if (rc) { + edac_printk(KERN_ERR, EDAC_MOD_STR, "failed to register\n"); + return rc; + } + + return 0; +} +module_init(cortex_arm64_edac_init); + +static void __exit cortex_arm64_edac_exit(void) +{ + platform_driver_unregister(&cortex_arm64_edac_driver); +} +module_exit(cortex_arm64_edac_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Brijesh Singh "); +MODULE_DESCRIPTION("Cortex A57 and A53 EDAC driver"); +module_param(poll_msec, int, 0444); +MODULE_PARM_DESC(poll_msec, "EDAC monitor poll interval in msec"); -- 1.9.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brijesh Singh Subject: [PATCH v2] EDAC: Add ARM64 EDAC Date: Wed, 21 Oct 2015 15:41:37 -0500 Message-ID: <1445460097-10260-1-git-send-email-brijeshkumar.singh@amd.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Sender: devicetree-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org Cc: Brijesh Singh , robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, pawel.moll-5wv7dgnIgG8@public.gmane.org, mark.rutland-5wv7dgnIgG8@public.gmane.org, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg@public.gmane.org, galak-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org, dougthompson-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org, bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org, mchehab-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org, devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, guohanjun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org, andre.przywara-5wv7dgnIgG8@public.gmane.org, arnd-r2nGTMty4D4@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-edac-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: devicetree@vger.kernel.org Add support for Cortex A57 and A53 EDAC driver. Signed-off-by: Brijesh Singh CC: robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org CC: pawel.moll-5wv7dgnIgG8@public.gmane.org CC: mark.rutland-5wv7dgnIgG8@public.gmane.org CC: ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg@public.gmane.org CC: galak-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org CC: dougthompson-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org CC: bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org CC: mchehab-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org CC: devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org CC: guohanjun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org CC: andre.przywara-5wv7dgnIgG8@public.gmane.org CC: arnd-r2nGTMty4D4@public.gmane.org CC: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org CC: linux-edac-u79uwXL29TY76Z2rM5mHXA@public.gmane.org --- v2: * convert into generic arm64 edac driver * remove AMD specific references from dt binding * remove poll_msec property from dt binding * add poll_msec as a module param, default is 100ms * update copyright text * define macro mnemonics for L1 and L2 RAMID * check L2 error per-cluster instead of per core * update function names * use get_online_cpus() and put_online_cpus() to make L1 and L2 register read hotplug-safe * add error check in probe routine .../devicetree/bindings/edac/armv8-edac.txt | 15 + drivers/edac/Kconfig | 6 + drivers/edac/Makefile | 1 + drivers/edac/cortex_arm64_edac.c | 457 +++++++++++++++++++++ 4 files changed, 479 insertions(+) create mode 100644 Documentation/devicetree/bindings/edac/armv8-edac.txt create mode 100644 drivers/edac/cortex_arm64_edac.c diff --git a/Documentation/devicetree/bindings/edac/armv8-edac.txt b/Documentation/devicetree/bindings/edac/armv8-edac.txt new file mode 100644 index 0000000..dfd128f --- /dev/null +++ b/Documentation/devicetree/bindings/edac/armv8-edac.txt @@ -0,0 +1,15 @@ +* ARMv8 L1/L2 cache error reporting + +On ARMv8, CPU Memory Error Syndrome Register and L2 Memory Error Syndrome +Register can be used for checking L1 and L2 memory errors. + +The following section describes the ARMv8 EDAC DT node binding. + +Required properties: +- compatible: Should be "arm,armv8-edac" + +Example: + edac { + compatible = "arm,armv8-edac"; + }; + diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index ef25000..dd7c195 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -390,4 +390,10 @@ config EDAC_XGENE Support for error detection and correction on the APM X-Gene family of SOCs. +config EDAC_CORTEX_ARM64 + tristate "ARM Cortex A57/A53" + depends on EDAC_MM_EDAC && ARM64 + help + Support for error detection and correction on the + ARM Cortex A57 and A53. endif # EDAC diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index ae3c5f3..ac01660 100644 --- a/drivers/edac/Makefile +++ b/drivers/edac/Makefile @@ -68,3 +68,4 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o obj-$(CONFIG_EDAC_ALTERA_MC) += altera_edac.o obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o +obj-$(CONFIG_EDAC_CORTEX_ARM64) += cortex_arm64_edac.o diff --git a/drivers/edac/cortex_arm64_edac.c b/drivers/edac/cortex_arm64_edac.c new file mode 100644 index 0000000..c37bb94 --- /dev/null +++ b/drivers/edac/cortex_arm64_edac.c @@ -0,0 +1,457 @@ +/* + * Cortex ARM64 EDAC + * + * Copyright (c) 2015, Advanced Micro Devices + * Author: Brijesh Singh + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include + +#include "edac_core.h" + +#define EDAC_MOD_STR "cortex_arm64_edac" + +#define A57_CPUMERRSR_EL1_INDEX(x) ((x) & 0x1ffff) +#define A57_CPUMERRSR_EL1_BANK(x) (((x) >> 18) & 0x1f) +#define A57_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A57_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A57_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0x7f) +#define A57_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A57_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A57_L1_I_TAG_RAM 0x00 +#define A57_L1_I_DATA_RAM 0x01 +#define A57_L1_D_TAG_RAM 0x08 +#define A57_L1_D_DATA_RAM 0x09 +#define A57_L1_TLB_RAM 0x18 + +#define A57_L2MERRSR_EL1_INDEX(x) ((x) & 0x1ffff) +#define A57_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0xf) +#define A57_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A57_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A57_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A57_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A57_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A57_L2_TAG_RAM 0x10 +#define A57_L2_DATA_RAM 0x11 +#define A57_L2_SNOOP_TAG_RAM 0x12 +#define A57_L2_DIRTY_RAM 0x14 +#define A57_L2_INCLUSION_PF_RAM 0x18 + +#define A53_CPUMERRSR_EL1_ADDR(x) ((x) & 0xfff) +#define A53_CPUMERRSR_EL1_CPUID(x) (((x) >> 18) & 0x07) +#define A53_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A53_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A53_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A53_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A53_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A53_L1_I_TAG_RAM 0x00 +#define A53_L1_I_DATA_RAM 0x01 +#define A53_L1_D_TAG_RAM 0x08 +#define A53_L1_D_DATA_RAM 0x09 +#define A53_L1_D_DIRT_RAM 0x0A +#define A53_L1_TLB_RAM 0x18 + +#define A53_L2MERRSR_EL1_INDEX(x) (((x) >> 3) & 0x3fff) +#define A53_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0x0f) +#define A53_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A53_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A53_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A53_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A53_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A53_L2_TAG_RAM 0x10 +#define A53_L2_DATA_RAM 0x11 +#define A53_L2_SNOOP_RAM 0x12 + +#define L1_CACHE 0 +#define L2_CACHE 1 + +int poll_msec = 100; + +struct cortex_arm64_edac { + struct edac_device_ctl_info *edac_ctl; +}; + +static inline u64 read_cpumerrsr_el1(void) +{ + u64 val; + + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val)); + return val; +} + +static inline void write_cpumerrsr_el1(u64 val) +{ + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (val)); +} + +static inline u64 read_l2merrsr_el1(void) +{ + u64 val; + + asm volatile("mrs %0, s3_1_c15_c2_3" : "=r" (val)); + return val; +} + +static inline void write_l2merrsr_el1(u64 val) +{ + asm volatile("msr s3_1_c15_c2_3, %0" :: "r" (val)); +} + +static void a53_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_l2merrsr_el1(); + + if (!A53_L2MERRSR_EL1_VALID(val)) + return; + + fatal = A53_L2MERRSR_EL1_FATAL(val); + repeat_err = A53_L2MERRSR_EL1_REPEAT(val); + other_err = A53_L2MERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A53 CPU%d L2 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); + + switch (A53_L2MERRSR_EL1_RAMID(val)) { + case A53_L2_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); + break; + case A53_L2_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); + break; + case A53_L2_SNOOP_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop filter RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + write_l2merrsr_el1(0); +} + +static void a57_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_l2merrsr_el1(); + + if (!A57_L2MERRSR_EL1_VALID(val)) + return; + + fatal = A57_L2MERRSR_EL1_FATAL(val); + repeat_err = A57_L2MERRSR_EL1_REPEAT(val); + other_err = A57_L2MERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A57 CPU%d L2 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); + + switch (A57_L2MERRSR_EL1_RAMID(val)) { + case A57_L2_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); + break; + case A57_L2_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); + break; + case A57_L2_SNOOP_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop tag RAM\n"); + break; + case A57_L2_DIRTY_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Dirty RAM\n"); + break; + case A57_L2_INCLUSION_PF_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 inclusion PF RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + write_l2merrsr_el1(0); +} + +static void a57_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_cpumerrsr_el1(); + + if (!A57_CPUMERRSR_EL1_VALID(val)) + return; + + fatal = A57_CPUMERRSR_EL1_FATAL(val); + repeat_err = A57_CPUMERRSR_EL1_REPEAT(val); + other_err = A57_CPUMERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "CPU%d L1 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); + + switch (A57_CPUMERRSR_EL1_RAMID(val)) { + case A57_L1_I_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); + break; + case A57_L1_I_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); + break; + case A57_L1_D_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); + break; + case A57_L1_D_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); + break; + case A57_L1_TLB_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + write_cpumerrsr_el1(0); +} + +static void a53_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_cpumerrsr_el1(); + + if (!A53_CPUMERRSR_EL1_VALID(val)) + return; + + fatal = A53_CPUMERRSR_EL1_FATAL(val); + repeat_err = A53_CPUMERRSR_EL1_REPEAT(val); + other_err = A53_CPUMERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A53 CPU%d L1 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); + + switch (A53_CPUMERRSR_EL1_RAMID(val)) { + case A53_L1_I_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); + break; + case A53_L1_I_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); + break; + case A53_L1_D_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); + break; + case A53_L1_D_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); + break; + case A53_L1_TLB_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + write_cpumerrsr_el1(0); +} + +static void parse_cpumerrsr(void *args) +{ + struct edac_device_ctl_info *edac_ctl = args; + int partnum = read_cpuid_part_number(); + + switch (partnum) { + case ARM_CPU_PART_CORTEX_A57: + a57_parse_cpumerrsr(edac_ctl); + break; + case ARM_CPU_PART_CORTEX_A53: + a53_parse_cpumerrsr(edac_ctl); + break; + } +} + +static void parse_l2merrsr(void *args) +{ + struct edac_device_ctl_info *edac_ctl = args; + int partnum = read_cpuid_part_number(); + + switch (partnum) { + case ARM_CPU_PART_CORTEX_A57: + a57_parse_l2merrsr(edac_ctl); + break; + case ARM_CPU_PART_CORTEX_A53: + a53_parse_l2merrsr(edac_ctl); + break; + } +} + +static void arm64_monitor_cache_errors(struct edac_device_ctl_info *edev_ctl) +{ + int cpu; + struct cpumask cluster_mask, old_mask; + + cpumask_clear(&cluster_mask); + cpumask_clear(&old_mask); + + get_online_cpus(); + for_each_online_cpu(cpu) { + smp_call_function_single(cpu, parse_cpumerrsr, edev_ctl, 0); + cpumask_copy(&cluster_mask, topology_core_cpumask(cpu)); + if (cpumask_equal(&cluster_mask, &old_mask)) + continue; + cpumask_copy(&old_mask, &cluster_mask); + smp_call_function_any(&cluster_mask, parse_l2merrsr, + edev_ctl, 0); + } + put_online_cpus(); +} + +static int cortex_arm64_edac_probe(struct platform_device *pdev) +{ + int rc; + struct cortex_arm64_edac *drv; + struct device *dev = &pdev->dev; + + drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL); + if (!drv) + return -ENOMEM; + + drv->edac_ctl = edac_device_alloc_ctl_info(0, "cpu", + num_possible_cpus(), "L", 2, + 1, NULL, 0, + edac_device_alloc_index()); + if (IS_ERR(drv->edac_ctl)) + return -ENOMEM; + + drv->edac_ctl->poll_msec = poll_msec; + drv->edac_ctl->edac_check = arm64_monitor_cache_errors; + drv->edac_ctl->dev = dev; + drv->edac_ctl->mod_name = dev_name(dev); + drv->edac_ctl->dev_name = dev_name(dev); + drv->edac_ctl->ctl_name = "cpu_err"; + drv->edac_ctl->panic_on_ue = 1; + platform_set_drvdata(pdev, drv); + + rc = edac_device_add_device(drv->edac_ctl); + if (rc) + goto edac_alloc_failed; + + return 0; + +edac_alloc_failed: + edac_device_free_ctl_info(drv->edac_ctl); + return rc; +} + +static int cortex_arm64_edac_remove(struct platform_device *pdev) +{ + struct cortex_arm64_edac *drv = dev_get_drvdata(&pdev->dev); + struct edac_device_ctl_info *edac_ctl = drv->edac_ctl; + + edac_device_del_device(edac_ctl->dev); + edac_device_free_ctl_info(edac_ctl); + + return 0; +} + +static const struct of_device_id cortex_arm64_edac_of_match[] = { + { .compatible = "arm,armv8-edac" }, + {}, +}; +MODULE_DEVICE_TABLE(of, cortex_arm64_edac_of_match); + +static struct platform_driver cortex_arm64_edac_driver = { + .probe = cortex_arm64_edac_probe, + .remove = cortex_arm64_edac_remove, + .driver = { + .name = "arm64-edac", + .owner = THIS_MODULE, + .of_match_table = cortex_arm64_edac_of_match, + }, +}; + +static int __init cortex_arm64_edac_init(void) +{ + int rc; + + /* Only POLL mode is supported so far */ + edac_op_state = EDAC_OPSTATE_POLL; + + rc = platform_driver_register(&cortex_arm64_edac_driver); + if (rc) { + edac_printk(KERN_ERR, EDAC_MOD_STR, "failed to register\n"); + return rc; + } + + return 0; +} +module_init(cortex_arm64_edac_init); + +static void __exit cortex_arm64_edac_exit(void) +{ + platform_driver_unregister(&cortex_arm64_edac_driver); +} +module_exit(cortex_arm64_edac_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Brijesh Singh "); +MODULE_DESCRIPTION("Cortex A57 and A53 EDAC driver"); +module_param(poll_msec, int, 0444); +MODULE_PARM_DESC(poll_msec, "EDAC monitor poll interval in msec"); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: brijeshkumar.singh@amd.com (Brijesh Singh) Date: Wed, 21 Oct 2015 15:41:37 -0500 Subject: [PATCH v2] EDAC: Add ARM64 EDAC Message-ID: <1445460097-10260-1-git-send-email-brijeshkumar.singh@amd.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Add support for Cortex A57 and A53 EDAC driver. Signed-off-by: Brijesh Singh CC: robh+dt at kernel.org CC: pawel.moll at arm.com CC: mark.rutland at arm.com CC: ijc+devicetree at hellion.org.uk CC: galak at codeaurora.org CC: dougthompson at xmission.com CC: bp at alien8.de CC: mchehab at osg.samsung.com CC: devicetree at vger.kernel.org CC: guohanjun at huawei.com CC: andre.przywara at arm.com CC: arnd at arndb.de CC: linux-kernel at vger.kernel.org CC: linux-edac at vger.kernel.org --- v2: * convert into generic arm64 edac driver * remove AMD specific references from dt binding * remove poll_msec property from dt binding * add poll_msec as a module param, default is 100ms * update copyright text * define macro mnemonics for L1 and L2 RAMID * check L2 error per-cluster instead of per core * update function names * use get_online_cpus() and put_online_cpus() to make L1 and L2 register read hotplug-safe * add error check in probe routine .../devicetree/bindings/edac/armv8-edac.txt | 15 + drivers/edac/Kconfig | 6 + drivers/edac/Makefile | 1 + drivers/edac/cortex_arm64_edac.c | 457 +++++++++++++++++++++ 4 files changed, 479 insertions(+) create mode 100644 Documentation/devicetree/bindings/edac/armv8-edac.txt create mode 100644 drivers/edac/cortex_arm64_edac.c diff --git a/Documentation/devicetree/bindings/edac/armv8-edac.txt b/Documentation/devicetree/bindings/edac/armv8-edac.txt new file mode 100644 index 0000000..dfd128f --- /dev/null +++ b/Documentation/devicetree/bindings/edac/armv8-edac.txt @@ -0,0 +1,15 @@ +* ARMv8 L1/L2 cache error reporting + +On ARMv8, CPU Memory Error Syndrome Register and L2 Memory Error Syndrome +Register can be used for checking L1 and L2 memory errors. + +The following section describes the ARMv8 EDAC DT node binding. + +Required properties: +- compatible: Should be "arm,armv8-edac" + +Example: + edac { + compatible = "arm,armv8-edac"; + }; + diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index ef25000..dd7c195 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -390,4 +390,10 @@ config EDAC_XGENE Support for error detection and correction on the APM X-Gene family of SOCs. +config EDAC_CORTEX_ARM64 + tristate "ARM Cortex A57/A53" + depends on EDAC_MM_EDAC && ARM64 + help + Support for error detection and correction on the + ARM Cortex A57 and A53. endif # EDAC diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index ae3c5f3..ac01660 100644 --- a/drivers/edac/Makefile +++ b/drivers/edac/Makefile @@ -68,3 +68,4 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o obj-$(CONFIG_EDAC_ALTERA_MC) += altera_edac.o obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o +obj-$(CONFIG_EDAC_CORTEX_ARM64) += cortex_arm64_edac.o diff --git a/drivers/edac/cortex_arm64_edac.c b/drivers/edac/cortex_arm64_edac.c new file mode 100644 index 0000000..c37bb94 --- /dev/null +++ b/drivers/edac/cortex_arm64_edac.c @@ -0,0 +1,457 @@ +/* + * Cortex ARM64 EDAC + * + * Copyright (c) 2015, Advanced Micro Devices + * Author: Brijesh Singh + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include + +#include "edac_core.h" + +#define EDAC_MOD_STR "cortex_arm64_edac" + +#define A57_CPUMERRSR_EL1_INDEX(x) ((x) & 0x1ffff) +#define A57_CPUMERRSR_EL1_BANK(x) (((x) >> 18) & 0x1f) +#define A57_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A57_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A57_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0x7f) +#define A57_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A57_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A57_L1_I_TAG_RAM 0x00 +#define A57_L1_I_DATA_RAM 0x01 +#define A57_L1_D_TAG_RAM 0x08 +#define A57_L1_D_DATA_RAM 0x09 +#define A57_L1_TLB_RAM 0x18 + +#define A57_L2MERRSR_EL1_INDEX(x) ((x) & 0x1ffff) +#define A57_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0xf) +#define A57_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A57_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A57_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A57_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A57_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A57_L2_TAG_RAM 0x10 +#define A57_L2_DATA_RAM 0x11 +#define A57_L2_SNOOP_TAG_RAM 0x12 +#define A57_L2_DIRTY_RAM 0x14 +#define A57_L2_INCLUSION_PF_RAM 0x18 + +#define A53_CPUMERRSR_EL1_ADDR(x) ((x) & 0xfff) +#define A53_CPUMERRSR_EL1_CPUID(x) (((x) >> 18) & 0x07) +#define A53_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A53_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A53_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A53_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A53_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A53_L1_I_TAG_RAM 0x00 +#define A53_L1_I_DATA_RAM 0x01 +#define A53_L1_D_TAG_RAM 0x08 +#define A53_L1_D_DATA_RAM 0x09 +#define A53_L1_D_DIRT_RAM 0x0A +#define A53_L1_TLB_RAM 0x18 + +#define A53_L2MERRSR_EL1_INDEX(x) (((x) >> 3) & 0x3fff) +#define A53_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0x0f) +#define A53_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) +#define A53_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) +#define A53_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) +#define A53_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) +#define A53_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) +#define A53_L2_TAG_RAM 0x10 +#define A53_L2_DATA_RAM 0x11 +#define A53_L2_SNOOP_RAM 0x12 + +#define L1_CACHE 0 +#define L2_CACHE 1 + +int poll_msec = 100; + +struct cortex_arm64_edac { + struct edac_device_ctl_info *edac_ctl; +}; + +static inline u64 read_cpumerrsr_el1(void) +{ + u64 val; + + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val)); + return val; +} + +static inline void write_cpumerrsr_el1(u64 val) +{ + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (val)); +} + +static inline u64 read_l2merrsr_el1(void) +{ + u64 val; + + asm volatile("mrs %0, s3_1_c15_c2_3" : "=r" (val)); + return val; +} + +static inline void write_l2merrsr_el1(u64 val) +{ + asm volatile("msr s3_1_c15_c2_3, %0" :: "r" (val)); +} + +static void a53_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_l2merrsr_el1(); + + if (!A53_L2MERRSR_EL1_VALID(val)) + return; + + fatal = A53_L2MERRSR_EL1_FATAL(val); + repeat_err = A53_L2MERRSR_EL1_REPEAT(val); + other_err = A53_L2MERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A53 CPU%d L2 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); + + switch (A53_L2MERRSR_EL1_RAMID(val)) { + case A53_L2_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); + break; + case A53_L2_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); + break; + case A53_L2_SNOOP_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop filter RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + write_l2merrsr_el1(0); +} + +static void a57_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_l2merrsr_el1(); + + if (!A57_L2MERRSR_EL1_VALID(val)) + return; + + fatal = A57_L2MERRSR_EL1_FATAL(val); + repeat_err = A57_L2MERRSR_EL1_REPEAT(val); + other_err = A57_L2MERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A57 CPU%d L2 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); + + switch (A57_L2MERRSR_EL1_RAMID(val)) { + case A57_L2_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); + break; + case A57_L2_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); + break; + case A57_L2_SNOOP_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop tag RAM\n"); + break; + case A57_L2_DIRTY_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Dirty RAM\n"); + break; + case A57_L2_INCLUSION_PF_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 inclusion PF RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, + edac_ctl->name); + write_l2merrsr_el1(0); +} + +static void a57_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_cpumerrsr_el1(); + + if (!A57_CPUMERRSR_EL1_VALID(val)) + return; + + fatal = A57_CPUMERRSR_EL1_FATAL(val); + repeat_err = A57_CPUMERRSR_EL1_REPEAT(val); + other_err = A57_CPUMERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "CPU%d L1 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); + + switch (A57_CPUMERRSR_EL1_RAMID(val)) { + case A57_L1_I_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); + break; + case A57_L1_I_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); + break; + case A57_L1_D_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); + break; + case A57_L1_D_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); + break; + case A57_L1_TLB_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + write_cpumerrsr_el1(0); +} + +static void a53_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) +{ + int fatal; + int repeat_err, other_err; + u64 val = read_cpumerrsr_el1(); + + if (!A53_CPUMERRSR_EL1_VALID(val)) + return; + + fatal = A53_CPUMERRSR_EL1_FATAL(val); + repeat_err = A53_CPUMERRSR_EL1_REPEAT(val); + other_err = A53_CPUMERRSR_EL1_OTHER(val); + + edac_printk(KERN_CRIT, EDAC_MOD_STR, + "A53 CPU%d L1 %s error detected!\n", smp_processor_id(), + fatal ? "fatal" : "non-fatal"); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); + + switch (A53_CPUMERRSR_EL1_RAMID(val)) { + case A53_L1_I_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); + break; + case A53_L1_I_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); + break; + case A53_L1_D_TAG_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); + break; + case A53_L1_D_DATA_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); + break; + case A53_L1_TLB_RAM: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); + break; + default: + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); + break; + } + + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", + repeat_err); + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", + other_err); + + if (fatal) + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + else + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, + edac_ctl->name); + write_cpumerrsr_el1(0); +} + +static void parse_cpumerrsr(void *args) +{ + struct edac_device_ctl_info *edac_ctl = args; + int partnum = read_cpuid_part_number(); + + switch (partnum) { + case ARM_CPU_PART_CORTEX_A57: + a57_parse_cpumerrsr(edac_ctl); + break; + case ARM_CPU_PART_CORTEX_A53: + a53_parse_cpumerrsr(edac_ctl); + break; + } +} + +static void parse_l2merrsr(void *args) +{ + struct edac_device_ctl_info *edac_ctl = args; + int partnum = read_cpuid_part_number(); + + switch (partnum) { + case ARM_CPU_PART_CORTEX_A57: + a57_parse_l2merrsr(edac_ctl); + break; + case ARM_CPU_PART_CORTEX_A53: + a53_parse_l2merrsr(edac_ctl); + break; + } +} + +static void arm64_monitor_cache_errors(struct edac_device_ctl_info *edev_ctl) +{ + int cpu; + struct cpumask cluster_mask, old_mask; + + cpumask_clear(&cluster_mask); + cpumask_clear(&old_mask); + + get_online_cpus(); + for_each_online_cpu(cpu) { + smp_call_function_single(cpu, parse_cpumerrsr, edev_ctl, 0); + cpumask_copy(&cluster_mask, topology_core_cpumask(cpu)); + if (cpumask_equal(&cluster_mask, &old_mask)) + continue; + cpumask_copy(&old_mask, &cluster_mask); + smp_call_function_any(&cluster_mask, parse_l2merrsr, + edev_ctl, 0); + } + put_online_cpus(); +} + +static int cortex_arm64_edac_probe(struct platform_device *pdev) +{ + int rc; + struct cortex_arm64_edac *drv; + struct device *dev = &pdev->dev; + + drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL); + if (!drv) + return -ENOMEM; + + drv->edac_ctl = edac_device_alloc_ctl_info(0, "cpu", + num_possible_cpus(), "L", 2, + 1, NULL, 0, + edac_device_alloc_index()); + if (IS_ERR(drv->edac_ctl)) + return -ENOMEM; + + drv->edac_ctl->poll_msec = poll_msec; + drv->edac_ctl->edac_check = arm64_monitor_cache_errors; + drv->edac_ctl->dev = dev; + drv->edac_ctl->mod_name = dev_name(dev); + drv->edac_ctl->dev_name = dev_name(dev); + drv->edac_ctl->ctl_name = "cpu_err"; + drv->edac_ctl->panic_on_ue = 1; + platform_set_drvdata(pdev, drv); + + rc = edac_device_add_device(drv->edac_ctl); + if (rc) + goto edac_alloc_failed; + + return 0; + +edac_alloc_failed: + edac_device_free_ctl_info(drv->edac_ctl); + return rc; +} + +static int cortex_arm64_edac_remove(struct platform_device *pdev) +{ + struct cortex_arm64_edac *drv = dev_get_drvdata(&pdev->dev); + struct edac_device_ctl_info *edac_ctl = drv->edac_ctl; + + edac_device_del_device(edac_ctl->dev); + edac_device_free_ctl_info(edac_ctl); + + return 0; +} + +static const struct of_device_id cortex_arm64_edac_of_match[] = { + { .compatible = "arm,armv8-edac" }, + {}, +}; +MODULE_DEVICE_TABLE(of, cortex_arm64_edac_of_match); + +static struct platform_driver cortex_arm64_edac_driver = { + .probe = cortex_arm64_edac_probe, + .remove = cortex_arm64_edac_remove, + .driver = { + .name = "arm64-edac", + .owner = THIS_MODULE, + .of_match_table = cortex_arm64_edac_of_match, + }, +}; + +static int __init cortex_arm64_edac_init(void) +{ + int rc; + + /* Only POLL mode is supported so far */ + edac_op_state = EDAC_OPSTATE_POLL; + + rc = platform_driver_register(&cortex_arm64_edac_driver); + if (rc) { + edac_printk(KERN_ERR, EDAC_MOD_STR, "failed to register\n"); + return rc; + } + + return 0; +} +module_init(cortex_arm64_edac_init); + +static void __exit cortex_arm64_edac_exit(void) +{ + platform_driver_unregister(&cortex_arm64_edac_driver); +} +module_exit(cortex_arm64_edac_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Brijesh Singh "); +MODULE_DESCRIPTION("Cortex A57 and A53 EDAC driver"); +module_param(poll_msec, int, 0444); +MODULE_PARM_DESC(poll_msec, "EDAC monitor poll interval in msec"); -- 1.9.1