From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965607AbbJVSs7 (ORCPT ); Thu, 22 Oct 2015 14:48:59 -0400 Received: from mail-bl2on0062.outbound.protection.outlook.com ([65.55.169.62]:5686 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965394AbbJVSsw (ORCPT ); Thu, 22 Oct 2015 14:48:52 -0400 Authentication-Results: spf=none (sender IP is 165.204.84.221) smtp.mailfrom=amd.com; alien8.de; dkim=none (message not signed) header.d=none;alien8.de; dmarc=permerror action=none header.from=amd.com; X-WSS-ID: 0NWMXL8-07-2PV-02 X-M-MSG: Subject: Re: [PATCH v2] EDAC: Add ARM64 EDAC To: Mauro Carvalho Chehab References: <1445460097-10260-1-git-send-email-brijeshkumar.singh@amd.com> <20151021192536.2af0f8c5@recife.lan> CC: , , , , , , , , , , , , , , From: Brijesh Singh Message-ID: <56292F52.2030407@amd.com> Date: Thu, 22 Oct 2015 13:47:46 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20151021192536.2af0f8c5@recife.lan> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.180.168.240] X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:165.204.84.221;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(428002)(189002)(24454002)(377454003)(199003)(479174004)(65816999)(92566002)(47776003)(189998001)(54356999)(77096005)(97736004)(59896002)(83506001)(65806001)(50466002)(105586002)(5007970100001)(87936001)(65956001)(2950100001)(5008740100001)(5004730100002)(575784001)(110136002)(76176999)(99136001)(86362001)(23746002)(11100500001)(80316001)(106466001)(87266999)(33656002)(50986999)(19580395003)(5001920100001)(101416001)(19580405001)(4001350100001)(36756003)(64126003)(2004002);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR12MB0712;H:atltwp01.amd.com;FPR:;SPF:None;PTR:InfoDomainNonexistent;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;BY2PR12MB0712;2:zP3pGR4qgHYif4m8oBCTZYLoMB4lNbbNpD3xI3beHDSHFUqvHz9/VpLLG8kc1EXhseTinHXo1Y+iNWFn5jQ8DsBaaWPwLH+NSeIvd+CjJ9Hf2GeN4jUaMaJZuv7Z7/ybrIzFxsR5P8vJndItfBVO8TLDZa1A8TzH6S0ZDEBPF9U=;3:qX3buJVZ9x1V6utbjRbNJpz313qESi4qMisK/bwIAuaGOFLlcCrbzw23IC15FRkdzfpOHcLyeHoHdld1eIkwraNnLFzrV6TuRY5pS7hkIBVple9vWwL7ZQ87FenXcdLcZTPnZ/maxFzvtuUVTYfYjhVfNW1GbW/PudNBwbZZZo984spJ7ywhn4/MRj0YIAJJd4AskrqeZJG+R4mSmdQAWwz9ufy+WimHRUtrEeTTFcxcI6lkWYG55CVA7LOPNlVC;25:mrwteWdmXZYitmvuufvDNtnh/nOK9P3rCfFcmLqJ+r1Bn4gdCxzG0irXF5KSSIij5YwxOpkkMm4F0G9+wPHfOjSzetA1SMaCsMzPPaIhv1q1FlrTO8YXqGv8AioctBx6EiGAD4IsA+NUsv6kLiJddZiNCnw75mKXlzQ3lBG1fowAGbY7z4nrTjT2mFjg4NDMgQMUAUTf5MZ5x/9zm3MfB4BnVNwpR/MGNV+7uNjTLd5v9VDG7kk9M0DRkLol6vnLBAKTA957gZwEQSQu357tVw== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR12MB0712; X-Microsoft-Exchange-Diagnostics: 1;BY2PR12MB0712;20:djwdYYOp/Epz/W0xjamSOmGUAbmJbMad04dQFMHTBtPP78A1pT2hWKQkwcwLmVVBjDknFQYEFzv8TeG3L5F5gAJsROqhv2lmkF0UiUNcklifJVisH6Zf6fE/bS2a7nYQKj/vmnxfZYV2hV3hbaJffLmPie/VPZo2jbjNAQcYi2NTLllHfaevhibp3WklVjA8FVrDLeqiFmqAOx3YvuMjZp6FqBlLoQknyqCyKhp4aVc5RIdPxsosNYvbCz0+5UC3Nd+js9DjYFzkkp7APw4wJ/4u6vVHuCvojTU73iMJDdcIBORCkhgtrS3w2AMRoCOopx3LfZJvzWQFrtBIG5mN3clgAFJZpV30C3K1XQOKrldNDXtSST2D2GjC0joTcjmBklFve4Fb21y3oClJbg+MK9nE4kAoX5S7gN1UZfuMTibhIVoH6jHS6Bmt1b3WJbC3HgodGGF4yp2cjwS4vI+VR7HqLZQ/FK0sR0hSeT9/dU2sKo7G++jqZ3yrD3ab6kQZ;4:HB8lfKQXSSA9J5dUVgwfNA9OZDpvF6AycEpsMaxS/XwNqRD+1jnFagAXKq6llDRgUCcnQi8HRtz6m5jw+zThxaROXKX41KTL/iIpQ5aihhDNvAQ6Wfdv/0TXJb9np+KbCx2yzE/IRK4Ogo1pEy19ap5fCc3IpEZgjM9Q7Gv+p2JRYjeEXOvTATvGRJZNLfh5wS7LEJA2zQLNqqNmLNISeYCvCJSe2sHz4BhD+3oRvwrM1QawAEZ2Cyc30JcWE2Ftdp1guXqIr5tw6BtSa1L5g9tS5Bz2+QZSuU7loSSyrP2B2gYogOXKU9lQPSYOnGIptj0wgwxGwQruoPaAsRJs5oTcanu7WsMdEs7U9g2o963l7KTgPX+tJhZpV7g+IPcn X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(767451399110); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(520078)(3002001)(102215026);SRVR:BY2PR12MB0712;BCL:0;PCL:0;RULEID:;SRVR:BY2PR12MB0712; X-Forefront-PRVS: 0737B96801 X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;BY2PR12MB0712;23:x6fu0+5w+MnBLOQYQff5Tuq+/wBM6VrF4JM0F?= =?Windows-1252?Q?rokmD51nKy9pCr2/bXmzg1zyyOlm5uoySWD9HRY8C1vAQzYRHjJhH2fq?= =?Windows-1252?Q?qlxT3PwViL5lghnRpCR0jAqjC7ScwhqdWfRFSP0o9+skqCW28tX1Gjh5?= =?Windows-1252?Q?P/FU/hJFUUIH5fbpyfCS0waAI53mpiR3UkS1wlRDEmzHKjDvJnvYiRQ3?= =?Windows-1252?Q?s8v0ZHNi24BIA+RxdP1KnsT/0T4TmCjOHphRTfK/fxBSCfgFHWybVdmT?= =?Windows-1252?Q?QQHe7UA9Bv9sYuftD7MM/5AMI6HxW9apZ539e+dt/es2uvHIn4OK417r?= =?Windows-1252?Q?M2LhRBSvnWsZU7UsZr65FdjSGNMS2qfdwuHRCpfnrxrJOfnGLpvP/Hqh?= =?Windows-1252?Q?AkvQHqzK9ZWhXpQDeDN3mJLi+JtpzCjpBIyjmueNE6gB7dGkb1hJR/yy?= =?Windows-1252?Q?xGR5c8DHfHyx2ce5G71NngYtD0cayVvJ8o/wsc4YYlTBdrz0jb2BMupf?= =?Windows-1252?Q?7CCkbHQk6EFHgU02y2s9jw1+sGbChRN6dsJlDUAdV+D6Ed6ey/buLxWH?= =?Windows-1252?Q?ilt4tkzUymjL4K65LET/ACM/Dg5UbvkzZJMYKMvsZ0WBtFwjiQEJZWTh?= =?Windows-1252?Q?9yDOE4A2IloBYjoUlyqz6HsKQDg5ICFL55bEgPUakvkH4//jD8s0t+bt?= =?Windows-1252?Q?sfZAa8XrH2B3HFLSVL6nSnk7MD77qUSb7x3jZkT80fgwgQFe+czdY0y8?= =?Windows-1252?Q?lBYnfQWuuEYA6WXeaTCvokKZCHxVAstfDhSK3zzy4EkE3+GxCdROueS9?= =?Windows-1252?Q?IovyWaoocY/xD9IX6/82TlvtUcLeSH1zNJt4f4lTKflSQ0/r0BTYkSrz?= =?Windows-1252?Q?LUqiWXRIF8lvWxP4tWwTo8DQW28uDbOGWmJssWyXO7XKweEZ5aMlfUip?= =?Windows-1252?Q?O1VHnBY/yIgWxN/9TBLlPJ5K5ve96JoZfjd7ex5AUfH17a7I3iXD8BSO?= =?Windows-1252?Q?YAKQfQLJ0ghLHwV7/Im2zVdL1hw1S6i2M+EvzlKQE+FCUJZMDeUQDqpQ?= =?Windows-1252?Q?dFeUsX2oYxNEjn2EMrGNs0/ofF2/kd16fkxgL3RdOn2o9IWjBTTsq2vj?= =?Windows-1252?Q?5ZcjOuq+PVUTaXSaQrZbfafmexCcbFxnRs/xxICiDAVt4D9vYWM7tQYu?= =?Windows-1252?Q?qoI6xvMAyruQovdZh3dK3FiYDWn54OOVifLG1h16D5NA/NXDh2RdgQv4?= =?Windows-1252?Q?VpYUzZk6Q8qLdSZMXVc7EyioYWzGk2R0NukQNgohECfqTipJBR/KlcQj?= =?Windows-1252?Q?6mTg+BtnYSEchiblE2lIlff7Q=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;BY2PR12MB0712;5:Qv0JGqY0bUAU0ZzAkrBaGfC7oWFJZqYgkA73Le6DjKRVeYaN2XDLXdaR/NlE4An3FvvLMN1+1ZEaNxFQEQ1ybPJ1lyhMhAshCtPt8pFS8axIzhAkIaurTSSRc2S7Q4T/zamU3IbGSJMwNGkAOuL7YQ==;24:ihHYDEjKnrrps2p6HDN9SffQurulV8bihuPiwECvoZXt7N1VVARJWXnfAYQfynY3I/U5dRXzUHJFDcO51mq1jgOoOhlEUWv7K4kSbkguAoI=;20:WD0tz9c19rpyE83KsdNidZ+3L9jRZLpRLGjMrNEHX0cmE45ZA7DYVkPeXC/vB9qY1AOEDjofoQwzTiQY01tccB8tMqutC8Yuv3qhZF8BgGBYulhZMODAsexj4TcZ0fS7QCPDtXqddV9vdjxbbF6a0OrvkoOSXSglmvSvK6cD6GJEIypy+iHkVD+8ZJA43YkmVL/foQBXhXN/Pf2JCNFKDu/CpuXoV0wwKRj2b+MCcYRfs7tsGpjIcixpGu0SCajF SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Oct 2015 18:48:45.6960 (UTC) X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.221];Helo=[atltwp01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR12MB0712 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mauro, On 10/21/2015 04:25 PM, Mauro Carvalho Chehab wrote: > Em Wed, 21 Oct 2015 15:41:37 -0500 > Brijesh Singh escreveu: > >> Add support for Cortex A57 and A53 EDAC driver. >> >> Signed-off-by: Brijesh Singh >> CC: robh+dt@kernel.org >> CC: pawel.moll@arm.com >> CC: mark.rutland@arm.com >> CC: ijc+devicetree@hellion.org.uk >> CC: galak@codeaurora.org >> CC: dougthompson@xmission.com >> CC: bp@alien8.de >> CC: mchehab@osg.samsung.com >> CC: devicetree@vger.kernel.org >> CC: guohanjun@huawei.com >> CC: andre.przywara@arm.com >> CC: arnd@arndb.de >> CC: linux-kernel@vger.kernel.org >> CC: linux-edac@vger.kernel.org >> --- >> >> v2: >> * convert into generic arm64 edac driver >> * remove AMD specific references from dt binding >> * remove poll_msec property from dt binding >> * add poll_msec as a module param, default is 100ms >> * update copyright text >> * define macro mnemonics for L1 and L2 RAMID >> * check L2 error per-cluster instead of per core >> * update function names >> * use get_online_cpus() and put_online_cpus() to make L1 and L2 register >> read hotplug-safe >> * add error check in probe routine >> >> .../devicetree/bindings/edac/armv8-edac.txt | 15 + >> drivers/edac/Kconfig | 6 + >> drivers/edac/Makefile | 1 + >> drivers/edac/cortex_arm64_edac.c | 457 +++++++++++++++++++++ >> 4 files changed, 479 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/edac/armv8-edac.txt >> create mode 100644 drivers/edac/cortex_arm64_edac.c >> >> diff --git a/Documentation/devicetree/bindings/edac/armv8-edac.txt b/Documentation/devicetree/bindings/edac/armv8-edac.txt >> new file mode 100644 >> index 0000000..dfd128f >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/edac/armv8-edac.txt >> @@ -0,0 +1,15 @@ >> +* ARMv8 L1/L2 cache error reporting >> + >> +On ARMv8, CPU Memory Error Syndrome Register and L2 Memory Error Syndrome >> +Register can be used for checking L1 and L2 memory errors. >> + >> +The following section describes the ARMv8 EDAC DT node binding. >> + >> +Required properties: >> +- compatible: Should be "arm,armv8-edac" >> + >> +Example: >> + edac { >> + compatible = "arm,armv8-edac"; >> + }; >> + >> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig >> index ef25000..dd7c195 100644 >> --- a/drivers/edac/Kconfig >> +++ b/drivers/edac/Kconfig >> @@ -390,4 +390,10 @@ config EDAC_XGENE >> Support for error detection and correction on the >> APM X-Gene family of SOCs. >> >> +config EDAC_CORTEX_ARM64 >> + tristate "ARM Cortex A57/A53" >> + depends on EDAC_MM_EDAC && ARM64 > > It would be good to be able to compile it on non-ARM64 archs > if COMPILE_TEST, e. g.: > > depends on EDAC_MM_EDAC && (ARM64 || COMPILE_TEST) > > That would allow testing tools like Coverity to test it. As far as > I know, the public license we use only works on x86. > >> + help >> + Support for error detection and correction on the >> + ARM Cortex A57 and A53. >> endif # EDAC >> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile >> index ae3c5f3..ac01660 100644 >> --- a/drivers/edac/Makefile >> +++ b/drivers/edac/Makefile >> @@ -68,3 +68,4 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o >> obj-$(CONFIG_EDAC_ALTERA_MC) += altera_edac.o >> obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o >> obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o >> +obj-$(CONFIG_EDAC_CORTEX_ARM64) += cortex_arm64_edac.o >> diff --git a/drivers/edac/cortex_arm64_edac.c b/drivers/edac/cortex_arm64_edac.c >> new file mode 100644 >> index 0000000..c37bb94 >> --- /dev/null >> +++ b/drivers/edac/cortex_arm64_edac.c >> @@ -0,0 +1,457 @@ >> +/* >> + * Cortex ARM64 EDAC >> + * >> + * Copyright (c) 2015, Advanced Micro Devices >> + * Author: Brijesh Singh >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation; either version 2 of the License. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + */ >> + >> +#include >> +#include >> +#include >> + >> +#include "edac_core.h" >> + >> +#define EDAC_MOD_STR "cortex_arm64_edac" >> + >> +#define A57_CPUMERRSR_EL1_INDEX(x) ((x) & 0x1ffff) >> +#define A57_CPUMERRSR_EL1_BANK(x) (((x) >> 18) & 0x1f) >> +#define A57_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A57_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A57_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0x7f) >> +#define A57_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A57_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A57_L1_I_TAG_RAM 0x00 >> +#define A57_L1_I_DATA_RAM 0x01 >> +#define A57_L1_D_TAG_RAM 0x08 >> +#define A57_L1_D_DATA_RAM 0x09 >> +#define A57_L1_TLB_RAM 0x18 >> + >> +#define A57_L2MERRSR_EL1_INDEX(x) ((x) & 0x1ffff) >> +#define A57_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0xf) >> +#define A57_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A57_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A57_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A57_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A57_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A57_L2_TAG_RAM 0x10 >> +#define A57_L2_DATA_RAM 0x11 >> +#define A57_L2_SNOOP_TAG_RAM 0x12 >> +#define A57_L2_DIRTY_RAM 0x14 >> +#define A57_L2_INCLUSION_PF_RAM 0x18 >> + >> +#define A53_CPUMERRSR_EL1_ADDR(x) ((x) & 0xfff) >> +#define A53_CPUMERRSR_EL1_CPUID(x) (((x) >> 18) & 0x07) >> +#define A53_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A53_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A53_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A53_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A53_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A53_L1_I_TAG_RAM 0x00 >> +#define A53_L1_I_DATA_RAM 0x01 >> +#define A53_L1_D_TAG_RAM 0x08 >> +#define A53_L1_D_DATA_RAM 0x09 >> +#define A53_L1_D_DIRT_RAM 0x0A >> +#define A53_L1_TLB_RAM 0x18 >> + >> +#define A53_L2MERRSR_EL1_INDEX(x) (((x) >> 3) & 0x3fff) >> +#define A53_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0x0f) >> +#define A53_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A53_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A53_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A53_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A53_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A53_L2_TAG_RAM 0x10 >> +#define A53_L2_DATA_RAM 0x11 >> +#define A53_L2_SNOOP_RAM 0x12 >> + >> +#define L1_CACHE 0 >> +#define L2_CACHE 1 >> + >> +int poll_msec = 100; >> + >> +struct cortex_arm64_edac { >> + struct edac_device_ctl_info *edac_ctl; >> +}; >> + >> +static inline u64 read_cpumerrsr_el1(void) >> +{ >> + u64 val; >> + >> + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val)); >> + return val; >> +} >> + >> +static inline void write_cpumerrsr_el1(u64 val) >> +{ >> + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (val)); >> +} >> + >> +static inline u64 read_l2merrsr_el1(void) >> +{ >> + u64 val; >> + >> + asm volatile("mrs %0, s3_1_c15_c2_3" : "=r" (val)); >> + return val; >> +} >> + >> +static inline void write_l2merrsr_el1(u64 val) >> +{ >> + asm volatile("msr s3_1_c15_c2_3, %0" :: "r" (val)); >> +} >> + > > If we're willing to compile with COMPILE_TEST, we'll need to provide > some stubs for the above functions that won't use asm. > >> +static void a53_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_l2merrsr_el1(); >> + >> + if (!A53_L2MERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A53_L2MERRSR_EL1_FATAL(val); >> + repeat_err = A53_L2MERRSR_EL1_REPEAT(val); >> + other_err = A53_L2MERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A53 CPU%d L2 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); >> + >> + switch (A53_L2MERRSR_EL1_RAMID(val)) { >> + case A53_L2_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); >> + break; >> + case A53_L2_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); >> + break; >> + case A53_L2_SNOOP_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop filter RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + write_l2merrsr_el1(0); >> +} >> + >> +static void a57_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_l2merrsr_el1(); >> + >> + if (!A57_L2MERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A57_L2MERRSR_EL1_FATAL(val); >> + repeat_err = A57_L2MERRSR_EL1_REPEAT(val); >> + other_err = A57_L2MERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A57 CPU%d L2 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); >> + >> + switch (A57_L2MERRSR_EL1_RAMID(val)) { >> + case A57_L2_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); >> + break; >> + case A57_L2_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); >> + break; >> + case A57_L2_SNOOP_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop tag RAM\n"); >> + break; >> + case A57_L2_DIRTY_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Dirty RAM\n"); >> + break; >> + case A57_L2_INCLUSION_PF_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 inclusion PF RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + write_l2merrsr_el1(0); >> +} >> + >> +static void a57_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_cpumerrsr_el1(); >> + >> + if (!A57_CPUMERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A57_CPUMERRSR_EL1_FATAL(val); >> + repeat_err = A57_CPUMERRSR_EL1_REPEAT(val); >> + other_err = A57_CPUMERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "CPU%d L1 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); >> + >> + switch (A57_CPUMERRSR_EL1_RAMID(val)) { >> + case A57_L1_I_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); >> + break; >> + case A57_L1_I_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); >> + break; >> + case A57_L1_D_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); >> + break; >> + case A57_L1_D_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); >> + break; >> + case A57_L1_TLB_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + write_cpumerrsr_el1(0); >> +} >> + >> +static void a53_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_cpumerrsr_el1(); >> + >> + if (!A53_CPUMERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A53_CPUMERRSR_EL1_FATAL(val); >> + repeat_err = A53_CPUMERRSR_EL1_REPEAT(val); >> + other_err = A53_CPUMERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A53 CPU%d L1 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); >> + >> + switch (A53_CPUMERRSR_EL1_RAMID(val)) { >> + case A53_L1_I_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); >> + break; >> + case A53_L1_I_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); >> + break; >> + case A53_L1_D_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); >> + break; >> + case A53_L1_D_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); >> + break; >> + case A53_L1_TLB_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); > > The above code doesn't look right to me. It should be, instead, calling > one of the functions that output the errors also via trace or to call one > of the trace functions directly (see the trace functions currently defined > at include/ras/ras_event.h). > > Failing to do that would cause RAS tools (like rasdaemon) to not get > the errors. > Noted. I will use trace_mc_event() to generate event but it seems that I still need to call edac_device_handle_ce/ue () to log the error in sysfs files. Also in case of UE I noticed that edac_device_handle_ue() takes care of causing panic (the expected behavior). So is it okay to use both trace event as well as edac_device_handle_xx. Something like this if (L2MERRSR_EL1_FATAL(val)) { trace_mc_event(HW_EVENT_ERR_UNCORRECTED, "L2 fatal error", "", repeat_err, 0, 0, 0, 0, index, 0, 0, "cortex_arm64_edac"); edac_device_handle_ue(edac_ctl, cpu, L2_CACHE, edac_ctl->name); } else { trace_mc_event(HW_EVENT_ERR_CORRECTED, "L2 non-fatal error", "", repeat_err, 0, 0, 0, 0, index, 0, 0, "cortex_arm64_edac"); edac_device_handle_ce(edac_ctl, cpu, L2_CACHE, edac_ctl->name); } >> + write_cpumerrsr_el1(0); >> +} >> + >> +static void parse_cpumerrsr(void *args) >> +{ >> + struct edac_device_ctl_info *edac_ctl = args; >> + int partnum = read_cpuid_part_number(); >> + >> + switch (partnum) { >> + case ARM_CPU_PART_CORTEX_A57: >> + a57_parse_cpumerrsr(edac_ctl); >> + break; >> + case ARM_CPU_PART_CORTEX_A53: >> + a53_parse_cpumerrsr(edac_ctl); >> + break; >> + } >> +} >> + >> +static void parse_l2merrsr(void *args) >> +{ >> + struct edac_device_ctl_info *edac_ctl = args; >> + int partnum = read_cpuid_part_number(); >> + >> + switch (partnum) { >> + case ARM_CPU_PART_CORTEX_A57: >> + a57_parse_l2merrsr(edac_ctl); >> + break; >> + case ARM_CPU_PART_CORTEX_A53: >> + a53_parse_l2merrsr(edac_ctl); >> + break; >> + } >> +} >> + >> +static void arm64_monitor_cache_errors(struct edac_device_ctl_info *edev_ctl) >> +{ >> + int cpu; >> + struct cpumask cluster_mask, old_mask; >> + >> + cpumask_clear(&cluster_mask); >> + cpumask_clear(&old_mask); >> + >> + get_online_cpus(); >> + for_each_online_cpu(cpu) { >> + smp_call_function_single(cpu, parse_cpumerrsr, edev_ctl, 0); >> + cpumask_copy(&cluster_mask, topology_core_cpumask(cpu)); >> + if (cpumask_equal(&cluster_mask, &old_mask)) >> + continue; >> + cpumask_copy(&old_mask, &cluster_mask); >> + smp_call_function_any(&cluster_mask, parse_l2merrsr, >> + edev_ctl, 0); >> + } >> + put_online_cpus(); >> +} >> + >> +static int cortex_arm64_edac_probe(struct platform_device *pdev) >> +{ >> + int rc; >> + struct cortex_arm64_edac *drv; >> + struct device *dev = &pdev->dev; >> + >> + drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL); >> + if (!drv) >> + return -ENOMEM; >> + >> + drv->edac_ctl = edac_device_alloc_ctl_info(0, "cpu", >> + num_possible_cpus(), "L", 2, >> + 1, NULL, 0, >> + edac_device_alloc_index()); >> + if (IS_ERR(drv->edac_ctl)) >> + return -ENOMEM; >> + >> + drv->edac_ctl->poll_msec = poll_msec; >> + drv->edac_ctl->edac_check = arm64_monitor_cache_errors; >> + drv->edac_ctl->dev = dev; >> + drv->edac_ctl->mod_name = dev_name(dev); >> + drv->edac_ctl->dev_name = dev_name(dev); >> + drv->edac_ctl->ctl_name = "cpu_err"; >> + drv->edac_ctl->panic_on_ue = 1; >> + platform_set_drvdata(pdev, drv); >> + >> + rc = edac_device_add_device(drv->edac_ctl); >> + if (rc) >> + goto edac_alloc_failed; >> + >> + return 0; >> + >> +edac_alloc_failed: >> + edac_device_free_ctl_info(drv->edac_ctl); >> + return rc; >> +} >> + >> +static int cortex_arm64_edac_remove(struct platform_device *pdev) >> +{ >> + struct cortex_arm64_edac *drv = dev_get_drvdata(&pdev->dev); >> + struct edac_device_ctl_info *edac_ctl = drv->edac_ctl; >> + >> + edac_device_del_device(edac_ctl->dev); >> + edac_device_free_ctl_info(edac_ctl); >> + >> + return 0; >> +} >> + >> +static const struct of_device_id cortex_arm64_edac_of_match[] = { >> + { .compatible = "arm,armv8-edac" }, >> + {}, >> +}; >> +MODULE_DEVICE_TABLE(of, cortex_arm64_edac_of_match); >> + >> +static struct platform_driver cortex_arm64_edac_driver = { >> + .probe = cortex_arm64_edac_probe, >> + .remove = cortex_arm64_edac_remove, >> + .driver = { >> + .name = "arm64-edac", >> + .owner = THIS_MODULE, >> + .of_match_table = cortex_arm64_edac_of_match, >> + }, >> +}; >> + >> +static int __init cortex_arm64_edac_init(void) >> +{ >> + int rc; >> + >> + /* Only POLL mode is supported so far */ >> + edac_op_state = EDAC_OPSTATE_POLL; >> + >> + rc = platform_driver_register(&cortex_arm64_edac_driver); >> + if (rc) { >> + edac_printk(KERN_ERR, EDAC_MOD_STR, "failed to register\n"); >> + return rc; >> + } >> + >> + return 0; >> +} >> +module_init(cortex_arm64_edac_init); >> + >> +static void __exit cortex_arm64_edac_exit(void) >> +{ >> + platform_driver_unregister(&cortex_arm64_edac_driver); >> +} >> +module_exit(cortex_arm64_edac_exit); >> + >> +MODULE_LICENSE("GPL"); >> +MODULE_AUTHOR("Brijesh Singh "); >> +MODULE_DESCRIPTION("Cortex A57 and A53 EDAC driver"); >> +module_param(poll_msec, int, 0444); >> +MODULE_PARM_DESC(poll_msec, "EDAC monitor poll interval in msec"); From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brijesh Singh Subject: Re: [PATCH v2] EDAC: Add ARM64 EDAC Date: Thu, 22 Oct 2015 13:47:46 -0500 Message-ID: <56292F52.2030407@amd.com> References: <1445460097-10260-1-git-send-email-brijeshkumar.singh@amd.com> <20151021192536.2af0f8c5@recife.lan> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20151021192536.2af0f8c5-+RedX5hVuTR+urZeOPWqwQ@public.gmane.org> Sender: devicetree-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mauro Carvalho Chehab Cc: brijeshkumar.singh-5C7GfCeVMHo@public.gmane.org, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, pawel.moll-5wv7dgnIgG8@public.gmane.org, mark.rutland-5wv7dgnIgG8@public.gmane.org, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg@public.gmane.org, galak-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org, dougthompson-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org, bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org, devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, guohanjun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org, andre.przywara-5wv7dgnIgG8@public.gmane.org, arnd-r2nGTMty4D4@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-edac-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: devicetree@vger.kernel.org Hi Mauro, On 10/21/2015 04:25 PM, Mauro Carvalho Chehab wrote: > Em Wed, 21 Oct 2015 15:41:37 -0500 > Brijesh Singh escreveu: > >> Add support for Cortex A57 and A53 EDAC driver. >> >> Signed-off-by: Brijesh Singh >> CC: robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org >> CC: pawel.moll-5wv7dgnIgG8@public.gmane.org >> CC: mark.rutland-5wv7dgnIgG8@public.gmane.org >> CC: ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg@public.gmane.org >> CC: galak-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org >> CC: dougthompson-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org >> CC: bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org >> CC: mchehab-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org >> CC: devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> CC: guohanjun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org >> CC: andre.przywara-5wv7dgnIgG8@public.gmane.org >> CC: arnd-r2nGTMty4D4@public.gmane.org >> CC: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> CC: linux-edac-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> --- >> >> v2: >> * convert into generic arm64 edac driver >> * remove AMD specific references from dt binding >> * remove poll_msec property from dt binding >> * add poll_msec as a module param, default is 100ms >> * update copyright text >> * define macro mnemonics for L1 and L2 RAMID >> * check L2 error per-cluster instead of per core >> * update function names >> * use get_online_cpus() and put_online_cpus() to make L1 and L2 register >> read hotplug-safe >> * add error check in probe routine >> >> .../devicetree/bindings/edac/armv8-edac.txt | 15 + >> drivers/edac/Kconfig | 6 + >> drivers/edac/Makefile | 1 + >> drivers/edac/cortex_arm64_edac.c | 457 +++++++++++++++++++++ >> 4 files changed, 479 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/edac/armv8-edac.txt >> create mode 100644 drivers/edac/cortex_arm64_edac.c >> >> diff --git a/Documentation/devicetree/bindings/edac/armv8-edac.txt b/Documentation/devicetree/bindings/edac/armv8-edac.txt >> new file mode 100644 >> index 0000000..dfd128f >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/edac/armv8-edac.txt >> @@ -0,0 +1,15 @@ >> +* ARMv8 L1/L2 cache error reporting >> + >> +On ARMv8, CPU Memory Error Syndrome Register and L2 Memory Error Syndrome >> +Register can be used for checking L1 and L2 memory errors. >> + >> +The following section describes the ARMv8 EDAC DT node binding. >> + >> +Required properties: >> +- compatible: Should be "arm,armv8-edac" >> + >> +Example: >> + edac { >> + compatible = "arm,armv8-edac"; >> + }; >> + >> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig >> index ef25000..dd7c195 100644 >> --- a/drivers/edac/Kconfig >> +++ b/drivers/edac/Kconfig >> @@ -390,4 +390,10 @@ config EDAC_XGENE >> Support for error detection and correction on the >> APM X-Gene family of SOCs. >> >> +config EDAC_CORTEX_ARM64 >> + tristate "ARM Cortex A57/A53" >> + depends on EDAC_MM_EDAC && ARM64 > > It would be good to be able to compile it on non-ARM64 archs > if COMPILE_TEST, e. g.: > > depends on EDAC_MM_EDAC && (ARM64 || COMPILE_TEST) > > That would allow testing tools like Coverity to test it. As far as > I know, the public license we use only works on x86. > >> + help >> + Support for error detection and correction on the >> + ARM Cortex A57 and A53. >> endif # EDAC >> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile >> index ae3c5f3..ac01660 100644 >> --- a/drivers/edac/Makefile >> +++ b/drivers/edac/Makefile >> @@ -68,3 +68,4 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o >> obj-$(CONFIG_EDAC_ALTERA_MC) += altera_edac.o >> obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o >> obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o >> +obj-$(CONFIG_EDAC_CORTEX_ARM64) += cortex_arm64_edac.o >> diff --git a/drivers/edac/cortex_arm64_edac.c b/drivers/edac/cortex_arm64_edac.c >> new file mode 100644 >> index 0000000..c37bb94 >> --- /dev/null >> +++ b/drivers/edac/cortex_arm64_edac.c >> @@ -0,0 +1,457 @@ >> +/* >> + * Cortex ARM64 EDAC >> + * >> + * Copyright (c) 2015, Advanced Micro Devices >> + * Author: Brijesh Singh >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation; either version 2 of the License. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + */ >> + >> +#include >> +#include >> +#include >> + >> +#include "edac_core.h" >> + >> +#define EDAC_MOD_STR "cortex_arm64_edac" >> + >> +#define A57_CPUMERRSR_EL1_INDEX(x) ((x) & 0x1ffff) >> +#define A57_CPUMERRSR_EL1_BANK(x) (((x) >> 18) & 0x1f) >> +#define A57_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A57_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A57_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0x7f) >> +#define A57_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A57_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A57_L1_I_TAG_RAM 0x00 >> +#define A57_L1_I_DATA_RAM 0x01 >> +#define A57_L1_D_TAG_RAM 0x08 >> +#define A57_L1_D_DATA_RAM 0x09 >> +#define A57_L1_TLB_RAM 0x18 >> + >> +#define A57_L2MERRSR_EL1_INDEX(x) ((x) & 0x1ffff) >> +#define A57_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0xf) >> +#define A57_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A57_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A57_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A57_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A57_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A57_L2_TAG_RAM 0x10 >> +#define A57_L2_DATA_RAM 0x11 >> +#define A57_L2_SNOOP_TAG_RAM 0x12 >> +#define A57_L2_DIRTY_RAM 0x14 >> +#define A57_L2_INCLUSION_PF_RAM 0x18 >> + >> +#define A53_CPUMERRSR_EL1_ADDR(x) ((x) & 0xfff) >> +#define A53_CPUMERRSR_EL1_CPUID(x) (((x) >> 18) & 0x07) >> +#define A53_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A53_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A53_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A53_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A53_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A53_L1_I_TAG_RAM 0x00 >> +#define A53_L1_I_DATA_RAM 0x01 >> +#define A53_L1_D_TAG_RAM 0x08 >> +#define A53_L1_D_DATA_RAM 0x09 >> +#define A53_L1_D_DIRT_RAM 0x0A >> +#define A53_L1_TLB_RAM 0x18 >> + >> +#define A53_L2MERRSR_EL1_INDEX(x) (((x) >> 3) & 0x3fff) >> +#define A53_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0x0f) >> +#define A53_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A53_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A53_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A53_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A53_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A53_L2_TAG_RAM 0x10 >> +#define A53_L2_DATA_RAM 0x11 >> +#define A53_L2_SNOOP_RAM 0x12 >> + >> +#define L1_CACHE 0 >> +#define L2_CACHE 1 >> + >> +int poll_msec = 100; >> + >> +struct cortex_arm64_edac { >> + struct edac_device_ctl_info *edac_ctl; >> +}; >> + >> +static inline u64 read_cpumerrsr_el1(void) >> +{ >> + u64 val; >> + >> + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val)); >> + return val; >> +} >> + >> +static inline void write_cpumerrsr_el1(u64 val) >> +{ >> + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (val)); >> +} >> + >> +static inline u64 read_l2merrsr_el1(void) >> +{ >> + u64 val; >> + >> + asm volatile("mrs %0, s3_1_c15_c2_3" : "=r" (val)); >> + return val; >> +} >> + >> +static inline void write_l2merrsr_el1(u64 val) >> +{ >> + asm volatile("msr s3_1_c15_c2_3, %0" :: "r" (val)); >> +} >> + > > If we're willing to compile with COMPILE_TEST, we'll need to provide > some stubs for the above functions that won't use asm. > >> +static void a53_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_l2merrsr_el1(); >> + >> + if (!A53_L2MERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A53_L2MERRSR_EL1_FATAL(val); >> + repeat_err = A53_L2MERRSR_EL1_REPEAT(val); >> + other_err = A53_L2MERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A53 CPU%d L2 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); >> + >> + switch (A53_L2MERRSR_EL1_RAMID(val)) { >> + case A53_L2_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); >> + break; >> + case A53_L2_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); >> + break; >> + case A53_L2_SNOOP_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop filter RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + write_l2merrsr_el1(0); >> +} >> + >> +static void a57_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_l2merrsr_el1(); >> + >> + if (!A57_L2MERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A57_L2MERRSR_EL1_FATAL(val); >> + repeat_err = A57_L2MERRSR_EL1_REPEAT(val); >> + other_err = A57_L2MERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A57 CPU%d L2 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); >> + >> + switch (A57_L2MERRSR_EL1_RAMID(val)) { >> + case A57_L2_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); >> + break; >> + case A57_L2_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); >> + break; >> + case A57_L2_SNOOP_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop tag RAM\n"); >> + break; >> + case A57_L2_DIRTY_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Dirty RAM\n"); >> + break; >> + case A57_L2_INCLUSION_PF_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 inclusion PF RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + write_l2merrsr_el1(0); >> +} >> + >> +static void a57_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_cpumerrsr_el1(); >> + >> + if (!A57_CPUMERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A57_CPUMERRSR_EL1_FATAL(val); >> + repeat_err = A57_CPUMERRSR_EL1_REPEAT(val); >> + other_err = A57_CPUMERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "CPU%d L1 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); >> + >> + switch (A57_CPUMERRSR_EL1_RAMID(val)) { >> + case A57_L1_I_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); >> + break; >> + case A57_L1_I_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); >> + break; >> + case A57_L1_D_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); >> + break; >> + case A57_L1_D_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); >> + break; >> + case A57_L1_TLB_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + write_cpumerrsr_el1(0); >> +} >> + >> +static void a53_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_cpumerrsr_el1(); >> + >> + if (!A53_CPUMERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A53_CPUMERRSR_EL1_FATAL(val); >> + repeat_err = A53_CPUMERRSR_EL1_REPEAT(val); >> + other_err = A53_CPUMERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A53 CPU%d L1 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); >> + >> + switch (A53_CPUMERRSR_EL1_RAMID(val)) { >> + case A53_L1_I_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); >> + break; >> + case A53_L1_I_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); >> + break; >> + case A53_L1_D_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); >> + break; >> + case A53_L1_D_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); >> + break; >> + case A53_L1_TLB_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); > > The above code doesn't look right to me. It should be, instead, calling > one of the functions that output the errors also via trace or to call one > of the trace functions directly (see the trace functions currently defined > at include/ras/ras_event.h). > > Failing to do that would cause RAS tools (like rasdaemon) to not get > the errors. > Noted. I will use trace_mc_event() to generate event but it seems that I still need to call edac_device_handle_ce/ue () to log the error in sysfs files. Also in case of UE I noticed that edac_device_handle_ue() takes care of causing panic (the expected behavior). So is it okay to use both trace event as well as edac_device_handle_xx. Something like this if (L2MERRSR_EL1_FATAL(val)) { trace_mc_event(HW_EVENT_ERR_UNCORRECTED, "L2 fatal error", "", repeat_err, 0, 0, 0, 0, index, 0, 0, "cortex_arm64_edac"); edac_device_handle_ue(edac_ctl, cpu, L2_CACHE, edac_ctl->name); } else { trace_mc_event(HW_EVENT_ERR_CORRECTED, "L2 non-fatal error", "", repeat_err, 0, 0, 0, 0, index, 0, 0, "cortex_arm64_edac"); edac_device_handle_ce(edac_ctl, cpu, L2_CACHE, edac_ctl->name); } >> + write_cpumerrsr_el1(0); >> +} >> + >> +static void parse_cpumerrsr(void *args) >> +{ >> + struct edac_device_ctl_info *edac_ctl = args; >> + int partnum = read_cpuid_part_number(); >> + >> + switch (partnum) { >> + case ARM_CPU_PART_CORTEX_A57: >> + a57_parse_cpumerrsr(edac_ctl); >> + break; >> + case ARM_CPU_PART_CORTEX_A53: >> + a53_parse_cpumerrsr(edac_ctl); >> + break; >> + } >> +} >> + >> +static void parse_l2merrsr(void *args) >> +{ >> + struct edac_device_ctl_info *edac_ctl = args; >> + int partnum = read_cpuid_part_number(); >> + >> + switch (partnum) { >> + case ARM_CPU_PART_CORTEX_A57: >> + a57_parse_l2merrsr(edac_ctl); >> + break; >> + case ARM_CPU_PART_CORTEX_A53: >> + a53_parse_l2merrsr(edac_ctl); >> + break; >> + } >> +} >> + >> +static void arm64_monitor_cache_errors(struct edac_device_ctl_info *edev_ctl) >> +{ >> + int cpu; >> + struct cpumask cluster_mask, old_mask; >> + >> + cpumask_clear(&cluster_mask); >> + cpumask_clear(&old_mask); >> + >> + get_online_cpus(); >> + for_each_online_cpu(cpu) { >> + smp_call_function_single(cpu, parse_cpumerrsr, edev_ctl, 0); >> + cpumask_copy(&cluster_mask, topology_core_cpumask(cpu)); >> + if (cpumask_equal(&cluster_mask, &old_mask)) >> + continue; >> + cpumask_copy(&old_mask, &cluster_mask); >> + smp_call_function_any(&cluster_mask, parse_l2merrsr, >> + edev_ctl, 0); >> + } >> + put_online_cpus(); >> +} >> + >> +static int cortex_arm64_edac_probe(struct platform_device *pdev) >> +{ >> + int rc; >> + struct cortex_arm64_edac *drv; >> + struct device *dev = &pdev->dev; >> + >> + drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL); >> + if (!drv) >> + return -ENOMEM; >> + >> + drv->edac_ctl = edac_device_alloc_ctl_info(0, "cpu", >> + num_possible_cpus(), "L", 2, >> + 1, NULL, 0, >> + edac_device_alloc_index()); >> + if (IS_ERR(drv->edac_ctl)) >> + return -ENOMEM; >> + >> + drv->edac_ctl->poll_msec = poll_msec; >> + drv->edac_ctl->edac_check = arm64_monitor_cache_errors; >> + drv->edac_ctl->dev = dev; >> + drv->edac_ctl->mod_name = dev_name(dev); >> + drv->edac_ctl->dev_name = dev_name(dev); >> + drv->edac_ctl->ctl_name = "cpu_err"; >> + drv->edac_ctl->panic_on_ue = 1; >> + platform_set_drvdata(pdev, drv); >> + >> + rc = edac_device_add_device(drv->edac_ctl); >> + if (rc) >> + goto edac_alloc_failed; >> + >> + return 0; >> + >> +edac_alloc_failed: >> + edac_device_free_ctl_info(drv->edac_ctl); >> + return rc; >> +} >> + >> +static int cortex_arm64_edac_remove(struct platform_device *pdev) >> +{ >> + struct cortex_arm64_edac *drv = dev_get_drvdata(&pdev->dev); >> + struct edac_device_ctl_info *edac_ctl = drv->edac_ctl; >> + >> + edac_device_del_device(edac_ctl->dev); >> + edac_device_free_ctl_info(edac_ctl); >> + >> + return 0; >> +} >> + >> +static const struct of_device_id cortex_arm64_edac_of_match[] = { >> + { .compatible = "arm,armv8-edac" }, >> + {}, >> +}; >> +MODULE_DEVICE_TABLE(of, cortex_arm64_edac_of_match); >> + >> +static struct platform_driver cortex_arm64_edac_driver = { >> + .probe = cortex_arm64_edac_probe, >> + .remove = cortex_arm64_edac_remove, >> + .driver = { >> + .name = "arm64-edac", >> + .owner = THIS_MODULE, >> + .of_match_table = cortex_arm64_edac_of_match, >> + }, >> +}; >> + >> +static int __init cortex_arm64_edac_init(void) >> +{ >> + int rc; >> + >> + /* Only POLL mode is supported so far */ >> + edac_op_state = EDAC_OPSTATE_POLL; >> + >> + rc = platform_driver_register(&cortex_arm64_edac_driver); >> + if (rc) { >> + edac_printk(KERN_ERR, EDAC_MOD_STR, "failed to register\n"); >> + return rc; >> + } >> + >> + return 0; >> +} >> +module_init(cortex_arm64_edac_init); >> + >> +static void __exit cortex_arm64_edac_exit(void) >> +{ >> + platform_driver_unregister(&cortex_arm64_edac_driver); >> +} >> +module_exit(cortex_arm64_edac_exit); >> + >> +MODULE_LICENSE("GPL"); >> +MODULE_AUTHOR("Brijesh Singh "); >> +MODULE_DESCRIPTION("Cortex A57 and A53 EDAC driver"); >> +module_param(poll_msec, int, 0444); >> +MODULE_PARM_DESC(poll_msec, "EDAC monitor poll interval in msec"); -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: brijeshkumar.singh@amd.com (Brijesh Singh) Date: Thu, 22 Oct 2015 13:47:46 -0500 Subject: [PATCH v2] EDAC: Add ARM64 EDAC In-Reply-To: <20151021192536.2af0f8c5@recife.lan> References: <1445460097-10260-1-git-send-email-brijeshkumar.singh@amd.com> <20151021192536.2af0f8c5@recife.lan> Message-ID: <56292F52.2030407@amd.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Mauro, On 10/21/2015 04:25 PM, Mauro Carvalho Chehab wrote: > Em Wed, 21 Oct 2015 15:41:37 -0500 > Brijesh Singh escreveu: > >> Add support for Cortex A57 and A53 EDAC driver. >> >> Signed-off-by: Brijesh Singh >> CC: robh+dt at kernel.org >> CC: pawel.moll at arm.com >> CC: mark.rutland at arm.com >> CC: ijc+devicetree at hellion.org.uk >> CC: galak at codeaurora.org >> CC: dougthompson at xmission.com >> CC: bp at alien8.de >> CC: mchehab at osg.samsung.com >> CC: devicetree at vger.kernel.org >> CC: guohanjun at huawei.com >> CC: andre.przywara at arm.com >> CC: arnd at arndb.de >> CC: linux-kernel at vger.kernel.org >> CC: linux-edac at vger.kernel.org >> --- >> >> v2: >> * convert into generic arm64 edac driver >> * remove AMD specific references from dt binding >> * remove poll_msec property from dt binding >> * add poll_msec as a module param, default is 100ms >> * update copyright text >> * define macro mnemonics for L1 and L2 RAMID >> * check L2 error per-cluster instead of per core >> * update function names >> * use get_online_cpus() and put_online_cpus() to make L1 and L2 register >> read hotplug-safe >> * add error check in probe routine >> >> .../devicetree/bindings/edac/armv8-edac.txt | 15 + >> drivers/edac/Kconfig | 6 + >> drivers/edac/Makefile | 1 + >> drivers/edac/cortex_arm64_edac.c | 457 +++++++++++++++++++++ >> 4 files changed, 479 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/edac/armv8-edac.txt >> create mode 100644 drivers/edac/cortex_arm64_edac.c >> >> diff --git a/Documentation/devicetree/bindings/edac/armv8-edac.txt b/Documentation/devicetree/bindings/edac/armv8-edac.txt >> new file mode 100644 >> index 0000000..dfd128f >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/edac/armv8-edac.txt >> @@ -0,0 +1,15 @@ >> +* ARMv8 L1/L2 cache error reporting >> + >> +On ARMv8, CPU Memory Error Syndrome Register and L2 Memory Error Syndrome >> +Register can be used for checking L1 and L2 memory errors. >> + >> +The following section describes the ARMv8 EDAC DT node binding. >> + >> +Required properties: >> +- compatible: Should be "arm,armv8-edac" >> + >> +Example: >> + edac { >> + compatible = "arm,armv8-edac"; >> + }; >> + >> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig >> index ef25000..dd7c195 100644 >> --- a/drivers/edac/Kconfig >> +++ b/drivers/edac/Kconfig >> @@ -390,4 +390,10 @@ config EDAC_XGENE >> Support for error detection and correction on the >> APM X-Gene family of SOCs. >> >> +config EDAC_CORTEX_ARM64 >> + tristate "ARM Cortex A57/A53" >> + depends on EDAC_MM_EDAC && ARM64 > > It would be good to be able to compile it on non-ARM64 archs > if COMPILE_TEST, e. g.: > > depends on EDAC_MM_EDAC && (ARM64 || COMPILE_TEST) > > That would allow testing tools like Coverity to test it. As far as > I know, the public license we use only works on x86. > >> + help >> + Support for error detection and correction on the >> + ARM Cortex A57 and A53. >> endif # EDAC >> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile >> index ae3c5f3..ac01660 100644 >> --- a/drivers/edac/Makefile >> +++ b/drivers/edac/Makefile >> @@ -68,3 +68,4 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o >> obj-$(CONFIG_EDAC_ALTERA_MC) += altera_edac.o >> obj-$(CONFIG_EDAC_SYNOPSYS) += synopsys_edac.o >> obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o >> +obj-$(CONFIG_EDAC_CORTEX_ARM64) += cortex_arm64_edac.o >> diff --git a/drivers/edac/cortex_arm64_edac.c b/drivers/edac/cortex_arm64_edac.c >> new file mode 100644 >> index 0000000..c37bb94 >> --- /dev/null >> +++ b/drivers/edac/cortex_arm64_edac.c >> @@ -0,0 +1,457 @@ >> +/* >> + * Cortex ARM64 EDAC >> + * >> + * Copyright (c) 2015, Advanced Micro Devices >> + * Author: Brijesh Singh >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation; either version 2 of the License. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + */ >> + >> +#include >> +#include >> +#include >> + >> +#include "edac_core.h" >> + >> +#define EDAC_MOD_STR "cortex_arm64_edac" >> + >> +#define A57_CPUMERRSR_EL1_INDEX(x) ((x) & 0x1ffff) >> +#define A57_CPUMERRSR_EL1_BANK(x) (((x) >> 18) & 0x1f) >> +#define A57_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A57_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A57_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0x7f) >> +#define A57_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A57_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A57_L1_I_TAG_RAM 0x00 >> +#define A57_L1_I_DATA_RAM 0x01 >> +#define A57_L1_D_TAG_RAM 0x08 >> +#define A57_L1_D_DATA_RAM 0x09 >> +#define A57_L1_TLB_RAM 0x18 >> + >> +#define A57_L2MERRSR_EL1_INDEX(x) ((x) & 0x1ffff) >> +#define A57_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0xf) >> +#define A57_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A57_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A57_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A57_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A57_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A57_L2_TAG_RAM 0x10 >> +#define A57_L2_DATA_RAM 0x11 >> +#define A57_L2_SNOOP_TAG_RAM 0x12 >> +#define A57_L2_DIRTY_RAM 0x14 >> +#define A57_L2_INCLUSION_PF_RAM 0x18 >> + >> +#define A53_CPUMERRSR_EL1_ADDR(x) ((x) & 0xfff) >> +#define A53_CPUMERRSR_EL1_CPUID(x) (((x) >> 18) & 0x07) >> +#define A53_CPUMERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A53_CPUMERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A53_CPUMERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A53_CPUMERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A53_CPUMERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A53_L1_I_TAG_RAM 0x00 >> +#define A53_L1_I_DATA_RAM 0x01 >> +#define A53_L1_D_TAG_RAM 0x08 >> +#define A53_L1_D_DATA_RAM 0x09 >> +#define A53_L1_D_DIRT_RAM 0x0A >> +#define A53_L1_TLB_RAM 0x18 >> + >> +#define A53_L2MERRSR_EL1_INDEX(x) (((x) >> 3) & 0x3fff) >> +#define A53_L2MERRSR_EL1_CPUID(x) (((x) >> 18) & 0x0f) >> +#define A53_L2MERRSR_EL1_RAMID(x) (((x) >> 24) & 0x7f) >> +#define A53_L2MERRSR_EL1_VALID(x) ((x) & (1 << 31)) >> +#define A53_L2MERRSR_EL1_REPEAT(x) (((x) >> 32) & 0xff) >> +#define A53_L2MERRSR_EL1_OTHER(x) (((x) >> 40) & 0xff) >> +#define A53_L2MERRSR_EL1_FATAL(x) ((x) & (1UL << 63)) >> +#define A53_L2_TAG_RAM 0x10 >> +#define A53_L2_DATA_RAM 0x11 >> +#define A53_L2_SNOOP_RAM 0x12 >> + >> +#define L1_CACHE 0 >> +#define L2_CACHE 1 >> + >> +int poll_msec = 100; >> + >> +struct cortex_arm64_edac { >> + struct edac_device_ctl_info *edac_ctl; >> +}; >> + >> +static inline u64 read_cpumerrsr_el1(void) >> +{ >> + u64 val; >> + >> + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val)); >> + return val; >> +} >> + >> +static inline void write_cpumerrsr_el1(u64 val) >> +{ >> + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (val)); >> +} >> + >> +static inline u64 read_l2merrsr_el1(void) >> +{ >> + u64 val; >> + >> + asm volatile("mrs %0, s3_1_c15_c2_3" : "=r" (val)); >> + return val; >> +} >> + >> +static inline void write_l2merrsr_el1(u64 val) >> +{ >> + asm volatile("msr s3_1_c15_c2_3, %0" :: "r" (val)); >> +} >> + > > If we're willing to compile with COMPILE_TEST, we'll need to provide > some stubs for the above functions that won't use asm. > >> +static void a53_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_l2merrsr_el1(); >> + >> + if (!A53_L2MERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A53_L2MERRSR_EL1_FATAL(val); >> + repeat_err = A53_L2MERRSR_EL1_REPEAT(val); >> + other_err = A53_L2MERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A53 CPU%d L2 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); >> + >> + switch (A53_L2MERRSR_EL1_RAMID(val)) { >> + case A53_L2_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); >> + break; >> + case A53_L2_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); >> + break; >> + case A53_L2_SNOOP_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop filter RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + write_l2merrsr_el1(0); >> +} >> + >> +static void a57_parse_l2merrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_l2merrsr_el1(); >> + >> + if (!A57_L2MERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A57_L2MERRSR_EL1_FATAL(val); >> + repeat_err = A57_L2MERRSR_EL1_REPEAT(val); >> + other_err = A57_L2MERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A57 CPU%d L2 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2MERRSR_EL1=%#llx\n", val); >> + >> + switch (A57_L2MERRSR_EL1_RAMID(val)) { >> + case A57_L2_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Tag RAM\n"); >> + break; >> + case A57_L2_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Data RAM\n"); >> + break; >> + case A57_L2_SNOOP_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Snoop tag RAM\n"); >> + break; >> + case A57_L2_DIRTY_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 Dirty RAM\n"); >> + break; >> + case A57_L2_INCLUSION_PF_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 inclusion PF RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L2_CACHE, >> + edac_ctl->name); >> + write_l2merrsr_el1(0); >> +} >> + >> +static void a57_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_cpumerrsr_el1(); >> + >> + if (!A57_CPUMERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A57_CPUMERRSR_EL1_FATAL(val); >> + repeat_err = A57_CPUMERRSR_EL1_REPEAT(val); >> + other_err = A57_CPUMERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "CPU%d L1 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); >> + >> + switch (A57_CPUMERRSR_EL1_RAMID(val)) { >> + case A57_L1_I_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); >> + break; >> + case A57_L1_I_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); >> + break; >> + case A57_L1_D_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); >> + break; >> + case A57_L1_D_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); >> + break; >> + case A57_L1_TLB_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + write_cpumerrsr_el1(0); >> +} >> + >> +static void a53_parse_cpumerrsr(struct edac_device_ctl_info *edac_ctl) >> +{ >> + int fatal; >> + int repeat_err, other_err; >> + u64 val = read_cpumerrsr_el1(); >> + >> + if (!A53_CPUMERRSR_EL1_VALID(val)) >> + return; >> + >> + fatal = A53_CPUMERRSR_EL1_FATAL(val); >> + repeat_err = A53_CPUMERRSR_EL1_REPEAT(val); >> + other_err = A53_CPUMERRSR_EL1_OTHER(val); >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, >> + "A53 CPU%d L1 %s error detected!\n", smp_processor_id(), >> + fatal ? "fatal" : "non-fatal"); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "CPUMERRSR_EL1=%#llx\n", val); >> + >> + switch (A53_CPUMERRSR_EL1_RAMID(val)) { >> + case A53_L1_I_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Tag RAM\n"); >> + break; >> + case A53_L1_I_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-I Data RAM\n"); >> + break; >> + case A53_L1_D_TAG_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Tag RAM\n"); >> + break; >> + case A53_L1_D_DATA_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L1-D Data RAM\n"); >> + break; >> + case A53_L1_TLB_RAM: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "L2 TLB RAM\n"); >> + break; >> + default: >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "unknown RAMID\n"); >> + break; >> + } >> + >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Repeated error count=%d", >> + repeat_err); >> + edac_printk(KERN_CRIT, EDAC_MOD_STR, "Other error count=%d\n", >> + other_err); >> + >> + if (fatal) >> + edac_device_handle_ue(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); >> + else >> + edac_device_handle_ce(edac_ctl, smp_processor_id(), L1_CACHE, >> + edac_ctl->name); > > The above code doesn't look right to me. It should be, instead, calling > one of the functions that output the errors also via trace or to call one > of the trace functions directly (see the trace functions currently defined > at include/ras/ras_event.h). > > Failing to do that would cause RAS tools (like rasdaemon) to not get > the errors. > Noted. I will use trace_mc_event() to generate event but it seems that I still need to call edac_device_handle_ce/ue () to log the error in sysfs files. Also in case of UE I noticed that edac_device_handle_ue() takes care of causing panic (the expected behavior). So is it okay to use both trace event as well as edac_device_handle_xx. Something like this if (L2MERRSR_EL1_FATAL(val)) { trace_mc_event(HW_EVENT_ERR_UNCORRECTED, "L2 fatal error", "", repeat_err, 0, 0, 0, 0, index, 0, 0, "cortex_arm64_edac"); edac_device_handle_ue(edac_ctl, cpu, L2_CACHE, edac_ctl->name); } else { trace_mc_event(HW_EVENT_ERR_CORRECTED, "L2 non-fatal error", "", repeat_err, 0, 0, 0, 0, index, 0, 0, "cortex_arm64_edac"); edac_device_handle_ce(edac_ctl, cpu, L2_CACHE, edac_ctl->name); } >> + write_cpumerrsr_el1(0); >> +} >> + >> +static void parse_cpumerrsr(void *args) >> +{ >> + struct edac_device_ctl_info *edac_ctl = args; >> + int partnum = read_cpuid_part_number(); >> + >> + switch (partnum) { >> + case ARM_CPU_PART_CORTEX_A57: >> + a57_parse_cpumerrsr(edac_ctl); >> + break; >> + case ARM_CPU_PART_CORTEX_A53: >> + a53_parse_cpumerrsr(edac_ctl); >> + break; >> + } >> +} >> + >> +static void parse_l2merrsr(void *args) >> +{ >> + struct edac_device_ctl_info *edac_ctl = args; >> + int partnum = read_cpuid_part_number(); >> + >> + switch (partnum) { >> + case ARM_CPU_PART_CORTEX_A57: >> + a57_parse_l2merrsr(edac_ctl); >> + break; >> + case ARM_CPU_PART_CORTEX_A53: >> + a53_parse_l2merrsr(edac_ctl); >> + break; >> + } >> +} >> + >> +static void arm64_monitor_cache_errors(struct edac_device_ctl_info *edev_ctl) >> +{ >> + int cpu; >> + struct cpumask cluster_mask, old_mask; >> + >> + cpumask_clear(&cluster_mask); >> + cpumask_clear(&old_mask); >> + >> + get_online_cpus(); >> + for_each_online_cpu(cpu) { >> + smp_call_function_single(cpu, parse_cpumerrsr, edev_ctl, 0); >> + cpumask_copy(&cluster_mask, topology_core_cpumask(cpu)); >> + if (cpumask_equal(&cluster_mask, &old_mask)) >> + continue; >> + cpumask_copy(&old_mask, &cluster_mask); >> + smp_call_function_any(&cluster_mask, parse_l2merrsr, >> + edev_ctl, 0); >> + } >> + put_online_cpus(); >> +} >> + >> +static int cortex_arm64_edac_probe(struct platform_device *pdev) >> +{ >> + int rc; >> + struct cortex_arm64_edac *drv; >> + struct device *dev = &pdev->dev; >> + >> + drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL); >> + if (!drv) >> + return -ENOMEM; >> + >> + drv->edac_ctl = edac_device_alloc_ctl_info(0, "cpu", >> + num_possible_cpus(), "L", 2, >> + 1, NULL, 0, >> + edac_device_alloc_index()); >> + if (IS_ERR(drv->edac_ctl)) >> + return -ENOMEM; >> + >> + drv->edac_ctl->poll_msec = poll_msec; >> + drv->edac_ctl->edac_check = arm64_monitor_cache_errors; >> + drv->edac_ctl->dev = dev; >> + drv->edac_ctl->mod_name = dev_name(dev); >> + drv->edac_ctl->dev_name = dev_name(dev); >> + drv->edac_ctl->ctl_name = "cpu_err"; >> + drv->edac_ctl->panic_on_ue = 1; >> + platform_set_drvdata(pdev, drv); >> + >> + rc = edac_device_add_device(drv->edac_ctl); >> + if (rc) >> + goto edac_alloc_failed; >> + >> + return 0; >> + >> +edac_alloc_failed: >> + edac_device_free_ctl_info(drv->edac_ctl); >> + return rc; >> +} >> + >> +static int cortex_arm64_edac_remove(struct platform_device *pdev) >> +{ >> + struct cortex_arm64_edac *drv = dev_get_drvdata(&pdev->dev); >> + struct edac_device_ctl_info *edac_ctl = drv->edac_ctl; >> + >> + edac_device_del_device(edac_ctl->dev); >> + edac_device_free_ctl_info(edac_ctl); >> + >> + return 0; >> +} >> + >> +static const struct of_device_id cortex_arm64_edac_of_match[] = { >> + { .compatible = "arm,armv8-edac" }, >> + {}, >> +}; >> +MODULE_DEVICE_TABLE(of, cortex_arm64_edac_of_match); >> + >> +static struct platform_driver cortex_arm64_edac_driver = { >> + .probe = cortex_arm64_edac_probe, >> + .remove = cortex_arm64_edac_remove, >> + .driver = { >> + .name = "arm64-edac", >> + .owner = THIS_MODULE, >> + .of_match_table = cortex_arm64_edac_of_match, >> + }, >> +}; >> + >> +static int __init cortex_arm64_edac_init(void) >> +{ >> + int rc; >> + >> + /* Only POLL mode is supported so far */ >> + edac_op_state = EDAC_OPSTATE_POLL; >> + >> + rc = platform_driver_register(&cortex_arm64_edac_driver); >> + if (rc) { >> + edac_printk(KERN_ERR, EDAC_MOD_STR, "failed to register\n"); >> + return rc; >> + } >> + >> + return 0; >> +} >> +module_init(cortex_arm64_edac_init); >> + >> +static void __exit cortex_arm64_edac_exit(void) >> +{ >> + platform_driver_unregister(&cortex_arm64_edac_driver); >> +} >> +module_exit(cortex_arm64_edac_exit); >> + >> +MODULE_LICENSE("GPL"); >> +MODULE_AUTHOR("Brijesh Singh "); >> +MODULE_DESCRIPTION("Cortex A57 and A53 EDAC driver"); >> +module_param(poll_msec, int, 0444); >> +MODULE_PARM_DESC(poll_msec, "EDAC monitor poll interval in msec");