From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A0EAC2D0A3 for ; Fri, 6 Nov 2020 19:35:21 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E479E2151B for ; Fri, 6 Nov 2020 19:35:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="XbU7A3NU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E479E2151B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=mIwoI84wXWrot51wMgLVgJHnMNAeKV52/T71FXWXObQ=; b=XbU7A3NU2oKk7utCwlrjUsO7x s5wbSQ0dMoGjBIiaq6WnqVd4SK2dq0RaOYucbwPff+KEhvowXTdIb78NxVqshULPqqMiN1LnZc72C jJPVumjE15d8es12EdlmToI5F2/Tg/h7wC1hwD3ZAQFzQp1iujZ1x4QF4E2U2KDdd8C9IBk9H4dCP XMtCL6zXzNE7LEZfVTcPALdM94Q/89LoimJQhP6lhvKVyszSF/EQHY0BAJSabds+WaXWNiKvYwYMn IgfqVorGG2XOQNxaDS9iGM0/8g3Stl+UXWu0eeTBpFLzGvv/wimQ0bY1jkWFbHpsqITP9nMviIx2a 3Tbt+x6xg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kb7V3-00015w-Gg; Fri, 06 Nov 2020 19:34:17 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kb7Uz-00014a-QE for linux-arm-kernel@lists.infradead.org; Fri, 06 Nov 2020 19:34:14 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DC30D1474; Fri, 6 Nov 2020 11:34:10 -0800 (PST) Received: from [172.16.1.113] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 538323F718; Fri, 6 Nov 2020 11:34:09 -0800 (PST) Subject: Re: [PATCH 2/3] drivers/edac: Add L1 and L2 error detection for A53 and A57 To: Sascha Hauer References: <20201013125033.4749-1-s.hauer@pengutronix.de> <20201013125033.4749-3-s.hauer@pengutronix.de> From: James Morse Message-ID: <6ea8a824-ba55-6c5b-993d-4c782e396f32@arm.com> Date: Fri, 6 Nov 2020 19:34:07 +0000 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201013125033.4749-3-s.hauer@pengutronix.de> Content-Language: en-GB X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201106_143413_988963_1484AEC2 X-CRM114-Status: GOOD ( 30.56 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: devicetree@vger.kernel.org, Tony Luck , Rob Herring , Robert Richter , Borislav Petkov , kernel@pengutronix.de, York Sun , Mauro Carvalho Chehab , linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Sascha, On 13/10/2020 13:50, Sascha Hauer wrote: > The Cortex A53 and A57 cores have error detection capabilities for the > L1/L2 Caches, this patch adds a driver for them. > > Unfortunately there is no robust way to inject errors into the caches, > so this driver doesn't contain any code to actually test it. It has > been tested though with code taken from an older version of this driver > found here: https://lkml.org/lkml/2018/3/14/1203. > For reasons stated > in this thread the error injection code is not suitable for mainline, > so it is removed from the driver. > diff --git a/drivers/edac/cortex_arm64_l1_l2.c b/drivers/edac/cortex_arm64_l1_l2.c > new file mode 100644 > index 000000000000..fb8386eb40ac > --- /dev/null > +++ b/drivers/edac/cortex_arm64_l1_l2.c > @@ -0,0 +1,208 @@ > +static void read_errors(void *data) > +{ > + struct edac_device_ctl_info *edac_ctl = data; > + int cpu = smp_processor_id(); > + char msg[MESSAGE_SIZE]; > + u64 cpumerr, l2merr; > + > + /* cpumerrsr_el1 */ > + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (cpumerr)); > + asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (0)); I think you've seen earlier comments on using the sys_reg macros for this. There were versions of binutils out there that choke on this. [...] > +} > + > +static void cortex_arm64_edac_check(struct edac_device_ctl_info *edac_ctl) > +{ > + struct arm64_pvt *pvt = edac_ctl->pvt_info; > + call_single_data_t *csd; > + int cpu; > + > + get_online_cpus(); > + for_each_cpu_and(cpu, cpu_online_mask, &pvt->compat_mask) { > + csd = per_cpu_ptr(pvt->csd_check, cpu); > + csd->func = read_errors; > + csd->info = edac_ctl; > + csd->flags = 0; > + /* Read CPU L1/L2 errors */ > + smp_call_function_single_async(cpu, csd); > + /* Wait until flags cleared */ > + smp_cond_load_acquire(&csd->flags, !VAL); Hmm. We end up waiting for each CPU to schedule something else. I can't see any reason we can't sleep here. Can't we use smp_call_function_many() here? It already considers cpu_online_mask, you'd just need to deal with read_errors() being called in parallel with itself. (concurrent calls into edac are one problem, but two CPUs read/writing the same L2 register could lead to double counting) > + } > + put_online_cpus(); > +} > +static int cortex_arm64_edac_probe(struct platform_device *pdev) > +{ > + struct device_node *np, *dn = pdev->dev.of_node; > + struct edac_device_ctl_info *edac_ctl; > + struct device *dev = &pdev->dev; > + struct of_phandle_iterator it; > + struct arm64_pvt *pvt; > + int rc, cpu; > + > + edac_ctl = edac_device_alloc_ctl_info(sizeof(*pvt), "cpu_cache", > + 1, "L", 2, 1, NULL, 0, > + edac_device_alloc_index()); I used this series to test on Juno to poke the user-space interface: This chokes on a big-little system as it can't register "cpu_cache" a second time. I think we should try to make the topology look like the one in edac_device.h. This means calling it 'cpu', and registering all of them up front. On a big/little system the second probe() call would need to be careful. I can have a go at this if you don't have a platform to hand. (The 'L2-cache' thing in edac_device.h turns out to be impossible and the 'Lx' you've done here is the most popular option. I'll post a patch to change the documentation to what people are doing) [...] > +} Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel