From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752710AbbG1SgZ (ORCPT ); Tue, 28 Jul 2015 14:36:25 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:34184 "EHLO mail-wi0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751355AbbG1SgX (ORCPT ); Tue, 28 Jul 2015 14:36:23 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150724224258.GA23990@google.com> <20150727113622.GE29945@e104818-lin.cambridge.arm.com> From: Bjorn Helgaas Date: Tue, 28 Jul 2015 13:36:02 -0500 Message-ID: Subject: Re: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 To: Duc Dang Cc: Catalin Marinas , "linux-pci@vger.kernel.org" , Tanmay Inamdar , linux-arm , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 28, 2015 at 12:39 PM, Duc Dang wrote: > On Mon, Jul 27, 2015 at 4:36 AM, Catalin Marinas > wrote: >> On Fri, Jul 24, 2015 at 05:05:19PM -0700, Duc Dang wrote: >>> On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas wrote: >>> > I regularly see faults like this on an APM X-Gene: >>> > >>> > U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33) >>> > CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz >>> > 32 KB ICACHE, 32 KB DCACHE >>> > SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz >>> > ... >>> > Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034 >> >> That's generated by an external device (PCIe root complex, card etc.) >> and some mis-configured CPU setting. >> >>> > Internal error: : 96000010 [#1] SMP >>> > Modules linked in: >>> > CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3 >>> > Hardware name: APM X-Gene Mustang board (DT) >>> > task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000 >>> > PC is at pci_generic_config_read32+0x4c/0xb8 >>> > LR is at pci_generic_config_read32+0x40/0xb8 >>> > pc : [] lr : [] pstate: 600001c5 >>> > ... >>> > Call trace: >>> > [] pci_generic_config_read32+0x4c/0xb8 >>> > [] pci_user_read_config_byte+0x60/0xc4 >>> > [] pci_read_config+0x15c/0x238 >>> > [] sysfs_kf_bin_read+0x68/0xa0 >>> > [] kernfs_fop_read+0x9c/0x1ac >>> > [] __vfs_read+0x44/0x128 >>> > [] vfs_read+0x84/0x144 >>> > [] SyS_read+0x50/0xb0 >>> >>> The log shows kernel gets an exception when trying to access Mellanox >>> card configuration space. This is usually due to suboptimal PCIe >>> SerDes parameters are using in your board, which will cause bad link >>> quality. >> >> I would have hoped that "suboptimal" means that it still works, albeit >> not fully optimal ;). > > Yes, it should still work, but you may see crashes occasionally due to > link quality. A crash seems like a too-severe response to a link quality issue. Isn't there some way to retry the access or return an error, so we don't have to crash the whole system? From mboxrd@z Thu Jan 1 00:00:00 1970 From: bhelgaas@google.com (Bjorn Helgaas) Date: Tue, 28 Jul 2015 13:36:02 -0500 Subject: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 In-Reply-To: References: <20150724224258.GA23990@google.com> <20150727113622.GE29945@e104818-lin.cambridge.arm.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Jul 28, 2015 at 12:39 PM, Duc Dang wrote: > On Mon, Jul 27, 2015 at 4:36 AM, Catalin Marinas > wrote: >> On Fri, Jul 24, 2015 at 05:05:19PM -0700, Duc Dang wrote: >>> On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas wrote: >>> > I regularly see faults like this on an APM X-Gene: >>> > >>> > U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33) >>> > CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz >>> > 32 KB ICACHE, 32 KB DCACHE >>> > SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz >>> > ... >>> > Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034 >> >> That's generated by an external device (PCIe root complex, card etc.) >> and some mis-configured CPU setting. >> >>> > Internal error: : 96000010 [#1] SMP >>> > Modules linked in: >>> > CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3 >>> > Hardware name: APM X-Gene Mustang board (DT) >>> > task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000 >>> > PC is at pci_generic_config_read32+0x4c/0xb8 >>> > LR is at pci_generic_config_read32+0x40/0xb8 >>> > pc : [] lr : [] pstate: 600001c5 >>> > ... >>> > Call trace: >>> > [] pci_generic_config_read32+0x4c/0xb8 >>> > [] pci_user_read_config_byte+0x60/0xc4 >>> > [] pci_read_config+0x15c/0x238 >>> > [] sysfs_kf_bin_read+0x68/0xa0 >>> > [] kernfs_fop_read+0x9c/0x1ac >>> > [] __vfs_read+0x44/0x128 >>> > [] vfs_read+0x84/0x144 >>> > [] SyS_read+0x50/0xb0 >>> >>> The log shows kernel gets an exception when trying to access Mellanox >>> card configuration space. This is usually due to suboptimal PCIe >>> SerDes parameters are using in your board, which will cause bad link >>> quality. >> >> I would have hoped that "suboptimal" means that it still works, albeit >> not fully optimal ;). > > Yes, it should still work, but you may see crashes occasionally due to > link quality. A crash seems like a too-severe response to a link quality issue. Isn't there some way to retry the access or return an error, so we don't have to crash the whole system?