From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752439AbbG1RqA (ORCPT ); Tue, 28 Jul 2015 13:46:00 -0400 Received: from mail-oi0-f53.google.com ([209.85.218.53]:35894 "EHLO mail-oi0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751159AbbG1Rp6 (ORCPT ); Tue, 28 Jul 2015 13:45:58 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150724224258.GA23990@google.com> From: Duc Dang Date: Tue, 28 Jul 2015 10:45:26 -0700 Message-ID: Subject: Re: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 To: Bjorn Helgaas Cc: Tanmay Inamdar , "linux-pci@vger.kernel.org" , linux-arm , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 28, 2015 at 9:43 AM, Bjorn Helgaas wrote: > On Fri, Jul 24, 2015 at 7:05 PM, Duc Dang wrote: >> Hi Bjorn, >> >> On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas wrote: >>> >>> I regularly see faults like this on an APM X-Gene: >>> >>> U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33) >>> CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz >>> 32 KB ICACHE, 32 KB DCACHE >>> SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz >>> ... >>> Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034 >>> Internal error: : 96000010 [#1] SMP >>> Modules linked in: >>> CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3 >>> Hardware name: APM X-Gene Mustang board (DT) >>> task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000 >>> PC is at pci_generic_config_read32+0x4c/0xb8 >>> LR is at pci_generic_config_read32+0x40/0xb8 >>> pc : [] lr : [] pstate: 600001c5 >>> ... >>> Call trace: >>> [] pci_generic_config_read32+0x4c/0xb8 >>> [] pci_user_read_config_byte+0x60/0xc4 >>> [] pci_read_config+0x15c/0x238 >>> [] sysfs_kf_bin_read+0x68/0xa0 >>> [] kernfs_fop_read+0x9c/0x1ac >>> [] __vfs_read+0x44/0x128 >>> [] vfs_read+0x84/0x144 >>> [] SyS_read+0x50/0xb0 >> >> The log shows kernel gets an exception when trying to access Mellanox >> card configuration space. This is usually due to suboptimal PCIe >> SerDes parameters are using in your board, which will cause bad link >> quality. >> The PCIe SerDes programming is done in U-Boot, so I suggest you do a >> U-Boot upgrade to our latest X-Gene U-Boot release. > > I installed U-Boot 1.15.12, which I thought was the latest. I'm still > seeing this issue regularly, approx once/hour. Our latest U-Boot is 1.15.15, but U-Boot 1.15.12 is already a good version to use. Are you running any PCIe traffic test when the error happens? I will try to reproduce the issue with my Mustang board as well. And it will be useful if you can share your "lspci -vvv" output when the board is running, we can check to see if there is any error status reported. -- Regards, Duc Dang. From mboxrd@z Thu Jan 1 00:00:00 1970 From: dhdang@apm.com (Duc Dang) Date: Tue, 28 Jul 2015 10:45:26 -0700 Subject: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 In-Reply-To: References: <20150724224258.GA23990@google.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Jul 28, 2015 at 9:43 AM, Bjorn Helgaas wrote: > On Fri, Jul 24, 2015 at 7:05 PM, Duc Dang wrote: >> Hi Bjorn, >> >> On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas wrote: >>> >>> I regularly see faults like this on an APM X-Gene: >>> >>> U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33) >>> CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz >>> 32 KB ICACHE, 32 KB DCACHE >>> SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz >>> ... >>> Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034 >>> Internal error: : 96000010 [#1] SMP >>> Modules linked in: >>> CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3 >>> Hardware name: APM X-Gene Mustang board (DT) >>> task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000 >>> PC is at pci_generic_config_read32+0x4c/0xb8 >>> LR is at pci_generic_config_read32+0x40/0xb8 >>> pc : [] lr : [] pstate: 600001c5 >>> ... >>> Call trace: >>> [] pci_generic_config_read32+0x4c/0xb8 >>> [] pci_user_read_config_byte+0x60/0xc4 >>> [] pci_read_config+0x15c/0x238 >>> [] sysfs_kf_bin_read+0x68/0xa0 >>> [] kernfs_fop_read+0x9c/0x1ac >>> [] __vfs_read+0x44/0x128 >>> [] vfs_read+0x84/0x144 >>> [] SyS_read+0x50/0xb0 >> >> The log shows kernel gets an exception when trying to access Mellanox >> card configuration space. This is usually due to suboptimal PCIe >> SerDes parameters are using in your board, which will cause bad link >> quality. >> The PCIe SerDes programming is done in U-Boot, so I suggest you do a >> U-Boot upgrade to our latest X-Gene U-Boot release. > > I installed U-Boot 1.15.12, which I thought was the latest. I'm still > seeing this issue regularly, approx once/hour. Our latest U-Boot is 1.15.15, but U-Boot 1.15.12 is already a good version to use. Are you running any PCIe traffic test when the error happens? I will try to reproduce the issue with my Mustang board as well. And it will be useful if you can share your "lspci -vvv" output when the board is running, we can check to see if there is any error status reported. -- Regards, Duc Dang.