From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030680AbcDMJ6W (ORCPT ); Wed, 13 Apr 2016 05:58:22 -0400 Received: from mail-ig0-f195.google.com ([209.85.213.195]:36390 "EHLO mail-ig0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030370AbcDMJ6U (ORCPT ); Wed, 13 Apr 2016 05:58:20 -0400 MIME-Version: 1.0 In-Reply-To: <20150728212944.GA12958@google.com> References: <20150724224258.GA23990@google.com> <20150728212944.GA12958@google.com> Date: Wed, 13 Apr 2016 10:58:18 +0100 X-Google-Sender-Auth: XNGGK3IgmOChl3KPZD4m6EF8FyA Message-ID: Subject: Re: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 From: Sudeep Holla To: Bjorn Helgaas Cc: Duc Dang , Tanmay Inamdar , "linux-pci@vger.kernel.org" , linux-arm , "linux-kernel@vger.kernel.org" , Sudeep Holla Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, (sorry for replying on the old thread, but I found it could be related to the issue I have now) On Tue, Jul 28, 2015 at 10:29 PM, Bjorn Helgaas wrote: > On Tue, Jul 28, 2015 at 10:45:26AM -0700, Duc Dang wrote: >> On Tue, Jul 28, 2015 at 9:43 AM, Bjorn Helgaas wrote: >> > On Fri, Jul 24, 2015 at 7:05 PM, Duc Dang wrote: >> >> Hi Bjorn, >> >> >> >> On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas wrote: >> >>> >> >>> I regularly see faults like this on an APM X-Gene: >> >>> >> >>> U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33) >> >>> CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz >> >>> 32 KB ICACHE, 32 KB DCACHE >> >>> SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz >> >>> ... >> >>> Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034 >> >>> Internal error: : 96000010 [#1] SMP >> >>> Modules linked in: >> >>> CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3 >> >>> Hardware name: APM X-Gene Mustang board (DT) >> >>> task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000 >> >>> PC is at pci_generic_config_read32+0x4c/0xb8 >> >>> LR is at pci_generic_config_read32+0x40/0xb8 >> >>> pc : [] lr : [] pstate: 600001c5 >> >>> ... >> >>> Call trace: >> >>> [] pci_generic_config_read32+0x4c/0xb8 >> >>> [] pci_user_read_config_byte+0x60/0xc4 >> >>> [] pci_read_config+0x15c/0x238 >> >>> [] sysfs_kf_bin_read+0x68/0xa0 >> >>> [] kernfs_fop_read+0x9c/0x1ac >> >>> [] __vfs_read+0x44/0x128 >> >>> [] vfs_read+0x84/0x144 >> >>> [] SyS_read+0x50/0xb0 >> >> >> >> The log shows kernel gets an exception when trying to access Mellanox >> >> card configuration space. This is usually due to suboptimal PCIe >> >> SerDes parameters are using in your board, which will cause bad link >> >> quality. >> >> The PCIe SerDes programming is done in U-Boot, so I suggest you do a >> >> U-Boot upgrade to our latest X-Gene U-Boot release. >> > >> > I installed U-Boot 1.15.12, which I thought was the latest. I'm still >> > seeing this issue regularly, approx once/hour. >> >> Our latest U-Boot is 1.15.15, but U-Boot 1.15.12 is already a good >> version to use. Are you running any PCIe traffic test when the error >> happens? > > Nope, the machine was either idle or running a reboot test; no PCIe stress > test or anything. > Was there any conclusion on this ? I am having similar issue[1] on my Juno with sky2 PCIe driver during reboot. Regards, Sudeep [1] http://marc.info/?l=linux-netdev&m=146046999701956&w=2 From mboxrd@z Thu Jan 1 00:00:00 1970 From: sudeep.holla@arm.com (Sudeep Holla) Date: Wed, 13 Apr 2016 10:58:18 +0100 Subject: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 In-Reply-To: <20150728212944.GA12958@google.com> References: <20150724224258.GA23990@google.com> <20150728212944.GA12958@google.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi, (sorry for replying on the old thread, but I found it could be related to the issue I have now) On Tue, Jul 28, 2015 at 10:29 PM, Bjorn Helgaas wrote: > On Tue, Jul 28, 2015 at 10:45:26AM -0700, Duc Dang wrote: >> On Tue, Jul 28, 2015 at 9:43 AM, Bjorn Helgaas wrote: >> > On Fri, Jul 24, 2015 at 7:05 PM, Duc Dang wrote: >> >> Hi Bjorn, >> >> >> >> On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas wrote: >> >>> >> >>> I regularly see faults like this on an APM X-Gene: >> >>> >> >>> U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33) >> >>> CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz >> >>> 32 KB ICACHE, 32 KB DCACHE >> >>> SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz >> >>> ... >> >>> Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034 >> >>> Internal error: : 96000010 [#1] SMP >> >>> Modules linked in: >> >>> CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3 >> >>> Hardware name: APM X-Gene Mustang board (DT) >> >>> task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000 >> >>> PC is at pci_generic_config_read32+0x4c/0xb8 >> >>> LR is at pci_generic_config_read32+0x40/0xb8 >> >>> pc : [] lr : [] pstate: 600001c5 >> >>> ... >> >>> Call trace: >> >>> [] pci_generic_config_read32+0x4c/0xb8 >> >>> [] pci_user_read_config_byte+0x60/0xc4 >> >>> [] pci_read_config+0x15c/0x238 >> >>> [] sysfs_kf_bin_read+0x68/0xa0 >> >>> [] kernfs_fop_read+0x9c/0x1ac >> >>> [] __vfs_read+0x44/0x128 >> >>> [] vfs_read+0x84/0x144 >> >>> [] SyS_read+0x50/0xb0 >> >> >> >> The log shows kernel gets an exception when trying to access Mellanox >> >> card configuration space. This is usually due to suboptimal PCIe >> >> SerDes parameters are using in your board, which will cause bad link >> >> quality. >> >> The PCIe SerDes programming is done in U-Boot, so I suggest you do a >> >> U-Boot upgrade to our latest X-Gene U-Boot release. >> > >> > I installed U-Boot 1.15.12, which I thought was the latest. I'm still >> > seeing this issue regularly, approx once/hour. >> >> Our latest U-Boot is 1.15.15, but U-Boot 1.15.12 is already a good >> version to use. Are you running any PCIe traffic test when the error >> happens? > > Nope, the machine was either idle or running a reboot test; no PCIe stress > test or anything. > Was there any conclusion on this ? I am having similar issue[1] on my Juno with sky2 PCIe driver during reboot. Regards, Sudeep [1] http://marc.info/?l=linux-netdev&m=146046999701956&w=2