From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754780AbbHJQSu (ORCPT ); Mon, 10 Aug 2015 12:18:50 -0400 Received: from mail-wi0-f170.google.com ([209.85.212.170]:38173 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754281AbbHJQSo (ORCPT ); Mon, 10 Aug 2015 12:18:44 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150724224258.GA23990@google.com> <20150728212944.GA12958@google.com> <20150729012255.GA18606@google.com> <20150729155509.GA31170@google.com> From: Bjorn Helgaas Date: Mon, 10 Aug 2015 11:18:23 -0500 Message-ID: Subject: Re: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 To: Duc Dang Cc: Tanmay Inamdar , "linux-pci@vger.kernel.org" , linux-arm , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang wrote: > On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas wrote: >> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote: >>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote: >> >>> > Do you have another PCIe card to try on the same reboot test on this board? >>> >>> I've seen this on at least two Mellanox cards. I'm running similar tests >>> on a different type of card now. >> >> FWIW, reboot tests on two machines with Mellanox cards failed, while the >> same test on a machine with a different proprietary card succeeded. > > Thanks, Bjorn. > > I don't have the same Mellanox card as yours, but I will also run > similar reboot test to see if I hit the same issue with my card. Any more hints on this? Nothing has changed on my end, so of course I'm still seeing this, always on machines with Mellanox, and never on other machines. Could this be a hardware issue like a signal integrity or margin issue? I don't know where to go from here because I'm not a hardware person, and I don't know anything to do in software. Bjorn From mboxrd@z Thu Jan 1 00:00:00 1970 From: bhelgaas@google.com (Bjorn Helgaas) Date: Mon, 10 Aug 2015 11:18:23 -0500 Subject: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 In-Reply-To: References: <20150724224258.GA23990@google.com> <20150728212944.GA12958@google.com> <20150729012255.GA18606@google.com> <20150729155509.GA31170@google.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang wrote: > On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas wrote: >> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote: >>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote: >> >>> > Do you have another PCIe card to try on the same reboot test on this board? >>> >>> I've seen this on at least two Mellanox cards. I'm running similar tests >>> on a different type of card now. >> >> FWIW, reboot tests on two machines with Mellanox cards failed, while the >> same test on a machine with a different proprietary card succeeded. > > Thanks, Bjorn. > > I don't have the same Mellanox card as yours, but I will also run > similar reboot test to see if I hit the same issue with my card. Any more hints on this? Nothing has changed on my end, so of course I'm still seeing this, always on machines with Mellanox, and never on other machines. Could this be a hardware issue like a signal integrity or margin issue? I don't know where to go from here because I'm not a hardware person, and I don't know anything to do in software. Bjorn