From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5619CC282C0 for ; Fri, 25 Jan 2019 17:46:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 17269218D0 for ; Fri, 25 Jan 2019 17:46:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1548438396; bh=WCwAv2UEfg31oYkncDj2THO3h9piWN71BicG+WLTfgA=; h=Subject:From:To:Cc:References:Date:In-Reply-To:List-ID:From; b=cyIm/MYlsnkSntefrLzOjn1KkuFOEZ3X6gqvpLBGj1DtudumBMifHgWIM/t0xlInk UZOZ6TTkiXpIkW43pniATgwNEp4aPkjZ9q4KQfFOLgjesVyUnwF1gX1KspxJSsuKsi lfDdUlXOMZ67Le7HUmgHerEIPpV0ufmT4g+4PT7w= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726311AbfAYRqf (ORCPT ); Fri, 25 Jan 2019 12:46:35 -0500 Received: from mail.kernel.org ([198.145.29.99]:34832 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728653AbfAYRqf (ORCPT ); Fri, 25 Jan 2019 12:46:35 -0500 Received: from [192.168.0.109] (cpe-174-109-247-98.nc.res.rr.com [174.109.247.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C6E0C218A2; Fri, 25 Jan 2019 17:46:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1548438394; bh=WCwAv2UEfg31oYkncDj2THO3h9piWN71BicG+WLTfgA=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=NAn3FNw+AD+v+UYY7ZT2Kh9xkz9WhNdsCZa1J9ZWcWmYYNkKO519fZlgrQkM71YwA NlmXijZcaQuAuvQVn6zW+Z7swmlCXPz7dsLVrRGTQOmT425SEm7GmdN0fZp1IUEkbG BKn3TflbHG1BgxKZvKuzbDfBDQPjR8MKgXv1BAtI= Subject: Re: [PATCH] PCI/ERR: Fix run error recovery callbacks for all affected devices From: Sinan Kaya To: Keith Busch , Dongdong Liu Cc: "helgaas@kernel.org" , "linux-pci@vger.kernel.org" , "linuxarm@huawei.com" , Bjorn Helgaas , tanxiaofei References: <1548337810-69892-1-git-send-email-liudongdong3@huawei.com> <20190124213701.GA9882@localhost.localdomain> <5d58ea17-115f-139d-93db-fe6e9ce573cb@huawei.com> <20190125171713.GB11210@localhost.localdomain> Message-ID: Date: Fri, 25 Jan 2019 12:46:32 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On 1/25/2019 12:37 PM, Sinan Kaya wrote: > On 1/25/2019 12:17 PM, Keith Busch wrote: >> On Fri, Jan 25, 2019 at 06:28:03AM -0800, Dongdong Liu wrote: >>> I want to fix 2 points by the patch. >>> >>> 1. For EP devices (such as multi-function EP device) under the same bus, >>> when one of the EP devices met non-fatal error, should report non-fatal >>> error only to the error endpoint device, no need to broadcast all of them. >>> That is the patch (PCI/AER: Report non-fatal errors only to the affected >>> endpoint  #4.15) >>> have done, but current code PATCH [1] broken this. >> >> How do you know a non-fatal affects only the reporting end point? These can >> certainly be bus errors, and it's not the first to detect may be affected. >> >> In any case, what harm does the broadcast cause? >> > > What is the PCIe spec rule about AER errors for multi-function devices? > > Does it say it needs to be propagated to all functions or each function has > its own unique AER error handler? > Thinking more... I think there is value in probing all devices for errors like today because multiple errors bit can be set. Since root port's AER register only captures the first error, the rest of the errors requires OS to poll each device to see what is going on. In this case the AER error status of other functions should not report any outstanding event. Please verify this. Otherwise, you are looking at a device quirk.