All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhuo Chen <chenzhuo.1@bytedance.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: allenbh@gmail.com, dave.jiang@intel.com,
	linux-scsi@vger.kernel.org, martin.petersen@oracle.com,
	linux-pci@vger.kernel.org, jejb@linux.ibm.com, jdmason@kudzu.us,
	james.smart@broadcom.com, fancer.lancer@gmail.com,
	linux-kernel@vger.kernel.org, ntb@lists.linux.dev,
	oohall@gmail.com, bhelgaas@google.com, dick.kennedy@broadcom.com,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [External] Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
Date: Tue, 27 Sep 2022 21:41:09 +0800	[thread overview]
Message-ID: <97ac6c82-81a0-2f63-7d8f-e56d702bc874@bytedance.com> (raw)
In-Reply-To: <20220926180906.GA1609498@bhelgaas>



On 9/27/22 2:09 AM, Bjorn Helgaas wrote:
> On Mon, Sep 26, 2022 at 10:01:55PM +0800, Zhuo Chen wrote:
>> On 9/23/22 5:08 AM, Bjorn Helgaas wrote:
>>> On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
>>>> When state is pci_channel_io_frozen in pcie_do_recovery(),
>>>> the severity is fatal and fatal status should be cleared.
>>>> So we add pci_aer_clear_fatal_status().
>>>
>>> Seems sensible to me.  Did you find this by code inspection or by
>>> debugging a problem?  If the latter, it would be nice to mention the
>>> symptoms of the problem in the commit log.
>>
>> I found this by code inspection so I may not enumerate what kind of problems
>> this code will cause.
>>>
>>>> Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
>>>> and pci_aer_clear_nonfatal_status() contains the function of
>>>> 'if (host->native_aer || pcie_ports_native)', so we move them
>>>> out of it.
>>>
>>> Wrap commit log to fill 75 columns.
>>>
>>>> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
>>>> ---
>>>>    drivers/pci/pcie/err.c | 8 ++++++--
>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>> index 0c5a143025af..e0a8ade4c3fe 100644
>>>> --- a/drivers/pci/pcie/err.c
>>>> +++ b/drivers/pci/pcie/err.c
>>>> @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>>    	 * it is responsible for clearing this status.  In that case, the
>>>>    	 * signaling device may not even be visible to the OS.
>>>>    	 */
>>>> -	if (host->native_aer || pcie_ports_native) {
>>>> +	if (host->native_aer || pcie_ports_native)
>>>>    		pcie_clear_device_status(dev);
>>>
>>> pcie_clear_device_status() doesn't check for pcie_aer_is_native()
>>> internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
>>> errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
>>> status only when we control AER"), both callers check before calling
>>> it.
>>>
>>> I think we should move the check inside pcie_clear_device_status().
>>> That could be a separate preliminary patch.
>>>
>>> There are a couple other places (aer_root_reset() and
>>> get_port_device_capability()) that do the same check and could be
>>> changed to use pcie_aer_is_native() instead.  That could be another
>>> preliminary patch.
>>>
>> Good suggestion. But I have only one doubt. In aer_root_reset(), if we use
>> "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
>> is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return
>> false. It's different from just using "(host->native_aer ||
>> pcie_ports_native)".
>> Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL
>> pointer check should be added in pcie_aer_is_native() because root may be
>> NULL.
> 
> Good point.  In aer_root_reset(), we're updating Root Port registers,
> so I think they should look like:
> 
>    if (pcie_aer_is_native(root) && aer) {
>      ...
>    }
> 
> Does that seem safe and equivalent to you?
> 
> Bjorn

I think ‘if (aer && pcie_aer_is_native(root))’ might be safer,
because when root is NULL, 'aer' will be NUll as well, and the
predicate will return false without entering pcie_aer_is_native(root).


-- 
Thanks,
Zhuo Chen

WARNING: multiple messages have this Message-ID (diff)
From: Zhuo Chen <chenzhuo.1@bytedance.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: allenbh@gmail.com, dave.jiang@intel.com,
	linux-scsi@vger.kernel.org, martin.petersen@oracle.com,
	linux-pci@vger.kernel.org, jejb@linux.ibm.com,
	james.smart@broadcom.com, fancer.lancer@gmail.com,
	linux-kernel@vger.kernel.org, ntb@lists.linux.dev,
	oohall@gmail.com, jdmason@kudzu.us, bhelgaas@google.com,
	dick.kennedy@broadcom.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [External] Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
Date: Tue, 27 Sep 2022 21:41:09 +0800	[thread overview]
Message-ID: <97ac6c82-81a0-2f63-7d8f-e56d702bc874@bytedance.com> (raw)
In-Reply-To: <20220926180906.GA1609498@bhelgaas>



On 9/27/22 2:09 AM, Bjorn Helgaas wrote:
> On Mon, Sep 26, 2022 at 10:01:55PM +0800, Zhuo Chen wrote:
>> On 9/23/22 5:08 AM, Bjorn Helgaas wrote:
>>> On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
>>>> When state is pci_channel_io_frozen in pcie_do_recovery(),
>>>> the severity is fatal and fatal status should be cleared.
>>>> So we add pci_aer_clear_fatal_status().
>>>
>>> Seems sensible to me.  Did you find this by code inspection or by
>>> debugging a problem?  If the latter, it would be nice to mention the
>>> symptoms of the problem in the commit log.
>>
>> I found this by code inspection so I may not enumerate what kind of problems
>> this code will cause.
>>>
>>>> Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
>>>> and pci_aer_clear_nonfatal_status() contains the function of
>>>> 'if (host->native_aer || pcie_ports_native)', so we move them
>>>> out of it.
>>>
>>> Wrap commit log to fill 75 columns.
>>>
>>>> Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
>>>> ---
>>>>    drivers/pci/pcie/err.c | 8 ++++++--
>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>> index 0c5a143025af..e0a8ade4c3fe 100644
>>>> --- a/drivers/pci/pcie/err.c
>>>> +++ b/drivers/pci/pcie/err.c
>>>> @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>>    	 * it is responsible for clearing this status.  In that case, the
>>>>    	 * signaling device may not even be visible to the OS.
>>>>    	 */
>>>> -	if (host->native_aer || pcie_ports_native) {
>>>> +	if (host->native_aer || pcie_ports_native)
>>>>    		pcie_clear_device_status(dev);
>>>
>>> pcie_clear_device_status() doesn't check for pcie_aer_is_native()
>>> internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
>>> errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
>>> status only when we control AER"), both callers check before calling
>>> it.
>>>
>>> I think we should move the check inside pcie_clear_device_status().
>>> That could be a separate preliminary patch.
>>>
>>> There are a couple other places (aer_root_reset() and
>>> get_port_device_capability()) that do the same check and could be
>>> changed to use pcie_aer_is_native() instead.  That could be another
>>> preliminary patch.
>>>
>> Good suggestion. But I have only one doubt. In aer_root_reset(), if we use
>> "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
>> is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return
>> false. It's different from just using "(host->native_aer ||
>> pcie_ports_native)".
>> Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL
>> pointer check should be added in pcie_aer_is_native() because root may be
>> NULL.
> 
> Good point.  In aer_root_reset(), we're updating Root Port registers,
> so I think they should look like:
> 
>    if (pcie_aer_is_native(root) && aer) {
>      ...
>    }
> 
> Does that seem safe and equivalent to you?
> 
> Bjorn

I think ‘if (aer && pcie_aer_is_native(root))’ might be safer,
because when root is NULL, 'aer' will be NUll as well, and the
predicate will return false without entering pcie_aer_is_native(root).


-- 
Thanks,
Zhuo Chen

  reply	other threads:[~2022-09-27 13:41 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01 18:16 [PATCH 0/3] PCI/AER: Fix and optimize usage of status clear api Zhuo Chen
2022-09-01 18:16 ` Zhuo Chen
2022-09-01 18:16 ` [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status Zhuo Chen
2022-09-01 18:16   ` Zhuo Chen
2022-09-11 16:22   ` Serge Semin
2022-09-11 16:22     ` Serge Semin
2022-09-11 17:09     ` [External] " Zhuo Chen
2022-09-11 17:09       ` Zhuo Chen
2022-09-11 17:55       ` Serge Semin
2022-09-11 17:55         ` Serge Semin
2022-09-22 20:02       ` Bjorn Helgaas
2022-09-22 20:02         ` Bjorn Helgaas
2022-09-26 13:30         ` Zhuo Chen
2022-09-26 13:30           ` Zhuo Chen
2022-09-26 17:21           ` Bjorn Helgaas
2022-09-01 18:16 ` [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery() Zhuo Chen
2022-09-01 18:16   ` Zhuo Chen
2022-09-22 21:08   ` Bjorn Helgaas
2022-09-22 21:08     ` Bjorn Helgaas
2022-09-26 14:01     ` Zhuo Chen
2022-09-26 14:01       ` Zhuo Chen
2022-09-26 18:09       ` Bjorn Helgaas
2022-09-26 18:09         ` Bjorn Helgaas
2022-09-27 13:41         ` Zhuo Chen [this message]
2022-09-27 13:41           ` [External] " Zhuo Chen
2022-09-01 18:16 ` [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status Zhuo Chen
2022-09-01 18:16   ` Zhuo Chen
2022-09-22 21:50   ` Bjorn Helgaas
2022-09-22 21:50     ` Bjorn Helgaas
2022-09-26 14:16     ` Zhuo Chen
2022-09-26 14:16       ` Zhuo Chen
2022-09-26 17:22       ` Bjorn Helgaas
2022-09-26 17:22         ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97ac6c82-81a0-2f63-7d8f-e56d702bc874@bytedance.com \
    --to=chenzhuo.1@bytedance.com \
    --cc=allenbh@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=dave.jiang@intel.com \
    --cc=dick.kennedy@broadcom.com \
    --cc=fancer.lancer@gmail.com \
    --cc=helgaas@kernel.org \
    --cc=james.smart@broadcom.com \
    --cc=jdmason@kudzu.us \
    --cc=jejb@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=martin.petersen@oracle.com \
    --cc=ntb@lists.linux.dev \
    --cc=oohall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.