All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sinan Kaya <okaya@codeaurora.org>
To: Borislav Petkov <bp@suse.de>
Cc: "Baicar, Tyler" <tbaicar@codeaurora.org>,
	Tony Luck <tony.luck@intel.com>,
	rjw@rjwysocki.net, lenb@kernel.org, will.deacon@arm.com,
	james.morse@arm.com, prarit@redhat.com, punit.agrawal@arm.com,
	shiju.jose@huawei.com, andriy.shevchenko@linux.intel.com,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux PCI <linux-pci@vger.kernel.org>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH] acpi: apei: call into AER handling regardless of severity
Date: Wed, 30 Aug 2017 11:31:06 -0400	[thread overview]
Message-ID: <af5dc902-faca-a7a1-6781-95ff82d5d8fd@codeaurora.org> (raw)
In-Reply-To: <20170830151601.ro5qt5272e2msevp@pd.tnic>

On 8/30/2017 11:16 AM, Borislav Petkov wrote:
> On Wed, Aug 30, 2017 at 10:05:44AM -0400, Sinan Kaya wrote:
>> Link reset is not the only recovery mechanism. In the case of nonfatal
>> errors, it is assumed that the endpoint CSR is still reachable.
>> Error is propagated the PCIe endpoint driver. Endpoint driver does a
>> re-initialization, we are back in business.
> 
> I'm assuming that's broadcast_error_message()'s job.
> 

That's right. Each driver provides an err_handler hook. broadcast function
calls these. 

static struct pci_driver e1000_driver = {
	..
	.err_handler = &e1000_err_handler
};

struct pci_error_handlers {
	...
	pci_ers_result_t (*error_detected)(struct pci_dev *dev,
					   enum pci_channel_state error);
}


>> That's not true. The GHES code is changing the severity here before posting
>> to the AER driver in ghes_do_proc().
>>
>> 	if (gdata->flags & CPER_SEC_RESET)
>> 		aer_severity = AER_FATAL;
> 
> You're missing the point that we would walk into that if branch *only* for
> 
>                         if (sev == GHES_SEV_RECOVERABLE &&
>                             sec_sev == GHES_SEV_RECOVERABLE
> 
> severities. So if you have an AER_FATAL error but ghes severities are
> not GHES_SEV_RECOVERABLE, nothing happens.

I see. We should probably try to do something only if GHES_SEV_CORRECTED or
GHES_SEV_RECOVERABLE.

If somebody wants to crash the system with GHES_SEV_PANIC, there is no point
in doing additional work.

> 
>> No, AER ISR is not set up if firmware first is enabled.
> 
> So then this is a major suckage. We do AER recovery on FF systems only
> for GHES_SEV_RECOVERABLE severity.
> 
>> The behavior should match non firmware-first case ideally.
>>
>> 1. Print all correctable errors.
>> 2. Go to do_recovery for all uncorrectable errors including fatal and
>> non-fatal. 
>>
>> This is also what AER driver does in the absence of firmware first via
>> handle_error_source().
> 
> Yes, that makes sense.
> 
> Which would mean that we'd call aer_recover_queue() regardless of GHES
> severity but we'd do recovery only if GHES_SEV_RECOVERABLE is set
> or CPER_SEC_RESET. I.e., we can communicate all that by setting the
> correct AER severity before calling aer_recover_queue(). And then call
> do_recovery() based on AER severity.
> 
> Hmmm?
> 

Sounds good. Do you still want to do PCIe recovery in the case of
GHES_SEV_PANIC or if some FW returns GHES_SEV_NO?

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

  reply	other threads:[~2017-08-30 15:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 17:11 [PATCH] acpi: apei: call into AER handling regardless of severity Tyler Baicar
2017-08-28 20:52 ` Rafael J. Wysocki
2017-08-29  8:20 ` Borislav Petkov
2017-08-29 21:27   ` Baicar, Tyler
2017-08-29 22:19     ` Borislav Petkov
2017-08-29 22:34       ` Sinan Kaya
2017-08-30 10:16         ` Borislav Petkov
2017-08-30 14:05           ` Sinan Kaya
2017-08-30 15:16             ` Borislav Petkov
2017-08-30 15:31               ` Sinan Kaya [this message]
2017-08-30 15:42                 ` Baicar, Tyler
2017-08-30 17:14                   ` Borislav Petkov
2017-08-30 18:09                     ` Baicar, Tyler
2017-08-30 17:02                 ` Borislav Petkov
2017-08-29 23:06       ` Luck, Tony
2017-08-29 23:06         ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af5dc902-faca-a7a1-6781-95ff82d5d8fd@codeaurora.org \
    --to=okaya@codeaurora.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bp@suse.de \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=shiju.jose@huawei.com \
    --cc=tbaicar@codeaurora.org \
    --cc=tony.luck@intel.com \
    --cc=will.deacon@arm.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.