All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@suse.de>
To: Sinan Kaya <okaya@codeaurora.org>
Cc: "Baicar, Tyler" <tbaicar@codeaurora.org>,
	Tony Luck <tony.luck@intel.com>,
	rjw@rjwysocki.net, lenb@kernel.org, will.deacon@arm.com,
	james.morse@arm.com, prarit@redhat.com, punit.agrawal@arm.com,
	shiju.jose@huawei.com, andriy.shevchenko@linux.intel.com,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux PCI <linux-pci@vger.kernel.org>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH] acpi: apei: call into AER handling regardless of severity
Date: Wed, 30 Aug 2017 12:16:17 +0200	[thread overview]
Message-ID: <20170830101617.3m266q7xuew6ctxl@pd.tnic> (raw)
In-Reply-To: <0fb1fe1b-207a-93fe-4ac6-b886451e488e@codeaurora.org>

On Tue, Aug 29, 2017 at 06:34:49PM -0400, Sinan Kaya wrote:
> The do_recovery function needs to be called for both uncorrectable error
> categories. (#2 and #3 above)

Care to share why exactly that needs to happen?

Because I'm reading this in pcieaer-howto.txt:

"If an error message indicates a non-fatal error, performing link reset
at upstream is not required."

and

"If an error message indicates a fatal error, kernel will broadcast
error_detected(dev, pci_channel_io_frozen) to all drivers within a
hierarchy in question. Then, performing link reset at upstream is
necessary."

Now, pci-error-recovery.txt has link reset as step 3 so I'm assuming
recovery means link reset. And thus, non-fatal AER errors are not
required to do recovery but fatal are.

> How these map to GHES error categories is out of know-how.

        case CPER_SEV_INFORMATIONAL:
                return GHES_SEV_NO;
        case CPER_SEV_CORRECTED:
                return GHES_SEV_CORRECTED;
        case CPER_SEV_RECOVERABLE:
                return GHES_SEV_RECOVERABLE;
        case CPER_SEV_FATAL:
                return GHES_SEV_PANIC;

and

        case CPER_SEV_RECOVERABLE:
                return AER_NONFATAL;
        case CPER_SEV_FATAL:
                return AER_FATAL;
        default:
                return AER_CORRECTABLE;

So I see GHES_SEV_RECOVERABLE -> CPER_SEV_RECOVERABLE -> AER_NONFATAL.

Which means, we've never done error recovery for AER_FATAL errors. Which
we should've been doing in the first place! Unless...

... Error recovery for those fatal errors has been happening down the
other, PCI path:

aer_isr->aer_isr_one_error->...->do_recovery()

Which then makes me look at this contraption in the ghes code:

config ACPI_APEI_PCIEAER
        bool "APEI PCIe AER logging/recovering support"
        depends on ACPI_APEI && PCIEAER
        help
          PCIe AER errors may be reported via APEI firmware first mode.
          Turn on this option to enable the corresponding support.

So this says "may be" reported.

Now the question is, what kind of errors are being reported through here
and what exactly are we expected to do about them? Print them? Or do
more?

Hmmm.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

  reply	other threads:[~2017-08-30 10:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 17:11 [PATCH] acpi: apei: call into AER handling regardless of severity Tyler Baicar
2017-08-28 20:52 ` Rafael J. Wysocki
2017-08-29  8:20 ` Borislav Petkov
2017-08-29 21:27   ` Baicar, Tyler
2017-08-29 22:19     ` Borislav Petkov
2017-08-29 22:34       ` Sinan Kaya
2017-08-30 10:16         ` Borislav Petkov [this message]
2017-08-30 14:05           ` Sinan Kaya
2017-08-30 15:16             ` Borislav Petkov
2017-08-30 15:31               ` Sinan Kaya
2017-08-30 15:42                 ` Baicar, Tyler
2017-08-30 17:14                   ` Borislav Petkov
2017-08-30 18:09                     ` Baicar, Tyler
2017-08-30 17:02                 ` Borislav Petkov
2017-08-29 23:06       ` Luck, Tony
2017-08-29 23:06         ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170830101617.3m266q7xuew6ctxl@pd.tnic \
    --to=bp@suse.de \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@codeaurora.org \
    --cc=prarit@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=shiju.jose@huawei.com \
    --cc=tbaicar@codeaurora.org \
    --cc=tony.luck@intel.com \
    --cc=will.deacon@arm.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.