linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Borislav Petkov <bp@alien8.de>, Tony Luck <tony.luck@intel.com>,
	James Morse <james.morse@arm.com>, Len Brown <lenb@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Bjorn Helgaas <bhelgaas@google.com>,
	luanshi <zhangliguang@linux.alibaba.com>,
	zhuo.song@linux.alibaba.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v5] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier
Date: Thu, 20 Jan 2022 17:24:24 +0100	[thread overview]
Message-ID: <CAJZ5v0j34en1bCZamSGA9XgC-VZRgLgiA89o060_FAWWf21Byw@mail.gmail.com> (raw)
In-Reply-To: <20220119204259.GA962224@bhelgaas>

On Wed, Jan 19, 2022 at 9:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jan 19, 2022 at 02:40:11PM +0800, Shuai Xue wrote:
> > [+to Rafael, question about HEST/GHES/SDEI init]
> >
> > Hi, Bjorn,
> >
> > Thank you for your comments and quick reply.
> >
> > 在 2022/1/19 AM6:49, Bjorn Helgaas 写道:
> > > On Sun, Jan 16, 2022 at 04:43:10PM +0800, Shuai Xue wrote:
> > >> On an ACPI system, ACPI is initialised very early from a
> > >> subsys_initcall(), while SDEI is not ready until a
> > >> subsys_initcall_sync(). This patch is to reduce the time before GHES
> > >> initialization.
> > >>
> > >> The SDEI driver provides functions (e.g. apei_sdei_register_ghes(),
> > >> apei_sdei_unregister_ghes()) to register or unregister event callback
> > >> for dispatcher in firmware. When the GHES driver probing, it registers
> > >> the corresponding callback according to the notification type specified
> > >> by GHES. If the GHES notification type is SDEI, the GHES driver will
> > >> call apei_sdei_register_ghes() to register event call.
> > >>
> > >> When the firmware emits an event, it migrates the handling of the event
> > >> into the kernel at the registered entry-point __sdei_asm_handler. And
> > >> finally, the kernel will call the registered event callback and return
> > >> status_code to indicate the status of event handling. SDEI_EV_FAILED
> > >> indicates that the kernel failed to handle the event.
> > >>
> > >> Consequently, when an error occurs during kernel booting, the kernel is
> > >> unable to handle and report errors until the GHES driver is initialized
> > >> by device_initcall(), in which the event callback is registered.  For
> > >> example, when the kernel booting, the console logs many times from
> > >> firmware before GHES drivers init in our platform:
> > >>
> > >>    Trip in MM PCIe RAS handle(Intr:910)
> > >>    Clean PE[1.1.1] ERR_STS:0x4000100 -> 0 INT_STS:F0000000
> > >>    Find RP(98:1.0)
> > >>    --Walk dev(98:1.0) CE:0 UCE:4000
> > >>    ...
> > >>    ERROR:   sdei_dispatch_event(32a) ret:-1
> > >>    --handler(910) end
> > >
> > > If I understand correctly, the firmware noticed an error, tried to
> > > report it to the kernel, and is complaining because the kernel isn't
> > > ready to handle it yet.  And the reason for this patch is to reduce
> > > these complaints from the firmware.
> >
> > My thoughts exactly :)
> >
> > > That doesn't seem like a very good reason for this patch.  There is
> > > *always* a window before the kernel is ready to handle events from the
> > > firmware.
> >
> > Yes, there is always a window. But if we could do better in kernel that
> > reduces the window by 90% (from 33 seconds to 3 second), why not?
> >
> > > Why is the firmware noticing these errors in the first place?  If
> > > you're seeing these complaints regularly, my guess is that either you
> > > have some terrible hardware or (more likely) the firmware isn't
> > > clearing some expected error condition correctly.  For example, maybe
> > > the Unsupported Request errors that happen while enumerating PCIe
> > > devices are being reported.
> > >
> > > If you register the callback function, the kernel will now start
> > > seeing these error reports.  What happens then?  Does the kernel log
> > > the errors somewhere?  Is that better than the current situation where
> > > the firmware logs them?
> >
> > Yep, it is a hardware issue. The firmware only logs in console
> > (ttyAMA0) and we can not see it in kernel side. After the kernel
> > starts seeing these error reports, we could see EDAC/ghes and
> > efi/cper detailed logs in dmesg. We did not notice the problem until
> > we check the console log, which inspired us to reduce the window
> > when kernel startup, so that we can see the message clearly and
> > properly. I think the intuition is to check the log of dmesg, not
> > the console.
>
> > > However, I DO think that:
> > >
> > >   - Removing acpi_hest_init() from acpi_pci_root_init(), and
> > >
> > >   - Converting ghes_init() and sdei_init() from initcalls to explicit
> > >     calls
> > >
> > > are very good reasons to do something like this patch because HEST is
> > > not PCI-specific, and IMO, explicit calls are better than initcalls
> > > because initcall ordering is implicit and not well-defined within a
> > > level.
> >
> > Haha, if the above reasons still don't convince you, I would like to
> > accept yours :) Should we do it in one patch or separate it into two
> > patches?
>
> IMO, this can be done in one patch, but this would probably go via
> Rafael.

Yes, that would make sense IMO.

  parent reply	other threads:[~2022-01-20 16:24 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-26  7:04 [RFC PATCH v4] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier Shuai Xue
2021-12-02 12:50 ` Shuai Xue
2021-12-16 13:34 ` [RESEND " Shuai Xue
2021-12-17 18:17   ` Rafael J. Wysocki
2021-12-19  4:04     ` Shuai Xue
2021-12-21 23:17   ` Bjorn Helgaas
2021-12-23  8:11     ` Shuai Xue
2021-12-24  0:17       ` Bjorn Helgaas
2021-12-24  8:55         ` Shuai Xue
2022-01-13 13:38           ` Shuai Xue
2022-01-16  8:43 ` [PATCH v5] " Shuai Xue
2022-01-18 22:49   ` Bjorn Helgaas
2022-01-19  6:40     ` Shuai Xue
2022-01-19 20:42       ` Bjorn Helgaas
2022-01-20  2:40         ` Shuai Xue
2022-01-20 16:24         ` Rafael J. Wysocki [this message]
2022-01-20  5:05 ` [PATCH v6] ACPI: explicit init HEST, SDEI and GHES in apci_init Shuai Xue
2022-01-20 16:22   ` Bjorn Helgaas
2022-01-21  3:43     ` Shuai Xue
2022-01-21 19:46       ` Bjorn Helgaas
2022-01-22  5:26 ` [PATCH v7 1/2] ACPI: APEI: explicit init HEST " Shuai Xue
2022-02-10  9:39   ` Shuai Xue
2022-02-14 18:51     ` Rafael J. Wysocki
2022-02-21 18:18   ` Nathan Chancellor
2022-02-21 18:25     ` Rafael J. Wysocki
2022-02-22  6:03     ` Shuai Xue
2022-01-22  5:26 ` [PATCH v7 2/2] ACPI: APEI: rename ghes_init with an "acpi_" prefix Shuai Xue
2022-02-27 12:25 ` [PATCH v8 1/2] ACPI: APEI: explicit init HEST and GHES in apci_init Shuai Xue
2022-03-03 19:27   ` Rafael J. Wysocki
2022-02-27 12:25 ` [PATCH v8 2/2] ACPI: APEI: rename ghes_init with an "acpi_" prefix Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJZ5v0j34en1bCZamSGA9XgC-VZRgLgiA89o060_FAWWf21Byw@mail.gmail.com \
    --to=rafael@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=helgaas@kernel.org \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    --cc=zhangliguang@linux.alibaba.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).