linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	bp@alien8.de, tony.luck@intel.com, james.morse@arm.com,
	lenb@kernel.org, rjw@rjwysocki.net, bhelgaas@google.com,
	zhangliguang@linux.alibaba.com, zhuo.song@linux.alibaba.com,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v5] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier
Date: Thu, 20 Jan 2022 10:40:30 +0800	[thread overview]
Message-ID: <28d484ac-23e2-698b-762b-cd0c8b220570@linux.alibaba.com> (raw)
In-Reply-To: <20220119204259.GA962224@bhelgaas>

Hi, Bjorn,

Thank you for your valuable comments.

在 2022/1/20 AM4:42, Bjorn Helgaas 写道:
> On Wed, Jan 19, 2022 at 02:40:11PM +0800, Shuai Xue wrote:
>> [+to Rafael, question about HEST/GHES/SDEI init]
>>
>> Hi, Bjorn,
>>
>> Thank you for your comments and quick reply.
>>
>> 在 2022/1/19 AM6:49, Bjorn Helgaas 写道:
>>> On Sun, Jan 16, 2022 at 04:43:10PM +0800, Shuai Xue wrote:
>>>> On an ACPI system, ACPI is initialised very early from a
>>>> subsys_initcall(), while SDEI is not ready until a
>>>> subsys_initcall_sync(). This patch is to reduce the time before GHES
>>>> initialization.
>>>>
>>>> The SDEI driver provides functions (e.g. apei_sdei_register_ghes(),
>>>> apei_sdei_unregister_ghes()) to register or unregister event callback
>>>> for dispatcher in firmware. When the GHES driver probing, it registers
>>>> the corresponding callback according to the notification type specified
>>>> by GHES. If the GHES notification type is SDEI, the GHES driver will
>>>> call apei_sdei_register_ghes() to register event call.
>>>>
>>>> When the firmware emits an event, it migrates the handling of the event
>>>> into the kernel at the registered entry-point __sdei_asm_handler. And
>>>> finally, the kernel will call the registered event callback and return
>>>> status_code to indicate the status of event handling. SDEI_EV_FAILED
>>>> indicates that the kernel failed to handle the event.
>>>>
>>>> Consequently, when an error occurs during kernel booting, the kernel is
>>>> unable to handle and report errors until the GHES driver is initialized
>>>> by device_initcall(), in which the event callback is registered.  For
>>>> example, when the kernel booting, the console logs many times from
>>>> firmware before GHES drivers init in our platform:
>>>>
>>>> 	Trip in MM PCIe RAS handle(Intr:910)
>>>>   	Clean PE[1.1.1] ERR_STS:0x4000100 -> 0 INT_STS:F0000000
>>>> 	Find RP(98:1.0)
>>>> 	--Walk dev(98:1.0) CE:0 UCE:4000
>>>> 	...
>>>> 	ERROR:   sdei_dispatch_event(32a) ret:-1
>>>> 	--handler(910) end
>>>
>>> If I understand correctly, the firmware noticed an error, tried to
>>> report it to the kernel, and is complaining because the kernel isn't
>>> ready to handle it yet.  And the reason for this patch is to reduce
>>> these complaints from the firmware.
>>
>> My thoughts exactly :)
>>
>>> That doesn't seem like a very good reason for this patch.  There is
>>> *always* a window before the kernel is ready to handle events from the
>>> firmware.
>>
>> Yes, there is always a window. But if we could do better in kernel that
>> reduces the window by 90% (from 33 seconds to 3 second), why not?
>>
>>> Why is the firmware noticing these errors in the first place?  If
>>> you're seeing these complaints regularly, my guess is that either you
>>> have some terrible hardware or (more likely) the firmware isn't
>>> clearing some expected error condition correctly.  For example, maybe
>>> the Unsupported Request errors that happen while enumerating PCIe
>>> devices are being reported.
>>>
>>> If you register the callback function, the kernel will now start
>>> seeing these error reports.  What happens then?  Does the kernel log
>>> the errors somewhere?  Is that better than the current situation where
>>> the firmware logs them?
>>
>> Yep, it is a hardware issue. The firmware only logs in console
>> (ttyAMA0) and we can not see it in kernel side. After the kernel
>> starts seeing these error reports, we could see EDAC/ghes and
>> efi/cper detailed logs in dmesg. We did not notice the problem until
>> we check the console log, which inspired us to reduce the window
>> when kernel startup, so that we can see the message clearly and
>> properly. I think the intuition is to check the log of dmesg, not
>> the console.
> 
>>> However, I DO think that:
>>>
>>>   - Removing acpi_hest_init() from acpi_pci_root_init(), and
>>>
>>>   - Converting ghes_init() and sdei_init() from initcalls to explicit
>>>     calls
>>>
>>> are very good reasons to do something like this patch because HEST is
>>> not PCI-specific, and IMO, explicit calls are better than initcalls
>>> because initcall ordering is implicit and not well-defined within a
>>> level.
>>
>> Haha, if the above reasons still don't convince you, I would like to
>> accept yours :) Should we do it in one patch or separate it into two
>> patches?
> 
> IMO, this can be done in one patch, but this would probably go via
> Rafael.

Got it, I will send a new patch and cc to Rafael.

>>>> -static int __init ghes_init(void)
>>>> +void __init ghes_init(void)
>>>>  {
>>>>  	int rc;
>>>>  
>>>>  	if (acpi_disabled)
>>>> -		return -ENODEV;
>>>> +		return;
>>>>  
>>>>  	switch (hest_disable) {
>>>>  	case HEST_NOT_FOUND:
>>>> -		return -ENODEV;
>>>> +		pr_info(GHES_PFX "HEST is not found!\n");
>>>
>>> I don't know whether this "HEST is not found" message is
>>> worthwhile or not.  I don't think lack of an HEST is an error, and
>>> users may be alarmed.  But this is an ACPI thing, so up to you and
>>> Rafael.
>>
>> If we explicit call ghes_init(), we can't tell if ghes is
>> initialized successfully based on the return value of initcall. So I
>> add a info message.
> 
> When ghes_init() is an initcall and you return -ENODEV for the
> HEST_NOT_FOUND case, I don't think we log any message about that, do
> we?  do_one_initcall() will capture and return the -ENODEV, but the
> caller (do_initcall_level()) just ignores it.

I see, thank you. I will delete it in next version.

>>>> @@ -1495,8 +1494,7 @@ static int __init ghes_init(void)
>>>>  	else
>>>>  		pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n");
>>>>  
>>>> -	return 0;
>>>> +	return;
>>>>  err:
>>>> -	return rc;
>>>> +	ghes_disable = 1;
>>>
>>> Why do you set ghes_disable here?  As far as I can tell, we will never
>>> look at it again.  The places we do look at it are:
>>>
>>>   - ghes_init(): earlier in this function, so we've already done that,
>>>
>>>   - acpi_hest_init(): we've already called that, too, and
>>>
>>>   - acpi_bus_osc_negotiate_platform_control(): called from
>>>     acpi_bus_init(), which we've already called.
>>
>> I add it for future potential usage. Thank you for pointing it out.
>> If you think it is not necessary, I will delete it in next version.
> 
> I think it is not necessary to save information that will never be
> used.  If you need it in the future, you can add it and the reason
> will be obvious.
> 
> Bjorn

You are right. I will delete it.

Best Regards,
Shuai



  reply	other threads:[~2022-01-20  2:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-26  7:04 [RFC PATCH v4] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier Shuai Xue
2021-12-02 12:50 ` Shuai Xue
2021-12-16 13:34 ` [RESEND " Shuai Xue
2021-12-17 18:17   ` Rafael J. Wysocki
2021-12-19  4:04     ` Shuai Xue
2021-12-21 23:17   ` Bjorn Helgaas
2021-12-23  8:11     ` Shuai Xue
2021-12-24  0:17       ` Bjorn Helgaas
2021-12-24  8:55         ` Shuai Xue
2022-01-13 13:38           ` Shuai Xue
2022-01-16  8:43 ` [PATCH v5] " Shuai Xue
2022-01-18 22:49   ` Bjorn Helgaas
2022-01-19  6:40     ` Shuai Xue
2022-01-19 20:42       ` Bjorn Helgaas
2022-01-20  2:40         ` Shuai Xue [this message]
2022-01-20 16:24         ` Rafael J. Wysocki
2022-01-20  5:05 ` [PATCH v6] ACPI: explicit init HEST, SDEI and GHES in apci_init Shuai Xue
2022-01-20 16:22   ` Bjorn Helgaas
2022-01-21  3:43     ` Shuai Xue
2022-01-21 19:46       ` Bjorn Helgaas
2022-01-22  5:26 ` [PATCH v7 1/2] ACPI: APEI: explicit init HEST " Shuai Xue
2022-02-10  9:39   ` Shuai Xue
2022-02-14 18:51     ` Rafael J. Wysocki
2022-02-21 18:18   ` Nathan Chancellor
2022-02-21 18:25     ` Rafael J. Wysocki
2022-02-22  6:03     ` Shuai Xue
2022-01-22  5:26 ` [PATCH v7 2/2] ACPI: APEI: rename ghes_init with an "acpi_" prefix Shuai Xue
2022-02-27 12:25 ` [PATCH v8 1/2] ACPI: APEI: explicit init HEST and GHES in apci_init Shuai Xue
2022-03-03 19:27   ` Rafael J. Wysocki
2022-02-27 12:25 ` [PATCH v8 2/2] ACPI: APEI: rename ghes_init with an "acpi_" prefix Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28d484ac-23e2-698b-762b-cd0c8b220570@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=helgaas@kernel.org \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    --cc=zhangliguang@linux.alibaba.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).