All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: Bjorn Helgaas <helgaas@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: bp@alien8.de, tony.luck@intel.com, james.morse@arm.com,
	lenb@kernel.org, bhelgaas@google.com,
	zhangliguang@linux.alibaba.com, zhuo.song@linux.alibaba.com,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org
Subject: Re: [RESEND PATCH v4] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier
Date: Thu, 13 Jan 2022 21:38:30 +0800	[thread overview]
Message-ID: <8e6ee387-7dc1-6c8a-c13f-ec06621e3efa@linux.alibaba.com> (raw)
In-Reply-To: <75dedf3a-41c7-980e-a8a8-74d116ae6663@linux.alibaba.com>

Hi, Bjorn, Rafael,

Happy New Year :)

I am wondering that do you have any comment to this implementation? Any feedback
are welcomed.

Thank you.

Best regrads,
Shuai

在 2021/12/24 PM4:55, Shuai Xue 写道:
> Hi, Bjorn,
> 
> Thank you for your comments.
> 
> 在 2021/12/24 AM8:17, Bjorn Helgaas 写道:
>> [+to Rafael, question about HEST/GHES/SDEI init]
>>
>> On Thu, Dec 23, 2021 at 04:11:11PM +0800, Shuai Xue wrote:
>>> 在 2021/12/22 AM7:17, Bjorn Helgaas 写道:
>>>> On Thu, Dec 16, 2021 at 09:34:56PM +0800, Shuai Xue wrote:
>>>>> On an ACPI system, ACPI is initialised very early from a subsys_initcall(),
>>>>> while SDEI is not ready until a subsys_initcall_sync().
>>>>>
>>>>> The SDEI driver provides functions (e.g. apei_sdei_register_ghes,
>>>>> apei_sdei_unregister_ghes) to register or unregister event callback for
>>>>> dispatcher in firmware. When the GHES driver probing, it registers the
>>>>> corresponding callback according to the notification type specified by
>>>>> GHES. If the GHES notification type is SDEI, the GHES driver will call
>>>>> apei_sdei_register_ghes to register event call.
>>>>>
>>>>> When the firmware emits an event, it migrates the handling of the event
>>>>> into the kernel at the registered entry-point __sdei_asm_handler. And
>>>>> finally, the kernel will call the registered event callback and return
>>>>> status_code to indicate the status of event handling. SDEI_EV_FAILED
>>>>> indicates that the kernel failed to handle the event.
>>>>>
>>>>> Consequently, when an error occurs during kernel booting, the kernel is
>>>>> unable to handle and report errors until the GHES driver is initialized by
>>>>> device_initcall(), in which the event callback is registered. All errors
>>>>> that occurred before GHES initialization are missed and there is no chance
>>>>> to report and find them again.
>>>>>
>>>>> From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
>>>>> memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
>>>>> the estatus memory pool. On the other hand, ghes_init() relies on
>>>>> sdei_init() to detect the SDEI version and the framework for registering
>>>>> and unregistering events.
>>>>
>>>>> By the way, I don't figure out why acpi_hest_init is called in
>>>>> acpi_pci_root_init, it don't rely on any other thing. May it could
>>>>> be moved further, following acpi_iort_init in acpi_init.
>>
>>>>> sdei_init() relies on ACPI table which is initialized
>>>>> subsys_initcall(): acpi_init(), acpi_bus_init(), acpi_load_tables(),
>>>>> acpi_tb_laod_namespace().  May it should be also moved further,
>>>>> after acpi_load_tables.
>>
>>>>> In this patch, move sdei_init and ghes_init as far ahead as
>>>>> possible, right after acpi_hest_init().
>>>>
>>>> I'm having a hard time figuring out the reason for this patch.
>>>>
>>>> Apparently the relevant parts are sdei_init() and ghes_init().
>>>> Today they are executed in that order:
>>>>
>>>>   subsys_initcall_sync(sdei_init);
>>>>   device_initcall(ghes_init);
>>>>
>>>> After this patch, they would be executed in the same order, but called
>>>> explicitly instead of as initcalls:
>>>>
>>>>   acpi_pci_root_init()
>>>>   {
>>>>     acpi_hest_init();
>>>>     sdei_init();
>>>>     ghes_init();
>>>>     ...
>>>>
>>>> Explicit calls are certainly better than initcalls, but that doesn't
>>>> seem to be the reason for this patch.
>>>>
>>>> Does this patch fix a bug?  If so, what is the bug?
>>>
>>> Yes. When the kernel booting, the console logs many times from firmware
>>> before GHES drivers init:
>>>
>>> 	Trip in MM PCIe RAS handle(Intr:910)
>>>   	Clean PE[1.1.1] ERR_STS:0x4000100 -> 0 INT_STS:F0000000
>>> 	Find RP(98:1.0)
>>> 	--Walk dev(98:1.0) CE:0 UCE:4000
>>> 	...
>>> 	ERROR:   sdei_dispatch_event(32a) ret:-1
>>> 	--handler(910) end
>>>
>>> This is because the callback function has not been registered yet.
>>> Previously reported errors will be overwritten by new ones. Therefore,
>>> all errors that occurred before GHES initialization are missed
>>> and there is no chance to report and find them again.
>>>
>>>> You say that currently "errors that occur before GHES initialization
>>>> are missed".  Isn't that still true after this patch?  Does this patch
>>>> merely reduce the time before GHES initialization?  If so, I'm
>>>> dubious, because we have to tolerate an arbitrary amount of time
>>>> there.
>>>
>>> After this patch, there are still errors missing. As you mentioned,
>>> we have to tolerate it until the software reporting capability is built.
>>>
>>> Yes, this patch merely reduce the time before GHES initialization.
>>
>> It's not a bug that errors that happen before the callback are lost.
>> At least, it's not a *Linux* bug.  It might be a poor design of the
>> firmware error reporting interface.
>>
>> If the only point of this patch is to reduce the time before GHES
>> initialization, the commit log should clearly say that.
> 
> Yep, it is a design defect of firmware. I will explicitly document
> the purpose of this patch in next version.
> 
>>> The boot dmesg before this patch:
>>>
>>> 	[    3.688586] HEST: Table parsing has been initialized.
>>> 	...
>>> 	[   33.204340] calling  sdei_init+0x0/0x120 @ 1
>>> 	[   33.208645] sdei: SDEIv1.0 (0x0) detected in firmware.
>>> 	...
>>> 	[   36.005390] calling  ghes_init+0x0/0x11c @ 1
>>> 	[   36.190021] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
>>>
>>>
>>> After this patch, the boot dmesg like bellow:
>>>
>>> 	[    3.688664] HEST: Table parsing has been initialized.
>>> 	[    3.688691] sdei: SDEIv1.0 (0x0) detected in firmware.
>>> 	[    3.694557] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
> 
> After this patch, we use explicit call to init GHES rather than initcall.
> The GHES message here is just to prove that GHES initialization is
> advanced to 3.694557 seconds.
> 
> 
>> [Tangent: I think this GHES message is confusing.  What "APEI bit"
>> does this refer to?  The only bits I remember are the Flags bits in
>> HEST Error Source Descriptor Entries, e.g., ACPI v6.3, sec 18.3.2.
> 
> From the log code behavior, the APEI bit refers to OSC_SB_APEI_SUPPORT.
> (BIT 4, ACPI v6.4, sec 6.2.11.2,  table Table 6.13 Platform-Wide _OSC
> Capabilities DWORD 2. [1])
> 
>     /* code in drivers/acpi/apei/ghes.c */
>     if (rc == 0 && osc_sb_apei_support_acked)
>         pr_info(GHES_PFX "APEI firmware first mode is enabled by APEI bit and WHEA _OSC.\n");
> 
>     /* code in drivers/acpi/bus.c */
>     osc_sb_apei_support_acked =
>         capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
> 
> 
>> "WHEA _OSC" means nothing to me, and I didn't find anything useful
>> with grep, other than that "WHEA" might be an obsolete name for what
>> we now call "APEI".
> 
> Yep, WHEA is an obsolete name. I think apei_osc_setup() is added for compatibility
> in commit eccddd32ced0 ("ACPI, APEI, Add APEI bit support in generic _OSC call")'
> 
> The ACPI spec Platform-Wide OSPM Capabilities defines
> UUID "0811B06E-4A27-44F9-8D60-3CBBC22E7B48" is used in
> acpi_bus_osc_support(). The Microsoft WHEA defines
> UUID "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c" that is used
> in apei_osc_setup(). [2][3]
> 
> From the discussion[4], the GHES message is to distinguish between
> the two specs.
> 
>     To avoid confusion, can we change the message as follow.
> 
>     - generic _OSC succeeded, APEI _OSC failed: APEI firmware first mode is
>     enabled by APEI bit.
>     - generic _OSC failed, APEI _OSC succeeded: APEI firmware first mode is
>     enabled by WHEA _OSC.
>     - both succeeded: APEI firmware first mode is enabled by APEI bit and
>     WHEA _OSC.
>     - both failed: Failed to enable APEI firmware first mode!
> 
> 
>> I don't think there's anything in _OSC that mentions "firmware first."
>>
>> I don't remember anything in the spec about a way to *enable* Firmware
>> First Error Handling needing (I'm looking at ACPI v6.3, sec 18.4).
>>
>> I think the "firmware first" information is useless to the OS -- as
>> far as I can tell, the spec says nothing about anything the OS should
>> do based on the FIRMWARE_FIRST bits.]
> 
> GHES works in "Firmware First" mode to report platform hardware error
> and it depends on APEI interface. (drivers/acpi/apei/ghes.c)
> 
>    * Generic Hardware Error Source provides a way to report platform
>    * hardware errors (such as that from chipset). It works in so called
>    * "Firmware First" mode, that is, hardware errors are reported to
>    * firmware firstly, then reported to Linux by firmware.
> 
> Based on above, I think the GHES message just means that this booting
> kernel supports APEI or WHEA, and GHES report Firmware first errors on
> this machine. Is it fine to keep the message?
> 
> 
>>> As we can see, the initialization of GHES is advanced by 33 seconds.
>>> So, in my opinion, this patch is necessary, right?
>>> (It should be noted that the effect of optimization varies with the platform.)
>>
>>>>> -device_initcall(ghes_init);
>>>>
>>>>>  void __init acpi_pci_root_init(void)
>>>>>  {
>>>>>  	acpi_hest_init();
>>>>> +	sdei_init();
>>>>> +	ghes_init();
>>>>
>>>> What's the connection between PCI, SDEI, and GHES?  As far as I can
>>>> tell, SDEI and GHES are not PCI-specific, so it doesn't seem like they
>>>> should be initialized here in acpi_pci_root_init().
>>>
>>> The only reason is that acpi_hest_init() is initialized here.
>>>
>>> From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
>>> memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
>>> the estatus memory pool. On the other hand, ghes_init() relies on
>>> sdei_init() to detect the SDEI version and the framework for registering
>>> and unregistering events. The dependencies are as follows
>>>
>>> 	ghes_init() => acpi_hest_init()
>>> 	ghes_init() => sdei_init()
>>>
>>> I don't figure out why acpi_hest_init() is called in
>>> acpi_pci_root_init(), it don't rely on any other thing.
>>> I am wondering that should we moved all of them further? e.g.
>>> following acpi_iort_init() in acpi_init().
>>
>> I don't know why acpi_hest_init() is called from acpi_pci_root_init().
>> It looks like HEST can support error sources other than PCI (IA-32
>> Machine Checks, NMIs, GHES, etc.)  It was added by 415e12b23792
>> ("PCI/ACPI: Request _OSC control once for each root bridge (v3)");
>> maybe Rafael remembers why.
>>
>> Seem like acpi_hest_init(), sdei_init(), and ghes_init() should all go
>> somewhere else, but I don't know where.  Maybe somewhere in
>> acpi_init()?
> 
> I tried to move them into acpi_init(), and it works well.
> 
> @@ -1251,6 +1251,9 @@ static int __init acpi_init(void)
> 
>         pci_mmcfg_late_init();
>         acpi_iort_init();
> +       acpi_hest_init();
> +       sdei_init();
> +       ghes_init();
>         acpi_scan_init();
>         acpi_ec_init();
> 
> What's your opinion?
> 
> Best Regards,
> 
> Shuai
> 
> 
> [1] https://uefi.org/specs/ACPI/6.4/06_Device_Configuration/Device_Configuration.html#platform-wide-osc-capabilities-dword-2
> [2] https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg4718503.html
> [3] http://www.guyreams.com/PDF/whea.pdf
> [4] https://patchwork.kernel.org/project/linux-acpi/patch/1308640587-24502-5-git-send-email-ying.huang@intel.com/#1978922

WARNING: multiple messages have this Message-ID (diff)
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: Bjorn Helgaas <helgaas@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: bp@alien8.de, tony.luck@intel.com, james.morse@arm.com,
	lenb@kernel.org, bhelgaas@google.com,
	zhangliguang@linux.alibaba.com, zhuo.song@linux.alibaba.com,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org
Subject: Re: [RESEND PATCH v4] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier
Date: Thu, 13 Jan 2022 21:38:30 +0800	[thread overview]
Message-ID: <8e6ee387-7dc1-6c8a-c13f-ec06621e3efa@linux.alibaba.com> (raw)
In-Reply-To: <75dedf3a-41c7-980e-a8a8-74d116ae6663@linux.alibaba.com>

Hi, Bjorn, Rafael,

Happy New Year :)

I am wondering that do you have any comment to this implementation? Any feedback
are welcomed.

Thank you.

Best regrads,
Shuai

在 2021/12/24 PM4:55, Shuai Xue 写道:
> Hi, Bjorn,
> 
> Thank you for your comments.
> 
> 在 2021/12/24 AM8:17, Bjorn Helgaas 写道:
>> [+to Rafael, question about HEST/GHES/SDEI init]
>>
>> On Thu, Dec 23, 2021 at 04:11:11PM +0800, Shuai Xue wrote:
>>> 在 2021/12/22 AM7:17, Bjorn Helgaas 写道:
>>>> On Thu, Dec 16, 2021 at 09:34:56PM +0800, Shuai Xue wrote:
>>>>> On an ACPI system, ACPI is initialised very early from a subsys_initcall(),
>>>>> while SDEI is not ready until a subsys_initcall_sync().
>>>>>
>>>>> The SDEI driver provides functions (e.g. apei_sdei_register_ghes,
>>>>> apei_sdei_unregister_ghes) to register or unregister event callback for
>>>>> dispatcher in firmware. When the GHES driver probing, it registers the
>>>>> corresponding callback according to the notification type specified by
>>>>> GHES. If the GHES notification type is SDEI, the GHES driver will call
>>>>> apei_sdei_register_ghes to register event call.
>>>>>
>>>>> When the firmware emits an event, it migrates the handling of the event
>>>>> into the kernel at the registered entry-point __sdei_asm_handler. And
>>>>> finally, the kernel will call the registered event callback and return
>>>>> status_code to indicate the status of event handling. SDEI_EV_FAILED
>>>>> indicates that the kernel failed to handle the event.
>>>>>
>>>>> Consequently, when an error occurs during kernel booting, the kernel is
>>>>> unable to handle and report errors until the GHES driver is initialized by
>>>>> device_initcall(), in which the event callback is registered. All errors
>>>>> that occurred before GHES initialization are missed and there is no chance
>>>>> to report and find them again.
>>>>>
>>>>> From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
>>>>> memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
>>>>> the estatus memory pool. On the other hand, ghes_init() relies on
>>>>> sdei_init() to detect the SDEI version and the framework for registering
>>>>> and unregistering events.
>>>>
>>>>> By the way, I don't figure out why acpi_hest_init is called in
>>>>> acpi_pci_root_init, it don't rely on any other thing. May it could
>>>>> be moved further, following acpi_iort_init in acpi_init.
>>
>>>>> sdei_init() relies on ACPI table which is initialized
>>>>> subsys_initcall(): acpi_init(), acpi_bus_init(), acpi_load_tables(),
>>>>> acpi_tb_laod_namespace().  May it should be also moved further,
>>>>> after acpi_load_tables.
>>
>>>>> In this patch, move sdei_init and ghes_init as far ahead as
>>>>> possible, right after acpi_hest_init().
>>>>
>>>> I'm having a hard time figuring out the reason for this patch.
>>>>
>>>> Apparently the relevant parts are sdei_init() and ghes_init().
>>>> Today they are executed in that order:
>>>>
>>>>   subsys_initcall_sync(sdei_init);
>>>>   device_initcall(ghes_init);
>>>>
>>>> After this patch, they would be executed in the same order, but called
>>>> explicitly instead of as initcalls:
>>>>
>>>>   acpi_pci_root_init()
>>>>   {
>>>>     acpi_hest_init();
>>>>     sdei_init();
>>>>     ghes_init();
>>>>     ...
>>>>
>>>> Explicit calls are certainly better than initcalls, but that doesn't
>>>> seem to be the reason for this patch.
>>>>
>>>> Does this patch fix a bug?  If so, what is the bug?
>>>
>>> Yes. When the kernel booting, the console logs many times from firmware
>>> before GHES drivers init:
>>>
>>> 	Trip in MM PCIe RAS handle(Intr:910)
>>>   	Clean PE[1.1.1] ERR_STS:0x4000100 -> 0 INT_STS:F0000000
>>> 	Find RP(98:1.0)
>>> 	--Walk dev(98:1.0) CE:0 UCE:4000
>>> 	...
>>> 	ERROR:   sdei_dispatch_event(32a) ret:-1
>>> 	--handler(910) end
>>>
>>> This is because the callback function has not been registered yet.
>>> Previously reported errors will be overwritten by new ones. Therefore,
>>> all errors that occurred before GHES initialization are missed
>>> and there is no chance to report and find them again.
>>>
>>>> You say that currently "errors that occur before GHES initialization
>>>> are missed".  Isn't that still true after this patch?  Does this patch
>>>> merely reduce the time before GHES initialization?  If so, I'm
>>>> dubious, because we have to tolerate an arbitrary amount of time
>>>> there.
>>>
>>> After this patch, there are still errors missing. As you mentioned,
>>> we have to tolerate it until the software reporting capability is built.
>>>
>>> Yes, this patch merely reduce the time before GHES initialization.
>>
>> It's not a bug that errors that happen before the callback are lost.
>> At least, it's not a *Linux* bug.  It might be a poor design of the
>> firmware error reporting interface.
>>
>> If the only point of this patch is to reduce the time before GHES
>> initialization, the commit log should clearly say that.
> 
> Yep, it is a design defect of firmware. I will explicitly document
> the purpose of this patch in next version.
> 
>>> The boot dmesg before this patch:
>>>
>>> 	[    3.688586] HEST: Table parsing has been initialized.
>>> 	...
>>> 	[   33.204340] calling  sdei_init+0x0/0x120 @ 1
>>> 	[   33.208645] sdei: SDEIv1.0 (0x0) detected in firmware.
>>> 	...
>>> 	[   36.005390] calling  ghes_init+0x0/0x11c @ 1
>>> 	[   36.190021] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
>>>
>>>
>>> After this patch, the boot dmesg like bellow:
>>>
>>> 	[    3.688664] HEST: Table parsing has been initialized.
>>> 	[    3.688691] sdei: SDEIv1.0 (0x0) detected in firmware.
>>> 	[    3.694557] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
> 
> After this patch, we use explicit call to init GHES rather than initcall.
> The GHES message here is just to prove that GHES initialization is
> advanced to 3.694557 seconds.
> 
> 
>> [Tangent: I think this GHES message is confusing.  What "APEI bit"
>> does this refer to?  The only bits I remember are the Flags bits in
>> HEST Error Source Descriptor Entries, e.g., ACPI v6.3, sec 18.3.2.
> 
> From the log code behavior, the APEI bit refers to OSC_SB_APEI_SUPPORT.
> (BIT 4, ACPI v6.4, sec 6.2.11.2,  table Table 6.13 Platform-Wide _OSC
> Capabilities DWORD 2. [1])
> 
>     /* code in drivers/acpi/apei/ghes.c */
>     if (rc == 0 && osc_sb_apei_support_acked)
>         pr_info(GHES_PFX "APEI firmware first mode is enabled by APEI bit and WHEA _OSC.\n");
> 
>     /* code in drivers/acpi/bus.c */
>     osc_sb_apei_support_acked =
>         capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
> 
> 
>> "WHEA _OSC" means nothing to me, and I didn't find anything useful
>> with grep, other than that "WHEA" might be an obsolete name for what
>> we now call "APEI".
> 
> Yep, WHEA is an obsolete name. I think apei_osc_setup() is added for compatibility
> in commit eccddd32ced0 ("ACPI, APEI, Add APEI bit support in generic _OSC call")'
> 
> The ACPI spec Platform-Wide OSPM Capabilities defines
> UUID "0811B06E-4A27-44F9-8D60-3CBBC22E7B48" is used in
> acpi_bus_osc_support(). The Microsoft WHEA defines
> UUID "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c" that is used
> in apei_osc_setup(). [2][3]
> 
> From the discussion[4], the GHES message is to distinguish between
> the two specs.
> 
>     To avoid confusion, can we change the message as follow.
> 
>     - generic _OSC succeeded, APEI _OSC failed: APEI firmware first mode is
>     enabled by APEI bit.
>     - generic _OSC failed, APEI _OSC succeeded: APEI firmware first mode is
>     enabled by WHEA _OSC.
>     - both succeeded: APEI firmware first mode is enabled by APEI bit and
>     WHEA _OSC.
>     - both failed: Failed to enable APEI firmware first mode!
> 
> 
>> I don't think there's anything in _OSC that mentions "firmware first."
>>
>> I don't remember anything in the spec about a way to *enable* Firmware
>> First Error Handling needing (I'm looking at ACPI v6.3, sec 18.4).
>>
>> I think the "firmware first" information is useless to the OS -- as
>> far as I can tell, the spec says nothing about anything the OS should
>> do based on the FIRMWARE_FIRST bits.]
> 
> GHES works in "Firmware First" mode to report platform hardware error
> and it depends on APEI interface. (drivers/acpi/apei/ghes.c)
> 
>    * Generic Hardware Error Source provides a way to report platform
>    * hardware errors (such as that from chipset). It works in so called
>    * "Firmware First" mode, that is, hardware errors are reported to
>    * firmware firstly, then reported to Linux by firmware.
> 
> Based on above, I think the GHES message just means that this booting
> kernel supports APEI or WHEA, and GHES report Firmware first errors on
> this machine. Is it fine to keep the message?
> 
> 
>>> As we can see, the initialization of GHES is advanced by 33 seconds.
>>> So, in my opinion, this patch is necessary, right?
>>> (It should be noted that the effect of optimization varies with the platform.)
>>
>>>>> -device_initcall(ghes_init);
>>>>
>>>>>  void __init acpi_pci_root_init(void)
>>>>>  {
>>>>>  	acpi_hest_init();
>>>>> +	sdei_init();
>>>>> +	ghes_init();
>>>>
>>>> What's the connection between PCI, SDEI, and GHES?  As far as I can
>>>> tell, SDEI and GHES are not PCI-specific, so it doesn't seem like they
>>>> should be initialized here in acpi_pci_root_init().
>>>
>>> The only reason is that acpi_hest_init() is initialized here.
>>>
>>> From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
>>> memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
>>> the estatus memory pool. On the other hand, ghes_init() relies on
>>> sdei_init() to detect the SDEI version and the framework for registering
>>> and unregistering events. The dependencies are as follows
>>>
>>> 	ghes_init() => acpi_hest_init()
>>> 	ghes_init() => sdei_init()
>>>
>>> I don't figure out why acpi_hest_init() is called in
>>> acpi_pci_root_init(), it don't rely on any other thing.
>>> I am wondering that should we moved all of them further? e.g.
>>> following acpi_iort_init() in acpi_init().
>>
>> I don't know why acpi_hest_init() is called from acpi_pci_root_init().
>> It looks like HEST can support error sources other than PCI (IA-32
>> Machine Checks, NMIs, GHES, etc.)  It was added by 415e12b23792
>> ("PCI/ACPI: Request _OSC control once for each root bridge (v3)");
>> maybe Rafael remembers why.
>>
>> Seem like acpi_hest_init(), sdei_init(), and ghes_init() should all go
>> somewhere else, but I don't know where.  Maybe somewhere in
>> acpi_init()?
> 
> I tried to move them into acpi_init(), and it works well.
> 
> @@ -1251,6 +1251,9 @@ static int __init acpi_init(void)
> 
>         pci_mmcfg_late_init();
>         acpi_iort_init();
> +       acpi_hest_init();
> +       sdei_init();
> +       ghes_init();
>         acpi_scan_init();
>         acpi_ec_init();
> 
> What's your opinion?
> 
> Best Regards,
> 
> Shuai
> 
> 
> [1] https://uefi.org/specs/ACPI/6.4/06_Device_Configuration/Device_Configuration.html#platform-wide-osc-capabilities-dword-2
> [2] https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg4718503.html
> [3] http://www.guyreams.com/PDF/whea.pdf
> [4] https://patchwork.kernel.org/project/linux-acpi/patch/1308640587-24502-5-git-send-email-ying.huang@intel.com/#1978922

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-01-13 13:38 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-26  7:04 [RFC PATCH v4] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier Shuai Xue
2021-11-26  7:04 ` Shuai Xue
2021-12-02 12:50 ` Shuai Xue
2021-12-02 12:50   ` Shuai Xue
2021-12-16 13:34 ` [RESEND " Shuai Xue
2021-12-16 13:34   ` Shuai Xue
2021-12-17 18:17   ` Rafael J. Wysocki
2021-12-17 18:17     ` Rafael J. Wysocki
2021-12-19  4:04     ` Shuai Xue
2021-12-19  4:04       ` Shuai Xue
2021-12-21 23:17   ` Bjorn Helgaas
2021-12-21 23:17     ` Bjorn Helgaas
2021-12-23  8:11     ` Shuai Xue
2021-12-23  8:11       ` Shuai Xue
2021-12-24  0:17       ` Bjorn Helgaas
2021-12-24  0:17         ` Bjorn Helgaas
2021-12-24  8:55         ` Shuai Xue
2021-12-24  8:55           ` Shuai Xue
2022-01-13 13:38           ` Shuai Xue [this message]
2022-01-13 13:38             ` Shuai Xue
2022-01-16  8:43 ` [PATCH v5] " Shuai Xue
2022-01-16  8:43   ` Shuai Xue
2022-01-18 22:49   ` Bjorn Helgaas
2022-01-18 22:49     ` Bjorn Helgaas
2022-01-19  6:40     ` Shuai Xue
2022-01-19  6:40       ` Shuai Xue
2022-01-19 20:42       ` Bjorn Helgaas
2022-01-19 20:42         ` Bjorn Helgaas
2022-01-20  2:40         ` Shuai Xue
2022-01-20  2:40           ` Shuai Xue
2022-01-20 16:24         ` Rafael J. Wysocki
2022-01-20 16:24           ` Rafael J. Wysocki
2022-01-20  5:05 ` [PATCH v6] ACPI: explicit init HEST, SDEI and GHES in apci_init Shuai Xue
2022-01-20  5:05   ` Shuai Xue
2022-01-20 16:22   ` Bjorn Helgaas
2022-01-20 16:22     ` Bjorn Helgaas
2022-01-21  3:43     ` Shuai Xue
2022-01-21  3:43       ` Shuai Xue
2022-01-21 19:46       ` Bjorn Helgaas
2022-01-21 19:46         ` Bjorn Helgaas
2022-01-22  5:26 ` [PATCH v7 1/2] ACPI: APEI: explicit init HEST " Shuai Xue
2022-01-22  5:26   ` Shuai Xue
2022-02-10  9:39   ` Shuai Xue
2022-02-10  9:39     ` Shuai Xue
2022-02-14 18:51     ` Rafael J. Wysocki
2022-02-14 18:51       ` Rafael J. Wysocki
2022-02-21 18:18   ` Nathan Chancellor
2022-02-21 18:18     ` Nathan Chancellor
2022-02-21 18:25     ` Rafael J. Wysocki
2022-02-21 18:25       ` Rafael J. Wysocki
2022-02-22  6:03     ` Shuai Xue
2022-02-22  6:03       ` Shuai Xue
2022-01-22  5:26 ` [PATCH v7 2/2] ACPI: APEI: rename ghes_init with an "acpi_" prefix Shuai Xue
2022-01-22  5:26   ` Shuai Xue
2022-02-27 12:25 ` [PATCH v8 1/2] ACPI: APEI: explicit init HEST and GHES in apci_init Shuai Xue
2022-02-27 12:25   ` Shuai Xue
2022-03-03 19:27   ` Rafael J. Wysocki
2022-03-03 19:27     ` Rafael J. Wysocki
2022-02-27 12:25 ` [PATCH v8 2/2] ACPI: APEI: rename ghes_init with an "acpi_" prefix Shuai Xue
2022-02-27 12:25   ` Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e6ee387-7dc1-6c8a-c13f-ec06621e3efa@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=helgaas@kernel.org \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    --cc=zhangliguang@linux.alibaba.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.