linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] kernel side can NOT trigger memory error with einj
@ 2022-03-08  5:19 Shuai Xue
  2022-03-16 17:29 ` Luck, Tony
  0 siblings, 1 reply; 8+ messages in thread
From: Shuai Xue @ 2022-03-08  5:19 UTC (permalink / raw)
  To: rjw, lenb, james.morse, Luck, Tony, bp, linux-acpi, linux-kernel,
	graeme.gregory, will.deacon, myron.stowe, len.brown, ying.huang

Hi folks,

If we inject an memory error at physical memory address, e.g. 0x92f033038,
used by a user space process:

	echo 0x92f033038 > /sys/kernel/debug/apei/einj/param1
	echo 0xfffffffffffff000 > /sys/kernel/debug/apei/einj/param2
	echo 0x1 > /sys/kernel/debug/apei/einj/flags
	echo 0x8 > /sys/kernel/debug/apei/einj/error_type
	echo 1 > /sys/kernel/debug/apei/einj/error_inject

Then the following error will be reported in dmesg:

    ACPI: [Firmware Bug]: requested region covers kernel memory @ 0x000000092f033038

After digging into einj trigger interface, I think it's a kernel bug.

On our platform, firmware relies on kernel to trigger an injected error.
Specifically, it populates trigger_tab with the injected physical memory
address, which is set in param1. It is expected to map the RAM address and
run read action. And the execution path is as follows:

    __einj_error_trigger
        => apei_resources_request
            => apei_exec_pre_map_gars
                => apei_exec_run

The root cause is because:

1. Commit fdea163d8c17 ("ACPI, APEI, EINJ, Fix resource conflict on some
machine") removes the injecting memory address range which conflits with
regular memory from trigger table resources. It make sense when calling
apei_resources_request(). **However, the actual mapping operation in
apei_exec_pre_map_gars() with trigger_ctx. And the conflit physical address
is still in trigger_ctx.**

2. Then apei_exec_pre_map_gars() will finally call acpi_os_ioremap().
The injected physical memory address is EFI_CONVENTIONAL_MEMORY and
memblock_is_map_memory is true (arch/arm64/kernel/acpi.c) so that we see
the printed message.

        case EFI_CONVENTIONAL_MEMORY:
        case EFI_PERSISTENT_MEMORY:
            if (memblock_is_map_memory(phys) ||
                !memblock_is_region_memory(phys, size)) {
                pr_warn(FW_BUG "requested region covers kernel memory @ %pa\n", &phys);
                return NULL;
            }

3. On the other hand, commit ba242d5b1a84 ("ACPI, APEI: Add RAM mapping support to ACPI")
add RAM support with kmap. But after commit aafc65c731fe ("ACPI: add arm64 to the
platforms that use ioremap"), ioremap is used to map memory. However, the
ioremap implementation (arch/arm64/mm/ioremap.c) not allowed to map RAM at
all.

    /*
     * Don't allow RAM to be mapped.
     */
    if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
        return NULL;

**As a result, the error could not be triggered, which is not expected if we want
to inject an error to a physical page used by process.**

A normal workflow maps Generic Address Register (GAR) by acpi_os_ioremap
and add its virtual address into acpi_ioremaps. The execution path is as
follows:

    apei_exec_pre_map_gars
        => pre_map_gar_callback
            => apei_map_generic_address
                => acpi_os_map_generic_address
                    => acpi_os_map_iomem    /* add mapped VA into acpi_ioremaps */
                        =>    acpi_map
                            => acpi_os_ioremap /**/

Then, a read or write action is taken. It will check if the physical
address is mapped from acpi_ioremap. If yes, the value is read directly.
Otherwise, acpi_os_ioremap the physical address first. The execution path
is as follows:

    __apei_exec_run
        => apei_exec_read_register
            => apei_read
                => acpi_os_read_memory
                    => acpi_map_vaddr_lookup    /* lookup VA of PA from acpi_ioremap */
                    => acpi_os_ioremap

It works well for reserved memory, but not for common case in which we want
to inject normal memory.


A hacking way to address this issue is that map RAM memory with kmap
instead of apei_exec_pre_map_gars, and read it directly instead of
apei_exec_run.
-       rc = apei_exec_pre_map_gars(&trigger_ctx);
-       if (rc)
-               goto out_release;
+       volatile long *ptr;
+       long tmp;
+       unsigned long pfn;
+       pfn = param1 >> PAGE_SHIFT;

-       rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR);
+       ptr = kmap(pfn_to_page(pfn));
+       tmp = *(ptr + (param1 & ~ PAGE_MASK));

-       apei_exec_post_unmap_gars(&trigger_ctx);

I am wondering that should we use kmap to map RAM in acpi_map or add a
another path to address this issue? Any comment is welcomed.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-22  3:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08  5:19 [BUG] kernel side can NOT trigger memory error with einj Shuai Xue
2022-03-16 17:29 ` Luck, Tony
2022-03-17  2:56   ` Shuai Xue
2022-03-17 16:57     ` Luck, Tony
2022-03-20 13:11       ` Shuai Xue
2022-03-21  2:43         ` Huang, Ying
2022-03-22  3:36           ` Shuai Xue
2022-03-21 15:54         ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).