All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Shuai <suagrfillet@gmail.com>
To: alexghiti@rivosinc.com, robh@kernel.org,
	Andrew Jones <ajones@ventanamicro.com>,
	anup@brainfault.org, palmer@rivosinc.com,
	jeeheng.sia@starfivetech.com, leyfoon.tan@starfivetech.com,
	mason.huo@starfivetech.com,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Conor Dooley <conor.dooley@microchip.com>,
	Guo Ren <guoren@kernel.org>
Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Bug report: kernel paniced when system hibernates
Date: Tue, 16 May 2023 09:24:07 +0000	[thread overview]
Message-ID: <CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com> (raw)

Description of problem:

The latest hibernation support[1] of RISC-V Linux produced a kernel panic.
The entire log has been posted at this link: https://termbin.com/sphl .

How reproducible:

You can reproduce it with the following step :

1. prepare the environment with
- Qemu-virt v8.0.0 (with OpenSbi v1.2)
- Linux v6.4-rc1

2. start the Qemu virt
```sh
$ cat ~/8_riscv/start_latest.sh
#!/bin/bash
/home/song/8_riscv/3_acpi/qemu/ooo/usr/local/bin/qemu-system-riscv64 \
-smp 2 -m 4G -nographic -machine virt \
-kernel /home/song/9_linux/linux/00_rv_test/arch/riscv/boot/Image \
-append "root=/dev/vda ro eaylycon=uart8250,mmio,0x10000000
early_ioremap_debug console=ttyS0 loglevel=8 memblock=debug
no_console_suspend audit=0 3" \
-drive file=/home/song/8_riscv/fedora/stage4-disk.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 \
-drive file=/home/song/8_riscv/fedora/adisk.qcow2,format=qcow2,id=hd1 \
-device virtio-blk-device,drive=hd1 \
-gdb tcp::1236 #-S
```
3. execute hibernation

```sh
swapon /dev/vdb2 # this is my swap disk

echo disk > /sys/power/state
```

4. Then you will encounter the kernel panic logged in the above link


Other Information:

After my initial and incomplete dig-up, the commit (3335068f8721
"riscv: Use PUD/P4D/PGD pages for the linear mapping")[2]
is closely related to this panic. This commit uses re-defined
`MIN_MEMBLOCK_ADDR` to discover the entire system memory
and extends the `va_pa_offset` from `kernel_map.phys_addr` to
`phys_ram_base` for linear memory mapping.

If the firmware delivered the firmware memory region (like: a PMP
protected region in OpenSbi) without "no-map" propriety,
this commit will result in firmware memory being directly mapped by
`create_linear_mapping_page_table()`.

We can see the mapping via ptdump :
```c
---[ Linear mapping ]---
0xff60000000000000-0xff60000000200000 0x0000000080000000 2M PMD D A G
. . W R V ------------- the firmware memory
0xff60000000200000-0xff60000000c00000 0x0000000080200000 10M PMD D A G . . . R V
0xff60000000c00000-0xff60000001000000 0x0000000080c00000 4M PMD D A G . . W R V
0xff60000001000000-0xff60000001600000 0x0000000081000000 6M PMD D A G . . . R V
0xff60000001600000-0xff60000040000000 0x0000000081600000 1002M PMD D A
G . . W R V
0xff60000040000000-0xff60000100000000 0x00000000c0000000 3G PUD D A G . . W R V
---[ Modules/BPF mapping ]---
---[ Kernel mapping ]---
0xffffffff80000000-0xffffffff80a00000 0x0000000080200000 10M PMD D A G . X . R V
0xffffffff80a00000-0xffffffff80c00000 0x0000000080c00000 2M PMD D A G . . . R V
0xffffffff80c00000-0xffffffff80e00000 0x0000000080e00000 2M PMD D A G . . W R V
0xffffffff80e00000-0xffffffff81400000 0x0000000081000000 6M PMD D A G . . . R V
0xffffffff81400000-0xffffffff81800000 0x0000000081600000 4M PMD
```

In the hibernation process, `swsusp_save()` calls
`copy_data_pages(&copy_bm, &orig_bm)` to copy these two memory
bitmaps,
the Oops(load access fault) occurred while copying the page of
PAGE_OFFSET (which maps the firmware memory).

I also did two other tests:
Test1:

The hibernation works well in the kernel with the commit 3335068f8721
reverted at least in the current environment.

Test2:

I built a simple kernel module to simulate the access of the value of
`PAGE_OFFSET` address, and the same panic occurred with the load
access fault.
So hibernation seems not the only case to trigger this panic.

Finally, should we always leave the firmware memory with
`MEMBLOCK_NOMAP` flag by some efforts from Linux or OpenSbi (at least
in the current environment) or any other suggestions?

Please correct me if I'm wrong.

[1]: https://lore.kernel.org/r/20230330064321.1008373-5-jeeheng.sia@starfivetech.com
[2]: https://lore.kernel.org/r/20230324155421.271544-4-alexghiti@rivosinc.com

-- 
Thanks,
Song

WARNING: multiple messages have this Message-ID (diff)
From: Song Shuai <suagrfillet@gmail.com>
To: alexghiti@rivosinc.com, robh@kernel.org,
	 Andrew Jones <ajones@ventanamicro.com>,
	anup@brainfault.org, palmer@rivosinc.com,
	 jeeheng.sia@starfivetech.com, leyfoon.tan@starfivetech.com,
	 mason.huo@starfivetech.com,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Conor Dooley <conor.dooley@microchip.com>,
	Guo Ren <guoren@kernel.org>
Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Bug report: kernel paniced when system hibernates
Date: Tue, 16 May 2023 09:24:07 +0000	[thread overview]
Message-ID: <CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com> (raw)

Description of problem:

The latest hibernation support[1] of RISC-V Linux produced a kernel panic.
The entire log has been posted at this link: https://termbin.com/sphl .

How reproducible:

You can reproduce it with the following step :

1. prepare the environment with
- Qemu-virt v8.0.0 (with OpenSbi v1.2)
- Linux v6.4-rc1

2. start the Qemu virt
```sh
$ cat ~/8_riscv/start_latest.sh
#!/bin/bash
/home/song/8_riscv/3_acpi/qemu/ooo/usr/local/bin/qemu-system-riscv64 \
-smp 2 -m 4G -nographic -machine virt \
-kernel /home/song/9_linux/linux/00_rv_test/arch/riscv/boot/Image \
-append "root=/dev/vda ro eaylycon=uart8250,mmio,0x10000000
early_ioremap_debug console=ttyS0 loglevel=8 memblock=debug
no_console_suspend audit=0 3" \
-drive file=/home/song/8_riscv/fedora/stage4-disk.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 \
-drive file=/home/song/8_riscv/fedora/adisk.qcow2,format=qcow2,id=hd1 \
-device virtio-blk-device,drive=hd1 \
-gdb tcp::1236 #-S
```
3. execute hibernation

```sh
swapon /dev/vdb2 # this is my swap disk

echo disk > /sys/power/state
```

4. Then you will encounter the kernel panic logged in the above link


Other Information:

After my initial and incomplete dig-up, the commit (3335068f8721
"riscv: Use PUD/P4D/PGD pages for the linear mapping")[2]
is closely related to this panic. This commit uses re-defined
`MIN_MEMBLOCK_ADDR` to discover the entire system memory
and extends the `va_pa_offset` from `kernel_map.phys_addr` to
`phys_ram_base` for linear memory mapping.

If the firmware delivered the firmware memory region (like: a PMP
protected region in OpenSbi) without "no-map" propriety,
this commit will result in firmware memory being directly mapped by
`create_linear_mapping_page_table()`.

We can see the mapping via ptdump :
```c
---[ Linear mapping ]---
0xff60000000000000-0xff60000000200000 0x0000000080000000 2M PMD D A G
. . W R V ------------- the firmware memory
0xff60000000200000-0xff60000000c00000 0x0000000080200000 10M PMD D A G . . . R V
0xff60000000c00000-0xff60000001000000 0x0000000080c00000 4M PMD D A G . . W R V
0xff60000001000000-0xff60000001600000 0x0000000081000000 6M PMD D A G . . . R V
0xff60000001600000-0xff60000040000000 0x0000000081600000 1002M PMD D A
G . . W R V
0xff60000040000000-0xff60000100000000 0x00000000c0000000 3G PUD D A G . . W R V
---[ Modules/BPF mapping ]---
---[ Kernel mapping ]---
0xffffffff80000000-0xffffffff80a00000 0x0000000080200000 10M PMD D A G . X . R V
0xffffffff80a00000-0xffffffff80c00000 0x0000000080c00000 2M PMD D A G . . . R V
0xffffffff80c00000-0xffffffff80e00000 0x0000000080e00000 2M PMD D A G . . W R V
0xffffffff80e00000-0xffffffff81400000 0x0000000081000000 6M PMD D A G . . . R V
0xffffffff81400000-0xffffffff81800000 0x0000000081600000 4M PMD
```

In the hibernation process, `swsusp_save()` calls
`copy_data_pages(&copy_bm, &orig_bm)` to copy these two memory
bitmaps,
the Oops(load access fault) occurred while copying the page of
PAGE_OFFSET (which maps the firmware memory).

I also did two other tests:
Test1:

The hibernation works well in the kernel with the commit 3335068f8721
reverted at least in the current environment.

Test2:

I built a simple kernel module to simulate the access of the value of
`PAGE_OFFSET` address, and the same panic occurred with the load
access fault.
So hibernation seems not the only case to trigger this panic.

Finally, should we always leave the firmware memory with
`MEMBLOCK_NOMAP` flag by some efforts from Linux or OpenSbi (at least
in the current environment) or any other suggestions?

Please correct me if I'm wrong.

[1]: https://lore.kernel.org/r/20230330064321.1008373-5-jeeheng.sia@starfivetech.com
[2]: https://lore.kernel.org/r/20230324155421.271544-4-alexghiti@rivosinc.com

-- 
Thanks,
Song

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

             reply	other threads:[~2023-05-16  9:25 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-16  9:24 Song Shuai [this message]
2023-05-16  9:24 ` Bug report: kernel paniced when system hibernates Song Shuai
2023-05-16  9:55 ` JeeHeng Sia
2023-05-16  9:55   ` JeeHeng Sia
2023-05-16 11:14   ` Alexandre Ghiti
2023-05-16 11:14     ` Alexandre Ghiti
2023-05-16 11:27     ` JeeHeng Sia
2023-05-16 11:27       ` JeeHeng Sia
2023-05-17  8:33       ` Alexandre Ghiti
2023-05-17  8:33         ` Alexandre Ghiti
2023-05-18  4:06         ` JeeHeng Sia
2023-05-18  4:06           ` JeeHeng Sia
2023-05-16 11:12 ` Alexandre Ghiti
2023-05-16 11:12   ` Alexandre Ghiti
2023-05-17  8:58   ` Alexandre Ghiti
2023-05-17  8:58     ` Alexandre Ghiti
2023-05-17 11:05     ` Song Shuai
2023-05-17 11:05       ` Song Shuai
     [not found]       ` <CAHVXubgjgMvFV0MOABbtKr+2TH85+0kow7wOrjxFCP5iXt1saQ@mail.gmail.com>
2023-05-17 14:42         ` Fwd: " Alexandre Ghiti
2023-05-17 14:42           ` Alexandre Ghiti
2023-05-18  1:29           ` JeeHeng Sia
2023-05-18  1:29             ` JeeHeng Sia
2023-05-18  9:13             ` Alexandre Ghiti
2023-05-18  9:13               ` Alexandre Ghiti
2023-05-18  3:29           ` Song Shuai
2023-05-18  3:29             ` Song Shuai
2023-05-18 11:54             ` Alexandre Ghiti
2023-05-18 11:54               ` Alexandre Ghiti
2023-05-17 11:27     ` Conor Dooley
2023-05-17 11:27       ` Conor Dooley
2023-05-17 14:55       ` Alexandre Ghiti
2023-05-17 14:55         ` Alexandre Ghiti
2023-05-18  6:53         ` Anup Patel
2023-05-18  6:53           ` Anup Patel
2023-05-18  7:59           ` Conor Dooley
2023-05-18  7:59             ` Conor Dooley
2023-05-18  8:41             ` Alexandre Ghiti
2023-05-18  8:41               ` Alexandre Ghiti
2023-05-18 10:35               ` Conor Dooley
2023-05-18 10:35                 ` Conor Dooley
2023-05-18 11:58                 ` Alexandre Ghiti
2023-05-18 11:58                   ` Alexandre Ghiti
2023-05-24 23:53                   ` Atish Patra
2023-05-24 23:53                     ` Atish Patra
2023-05-25 12:55                     ` Conor Dooley
2023-05-25 12:55                       ` Conor Dooley
2023-05-18 12:21                 ` JeeHeng Sia
2023-05-18 12:21                   ` JeeHeng Sia
2023-05-18 12:09           ` Alexandre Ghiti
2023-05-18 12:09             ` Alexandre Ghiti
2023-05-18 14:04             ` Anup Patel
2023-05-18 14:04               ` Anup Patel
2023-05-24 13:49               ` Conor Dooley
2023-05-24 13:49                 ` Conor Dooley
2023-05-24 13:57                 ` Alexandre Ghiti
2023-05-24 13:57                   ` Alexandre Ghiti
2023-05-24 15:59                   ` Conor Dooley
2023-05-24 15:59                     ` Conor Dooley
2023-05-24 23:45               ` Atish Patra
2023-05-24 23:45                 ` Atish Patra
2023-05-25 13:08                 ` Conor Dooley
2023-05-25 13:08                   ` Conor Dooley
2023-05-25 13:21                   ` Anup Patel
2023-05-25 13:21                     ` Anup Patel
2023-05-25 13:37                     ` Conor Dooley
2023-05-25 13:37                       ` Conor Dooley
2023-05-25 13:43                       ` Anup Patel
2023-05-25 13:43                         ` Anup Patel
2023-05-25 13:55                         ` Conor Dooley
2023-05-25 13:55                           ` Conor Dooley
2023-05-25 13:59                           ` Anup Patel
2023-05-25 13:59                             ` Anup Patel
2023-05-25 14:20                             ` Conor Dooley
2023-05-25 14:20                               ` Conor Dooley
2023-05-25 17:39                               ` Atish Patra
2023-05-25 17:39                                 ` Atish Patra
2023-05-25 18:22                                 ` Conor Dooley
2023-05-25 18:22                                   ` Conor Dooley
2023-05-25 18:37                                   ` Atish Patra
2023-05-25 18:37                                     ` Atish Patra
2023-05-25 18:39                                     ` Conor Dooley
2023-05-25 18:39                                       ` Conor Dooley
2023-05-25 20:06                                       ` Atish Patra
2023-05-25 20:06                                         ` Atish Patra
2023-05-25 21:24                                         ` Conor Dooley
2023-05-25 21:24                                           ` Conor Dooley
2023-05-26 13:14                                           ` Alexandre Ghiti
2023-05-26 13:14                                             ` Alexandre Ghiti
2023-05-26 14:59                                             ` Conor Dooley
2023-05-26 14:59                                               ` Conor Dooley
2023-05-26 15:12                                               ` Alexandre Ghiti
2023-05-26 15:12                                                 ` Alexandre Ghiti
2023-05-26 15:17                                                 ` Anup Patel
2023-05-26 15:17                                                   ` Anup Patel
2023-05-26 15:22                                                   ` Alexandre Ghiti
2023-05-26 15:22                                                     ` Alexandre Ghiti
2023-05-26 18:48                                                     ` Atish Patra
2023-05-26 18:48                                                       ` Atish Patra
2023-05-16 11:35 ` Conor Dooley
2023-05-16 11:35   ` Conor Dooley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com' \
    --to=suagrfillet@gmail.com \
    --cc=ajones@ventanamicro.com \
    --cc=alexghiti@rivosinc.com \
    --cc=anup@brainfault.org \
    --cc=conor.dooley@microchip.com \
    --cc=guoren@kernel.org \
    --cc=jeeheng.sia@starfivetech.com \
    --cc=leyfoon.tan@starfivetech.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mason.huo@starfivetech.com \
    --cc=palmer@rivosinc.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.