All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] RISC-V Hibernation Support
@ 2023-01-27  9:10 ` Sia Jee Heng
  0 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

This series adds RISC-V Hibernation/suspend to disk support.
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start to
restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

At high-level, this series includes the following changes:
1) Change suspend_save_csrs() and suspend_restore_csrs()
   to public function as these functions are common to
   suspend/hibernation. (patch 1)
2) Refactor the common code in the __cpu_resume_enter() function and
   __hibernate_cpu_resume() function. The common code are used by
   hibernation and suspend. (patch 2)
3) Enhance kernel_page_present() function to support huge page. (patch 3)
4) Add arch/riscv low level functions to support
   hibernation/suspend to disk. (patch 4)

The above patches are based on kernel v6.2-rc5 and are tested on
StarFive VF2 SBC board and Qemu. 
ACPI platform mode is not supported in this series.

Changes since v2:
- Rebased to kernel v6.2-rc5
- Refactor the common code used by hibernation and suspend
- Create copy_page macro
- Solved other comments from Andrew and Conor

Changes since v1:
- Rebased to kernel v6.2-rc3
- Fixed bot's compilation error

Sia Jee Heng (4):
  RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public
    function
  RISC-V: Factor out common code of __cpu_resume_enter()
  RISC-V: mm: Enable huge page support to kernel_page_present() function
  RISC-V: Add arch functions to support hibernation/suspend-to-disk

 arch/riscv/Kconfig                 |   7 +
 arch/riscv/include/asm/assembler.h |  82 +++++++
 arch/riscv/include/asm/suspend.h   |  24 ++
 arch/riscv/kernel/Makefile         |   1 +
 arch/riscv/kernel/asm-offsets.c    |   5 +
 arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
 arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
 arch/riscv/kernel/suspend.c        |   4 +-
 arch/riscv/kernel/suspend_entry.S  |  34 +--
 arch/riscv/mm/pageattr.c           |   6 +
 10 files changed, 579 insertions(+), 33 deletions(-)
 create mode 100644 arch/riscv/include/asm/assembler.h
 create mode 100644 arch/riscv/kernel/hibernate-asm.S
 create mode 100644 arch/riscv/kernel/hibernate.c


base-commit: 7c46948a6e9cf47ed03b0d489fde894ad46f1437
-- 
2.34.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/4] RISC-V Hibernation Support
@ 2023-01-27  9:10 ` Sia Jee Heng
  0 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

This series adds RISC-V Hibernation/suspend to disk support.
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start to
restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

At high-level, this series includes the following changes:
1) Change suspend_save_csrs() and suspend_restore_csrs()
   to public function as these functions are common to
   suspend/hibernation. (patch 1)
2) Refactor the common code in the __cpu_resume_enter() function and
   __hibernate_cpu_resume() function. The common code are used by
   hibernation and suspend. (patch 2)
3) Enhance kernel_page_present() function to support huge page. (patch 3)
4) Add arch/riscv low level functions to support
   hibernation/suspend to disk. (patch 4)

The above patches are based on kernel v6.2-rc5 and are tested on
StarFive VF2 SBC board and Qemu. 
ACPI platform mode is not supported in this series.

Changes since v2:
- Rebased to kernel v6.2-rc5
- Refactor the common code used by hibernation and suspend
- Create copy_page macro
- Solved other comments from Andrew and Conor

Changes since v1:
- Rebased to kernel v6.2-rc3
- Fixed bot's compilation error

Sia Jee Heng (4):
  RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public
    function
  RISC-V: Factor out common code of __cpu_resume_enter()
  RISC-V: mm: Enable huge page support to kernel_page_present() function
  RISC-V: Add arch functions to support hibernation/suspend-to-disk

 arch/riscv/Kconfig                 |   7 +
 arch/riscv/include/asm/assembler.h |  82 +++++++
 arch/riscv/include/asm/suspend.h   |  24 ++
 arch/riscv/kernel/Makefile         |   1 +
 arch/riscv/kernel/asm-offsets.c    |   5 +
 arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
 arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
 arch/riscv/kernel/suspend.c        |   4 +-
 arch/riscv/kernel/suspend_entry.S  |  34 +--
 arch/riscv/mm/pageattr.c           |   6 +
 10 files changed, 579 insertions(+), 33 deletions(-)
 create mode 100644 arch/riscv/include/asm/assembler.h
 create mode 100644 arch/riscv/kernel/hibernate-asm.S
 create mode 100644 arch/riscv/kernel/hibernate.c


base-commit: 7c46948a6e9cf47ed03b0d489fde894ad46f1437
-- 
2.34.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
  2023-01-27  9:10 ` Sia Jee Heng
@ 2023-01-27  9:10   ` Sia Jee Heng
  -1 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

Currently suspend_save_csrs() and suspend_restore_csrs() functions are
statically defined in the suspend.c. Change the function's attribute
to public so that the functions can be used by hibernation as well.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
---
 arch/riscv/include/asm/suspend.h | 3 +++
 arch/riscv/kernel/suspend.c      | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 8be391c2aecb..75419c5ca272 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -33,4 +33,7 @@ int cpu_suspend(unsigned long arg,
 /* Low-level CPU resume entry function */
 int __cpu_resume_enter(unsigned long hartid, unsigned long context);
 
+/* Used to save and restore the csr */
+void suspend_save_csrs(struct suspend_context *context);
+void suspend_restore_csrs(struct suspend_context *context);
 #endif
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
index 9ba24fb8cc93..3c89b8ec69c4 100644
--- a/arch/riscv/kernel/suspend.c
+++ b/arch/riscv/kernel/suspend.c
@@ -8,7 +8,7 @@
 #include <asm/csr.h>
 #include <asm/suspend.h>
 
-static void suspend_save_csrs(struct suspend_context *context)
+void suspend_save_csrs(struct suspend_context *context)
 {
 	context->scratch = csr_read(CSR_SCRATCH);
 	context->tvec = csr_read(CSR_TVEC);
@@ -29,7 +29,7 @@ static void suspend_save_csrs(struct suspend_context *context)
 #endif
 }
 
-static void suspend_restore_csrs(struct suspend_context *context)
+void suspend_restore_csrs(struct suspend_context *context)
 {
 	csr_write(CSR_SCRATCH, context->scratch);
 	csr_write(CSR_TVEC, context->tvec);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
@ 2023-01-27  9:10   ` Sia Jee Heng
  0 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

Currently suspend_save_csrs() and suspend_restore_csrs() functions are
statically defined in the suspend.c. Change the function's attribute
to public so that the functions can be used by hibernation as well.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
---
 arch/riscv/include/asm/suspend.h | 3 +++
 arch/riscv/kernel/suspend.c      | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 8be391c2aecb..75419c5ca272 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -33,4 +33,7 @@ int cpu_suspend(unsigned long arg,
 /* Low-level CPU resume entry function */
 int __cpu_resume_enter(unsigned long hartid, unsigned long context);
 
+/* Used to save and restore the csr */
+void suspend_save_csrs(struct suspend_context *context);
+void suspend_restore_csrs(struct suspend_context *context);
 #endif
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
index 9ba24fb8cc93..3c89b8ec69c4 100644
--- a/arch/riscv/kernel/suspend.c
+++ b/arch/riscv/kernel/suspend.c
@@ -8,7 +8,7 @@
 #include <asm/csr.h>
 #include <asm/suspend.h>
 
-static void suspend_save_csrs(struct suspend_context *context)
+void suspend_save_csrs(struct suspend_context *context)
 {
 	context->scratch = csr_read(CSR_SCRATCH);
 	context->tvec = csr_read(CSR_TVEC);
@@ -29,7 +29,7 @@ static void suspend_save_csrs(struct suspend_context *context)
 #endif
 }
 
-static void suspend_restore_csrs(struct suspend_context *context)
+void suspend_restore_csrs(struct suspend_context *context)
 {
 	csr_write(CSR_SCRATCH, context->scratch);
 	csr_write(CSR_TVEC, context->tvec);
-- 
2.34.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
  2023-01-27  9:10 ` Sia Jee Heng
@ 2023-01-27  9:10   ` Sia Jee Heng
  -1 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

The cpu_resume() function is very similar for the suspend to disk and
suspend to ram cases. Factor out the common code into restore_csr macro
and restore_reg macro.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
---
 arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
 arch/riscv/kernel/suspend_entry.S  | 34 ++--------------
 2 files changed, 65 insertions(+), 31 deletions(-)
 create mode 100644 arch/riscv/include/asm/assembler.h

diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
new file mode 100644
index 000000000000..ef1283d04b70
--- /dev/null
+++ b/arch/riscv/include/asm/assembler.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
+ */
+
+#ifndef __ASSEMBLY__
+#error "Only include this from assembly code"
+#endif
+
+#ifndef __ASM_ASSEMBLER_H
+#define __ASM_ASSEMBLER_H
+
+#include <asm/asm.h>
+#include <asm/csr.h>
+#include <asm/asm-offsets.h>
+
+/**
+ * restore_csr - restore hart's CSR value
+ */
+	.macro restore_csr
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
+		csrw	CSR_EPC, t0
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
+		csrw	CSR_STATUS, t0
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
+		csrw	CSR_TVAL, t0
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
+		csrw	CSR_CAUSE, t0
+	.endm
+
+/**
+ * restore_reg - Restore registers (except A0 and T0-T6)
+ */
+	.macro restore_reg
+		REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
+		REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
+		REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
+		REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
+		REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
+		REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
+		REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
+		REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
+		REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
+		REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
+		REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
+		REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
+		REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
+		REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
+		REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
+		REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
+		REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
+		REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
+		REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
+		REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
+		REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
+		REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
+		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
+	.endm
+
+#endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
index aafcca58c19d..74a8fab8e0f6 100644
--- a/arch/riscv/kernel/suspend_entry.S
+++ b/arch/riscv/kernel/suspend_entry.S
@@ -7,6 +7,7 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm/asm-offsets.h>
+#include <asm/assembler.h>
 #include <asm/csr.h>
 #include <asm/xip_fixup.h>
 
@@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
 	add	a0, a1, zero
 
 	/* Restore CSRs */
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
-	csrw	CSR_EPC, t0
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
-	csrw	CSR_STATUS, t0
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
-	csrw	CSR_TVAL, t0
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
-	csrw	CSR_CAUSE, t0
+	restore_csr
 
 	/* Restore registers (except A0 and T0-T6) */
-	REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
-	REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
-	REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
-	REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
-	REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
-	REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
-	REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
-	REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
-	REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
-	REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
-	REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
-	REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
-	REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
-	REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
-	REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
-	REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
-	REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
-	REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
-	REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
-	REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
-	REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
-	REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
-	REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
+	restore_reg
 
 	/* Return zero value */
 	add	a0, zero, zero
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
@ 2023-01-27  9:10   ` Sia Jee Heng
  0 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

The cpu_resume() function is very similar for the suspend to disk and
suspend to ram cases. Factor out the common code into restore_csr macro
and restore_reg macro.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
---
 arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
 arch/riscv/kernel/suspend_entry.S  | 34 ++--------------
 2 files changed, 65 insertions(+), 31 deletions(-)
 create mode 100644 arch/riscv/include/asm/assembler.h

diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
new file mode 100644
index 000000000000..ef1283d04b70
--- /dev/null
+++ b/arch/riscv/include/asm/assembler.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
+ */
+
+#ifndef __ASSEMBLY__
+#error "Only include this from assembly code"
+#endif
+
+#ifndef __ASM_ASSEMBLER_H
+#define __ASM_ASSEMBLER_H
+
+#include <asm/asm.h>
+#include <asm/csr.h>
+#include <asm/asm-offsets.h>
+
+/**
+ * restore_csr - restore hart's CSR value
+ */
+	.macro restore_csr
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
+		csrw	CSR_EPC, t0
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
+		csrw	CSR_STATUS, t0
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
+		csrw	CSR_TVAL, t0
+		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
+		csrw	CSR_CAUSE, t0
+	.endm
+
+/**
+ * restore_reg - Restore registers (except A0 and T0-T6)
+ */
+	.macro restore_reg
+		REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
+		REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
+		REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
+		REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
+		REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
+		REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
+		REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
+		REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
+		REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
+		REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
+		REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
+		REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
+		REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
+		REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
+		REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
+		REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
+		REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
+		REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
+		REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
+		REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
+		REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
+		REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
+		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
+	.endm
+
+#endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
index aafcca58c19d..74a8fab8e0f6 100644
--- a/arch/riscv/kernel/suspend_entry.S
+++ b/arch/riscv/kernel/suspend_entry.S
@@ -7,6 +7,7 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm/asm-offsets.h>
+#include <asm/assembler.h>
 #include <asm/csr.h>
 #include <asm/xip_fixup.h>
 
@@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
 	add	a0, a1, zero
 
 	/* Restore CSRs */
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
-	csrw	CSR_EPC, t0
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
-	csrw	CSR_STATUS, t0
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
-	csrw	CSR_TVAL, t0
-	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
-	csrw	CSR_CAUSE, t0
+	restore_csr
 
 	/* Restore registers (except A0 and T0-T6) */
-	REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
-	REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
-	REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
-	REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
-	REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
-	REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
-	REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
-	REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
-	REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
-	REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
-	REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
-	REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
-	REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
-	REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
-	REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
-	REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
-	REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
-	REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
-	REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
-	REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
-	REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
-	REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
-	REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
+	restore_reg
 
 	/* Return zero value */
 	add	a0, zero, zero
-- 
2.34.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
  2023-01-27  9:10 ` Sia Jee Heng
@ 2023-01-27  9:10   ` Sia Jee Heng
  -1 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

Currently kernel_page_present() function doesn't support huge page
detection causes the function to mistakenly return false to the
hibernation core.

Add huge page detection to the function to solve the problem.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
---
 arch/riscv/mm/pageattr.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 86c56616e5de..792b8d10cdfc 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
 	p4d = p4d_offset(pgd, addr);
 	if (!p4d_present(*p4d))
 		return false;
+	if (p4d_leaf(*p4d))
+		return true;
 
 	pud = pud_offset(p4d, addr);
 	if (!pud_present(*pud))
 		return false;
+	if (pud_leaf(*pud))
+		return true;
 
 	pmd = pmd_offset(pud, addr);
 	if (!pmd_present(*pmd))
 		return false;
+	if (pmd_leaf(*pmd))
+		return true;
 
 	pte = pte_offset_kernel(pmd, addr);
 	return pte_present(*pte);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
@ 2023-01-27  9:10   ` Sia Jee Heng
  0 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

Currently kernel_page_present() function doesn't support huge page
detection causes the function to mistakenly return false to the
hibernation core.

Add huge page detection to the function to solve the problem.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
---
 arch/riscv/mm/pageattr.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 86c56616e5de..792b8d10cdfc 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
 	p4d = p4d_offset(pgd, addr);
 	if (!p4d_present(*p4d))
 		return false;
+	if (p4d_leaf(*p4d))
+		return true;
 
 	pud = pud_offset(p4d, addr);
 	if (!pud_present(*pud))
 		return false;
+	if (pud_leaf(*pud))
+		return true;
 
 	pmd = pmd_offset(pud, addr);
 	if (!pmd_present(*pmd))
 		return false;
+	if (pmd_leaf(*pmd))
+		return true;
 
 	pte = pte_offset_kernel(pmd, addr);
 	return pte_present(*pte);
-- 
2.34.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-27  9:10 ` Sia Jee Heng
@ 2023-01-27  9:10   ` Sia Jee Heng
  -1 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start
to restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
---
 arch/riscv/Kconfig                 |   7 +
 arch/riscv/include/asm/assembler.h |  20 ++
 arch/riscv/include/asm/suspend.h   |  21 ++
 arch/riscv/kernel/Makefile         |   1 +
 arch/riscv/kernel/asm-offsets.c    |   5 +
 arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
 arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
 7 files changed, 503 insertions(+)
 create mode 100644 arch/riscv/kernel/hibernate-asm.S
 create mode 100644 arch/riscv/kernel/hibernate.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..4555848a817f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -690,6 +690,13 @@ menu "Power management options"
 
 source "kernel/power/Kconfig"
 
+config ARCH_HIBERNATION_POSSIBLE
+	def_bool y
+
+config ARCH_HIBERNATION_HEADER
+	def_bool y
+	depends on HIBERNATION
+
 endmenu # "Power management options"
 
 menu "CPU Power Management"
diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
index ef1283d04b70..3de70d3e6ceb 100644
--- a/arch/riscv/include/asm/assembler.h
+++ b/arch/riscv/include/asm/assembler.h
@@ -59,4 +59,24 @@
 		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
 	.endm
 
+/**
+ * copy_page - copy 1 page (4KB) of data from source to destination
+ * @a0 - destination
+ * @a1 - source
+ */
+	.macro	copy_page a0, a1
+		lui	a2, 0x1
+		add	a2, a2, a0
+.1 :
+		REG_L	t0, 0(a1)
+		REG_L	t1, SZREG(a1)
+
+		REG_S	t0, 0(a0)
+		REG_S	t1, SZREG(a0)
+
+		addi	a0, a0, 2 * SZREG
+		addi	a1, a1, 2 * SZREG
+		bne	a2, a0, .1
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 75419c5ca272..db40ae433aa9 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -21,6 +21,12 @@ struct suspend_context {
 #endif
 };
 
+/*
+ * This parameter will be assigned to 0 during resume and will be used by
+ * hibernation core for the subsequent resume sequence
+ */
+extern int in_suspend;
+
 /* Low-level CPU suspend entry function */
 int __cpu_suspend_enter(struct suspend_context *context);
 
@@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
 /* Used to save and restore the csr */
 void suspend_save_csrs(struct suspend_context *context);
 void suspend_restore_csrs(struct suspend_context *context);
+
+/* Low-level API to support hibernation */
+int swsusp_arch_suspend(void);
+int swsusp_arch_resume(void);
+int arch_hibernation_header_save(void *addr, unsigned int max_size);
+int arch_hibernation_header_restore(void *addr);
+int __hibernate_cpu_resume(void);
+
+/* Used to resume on the CPU we hibernated on */
+int hibernate_resume_nonboot_cpu_disable(void);
+
+/* Used to restore the hibernated image */
+asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
+				unsigned long cpu_resume);
+asmlinkage int core_restore_code(void);
 #endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 4cf303a779ab..daab341d55e4 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
 obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
 
 obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
+obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
 
 obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index df9444397908..d6a75aac1d27 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -9,6 +9,7 @@
 #include <linux/kbuild.h>
 #include <linux/mm.h>
 #include <linux/sched.h>
+#include <linux/suspend.h>
 #include <asm/kvm_host.h>
 #include <asm/thread_info.h>
 #include <asm/ptrace.h>
@@ -116,6 +117,10 @@ void asm_offsets(void)
 
 	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
 
+	OFFSET(HIBERN_PBE_ADDR, pbe, address);
+	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
+	OFFSET(HIBERN_PBE_NEXT, pbe, next);
+
 	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
 	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
 	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
new file mode 100644
index 000000000000..a83d534b89bd
--- /dev/null
+++ b/arch/riscv/kernel/hibernate-asm.S
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Hibernation support specific for RISCV
+ *
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
+ */
+
+#include <asm/asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
+#include <asm/csr.h>
+
+#include <linux/linkage.h>
+
+/*
+ * This code is executed when resume from the hibernation.
+ *
+ * It begins with loading the temporary page table then restores the memory image.
+ * Finally branches to __hibernate_cpu_resume() to restore the state saved by
+ * swsusp_arch_suspend().
+ */
+
+/*
+ * int __hibernate_cpu_resume(void)
+ * Switch back to the hibernated image's page table prior to restore the CPU
+ * context.
+ *
+ * Always returns 0 to the C code.
+ */
+ENTRY(__hibernate_cpu_resume)
+	/* switch to hibernated image's page table */
+	csrw CSR_SATP, s0
+	sfence.vma
+
+	REG_L	a0, hibernate_cpu_context
+
+	/* Restore CSRs */
+	restore_csr
+
+	/* Restore registers (except A0 and T0-T6) */
+	restore_reg
+
+	/* Return zero value */
+	add	a0, zero, zero
+
+	/* Return to C code */
+	ret
+END(__hibernate_cpu_resume)
+
+/*
+ * Prepare to restore the image.
+ * a0: satp of saved page tables
+ * a1: satp of temporary page tables
+ * a2: cpu_resume
+ */
+ENTRY(restore_image)
+	mv	s0, a0
+	mv	s1, a1
+	mv	s2, a2
+	REG_L	s4, restore_pblist
+	REG_L	a1, relocated_restore_code
+
+	jalr	a1
+END(restore_image)
+
+/*
+ * The below code will be executed from a 'safe' page.
+ * It first switches to the temporary page table, then start to copy the pages
+ * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
+ * to restore the CPU context.
+ */
+ENTRY(core_restore_code)
+	/* switch to temp page table */
+	csrw satp, s1
+	sfence.vma
+.Lcopy:
+	/* The below code will restore the hibernated image. */
+	REG_L	a1, HIBERN_PBE_ADDR(s4)
+	REG_L	a0, HIBERN_PBE_ORIG(s4)
+
+	copy_page a0, a1
+
+	REG_L	s4, HIBERN_PBE_NEXT(s4)
+	bnez	s4, .Lcopy
+
+	jalr	s2
+END(core_restore_code)
diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
new file mode 100644
index 000000000000..bf7f3c781820
--- /dev/null
+++ b/arch/riscv/kernel/hibernate.c
@@ -0,0 +1,360 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hibernation support specific for RISCV
+ *
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
+ */
+
+#include <asm/barrier.h>
+#include <asm/cacheflush.h>
+#include <asm/mmu_context.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+#include <asm/sections.h>
+#include <asm/set_memory.h>
+#include <asm/smp.h>
+#include <asm/suspend.h>
+
+#include <linux/cpu.h>
+#include <linux/memblock.h>
+#include <linux/pm.h>
+#include <linux/sched.h>
+#include <linux/suspend.h>
+#include <linux/utsname.h>
+
+/* The logical cpu number we should resume on, initialised to a non-cpu number */
+static int sleep_cpu = -EINVAL;
+
+/* CPU context to be saved */
+struct suspend_context *hibernate_cpu_context;
+
+unsigned long relocated_restore_code;
+
+/* Pointer to the temporary resume page table */
+pgd_t *resume_pg_dir;
+
+/**
+ * struct arch_hibernate_hdr_invariants - container to store kernel build version
+ * @uts_version: to save the build number and date so that the we are not resume with
+ *		a different kernel
+ */
+struct arch_hibernate_hdr_invariants {
+	char		uts_version[__NEW_UTS_LEN + 1];
+};
+
+/**
+ * struct arch_hibernate_hdr - helper parameters that help us to restore the image
+ * @invariants: container to store kernel build version
+ * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
+ * @saved_satp: original page table used by the hibernated image.
+ * @restore_cpu_addr: the kernel's image address to restore the CPU context.
+ */
+static struct arch_hibernate_hdr {
+	struct arch_hibernate_hdr_invariants invariants;
+	unsigned long	hartid;
+	unsigned long	saved_satp;
+	unsigned long	restore_cpu_addr;
+} resume_hdr;
+
+static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
+{
+	memset(i, 0, sizeof(*i));
+	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
+}
+
+/*
+ * Check if the given pfn is in the 'nosave' section.
+ */
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
+	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
+
+	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
+}
+
+void notrace save_processor_state(void)
+{
+	WARN_ON(num_online_cpus() != 1);
+}
+
+void notrace restore_processor_state(void)
+{
+}
+
+/*
+ * Helper parameters need to be saved to the hibernation image header.
+ */
+int arch_hibernation_header_save(void *addr, unsigned int max_size)
+{
+	struct arch_hibernate_hdr *hdr = addr;
+
+	if (max_size < sizeof(*hdr))
+		return -EOVERFLOW;
+
+	arch_hdr_invariants(&hdr->invariants);
+
+	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
+	hdr->saved_satp = csr_read(CSR_SATP);
+	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
+
+	return 0;
+}
+EXPORT_SYMBOL(arch_hibernation_header_save);
+
+/*
+ * Retrieve the helper parameters from the hibernation image header
+ */
+int arch_hibernation_header_restore(void *addr)
+{
+	struct arch_hibernate_hdr_invariants invariants;
+	struct arch_hibernate_hdr *hdr = addr;
+	int ret = 0;
+
+	arch_hdr_invariants(&invariants);
+
+	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
+		pr_crit("Hibernate image not generated by this kernel!\n");
+		return -EINVAL;
+	}
+
+	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
+	if (sleep_cpu < 0) {
+		pr_crit("Hibernated on a CPU not known to this kernel!\n");
+		sleep_cpu = -EINVAL;
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SMP
+	ret = bringup_hibernate_cpu(sleep_cpu);
+	if (ret) {
+		sleep_cpu = -EINVAL;
+		return ret;
+	}
+#endif
+	resume_hdr = *hdr;
+
+	return ret;
+}
+EXPORT_SYMBOL(arch_hibernation_header_restore);
+
+int swsusp_arch_suspend(void)
+{
+	int ret = 0;
+
+	if (__cpu_suspend_enter(hibernate_cpu_context)) {
+		sleep_cpu = smp_processor_id();
+		suspend_save_csrs(hibernate_cpu_context);
+		ret = swsusp_save();
+	} else {
+		suspend_restore_csrs(hibernate_cpu_context);
+		flush_tlb_all();
+
+		/* Invalidated Icache */
+		flush_icache_all();
+
+		/*
+		 * Tell the hibernation core that we've just restored
+		 * the memory
+		 */
+		in_suspend = 0;
+		sleep_cpu = -EINVAL;
+	}
+
+	return ret;
+}
+
+static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pte_idx = pte_index(vaddr);
+
+	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
+
+	return 0;
+}
+
+#ifndef __PAGETABLE_PMD_FOLDED
+#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
+		(pgtable_l5_enabled ?					\
+		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
+		(pgtable_l4_enabled ?					\
+		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
+		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
+
+static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pmd_idx = pmd_index(vaddr);
+	pte_t *ptep;
+
+	if (pmd_none(pmdp[pmd_idx])) {
+		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
+		if (!ptep)
+			return -ENOMEM;
+
+		memset(ptep, 0, PAGE_SIZE);
+		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
+	} else {
+		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
+	}
+
+	return temp_pgtable_map_pte(ptep, vaddr, prot);
+}
+
+static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pud_index = pud_index(vaddr);
+	pmd_t *pmdp;
+
+	if (pud_val(pudp[pud_index]) == 0) {
+		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
+		if (!pmdp)
+			return -ENOMEM;
+
+		memset(pmdp, 0, PAGE_SIZE);
+		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
+	} else {
+		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
+	}
+
+	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
+}
+
+static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t p4d_index = p4d_index(vaddr);
+	pud_t *pudp;
+
+	if (p4d_val(p4dp[p4d_index]) == 0) {
+		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
+		if (!pudp)
+			return -ENOMEM;
+
+		memset(pudp, 0, PAGE_SIZE);
+		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
+	} else {
+		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
+	}
+
+	return temp_pgtable_map_pud(pudp, vaddr, prot);
+}
+
+#else
+#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
+	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
+#endif /* __PAGETABLE_PMD_FOLDED */
+
+static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pgd_idx = pgd_index(vaddr);
+	void *nextp;
+
+	if (pgd_val(pgdp[pgd_idx]) == 0) {
+		nextp = (void *)get_safe_page(GFP_ATOMIC);
+		if (!nextp)
+			return -ENOMEM;
+
+		memset(nextp, 0, PAGE_SIZE);
+		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
+	} else {
+		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
+	}
+
+	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
+}
+
+static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
+{
+	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
+}
+
+static unsigned long relocate_restore_code(void)
+{
+	unsigned long ret;
+	void *page = (void *)get_safe_page(GFP_ATOMIC);
+
+	if (!page)
+		return -ENOMEM;
+
+	copy_page(page, core_restore_code);
+
+	/* Make the page containing the relocated code executable */
+	set_memory_x((unsigned long)page, 1);
+
+	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
+	if (ret)
+		return ret;
+
+	return (unsigned long)page;
+}
+
+int swsusp_arch_resume(void)
+{
+	unsigned long addr = PAGE_OFFSET;
+	unsigned long ret;
+
+	/*
+	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
+	 * we don't need to free it here.
+	 */
+	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
+	if (!resume_pg_dir)
+		return -ENOMEM;
+
+	/*
+	 * The pages need to be writable when restoring the image.
+	 * Create a second copy of page table just for the linear map, and use this when
+	 * restoring.
+	 */
+	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
+		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
+		if (ret)
+			return (int)ret;
+	}
+
+	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
+	relocated_restore_code = relocate_restore_code();
+	if (relocated_restore_code == -ENOMEM)
+		return -ENOMEM;
+
+	/*
+	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
+	 * restore code can jump to it after finished restore the image. The next execution
+	 * code doesn't find itself in a different address space after switching over to the
+	 * original page table used by the hibernated image.
+	 */
+	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
+					PAGE_KERNEL_READ_EXEC);
+	if (ret)
+		return ret;
+
+	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
+			resume_hdr.restore_cpu_addr);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP_SMP
+int hibernate_resume_nonboot_cpu_disable(void)
+{
+	if (sleep_cpu < 0) {
+		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
+		return -ENODEV;
+	}
+
+	return freeze_secondary_cpus(sleep_cpu);
+}
+#endif
+
+static int __init riscv_hibernate_init(void)
+{
+	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
+
+	if (WARN_ON(!hibernate_cpu_context))
+		return -ENOMEM;
+
+	return 0;
+}
+
+early_initcall(riscv_hibernate_init);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-01-27  9:10   ` Sia Jee Heng
  0 siblings, 0 replies; 52+ messages in thread
From: Sia Jee Heng @ 2023-01-27  9:10 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, jeeheng.sia, leyfoon.tan, mason.huo

Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start
to restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
---
 arch/riscv/Kconfig                 |   7 +
 arch/riscv/include/asm/assembler.h |  20 ++
 arch/riscv/include/asm/suspend.h   |  21 ++
 arch/riscv/kernel/Makefile         |   1 +
 arch/riscv/kernel/asm-offsets.c    |   5 +
 arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
 arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
 7 files changed, 503 insertions(+)
 create mode 100644 arch/riscv/kernel/hibernate-asm.S
 create mode 100644 arch/riscv/kernel/hibernate.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..4555848a817f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -690,6 +690,13 @@ menu "Power management options"
 
 source "kernel/power/Kconfig"
 
+config ARCH_HIBERNATION_POSSIBLE
+	def_bool y
+
+config ARCH_HIBERNATION_HEADER
+	def_bool y
+	depends on HIBERNATION
+
 endmenu # "Power management options"
 
 menu "CPU Power Management"
diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
index ef1283d04b70..3de70d3e6ceb 100644
--- a/arch/riscv/include/asm/assembler.h
+++ b/arch/riscv/include/asm/assembler.h
@@ -59,4 +59,24 @@
 		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
 	.endm
 
+/**
+ * copy_page - copy 1 page (4KB) of data from source to destination
+ * @a0 - destination
+ * @a1 - source
+ */
+	.macro	copy_page a0, a1
+		lui	a2, 0x1
+		add	a2, a2, a0
+.1 :
+		REG_L	t0, 0(a1)
+		REG_L	t1, SZREG(a1)
+
+		REG_S	t0, 0(a0)
+		REG_S	t1, SZREG(a0)
+
+		addi	a0, a0, 2 * SZREG
+		addi	a1, a1, 2 * SZREG
+		bne	a2, a0, .1
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 75419c5ca272..db40ae433aa9 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -21,6 +21,12 @@ struct suspend_context {
 #endif
 };
 
+/*
+ * This parameter will be assigned to 0 during resume and will be used by
+ * hibernation core for the subsequent resume sequence
+ */
+extern int in_suspend;
+
 /* Low-level CPU suspend entry function */
 int __cpu_suspend_enter(struct suspend_context *context);
 
@@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
 /* Used to save and restore the csr */
 void suspend_save_csrs(struct suspend_context *context);
 void suspend_restore_csrs(struct suspend_context *context);
+
+/* Low-level API to support hibernation */
+int swsusp_arch_suspend(void);
+int swsusp_arch_resume(void);
+int arch_hibernation_header_save(void *addr, unsigned int max_size);
+int arch_hibernation_header_restore(void *addr);
+int __hibernate_cpu_resume(void);
+
+/* Used to resume on the CPU we hibernated on */
+int hibernate_resume_nonboot_cpu_disable(void);
+
+/* Used to restore the hibernated image */
+asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
+				unsigned long cpu_resume);
+asmlinkage int core_restore_code(void);
 #endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 4cf303a779ab..daab341d55e4 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
 obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
 
 obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
+obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
 
 obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index df9444397908..d6a75aac1d27 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -9,6 +9,7 @@
 #include <linux/kbuild.h>
 #include <linux/mm.h>
 #include <linux/sched.h>
+#include <linux/suspend.h>
 #include <asm/kvm_host.h>
 #include <asm/thread_info.h>
 #include <asm/ptrace.h>
@@ -116,6 +117,10 @@ void asm_offsets(void)
 
 	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
 
+	OFFSET(HIBERN_PBE_ADDR, pbe, address);
+	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
+	OFFSET(HIBERN_PBE_NEXT, pbe, next);
+
 	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
 	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
 	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
new file mode 100644
index 000000000000..a83d534b89bd
--- /dev/null
+++ b/arch/riscv/kernel/hibernate-asm.S
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Hibernation support specific for RISCV
+ *
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
+ */
+
+#include <asm/asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
+#include <asm/csr.h>
+
+#include <linux/linkage.h>
+
+/*
+ * This code is executed when resume from the hibernation.
+ *
+ * It begins with loading the temporary page table then restores the memory image.
+ * Finally branches to __hibernate_cpu_resume() to restore the state saved by
+ * swsusp_arch_suspend().
+ */
+
+/*
+ * int __hibernate_cpu_resume(void)
+ * Switch back to the hibernated image's page table prior to restore the CPU
+ * context.
+ *
+ * Always returns 0 to the C code.
+ */
+ENTRY(__hibernate_cpu_resume)
+	/* switch to hibernated image's page table */
+	csrw CSR_SATP, s0
+	sfence.vma
+
+	REG_L	a0, hibernate_cpu_context
+
+	/* Restore CSRs */
+	restore_csr
+
+	/* Restore registers (except A0 and T0-T6) */
+	restore_reg
+
+	/* Return zero value */
+	add	a0, zero, zero
+
+	/* Return to C code */
+	ret
+END(__hibernate_cpu_resume)
+
+/*
+ * Prepare to restore the image.
+ * a0: satp of saved page tables
+ * a1: satp of temporary page tables
+ * a2: cpu_resume
+ */
+ENTRY(restore_image)
+	mv	s0, a0
+	mv	s1, a1
+	mv	s2, a2
+	REG_L	s4, restore_pblist
+	REG_L	a1, relocated_restore_code
+
+	jalr	a1
+END(restore_image)
+
+/*
+ * The below code will be executed from a 'safe' page.
+ * It first switches to the temporary page table, then start to copy the pages
+ * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
+ * to restore the CPU context.
+ */
+ENTRY(core_restore_code)
+	/* switch to temp page table */
+	csrw satp, s1
+	sfence.vma
+.Lcopy:
+	/* The below code will restore the hibernated image. */
+	REG_L	a1, HIBERN_PBE_ADDR(s4)
+	REG_L	a0, HIBERN_PBE_ORIG(s4)
+
+	copy_page a0, a1
+
+	REG_L	s4, HIBERN_PBE_NEXT(s4)
+	bnez	s4, .Lcopy
+
+	jalr	s2
+END(core_restore_code)
diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
new file mode 100644
index 000000000000..bf7f3c781820
--- /dev/null
+++ b/arch/riscv/kernel/hibernate.c
@@ -0,0 +1,360 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hibernation support specific for RISCV
+ *
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
+ */
+
+#include <asm/barrier.h>
+#include <asm/cacheflush.h>
+#include <asm/mmu_context.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+#include <asm/sections.h>
+#include <asm/set_memory.h>
+#include <asm/smp.h>
+#include <asm/suspend.h>
+
+#include <linux/cpu.h>
+#include <linux/memblock.h>
+#include <linux/pm.h>
+#include <linux/sched.h>
+#include <linux/suspend.h>
+#include <linux/utsname.h>
+
+/* The logical cpu number we should resume on, initialised to a non-cpu number */
+static int sleep_cpu = -EINVAL;
+
+/* CPU context to be saved */
+struct suspend_context *hibernate_cpu_context;
+
+unsigned long relocated_restore_code;
+
+/* Pointer to the temporary resume page table */
+pgd_t *resume_pg_dir;
+
+/**
+ * struct arch_hibernate_hdr_invariants - container to store kernel build version
+ * @uts_version: to save the build number and date so that the we are not resume with
+ *		a different kernel
+ */
+struct arch_hibernate_hdr_invariants {
+	char		uts_version[__NEW_UTS_LEN + 1];
+};
+
+/**
+ * struct arch_hibernate_hdr - helper parameters that help us to restore the image
+ * @invariants: container to store kernel build version
+ * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
+ * @saved_satp: original page table used by the hibernated image.
+ * @restore_cpu_addr: the kernel's image address to restore the CPU context.
+ */
+static struct arch_hibernate_hdr {
+	struct arch_hibernate_hdr_invariants invariants;
+	unsigned long	hartid;
+	unsigned long	saved_satp;
+	unsigned long	restore_cpu_addr;
+} resume_hdr;
+
+static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
+{
+	memset(i, 0, sizeof(*i));
+	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
+}
+
+/*
+ * Check if the given pfn is in the 'nosave' section.
+ */
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
+	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
+
+	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
+}
+
+void notrace save_processor_state(void)
+{
+	WARN_ON(num_online_cpus() != 1);
+}
+
+void notrace restore_processor_state(void)
+{
+}
+
+/*
+ * Helper parameters need to be saved to the hibernation image header.
+ */
+int arch_hibernation_header_save(void *addr, unsigned int max_size)
+{
+	struct arch_hibernate_hdr *hdr = addr;
+
+	if (max_size < sizeof(*hdr))
+		return -EOVERFLOW;
+
+	arch_hdr_invariants(&hdr->invariants);
+
+	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
+	hdr->saved_satp = csr_read(CSR_SATP);
+	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
+
+	return 0;
+}
+EXPORT_SYMBOL(arch_hibernation_header_save);
+
+/*
+ * Retrieve the helper parameters from the hibernation image header
+ */
+int arch_hibernation_header_restore(void *addr)
+{
+	struct arch_hibernate_hdr_invariants invariants;
+	struct arch_hibernate_hdr *hdr = addr;
+	int ret = 0;
+
+	arch_hdr_invariants(&invariants);
+
+	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
+		pr_crit("Hibernate image not generated by this kernel!\n");
+		return -EINVAL;
+	}
+
+	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
+	if (sleep_cpu < 0) {
+		pr_crit("Hibernated on a CPU not known to this kernel!\n");
+		sleep_cpu = -EINVAL;
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SMP
+	ret = bringup_hibernate_cpu(sleep_cpu);
+	if (ret) {
+		sleep_cpu = -EINVAL;
+		return ret;
+	}
+#endif
+	resume_hdr = *hdr;
+
+	return ret;
+}
+EXPORT_SYMBOL(arch_hibernation_header_restore);
+
+int swsusp_arch_suspend(void)
+{
+	int ret = 0;
+
+	if (__cpu_suspend_enter(hibernate_cpu_context)) {
+		sleep_cpu = smp_processor_id();
+		suspend_save_csrs(hibernate_cpu_context);
+		ret = swsusp_save();
+	} else {
+		suspend_restore_csrs(hibernate_cpu_context);
+		flush_tlb_all();
+
+		/* Invalidated Icache */
+		flush_icache_all();
+
+		/*
+		 * Tell the hibernation core that we've just restored
+		 * the memory
+		 */
+		in_suspend = 0;
+		sleep_cpu = -EINVAL;
+	}
+
+	return ret;
+}
+
+static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pte_idx = pte_index(vaddr);
+
+	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
+
+	return 0;
+}
+
+#ifndef __PAGETABLE_PMD_FOLDED
+#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
+		(pgtable_l5_enabled ?					\
+		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
+		(pgtable_l4_enabled ?					\
+		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
+		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
+
+static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pmd_idx = pmd_index(vaddr);
+	pte_t *ptep;
+
+	if (pmd_none(pmdp[pmd_idx])) {
+		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
+		if (!ptep)
+			return -ENOMEM;
+
+		memset(ptep, 0, PAGE_SIZE);
+		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
+	} else {
+		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
+	}
+
+	return temp_pgtable_map_pte(ptep, vaddr, prot);
+}
+
+static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pud_index = pud_index(vaddr);
+	pmd_t *pmdp;
+
+	if (pud_val(pudp[pud_index]) == 0) {
+		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
+		if (!pmdp)
+			return -ENOMEM;
+
+		memset(pmdp, 0, PAGE_SIZE);
+		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
+	} else {
+		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
+	}
+
+	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
+}
+
+static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t p4d_index = p4d_index(vaddr);
+	pud_t *pudp;
+
+	if (p4d_val(p4dp[p4d_index]) == 0) {
+		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
+		if (!pudp)
+			return -ENOMEM;
+
+		memset(pudp, 0, PAGE_SIZE);
+		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
+	} else {
+		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
+	}
+
+	return temp_pgtable_map_pud(pudp, vaddr, prot);
+}
+
+#else
+#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
+	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
+#endif /* __PAGETABLE_PMD_FOLDED */
+
+static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
+{
+	uintptr_t pgd_idx = pgd_index(vaddr);
+	void *nextp;
+
+	if (pgd_val(pgdp[pgd_idx]) == 0) {
+		nextp = (void *)get_safe_page(GFP_ATOMIC);
+		if (!nextp)
+			return -ENOMEM;
+
+		memset(nextp, 0, PAGE_SIZE);
+		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
+	} else {
+		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
+	}
+
+	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
+}
+
+static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
+{
+	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
+}
+
+static unsigned long relocate_restore_code(void)
+{
+	unsigned long ret;
+	void *page = (void *)get_safe_page(GFP_ATOMIC);
+
+	if (!page)
+		return -ENOMEM;
+
+	copy_page(page, core_restore_code);
+
+	/* Make the page containing the relocated code executable */
+	set_memory_x((unsigned long)page, 1);
+
+	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
+	if (ret)
+		return ret;
+
+	return (unsigned long)page;
+}
+
+int swsusp_arch_resume(void)
+{
+	unsigned long addr = PAGE_OFFSET;
+	unsigned long ret;
+
+	/*
+	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
+	 * we don't need to free it here.
+	 */
+	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
+	if (!resume_pg_dir)
+		return -ENOMEM;
+
+	/*
+	 * The pages need to be writable when restoring the image.
+	 * Create a second copy of page table just for the linear map, and use this when
+	 * restoring.
+	 */
+	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
+		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
+		if (ret)
+			return (int)ret;
+	}
+
+	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
+	relocated_restore_code = relocate_restore_code();
+	if (relocated_restore_code == -ENOMEM)
+		return -ENOMEM;
+
+	/*
+	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
+	 * restore code can jump to it after finished restore the image. The next execution
+	 * code doesn't find itself in a different address space after switching over to the
+	 * original page table used by the hibernated image.
+	 */
+	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
+					PAGE_KERNEL_READ_EXEC);
+	if (ret)
+		return ret;
+
+	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
+			resume_hdr.restore_cpu_addr);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP_SMP
+int hibernate_resume_nonboot_cpu_disable(void)
+{
+	if (sleep_cpu < 0) {
+		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
+		return -ENODEV;
+	}
+
+	return freeze_secondary_cpus(sleep_cpu);
+}
+#endif
+
+static int __init riscv_hibernate_init(void)
+{
+	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
+
+	if (WARN_ON(!hibernate_cpu_context))
+		return -ENOMEM;
+
+	return 0;
+}
+
+early_initcall(riscv_hibernate_init);
-- 
2.34.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
  2023-01-27  9:10   ` Sia Jee Heng
@ 2023-01-30 21:49     ` Conor Dooley
  -1 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 21:49 UTC (permalink / raw)
  To: Sia Jee Heng
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo

[-- Attachment #1: Type: text/plain, Size: 5283 bytes --]

On Fri, Jan 27, 2023 at 05:10:49PM +0800, Sia Jee Heng wrote:
> The cpu_resume() function is very similar for the suspend to disk and
> suspend to ram cases. Factor out the common code into restore_csr macro
> and restore_reg macro.
> 
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> ---
>  arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
>  arch/riscv/kernel/suspend_entry.S  | 34 ++--------------
>  2 files changed, 65 insertions(+), 31 deletions(-)
>  create mode 100644 arch/riscv/include/asm/assembler.h
> 
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> new file mode 100644
> index 000000000000..ef1283d04b70
> --- /dev/null
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#ifndef __ASSEMBLY__
> +#error "Only include this from assembly code"
> +#endif
> +
> +#ifndef __ASM_ASSEMBLER_H
> +#define __ASM_ASSEMBLER_H
> +
> +#include <asm/asm.h>
> +#include <asm/csr.h>
> +#include <asm/asm-offsets.h>
> +
> +/**
> + * restore_csr - restore hart's CSR value
> + */
> +	.macro restore_csr
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> +		csrw	CSR_EPC, t0
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> +		csrw	CSR_STATUS, t0
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> +		csrw	CSR_TVAL, t0
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> +		csrw	CSR_CAUSE, t0
> +	.endm
> +
> +/**
> + * restore_reg - Restore registers (except A0 and T0-T6)

arch/riscv/include/asm/assembler.h:34: warning: Incorrect use of kernel-doc format:  * restore_reg - Restore registers (except A0 and T0-T6)

Otherwise, LGTM.

> + */
> +	.macro restore_reg
> +		REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> +		REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> +		REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> +		REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> +		REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> +		REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> +		REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> +		REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> +		REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> +		REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> +		REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> +		REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> +		REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> +		REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> +		REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> +		REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> +		REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> +		REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> +		REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> +		REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> +		REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> +		REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> +		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> +	.endm
> +
> +#endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> index aafcca58c19d..74a8fab8e0f6 100644
> --- a/arch/riscv/kernel/suspend_entry.S
> +++ b/arch/riscv/kernel/suspend_entry.S
> @@ -7,6 +7,7 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
>  #include <asm/csr.h>
>  #include <asm/xip_fixup.h>
>  
> @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
>  	add	a0, a1, zero
>  
>  	/* Restore CSRs */
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> -	csrw	CSR_EPC, t0
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> -	csrw	CSR_STATUS, t0
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> -	csrw	CSR_TVAL, t0
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> -	csrw	CSR_CAUSE, t0
> +	restore_csr
>  
>  	/* Restore registers (except A0 and T0-T6) */
> -	REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> -	REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> -	REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> -	REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> -	REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> -	REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> -	REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> -	REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> -	REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> -	REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> -	REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> -	REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> -	REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> -	REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> -	REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> -	REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> -	REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> -	REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> -	REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> -	REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> -	REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> -	REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> -	REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> +	restore_reg
>  
>  	/* Return zero value */
>  	add	a0, zero, zero
> -- 
> 2.34.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
@ 2023-01-30 21:49     ` Conor Dooley
  0 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 21:49 UTC (permalink / raw)
  To: Sia Jee Heng
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo


[-- Attachment #1.1: Type: text/plain, Size: 5283 bytes --]

On Fri, Jan 27, 2023 at 05:10:49PM +0800, Sia Jee Heng wrote:
> The cpu_resume() function is very similar for the suspend to disk and
> suspend to ram cases. Factor out the common code into restore_csr macro
> and restore_reg macro.
> 
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> ---
>  arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
>  arch/riscv/kernel/suspend_entry.S  | 34 ++--------------
>  2 files changed, 65 insertions(+), 31 deletions(-)
>  create mode 100644 arch/riscv/include/asm/assembler.h
> 
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> new file mode 100644
> index 000000000000..ef1283d04b70
> --- /dev/null
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#ifndef __ASSEMBLY__
> +#error "Only include this from assembly code"
> +#endif
> +
> +#ifndef __ASM_ASSEMBLER_H
> +#define __ASM_ASSEMBLER_H
> +
> +#include <asm/asm.h>
> +#include <asm/csr.h>
> +#include <asm/asm-offsets.h>
> +
> +/**
> + * restore_csr - restore hart's CSR value
> + */
> +	.macro restore_csr
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> +		csrw	CSR_EPC, t0
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> +		csrw	CSR_STATUS, t0
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> +		csrw	CSR_TVAL, t0
> +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> +		csrw	CSR_CAUSE, t0
> +	.endm
> +
> +/**
> + * restore_reg - Restore registers (except A0 and T0-T6)

arch/riscv/include/asm/assembler.h:34: warning: Incorrect use of kernel-doc format:  * restore_reg - Restore registers (except A0 and T0-T6)

Otherwise, LGTM.

> + */
> +	.macro restore_reg
> +		REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> +		REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> +		REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> +		REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> +		REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> +		REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> +		REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> +		REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> +		REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> +		REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> +		REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> +		REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> +		REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> +		REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> +		REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> +		REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> +		REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> +		REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> +		REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> +		REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> +		REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> +		REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> +		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> +	.endm
> +
> +#endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> index aafcca58c19d..74a8fab8e0f6 100644
> --- a/arch/riscv/kernel/suspend_entry.S
> +++ b/arch/riscv/kernel/suspend_entry.S
> @@ -7,6 +7,7 @@
>  #include <linux/linkage.h>
>  #include <asm/asm.h>
>  #include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
>  #include <asm/csr.h>
>  #include <asm/xip_fixup.h>
>  
> @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
>  	add	a0, a1, zero
>  
>  	/* Restore CSRs */
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> -	csrw	CSR_EPC, t0
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> -	csrw	CSR_STATUS, t0
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> -	csrw	CSR_TVAL, t0
> -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> -	csrw	CSR_CAUSE, t0
> +	restore_csr
>  
>  	/* Restore registers (except A0 and T0-T6) */
> -	REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> -	REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> -	REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> -	REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> -	REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> -	REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> -	REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> -	REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> -	REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> -	REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> -	REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> -	REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> -	REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> -	REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> -	REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> -	REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> -	REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> -	REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> -	REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> -	REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> -	REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> -	REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> -	REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> +	restore_reg
>  
>  	/* Return zero value */
>  	add	a0, zero, zero
> -- 
> 2.34.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
  2023-01-27  9:10   ` Sia Jee Heng
@ 2023-01-30 21:57     ` Conor Dooley
  -1 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 21:57 UTC (permalink / raw)
  To: Sia Jee Heng, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]

+CC Alex

On Fri, Jan 27, 2023 at 05:10:50PM +0800, Sia Jee Heng wrote:
> Currently kernel_page_present() function doesn't support huge page
> detection causes the function to mistakenly return false to the
> hibernation core.

This sounds like a bug & should have a fixes tag, no? I assume for
whatever commit enabled huge page support...
We don't support set_memory, which by the looks of things is the other
usecase for this function, so probably doesn't need backporting.

Alex, does this change look good to you?

> Add huge page detection to the function to solve the problem.
> 
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> ---
>  arch/riscv/mm/pageattr.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> index 86c56616e5de..792b8d10cdfc 100644
> --- a/arch/riscv/mm/pageattr.c
> +++ b/arch/riscv/mm/pageattr.c
> @@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
>  	p4d = p4d_offset(pgd, addr);
>  	if (!p4d_present(*p4d))
>  		return false;
> +	if (p4d_leaf(*p4d))
> +		return true;
>  
>  	pud = pud_offset(p4d, addr);
>  	if (!pud_present(*pud))
>  		return false;
> +	if (pud_leaf(*pud))
> +		return true;
>  
>  	pmd = pmd_offset(pud, addr);
>  	if (!pmd_present(*pmd))
>  		return false;
> +	if (pmd_leaf(*pmd))
> +		return true;
>  
>  	pte = pte_offset_kernel(pmd, addr);
>  	return pte_present(*pte);
> -- 
> 2.34.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
@ 2023-01-30 21:57     ` Conor Dooley
  0 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 21:57 UTC (permalink / raw)
  To: Sia Jee Heng, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo


[-- Attachment #1.1: Type: text/plain, Size: 1603 bytes --]

+CC Alex

On Fri, Jan 27, 2023 at 05:10:50PM +0800, Sia Jee Heng wrote:
> Currently kernel_page_present() function doesn't support huge page
> detection causes the function to mistakenly return false to the
> hibernation core.

This sounds like a bug & should have a fixes tag, no? I assume for
whatever commit enabled huge page support...
We don't support set_memory, which by the looks of things is the other
usecase for this function, so probably doesn't need backporting.

Alex, does this change look good to you?

> Add huge page detection to the function to solve the problem.
> 
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> ---
>  arch/riscv/mm/pageattr.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> index 86c56616e5de..792b8d10cdfc 100644
> --- a/arch/riscv/mm/pageattr.c
> +++ b/arch/riscv/mm/pageattr.c
> @@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
>  	p4d = p4d_offset(pgd, addr);
>  	if (!p4d_present(*p4d))
>  		return false;
> +	if (p4d_leaf(*p4d))
> +		return true;
>  
>  	pud = pud_offset(p4d, addr);
>  	if (!pud_present(*pud))
>  		return false;
> +	if (pud_leaf(*pud))
> +		return true;
>  
>  	pmd = pmd_offset(pud, addr);
>  	if (!pmd_present(*pmd))
>  		return false;
> +	if (pmd_leaf(*pmd))
> +		return true;
>  
>  	pte = pte_offset_kernel(pmd, addr);
>  	return pte_present(*pte);
> -- 
> 2.34.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-27  9:10   ` Sia Jee Heng
@ 2023-01-30 23:30     ` Conor Dooley
  -1 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 23:30 UTC (permalink / raw)
  To: Sia Jee Heng, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo

[-- Attachment #1: Type: text/plain, Size: 17073 bytes --]

+CC Alex

Alex, could you take a look at the page table bits here when you get a
chance please?

On Fri, Jan 27, 2023 at 05:10:51PM +0800, Sia Jee Heng wrote:
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
> 
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
> 
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start
> to restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
> 
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> 
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> ---
>  arch/riscv/Kconfig                 |   7 +
>  arch/riscv/include/asm/assembler.h |  20 ++
>  arch/riscv/include/asm/suspend.h   |  21 ++
>  arch/riscv/kernel/Makefile         |   1 +
>  arch/riscv/kernel/asm-offsets.c    |   5 +
>  arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>  arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>  7 files changed, 503 insertions(+)
>  create mode 100644 arch/riscv/kernel/hibernate-asm.S
>  create mode 100644 arch/riscv/kernel/hibernate.c
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..4555848a817f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,6 +690,13 @@ menu "Power management options"
>  
>  source "kernel/power/Kconfig"
>  
> +config ARCH_HIBERNATION_POSSIBLE
> +	def_bool y
> +
> +config ARCH_HIBERNATION_HEADER
> +	def_bool y
> +	depends on HIBERNATION
> +
>  endmenu # "Power management options"
>  
>  menu "CPU Power Management"
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> index ef1283d04b70..3de70d3e6ceb 100644
> --- a/arch/riscv/include/asm/assembler.h
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -59,4 +59,24 @@
>  		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>  	.endm
>  
> +/**
> + * copy_page - copy 1 page (4KB) of data from source to destination

arch/riscv/include/asm/assembler.h:64: warning: Incorrect use of kernel-doc format:  * copy_page - copy 1 page (4KB) of data from source to destination

> + * @a0 - destination
> + * @a1 - source
> + */
> +	.macro	copy_page a0, a1
> +		lui	a2, 0x1
> +		add	a2, a2, a0
> +.1 :
> +		REG_L	t0, 0(a1)
> +		REG_L	t1, SZREG(a1)
> +
> +		REG_S	t0, 0(a0)
> +		REG_S	t1, SZREG(a0)
> +
> +		addi	a0, a0, 2 * SZREG
> +		addi	a1, a1, 2 * SZREG
> +		bne	a2, a0, .1

allmodconfig, clang 15.0.4:

<instantiation>:3:1: error: unexpected token at start of statement
.1 :
^
/stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
 copy_page a0, a1
 ^
<instantiation>:12:15: error: unknown operand
  bne a2, a0, .1
              ^
/stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
 copy_page a0, a1
 ^
make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1

> +	.endm
> +
>  #endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 75419c5ca272..db40ae433aa9 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -21,6 +21,12 @@ struct suspend_context {
>  #endif
>  };
>  
> +/*
> + * This parameter will be assigned to 0 during resume and will be used by
> + * hibernation core for the subsequent resume sequence

This isn't a parameter! I'm not sure that the comment really adds
anything to be honest, but "Used by the hibernation core and cleared
during the resume sequence" probably gets the point across equally well.

> + */
> +extern int in_suspend;
> +
>  /* Low-level CPU suspend entry function */
>  int __cpu_suspend_enter(struct suspend_context *context);
>  
> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>  /* Used to save and restore the csr */
>  void suspend_save_csrs(struct suspend_context *context);
>  void suspend_restore_csrs(struct suspend_context *context);
> +
> +/* Low-level API to support hibernation */
> +int swsusp_arch_suspend(void);
> +int swsusp_arch_resume(void);
> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> +int arch_hibernation_header_restore(void *addr);
> +int __hibernate_cpu_resume(void);
> +
> +/* Used to resume on the CPU we hibernated on */
> +int hibernate_resume_nonboot_cpu_disable(void);
> +
> +/* Used to restore the hibernated image */

I think this comment is kinda stating the obvious, no?

> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> +				unsigned long cpu_resume);
> +asmlinkage int core_restore_code(void);

How about dropping the comment and prepending hiberate_ to this function
names?

>  #endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 4cf303a779ab..daab341d55e4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>  obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>  
>  obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>  
>  obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>  obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..d6a75aac1d27 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -9,6 +9,7 @@
>  #include <linux/kbuild.h>
>  #include <linux/mm.h>
>  #include <linux/sched.h>
> +#include <linux/suspend.h>
>  #include <asm/kvm_host.h>
>  #include <asm/thread_info.h>
>  #include <asm/ptrace.h>
> @@ -116,6 +117,10 @@ void asm_offsets(void)
>  
>  	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>  
> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> +
>  	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>  	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>  	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> new file mode 100644
> index 000000000000..a83d534b89bd
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate-asm.S
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Hibernation support specific for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/csr.h>
> +
> +#include <linux/linkage.h>
> +
> +/*
> + * This code is executed when resume from the hibernation.
> + *
> + * It begins with loading the temporary page table then restores the memory image.
> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> + * swsusp_arch_suspend().
> + */

This file looks to be confusingly ordered. You lead with a comment
describing a sequence but the file doesn't follow it.
I suggest removing this comment.

> +
> +/*
> + * int __hibernate_cpu_resume(void)
> + * Switch back to the hibernated image's page table prior to restore the CPU

nit: s/restore/restoring

> + * context.
> + *
> + * Always returns 0 to the C code.

s/to the C code//

> + */
> +ENTRY(__hibernate_cpu_resume)
> +	/* switch to hibernated image's page table */
> +	csrw CSR_SATP, s0
> +	sfence.vma
> +
> +	REG_L	a0, hibernate_cpu_context
> +
> +	/* Restore CSRs */

Stating the obvious again here, no?

> +	restore_csr
> +
> +	/* Restore registers (except A0 and T0-T6) */

Do we need to mention the (except A0 & T0-T6) here and elsewhere?
If they're lost across calls anyway, is it worth mentioning that they're
lost across hibernation?

> +	restore_reg
> +
> +	/* Return zero value */
> +	add	a0, zero, zero
> +
> +	/* Return to C code */

I'd drop this comment. I don't think the presumed caller of the function
needs to be mentioned here.

> +	ret
> +END(__hibernate_cpu_resume)
> +
> +/*
> + * Prepare to restore the image.
> + * a0: satp of saved page tables
> + * a1: satp of temporary page tables
> + * a2: cpu_resume
> + */
> +ENTRY(restore_image)
> +	mv	s0, a0
> +	mv	s1, a1
> +	mv	s2, a2
> +	REG_L	s4, restore_pblist
> +	REG_L	a1, relocated_restore_code
> +
> +	jalr	a1
> +END(restore_image)
> +
> +/*
> + * The below code will be executed from a 'safe' page.
> + * It first switches to the temporary page table, then start to copy the pages

nit: s/start/starts/

> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()

nit: s/jumps to the/jumps to/

> + * to restore the CPU context.
> + */
> +ENTRY(core_restore_code)
> +	/* switch to temp page table */
> +	csrw satp, s1
> +	sfence.vma
> +.Lcopy:
> +	/* The below code will restore the hibernated image. */

I think this should be moved to the top of the pre-function comment.

> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> +
> +	copy_page a0, a1
> +
> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> +	bnez	s4, .Lcopy
> +
> +	jalr	s2
> +END(core_restore_code)
> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> new file mode 100644
> index 000000000000..bf7f3c781820
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate.c
> @@ -0,0 +1,360 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hibernation support specific for RISCV

Well, it'd be odd if it was for another arch but sitting in arch/riscv!
;)

Thanks for your patches though, it'll be great to have hibernation
support going.

> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/barrier.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
> +#include <asm/set_memory.h>
> +#include <asm/smp.h>
> +#include <asm/suspend.h>
> +
> +#include <linux/cpu.h>
> +#include <linux/memblock.h>
> +#include <linux/pm.h>
> +#include <linux/sched.h>
> +#include <linux/suspend.h>
> +#include <linux/utsname.h>
> +
> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> +static int sleep_cpu = -EINVAL;
> +
> +/* CPU context to be saved */
> +struct suspend_context *hibernate_cpu_context;
> +
> +unsigned long relocated_restore_code;
> +
> +/* Pointer to the temporary resume page table */
> +pgd_t *resume_pg_dir;

sparse doesn't like what you've done here:
/stuff/linux/arch/riscv/kernel/hibernate.c:31:24: warning: symbol 'hibernate_cpu_context' was not declared. Should it be static?
/stuff/linux/arch/riscv/kernel/hibernate.c:33:15: warning: symbol 'relocated_restore_code' was not declared. Should it be static?
/stuff/linux/arch/riscv/kernel/hibernate.c:36:7: warning: symbol 'resume_pg_dir' was not declared. Should it be static?
> +
> +/**
> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> + * @uts_version: to save the build number and date so that the we are not resume with

nit: "so that we do not resume"

> + *		a different kernel
> + */
> +struct arch_hibernate_hdr_invariants {
> +	char		uts_version[__NEW_UTS_LEN + 1];
> +};
> +
> +/**
> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> + * @invariants: container to store kernel build version
> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.

nit: s/executing/executes

Also, my OCD is triggered by the inconsistent full stops at EOL.

> + * @saved_satp: original page table used by the hibernated image.
> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> + */
> +static struct arch_hibernate_hdr {
> +	struct arch_hibernate_hdr_invariants invariants;
> +	unsigned long	hartid;
> +	unsigned long	saved_satp;
> +	unsigned long	restore_cpu_addr;
> +} resume_hdr;
> +
> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> +{
> +	memset(i, 0, sizeof(*i));
> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> +}
> +
> +/*
> + * Check if the given pfn is in the 'nosave' section.
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> +
> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> +}
> +
> +void notrace save_processor_state(void)
> +{
> +	WARN_ON(num_online_cpus() != 1);
> +}
> +
> +void notrace restore_processor_state(void)
> +{
> +}
> +
> +/*
> + * Helper parameters need to be saved to the hibernation image header.
> + */
> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> +{
> +	struct arch_hibernate_hdr *hdr = addr;
> +
> +	if (max_size < sizeof(*hdr))
> +		return -EOVERFLOW;
> +
> +	arch_hdr_invariants(&hdr->invariants);
> +
> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> +	hdr->saved_satp = csr_read(CSR_SATP);
> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_save);

EXPORT_SYMBOL_GPL(), no? Same below.

> +/*
> + * Retrieve the helper parameters from the hibernation image header
> + */
> +int arch_hibernation_header_restore(void *addr)
> +{
> +	struct arch_hibernate_hdr_invariants invariants;
> +	struct arch_hibernate_hdr *hdr = addr;
> +	int ret = 0;
> +
> +	arch_hdr_invariants(&invariants);
> +
> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> +		pr_crit("Hibernate image not generated by this kernel!\n");

Out of curiosity more than anything else, why pr_crit()? Copy-paste from
arm64?

> +		return -EINVAL;
> +	}
> +
> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> +	if (sleep_cpu < 0) {
> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> +		sleep_cpu = -EINVAL;
> +		return -EINVAL;
> +	}
> +
> +#ifdef CONFIG_SMP
> +	ret = bringup_hibernate_cpu(sleep_cpu);
> +	if (ret) {
> +		sleep_cpu = -EINVAL;
> +		return ret;
> +	}
> +#endif
> +	resume_hdr = *hdr;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> +
> +int swsusp_arch_suspend(void)
> +{
> +	int ret = 0;
> +
> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> +		sleep_cpu = smp_processor_id();
> +		suspend_save_csrs(hibernate_cpu_context);
> +		ret = swsusp_save();
> +	} else {
> +		suspend_restore_csrs(hibernate_cpu_context);
> +		flush_tlb_all();
> +
> +		/* Invalidated Icache */

Think this comment can go, no?

> +		flush_icache_all();
> +
> +		/*
> +		 * Tell the hibernation core that we've just restored
> +		 * the memory

I noticed arm64 manipulates the crash kernel in this function too.
How come we don't?

> +		 */
> +		in_suspend = 0;
> +		sleep_cpu = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +

The page table stuff here is beyond me... Hopefully Alex can take a look!

I noticed arm64's one of these is not gated, what is different about
RISC-V that requires it to be?

> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> +	if (sleep_cpu < 0) {
> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> +		return -ENODEV;
> +	}
> +
> +	return freeze_secondary_cpus(sleep_cpu);
> +}
> +#endif
> +
> +static int __init riscv_hibernate_init(void)
> +{
> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> +
> +	if (WARN_ON(!hibernate_cpu_context))
> +		return -ENOMEM;
> +
> +	return 0;
> +}

Thanks,
Conor.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-01-30 23:30     ` Conor Dooley
  0 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 23:30 UTC (permalink / raw)
  To: Sia Jee Heng, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo


[-- Attachment #1.1: Type: text/plain, Size: 17073 bytes --]

+CC Alex

Alex, could you take a look at the page table bits here when you get a
chance please?

On Fri, Jan 27, 2023 at 05:10:51PM +0800, Sia Jee Heng wrote:
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
> 
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
> 
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start
> to restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
> 
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> 
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> ---
>  arch/riscv/Kconfig                 |   7 +
>  arch/riscv/include/asm/assembler.h |  20 ++
>  arch/riscv/include/asm/suspend.h   |  21 ++
>  arch/riscv/kernel/Makefile         |   1 +
>  arch/riscv/kernel/asm-offsets.c    |   5 +
>  arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>  arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>  7 files changed, 503 insertions(+)
>  create mode 100644 arch/riscv/kernel/hibernate-asm.S
>  create mode 100644 arch/riscv/kernel/hibernate.c
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..4555848a817f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,6 +690,13 @@ menu "Power management options"
>  
>  source "kernel/power/Kconfig"
>  
> +config ARCH_HIBERNATION_POSSIBLE
> +	def_bool y
> +
> +config ARCH_HIBERNATION_HEADER
> +	def_bool y
> +	depends on HIBERNATION
> +
>  endmenu # "Power management options"
>  
>  menu "CPU Power Management"
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> index ef1283d04b70..3de70d3e6ceb 100644
> --- a/arch/riscv/include/asm/assembler.h
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -59,4 +59,24 @@
>  		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>  	.endm
>  
> +/**
> + * copy_page - copy 1 page (4KB) of data from source to destination

arch/riscv/include/asm/assembler.h:64: warning: Incorrect use of kernel-doc format:  * copy_page - copy 1 page (4KB) of data from source to destination

> + * @a0 - destination
> + * @a1 - source
> + */
> +	.macro	copy_page a0, a1
> +		lui	a2, 0x1
> +		add	a2, a2, a0
> +.1 :
> +		REG_L	t0, 0(a1)
> +		REG_L	t1, SZREG(a1)
> +
> +		REG_S	t0, 0(a0)
> +		REG_S	t1, SZREG(a0)
> +
> +		addi	a0, a0, 2 * SZREG
> +		addi	a1, a1, 2 * SZREG
> +		bne	a2, a0, .1

allmodconfig, clang 15.0.4:

<instantiation>:3:1: error: unexpected token at start of statement
.1 :
^
/stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
 copy_page a0, a1
 ^
<instantiation>:12:15: error: unknown operand
  bne a2, a0, .1
              ^
/stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
 copy_page a0, a1
 ^
make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1

> +	.endm
> +
>  #endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 75419c5ca272..db40ae433aa9 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -21,6 +21,12 @@ struct suspend_context {
>  #endif
>  };
>  
> +/*
> + * This parameter will be assigned to 0 during resume and will be used by
> + * hibernation core for the subsequent resume sequence

This isn't a parameter! I'm not sure that the comment really adds
anything to be honest, but "Used by the hibernation core and cleared
during the resume sequence" probably gets the point across equally well.

> + */
> +extern int in_suspend;
> +
>  /* Low-level CPU suspend entry function */
>  int __cpu_suspend_enter(struct suspend_context *context);
>  
> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>  /* Used to save and restore the csr */
>  void suspend_save_csrs(struct suspend_context *context);
>  void suspend_restore_csrs(struct suspend_context *context);
> +
> +/* Low-level API to support hibernation */
> +int swsusp_arch_suspend(void);
> +int swsusp_arch_resume(void);
> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> +int arch_hibernation_header_restore(void *addr);
> +int __hibernate_cpu_resume(void);
> +
> +/* Used to resume on the CPU we hibernated on */
> +int hibernate_resume_nonboot_cpu_disable(void);
> +
> +/* Used to restore the hibernated image */

I think this comment is kinda stating the obvious, no?

> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> +				unsigned long cpu_resume);
> +asmlinkage int core_restore_code(void);

How about dropping the comment and prepending hiberate_ to this function
names?

>  #endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 4cf303a779ab..daab341d55e4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>  obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>  
>  obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>  
>  obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>  obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..d6a75aac1d27 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -9,6 +9,7 @@
>  #include <linux/kbuild.h>
>  #include <linux/mm.h>
>  #include <linux/sched.h>
> +#include <linux/suspend.h>
>  #include <asm/kvm_host.h>
>  #include <asm/thread_info.h>
>  #include <asm/ptrace.h>
> @@ -116,6 +117,10 @@ void asm_offsets(void)
>  
>  	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>  
> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> +
>  	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>  	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>  	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> new file mode 100644
> index 000000000000..a83d534b89bd
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate-asm.S
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Hibernation support specific for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/csr.h>
> +
> +#include <linux/linkage.h>
> +
> +/*
> + * This code is executed when resume from the hibernation.
> + *
> + * It begins with loading the temporary page table then restores the memory image.
> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> + * swsusp_arch_suspend().
> + */

This file looks to be confusingly ordered. You lead with a comment
describing a sequence but the file doesn't follow it.
I suggest removing this comment.

> +
> +/*
> + * int __hibernate_cpu_resume(void)
> + * Switch back to the hibernated image's page table prior to restore the CPU

nit: s/restore/restoring

> + * context.
> + *
> + * Always returns 0 to the C code.

s/to the C code//

> + */
> +ENTRY(__hibernate_cpu_resume)
> +	/* switch to hibernated image's page table */
> +	csrw CSR_SATP, s0
> +	sfence.vma
> +
> +	REG_L	a0, hibernate_cpu_context
> +
> +	/* Restore CSRs */

Stating the obvious again here, no?

> +	restore_csr
> +
> +	/* Restore registers (except A0 and T0-T6) */

Do we need to mention the (except A0 & T0-T6) here and elsewhere?
If they're lost across calls anyway, is it worth mentioning that they're
lost across hibernation?

> +	restore_reg
> +
> +	/* Return zero value */
> +	add	a0, zero, zero
> +
> +	/* Return to C code */

I'd drop this comment. I don't think the presumed caller of the function
needs to be mentioned here.

> +	ret
> +END(__hibernate_cpu_resume)
> +
> +/*
> + * Prepare to restore the image.
> + * a0: satp of saved page tables
> + * a1: satp of temporary page tables
> + * a2: cpu_resume
> + */
> +ENTRY(restore_image)
> +	mv	s0, a0
> +	mv	s1, a1
> +	mv	s2, a2
> +	REG_L	s4, restore_pblist
> +	REG_L	a1, relocated_restore_code
> +
> +	jalr	a1
> +END(restore_image)
> +
> +/*
> + * The below code will be executed from a 'safe' page.
> + * It first switches to the temporary page table, then start to copy the pages

nit: s/start/starts/

> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()

nit: s/jumps to the/jumps to/

> + * to restore the CPU context.
> + */
> +ENTRY(core_restore_code)
> +	/* switch to temp page table */
> +	csrw satp, s1
> +	sfence.vma
> +.Lcopy:
> +	/* The below code will restore the hibernated image. */

I think this should be moved to the top of the pre-function comment.

> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> +
> +	copy_page a0, a1
> +
> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> +	bnez	s4, .Lcopy
> +
> +	jalr	s2
> +END(core_restore_code)
> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> new file mode 100644
> index 000000000000..bf7f3c781820
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate.c
> @@ -0,0 +1,360 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hibernation support specific for RISCV

Well, it'd be odd if it was for another arch but sitting in arch/riscv!
;)

Thanks for your patches though, it'll be great to have hibernation
support going.

> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/barrier.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
> +#include <asm/set_memory.h>
> +#include <asm/smp.h>
> +#include <asm/suspend.h>
> +
> +#include <linux/cpu.h>
> +#include <linux/memblock.h>
> +#include <linux/pm.h>
> +#include <linux/sched.h>
> +#include <linux/suspend.h>
> +#include <linux/utsname.h>
> +
> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> +static int sleep_cpu = -EINVAL;
> +
> +/* CPU context to be saved */
> +struct suspend_context *hibernate_cpu_context;
> +
> +unsigned long relocated_restore_code;
> +
> +/* Pointer to the temporary resume page table */
> +pgd_t *resume_pg_dir;

sparse doesn't like what you've done here:
/stuff/linux/arch/riscv/kernel/hibernate.c:31:24: warning: symbol 'hibernate_cpu_context' was not declared. Should it be static?
/stuff/linux/arch/riscv/kernel/hibernate.c:33:15: warning: symbol 'relocated_restore_code' was not declared. Should it be static?
/stuff/linux/arch/riscv/kernel/hibernate.c:36:7: warning: symbol 'resume_pg_dir' was not declared. Should it be static?
> +
> +/**
> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> + * @uts_version: to save the build number and date so that the we are not resume with

nit: "so that we do not resume"

> + *		a different kernel
> + */
> +struct arch_hibernate_hdr_invariants {
> +	char		uts_version[__NEW_UTS_LEN + 1];
> +};
> +
> +/**
> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> + * @invariants: container to store kernel build version
> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.

nit: s/executing/executes

Also, my OCD is triggered by the inconsistent full stops at EOL.

> + * @saved_satp: original page table used by the hibernated image.
> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> + */
> +static struct arch_hibernate_hdr {
> +	struct arch_hibernate_hdr_invariants invariants;
> +	unsigned long	hartid;
> +	unsigned long	saved_satp;
> +	unsigned long	restore_cpu_addr;
> +} resume_hdr;
> +
> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> +{
> +	memset(i, 0, sizeof(*i));
> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> +}
> +
> +/*
> + * Check if the given pfn is in the 'nosave' section.
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> +
> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> +}
> +
> +void notrace save_processor_state(void)
> +{
> +	WARN_ON(num_online_cpus() != 1);
> +}
> +
> +void notrace restore_processor_state(void)
> +{
> +}
> +
> +/*
> + * Helper parameters need to be saved to the hibernation image header.
> + */
> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> +{
> +	struct arch_hibernate_hdr *hdr = addr;
> +
> +	if (max_size < sizeof(*hdr))
> +		return -EOVERFLOW;
> +
> +	arch_hdr_invariants(&hdr->invariants);
> +
> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> +	hdr->saved_satp = csr_read(CSR_SATP);
> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_save);

EXPORT_SYMBOL_GPL(), no? Same below.

> +/*
> + * Retrieve the helper parameters from the hibernation image header
> + */
> +int arch_hibernation_header_restore(void *addr)
> +{
> +	struct arch_hibernate_hdr_invariants invariants;
> +	struct arch_hibernate_hdr *hdr = addr;
> +	int ret = 0;
> +
> +	arch_hdr_invariants(&invariants);
> +
> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> +		pr_crit("Hibernate image not generated by this kernel!\n");

Out of curiosity more than anything else, why pr_crit()? Copy-paste from
arm64?

> +		return -EINVAL;
> +	}
> +
> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> +	if (sleep_cpu < 0) {
> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> +		sleep_cpu = -EINVAL;
> +		return -EINVAL;
> +	}
> +
> +#ifdef CONFIG_SMP
> +	ret = bringup_hibernate_cpu(sleep_cpu);
> +	if (ret) {
> +		sleep_cpu = -EINVAL;
> +		return ret;
> +	}
> +#endif
> +	resume_hdr = *hdr;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> +
> +int swsusp_arch_suspend(void)
> +{
> +	int ret = 0;
> +
> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> +		sleep_cpu = smp_processor_id();
> +		suspend_save_csrs(hibernate_cpu_context);
> +		ret = swsusp_save();
> +	} else {
> +		suspend_restore_csrs(hibernate_cpu_context);
> +		flush_tlb_all();
> +
> +		/* Invalidated Icache */

Think this comment can go, no?

> +		flush_icache_all();
> +
> +		/*
> +		 * Tell the hibernation core that we've just restored
> +		 * the memory

I noticed arm64 manipulates the crash kernel in this function too.
How come we don't?

> +		 */
> +		in_suspend = 0;
> +		sleep_cpu = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +

The page table stuff here is beyond me... Hopefully Alex can take a look!

I noticed arm64's one of these is not gated, what is different about
RISC-V that requires it to be?

> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> +	if (sleep_cpu < 0) {
> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> +		return -ENODEV;
> +	}
> +
> +	return freeze_secondary_cpus(sleep_cpu);
> +}
> +#endif
> +
> +static int __init riscv_hibernate_init(void)
> +{
> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> +
> +	if (WARN_ON(!hibernate_cpu_context))
> +		return -ENOMEM;
> +
> +	return 0;
> +}

Thanks,
Conor.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
  2023-01-27  9:10   ` Sia Jee Heng
@ 2023-01-30 23:31     ` Conor Dooley
  -1 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 23:31 UTC (permalink / raw)
  To: Sia Jee Heng
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

On Fri, Jan 27, 2023 at 05:10:48PM +0800, Sia Jee Heng wrote:
> Currently suspend_save_csrs() and suspend_restore_csrs() functions are
> statically defined in the suspend.c. Change the function's attribute
> to public so that the functions can be used by hibernation as well.

Seems reasonable to me!
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
@ 2023-01-30 23:31     ` Conor Dooley
  0 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-01-30 23:31 UTC (permalink / raw)
  To: Sia Jee Heng
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	leyfoon.tan, mason.huo


[-- Attachment #1.1: Type: text/plain, Size: 372 bytes --]

On Fri, Jan 27, 2023 at 05:10:48PM +0800, Sia Jee Heng wrote:
> Currently suspend_save_csrs() and suspend_restore_csrs() functions are
> statically defined in the suspend.c. Change the function's attribute
> to public so that the functions can be used by hibernation as well.

Seems reasonable to me!
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>

Thanks,
Conor.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
  2023-01-30 21:57     ` Conor Dooley
@ 2023-01-31  8:19       ` Alexandre Ghiti
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-01-31  8:19 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Sia Jee Heng, paul.walmsley, palmer, aou, linux-riscv,
	linux-kernel, leyfoon.tan, mason.huo

Hi,

On Mon, Jan 30, 2023 at 10:57 PM Conor Dooley <conor@kernel.org> wrote:
>
> +CC Alex
>
> On Fri, Jan 27, 2023 at 05:10:50PM +0800, Sia Jee Heng wrote:
> > Currently kernel_page_present() function doesn't support huge page
> > detection causes the function to mistakenly return false to the
> > hibernation core.
>
> This sounds like a bug & should have a fixes tag, no? I assume for
> whatever commit enabled huge page support...
> We don't support set_memory, which by the looks of things is the other
> usecase for this function, so probably doesn't need backporting.

Maybe add this patch in the Fixes tag: commit 9e953cda5cdf ("riscv:
Introduce huge page support for 32/64bit kernel").

>
> Alex, does this change look good to you?

Yes, just one thing though: what about a pgd_leaf() test? Even if very
unlikely (I see x86 does not even test it), the privileged spec states
it is possible to have a 256TB page.

Thanks,

Alex

>
> > Add huge page detection to the function to solve the problem.
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >  arch/riscv/mm/pageattr.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> > index 86c56616e5de..792b8d10cdfc 100644
> > --- a/arch/riscv/mm/pageattr.c
> > +++ b/arch/riscv/mm/pageattr.c
> > @@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
> >       p4d = p4d_offset(pgd, addr);
> >       if (!p4d_present(*p4d))
> >               return false;
> > +     if (p4d_leaf(*p4d))
> > +             return true;
> >
> >       pud = pud_offset(p4d, addr);
> >       if (!pud_present(*pud))
> >               return false;
> > +     if (pud_leaf(*pud))
> > +             return true;
> >
> >       pmd = pmd_offset(pud, addr);
> >       if (!pmd_present(*pmd))
> >               return false;
> > +     if (pmd_leaf(*pmd))
> > +             return true;
> >
> >       pte = pte_offset_kernel(pmd, addr);
> >       return pte_present(*pte);
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
@ 2023-01-31  8:19       ` Alexandre Ghiti
  0 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-01-31  8:19 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Sia Jee Heng, paul.walmsley, palmer, aou, linux-riscv,
	linux-kernel, leyfoon.tan, mason.huo

Hi,

On Mon, Jan 30, 2023 at 10:57 PM Conor Dooley <conor@kernel.org> wrote:
>
> +CC Alex
>
> On Fri, Jan 27, 2023 at 05:10:50PM +0800, Sia Jee Heng wrote:
> > Currently kernel_page_present() function doesn't support huge page
> > detection causes the function to mistakenly return false to the
> > hibernation core.
>
> This sounds like a bug & should have a fixes tag, no? I assume for
> whatever commit enabled huge page support...
> We don't support set_memory, which by the looks of things is the other
> usecase for this function, so probably doesn't need backporting.

Maybe add this patch in the Fixes tag: commit 9e953cda5cdf ("riscv:
Introduce huge page support for 32/64bit kernel").

>
> Alex, does this change look good to you?

Yes, just one thing though: what about a pgd_leaf() test? Even if very
unlikely (I see x86 does not even test it), the privileged spec states
it is possible to have a 256TB page.

Thanks,

Alex

>
> > Add huge page detection to the function to solve the problem.
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >  arch/riscv/mm/pageattr.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> > index 86c56616e5de..792b8d10cdfc 100644
> > --- a/arch/riscv/mm/pageattr.c
> > +++ b/arch/riscv/mm/pageattr.c
> > @@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
> >       p4d = p4d_offset(pgd, addr);
> >       if (!p4d_present(*p4d))
> >               return false;
> > +     if (p4d_leaf(*p4d))
> > +             return true;
> >
> >       pud = pud_offset(p4d, addr);
> >       if (!pud_present(*pud))
> >               return false;
> > +     if (pud_leaf(*pud))
> > +             return true;
> >
> >       pmd = pmd_offset(pud, addr);
> >       if (!pmd_present(*pmd))
> >               return false;
> > +     if (pmd_leaf(*pmd))
> > +             return true;
> >
> >       pte = pte_offset_kernel(pmd, addr);
> >       return pte_present(*pte);
> > --
> > 2.34.1
> >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-30 23:30     ` Conor Dooley
@ 2023-01-31  9:59       ` Alexandre Ghiti
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-01-31  9:59 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Sia Jee Heng, paul.walmsley, palmer, aou, linux-riscv,
	linux-kernel, leyfoon.tan, mason.huo

Hi,

On Tue, Jan 31, 2023 at 12:30 AM Conor Dooley <conor@kernel.org> wrote:
>
> +CC Alex
>
> Alex, could you take a look at the page table bits here when you get a
> chance please?

Yes, I'll do that soon.

Thanks,

Alex

>
> On Fri, Jan 27, 2023 at 05:10:51PM +0800, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >  arch/riscv/Kconfig                 |   7 +
> >  arch/riscv/include/asm/assembler.h |  20 ++
> >  arch/riscv/include/asm/suspend.h   |  21 ++
> >  arch/riscv/kernel/Makefile         |   1 +
> >  arch/riscv/kernel/asm-offsets.c    |   5 +
> >  arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >  arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >  7 files changed, 503 insertions(+)
> >  create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >  create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> >  source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > +     def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > +     def_bool y
> > +     depends on HIBERNATION
> > +
> >  endmenu # "Power management options"
> >
> >  menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index ef1283d04b70..3de70d3e6ceb 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> >               REG_L   s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >       .endm
> >
> > +/**
> > + * copy_page - copy 1 page (4KB) of data from source to destination
>
> arch/riscv/include/asm/assembler.h:64: warning: Incorrect use of kernel-doc format:  * copy_page - copy 1 page (4KB) of data from source to destination
>
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +     .macro  copy_page a0, a1
> > +             lui     a2, 0x1
> > +             add     a2, a2, a0
> > +.1 :
> > +             REG_L   t0, 0(a1)
> > +             REG_L   t1, SZREG(a1)
> > +
> > +             REG_S   t0, 0(a0)
> > +             REG_S   t1, SZREG(a0)
> > +
> > +             addi    a0, a0, 2 * SZREG
> > +             addi    a1, a1, 2 * SZREG
> > +             bne     a2, a0, .1
>
> allmodconfig, clang 15.0.4:
>
> <instantiation>:3:1: error: unexpected token at start of statement
> .1 :
> ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> <instantiation>:12:15: error: unknown operand
>   bne a2, a0, .1
>               ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
>
> > +     .endm
> > +
> >  #endif       /* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..db40ae433aa9 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,12 @@ struct suspend_context {
> >  #endif
> >  };
> >
> > +/*
> > + * This parameter will be assigned to 0 during resume and will be used by
> > + * hibernation core for the subsequent resume sequence
>
> This isn't a parameter! I'm not sure that the comment really adds
> anything to be honest, but "Used by the hibernation core and cleared
> during the resume sequence" probably gets the point across equally well.
>
> > + */
> > +extern int in_suspend;
> > +
> >  /* Low-level CPU suspend entry function */
> >  int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >  /* Used to save and restore the csr */
> >  void suspend_save_csrs(struct suspend_context *context);
> >  void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +/* Used to restore the hibernated image */
>
> I think this comment is kinda stating the obvious, no?
>
> > +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > +                             unsigned long cpu_resume);
> > +asmlinkage int core_restore_code(void);
>
> How about dropping the comment and prepending hiberate_ to this function
> names?
>
> >  #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)               += module.o
> >  obj-$(CONFIG_MODULE_SECTIONS)        += module-sections.o
> >
> >  obj-$(CONFIG_CPU_PM)         += suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION)    += hibernate.o hibernate-asm.o
> >
> >  obj-$(CONFIG_FUNCTION_TRACER)        += mcount.o ftrace.o
> >  obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/kbuild.h>
> >  #include <linux/mm.h>
> >  #include <linux/sched.h>
> > +#include <linux/suspend.h>
> >  #include <asm/kvm_host.h>
> >  #include <asm/thread_info.h>
> >  #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> >       OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > +     OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > +     OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > +     OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> >       OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >       OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >       OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..a83d534b89bd
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,89 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * This code is executed when resume from the hibernation.
> > + *
> > + * It begins with loading the temporary page table then restores the memory image.
> > + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> > + * swsusp_arch_suspend().
> > + */
>
> This file looks to be confusingly ordered. You lead with a comment
> describing a sequence but the file doesn't follow it.
> I suggest removing this comment.
>
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restore the CPU
>
> nit: s/restore/restoring
>
> > + * context.
> > + *
> > + * Always returns 0 to the C code.
>
> s/to the C code//
>
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > +     /* switch to hibernated image's page table */
> > +     csrw CSR_SATP, s0
> > +     sfence.vma
> > +
> > +     REG_L   a0, hibernate_cpu_context
> > +
> > +     /* Restore CSRs */
>
> Stating the obvious again here, no?
>
> > +     restore_csr
> > +
> > +     /* Restore registers (except A0 and T0-T6) */
>
> Do we need to mention the (except A0 & T0-T6) here and elsewhere?
> If they're lost across calls anyway, is it worth mentioning that they're
> lost across hibernation?
>
> > +     restore_reg
> > +
> > +     /* Return zero value */
> > +     add     a0, zero, zero
> > +
> > +     /* Return to C code */
>
> I'd drop this comment. I don't think the presumed caller of the function
> needs to be mentioned here.
>
> > +     ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables
> > + * a1: satp of temporary page tables
> > + * a2: cpu_resume
> > + */
> > +ENTRY(restore_image)
> > +     mv      s0, a0
> > +     mv      s1, a1
> > +     mv      s2, a2
> > +     REG_L   s4, restore_pblist
> > +     REG_L   a1, relocated_restore_code
> > +
> > +     jalr    a1
> > +END(restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then start to copy the pages
>
> nit: s/start/starts/
>
> > + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>
> nit: s/jumps to the/jumps to/
>
> > + * to restore the CPU context.
> > + */
> > +ENTRY(core_restore_code)
> > +     /* switch to temp page table */
> > +     csrw satp, s1
> > +     sfence.vma
> > +.Lcopy:
> > +     /* The below code will restore the hibernated image. */
>
> I think this should be moved to the top of the pre-function comment.
>
> > +     REG_L   a1, HIBERN_PBE_ADDR(s4)
> > +     REG_L   a0, HIBERN_PBE_ORIG(s4)
> > +
> > +     copy_page a0, a1
> > +
> > +     REG_L   s4, HIBERN_PBE_NEXT(s4)
> > +     bnez    s4, .Lcopy
> > +
> > +     jalr    s2
> > +END(core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..bf7f3c781820
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,360 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support specific for RISCV
>
> Well, it'd be odd if it was for another arch but sitting in arch/riscv!
> ;)
>
> Thanks for your patches though, it'll be great to have hibernation
> support going.
>
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* CPU context to be saved */
> > +struct suspend_context *hibernate_cpu_context;
> > +
> > +unsigned long relocated_restore_code;
> > +
> > +/* Pointer to the temporary resume page table */
> > +pgd_t *resume_pg_dir;
>
> sparse doesn't like what you've done here:
> /stuff/linux/arch/riscv/kernel/hibernate.c:31:24: warning: symbol 'hibernate_cpu_context' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:33:15: warning: symbol 'relocated_restore_code' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:36:7: warning: symbol 'resume_pg_dir' was not declared. Should it be static?
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> > + * @uts_version: to save the build number and date so that the we are not resume with
>
> nit: "so that we do not resume"
>
> > + *           a different kernel
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > +     char            uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> > + * @invariants: container to store kernel build version
> > + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>
> nit: s/executing/executes
>
> Also, my OCD is triggered by the inconsistent full stops at EOL.
>
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > +     struct arch_hibernate_hdr_invariants invariants;
> > +     unsigned long   hartid;
> > +     unsigned long   saved_satp;
> > +     unsigned long   restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > +     memset(i, 0, sizeof(*i));
> > +     memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +     unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > +     unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > +     return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > +     WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > +     struct arch_hibernate_hdr *hdr = addr;
> > +
> > +     if (max_size < sizeof(*hdr))
> > +             return -EOVERFLOW;
> > +
> > +     arch_hdr_invariants(&hdr->invariants);
> > +
> > +     hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > +     hdr->saved_satp = csr_read(CSR_SATP);
> > +     hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > +     return 0;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_save);
>
> EXPORT_SYMBOL_GPL(), no? Same below.
>
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > +     struct arch_hibernate_hdr_invariants invariants;
> > +     struct arch_hibernate_hdr *hdr = addr;
> > +     int ret = 0;
> > +
> > +     arch_hdr_invariants(&invariants);
> > +
> > +     if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > +             pr_crit("Hibernate image not generated by this kernel!\n");
>
> Out of curiosity more than anything else, why pr_crit()? Copy-paste from
> arm64?
>
> > +             return -EINVAL;
> > +     }
> > +
> > +     sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > +     if (sleep_cpu < 0) {
> > +             pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > +             sleep_cpu = -EINVAL;
> > +             return -EINVAL;
> > +     }
> > +
> > +#ifdef CONFIG_SMP
> > +     ret = bringup_hibernate_cpu(sleep_cpu);
> > +     if (ret) {
> > +             sleep_cpu = -EINVAL;
> > +             return ret;
> > +     }
> > +#endif
> > +     resume_hdr = *hdr;
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > +     int ret = 0;
> > +
> > +     if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > +             sleep_cpu = smp_processor_id();
> > +             suspend_save_csrs(hibernate_cpu_context);
> > +             ret = swsusp_save();
> > +     } else {
> > +             suspend_restore_csrs(hibernate_cpu_context);
> > +             flush_tlb_all();
> > +
> > +             /* Invalidated Icache */
>
> Think this comment can go, no?
>
> > +             flush_icache_all();
> > +
> > +             /*
> > +              * Tell the hibernation core that we've just restored
> > +              * the memory
>
> I noticed arm64 manipulates the crash kernel in this function too.
> How come we don't?
>
> > +              */
> > +             in_suspend = 0;
> > +             sleep_cpu = -EINVAL;
> > +     }
> > +
> > +     return ret;
> > +}
> > +
>
> The page table stuff here is beyond me... Hopefully Alex can take a look!
>
> I noticed arm64's one of these is not gated, what is different about
> RISC-V that requires it to be?
>
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > +     if (sleep_cpu < 0) {
> > +             pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> > +             return -ENODEV;
> > +     }
> > +
> > +     return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > +     hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> > +
> > +     if (WARN_ON(!hibernate_cpu_context))
> > +             return -ENOMEM;
> > +
> > +     return 0;
> > +}
>
> Thanks,
> Conor.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-01-31  9:59       ` Alexandre Ghiti
  0 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-01-31  9:59 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Sia Jee Heng, paul.walmsley, palmer, aou, linux-riscv,
	linux-kernel, leyfoon.tan, mason.huo

Hi,

On Tue, Jan 31, 2023 at 12:30 AM Conor Dooley <conor@kernel.org> wrote:
>
> +CC Alex
>
> Alex, could you take a look at the page table bits here when you get a
> chance please?

Yes, I'll do that soon.

Thanks,

Alex

>
> On Fri, Jan 27, 2023 at 05:10:51PM +0800, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >  arch/riscv/Kconfig                 |   7 +
> >  arch/riscv/include/asm/assembler.h |  20 ++
> >  arch/riscv/include/asm/suspend.h   |  21 ++
> >  arch/riscv/kernel/Makefile         |   1 +
> >  arch/riscv/kernel/asm-offsets.c    |   5 +
> >  arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >  arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >  7 files changed, 503 insertions(+)
> >  create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >  create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> >  source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > +     def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > +     def_bool y
> > +     depends on HIBERNATION
> > +
> >  endmenu # "Power management options"
> >
> >  menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index ef1283d04b70..3de70d3e6ceb 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> >               REG_L   s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >       .endm
> >
> > +/**
> > + * copy_page - copy 1 page (4KB) of data from source to destination
>
> arch/riscv/include/asm/assembler.h:64: warning: Incorrect use of kernel-doc format:  * copy_page - copy 1 page (4KB) of data from source to destination
>
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +     .macro  copy_page a0, a1
> > +             lui     a2, 0x1
> > +             add     a2, a2, a0
> > +.1 :
> > +             REG_L   t0, 0(a1)
> > +             REG_L   t1, SZREG(a1)
> > +
> > +             REG_S   t0, 0(a0)
> > +             REG_S   t1, SZREG(a0)
> > +
> > +             addi    a0, a0, 2 * SZREG
> > +             addi    a1, a1, 2 * SZREG
> > +             bne     a2, a0, .1
>
> allmodconfig, clang 15.0.4:
>
> <instantiation>:3:1: error: unexpected token at start of statement
> .1 :
> ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> <instantiation>:12:15: error: unknown operand
>   bne a2, a0, .1
>               ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
>
> > +     .endm
> > +
> >  #endif       /* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..db40ae433aa9 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,12 @@ struct suspend_context {
> >  #endif
> >  };
> >
> > +/*
> > + * This parameter will be assigned to 0 during resume and will be used by
> > + * hibernation core for the subsequent resume sequence
>
> This isn't a parameter! I'm not sure that the comment really adds
> anything to be honest, but "Used by the hibernation core and cleared
> during the resume sequence" probably gets the point across equally well.
>
> > + */
> > +extern int in_suspend;
> > +
> >  /* Low-level CPU suspend entry function */
> >  int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >  /* Used to save and restore the csr */
> >  void suspend_save_csrs(struct suspend_context *context);
> >  void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +/* Used to restore the hibernated image */
>
> I think this comment is kinda stating the obvious, no?
>
> > +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > +                             unsigned long cpu_resume);
> > +asmlinkage int core_restore_code(void);
>
> How about dropping the comment and prepending hiberate_ to this function
> names?
>
> >  #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)               += module.o
> >  obj-$(CONFIG_MODULE_SECTIONS)        += module-sections.o
> >
> >  obj-$(CONFIG_CPU_PM)         += suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION)    += hibernate.o hibernate-asm.o
> >
> >  obj-$(CONFIG_FUNCTION_TRACER)        += mcount.o ftrace.o
> >  obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/kbuild.h>
> >  #include <linux/mm.h>
> >  #include <linux/sched.h>
> > +#include <linux/suspend.h>
> >  #include <asm/kvm_host.h>
> >  #include <asm/thread_info.h>
> >  #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> >       OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > +     OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > +     OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > +     OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> >       OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >       OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >       OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..a83d534b89bd
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,89 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * This code is executed when resume from the hibernation.
> > + *
> > + * It begins with loading the temporary page table then restores the memory image.
> > + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> > + * swsusp_arch_suspend().
> > + */
>
> This file looks to be confusingly ordered. You lead with a comment
> describing a sequence but the file doesn't follow it.
> I suggest removing this comment.
>
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restore the CPU
>
> nit: s/restore/restoring
>
> > + * context.
> > + *
> > + * Always returns 0 to the C code.
>
> s/to the C code//
>
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > +     /* switch to hibernated image's page table */
> > +     csrw CSR_SATP, s0
> > +     sfence.vma
> > +
> > +     REG_L   a0, hibernate_cpu_context
> > +
> > +     /* Restore CSRs */
>
> Stating the obvious again here, no?
>
> > +     restore_csr
> > +
> > +     /* Restore registers (except A0 and T0-T6) */
>
> Do we need to mention the (except A0 & T0-T6) here and elsewhere?
> If they're lost across calls anyway, is it worth mentioning that they're
> lost across hibernation?
>
> > +     restore_reg
> > +
> > +     /* Return zero value */
> > +     add     a0, zero, zero
> > +
> > +     /* Return to C code */
>
> I'd drop this comment. I don't think the presumed caller of the function
> needs to be mentioned here.
>
> > +     ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables
> > + * a1: satp of temporary page tables
> > + * a2: cpu_resume
> > + */
> > +ENTRY(restore_image)
> > +     mv      s0, a0
> > +     mv      s1, a1
> > +     mv      s2, a2
> > +     REG_L   s4, restore_pblist
> > +     REG_L   a1, relocated_restore_code
> > +
> > +     jalr    a1
> > +END(restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then start to copy the pages
>
> nit: s/start/starts/
>
> > + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>
> nit: s/jumps to the/jumps to/
>
> > + * to restore the CPU context.
> > + */
> > +ENTRY(core_restore_code)
> > +     /* switch to temp page table */
> > +     csrw satp, s1
> > +     sfence.vma
> > +.Lcopy:
> > +     /* The below code will restore the hibernated image. */
>
> I think this should be moved to the top of the pre-function comment.
>
> > +     REG_L   a1, HIBERN_PBE_ADDR(s4)
> > +     REG_L   a0, HIBERN_PBE_ORIG(s4)
> > +
> > +     copy_page a0, a1
> > +
> > +     REG_L   s4, HIBERN_PBE_NEXT(s4)
> > +     bnez    s4, .Lcopy
> > +
> > +     jalr    s2
> > +END(core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..bf7f3c781820
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,360 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support specific for RISCV
>
> Well, it'd be odd if it was for another arch but sitting in arch/riscv!
> ;)
>
> Thanks for your patches though, it'll be great to have hibernation
> support going.
>
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* CPU context to be saved */
> > +struct suspend_context *hibernate_cpu_context;
> > +
> > +unsigned long relocated_restore_code;
> > +
> > +/* Pointer to the temporary resume page table */
> > +pgd_t *resume_pg_dir;
>
> sparse doesn't like what you've done here:
> /stuff/linux/arch/riscv/kernel/hibernate.c:31:24: warning: symbol 'hibernate_cpu_context' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:33:15: warning: symbol 'relocated_restore_code' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:36:7: warning: symbol 'resume_pg_dir' was not declared. Should it be static?
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> > + * @uts_version: to save the build number and date so that the we are not resume with
>
> nit: "so that we do not resume"
>
> > + *           a different kernel
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > +     char            uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> > + * @invariants: container to store kernel build version
> > + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>
> nit: s/executing/executes
>
> Also, my OCD is triggered by the inconsistent full stops at EOL.
>
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > +     struct arch_hibernate_hdr_invariants invariants;
> > +     unsigned long   hartid;
> > +     unsigned long   saved_satp;
> > +     unsigned long   restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > +     memset(i, 0, sizeof(*i));
> > +     memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +     unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > +     unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > +     return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > +     WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > +     struct arch_hibernate_hdr *hdr = addr;
> > +
> > +     if (max_size < sizeof(*hdr))
> > +             return -EOVERFLOW;
> > +
> > +     arch_hdr_invariants(&hdr->invariants);
> > +
> > +     hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > +     hdr->saved_satp = csr_read(CSR_SATP);
> > +     hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > +     return 0;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_save);
>
> EXPORT_SYMBOL_GPL(), no? Same below.
>
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > +     struct arch_hibernate_hdr_invariants invariants;
> > +     struct arch_hibernate_hdr *hdr = addr;
> > +     int ret = 0;
> > +
> > +     arch_hdr_invariants(&invariants);
> > +
> > +     if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > +             pr_crit("Hibernate image not generated by this kernel!\n");
>
> Out of curiosity more than anything else, why pr_crit()? Copy-paste from
> arm64?
>
> > +             return -EINVAL;
> > +     }
> > +
> > +     sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > +     if (sleep_cpu < 0) {
> > +             pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > +             sleep_cpu = -EINVAL;
> > +             return -EINVAL;
> > +     }
> > +
> > +#ifdef CONFIG_SMP
> > +     ret = bringup_hibernate_cpu(sleep_cpu);
> > +     if (ret) {
> > +             sleep_cpu = -EINVAL;
> > +             return ret;
> > +     }
> > +#endif
> > +     resume_hdr = *hdr;
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > +     int ret = 0;
> > +
> > +     if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > +             sleep_cpu = smp_processor_id();
> > +             suspend_save_csrs(hibernate_cpu_context);
> > +             ret = swsusp_save();
> > +     } else {
> > +             suspend_restore_csrs(hibernate_cpu_context);
> > +             flush_tlb_all();
> > +
> > +             /* Invalidated Icache */
>
> Think this comment can go, no?
>
> > +             flush_icache_all();
> > +
> > +             /*
> > +              * Tell the hibernation core that we've just restored
> > +              * the memory
>
> I noticed arm64 manipulates the crash kernel in this function too.
> How come we don't?
>
> > +              */
> > +             in_suspend = 0;
> > +             sleep_cpu = -EINVAL;
> > +     }
> > +
> > +     return ret;
> > +}
> > +
>
> The page table stuff here is beyond me... Hopefully Alex can take a look!
>
> I noticed arm64's one of these is not gated, what is different about
> RISC-V that requires it to be?
>
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > +     if (sleep_cpu < 0) {
> > +             pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> > +             return -ENODEV;
> > +     }
> > +
> > +     return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > +     hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> > +
> > +     if (WARN_ON(!hibernate_cpu_context))
> > +             return -ENOMEM;
> > +
> > +     return 0;
> > +}
>
> Thanks,
> Conor.
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
  2023-01-31  8:19       ` Alexandre Ghiti
@ 2023-02-01  5:48         ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-01  5:48 UTC (permalink / raw)
  To: Alexandre Ghiti, Conor Dooley
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Alexandre Ghiti <alexghiti@rivosinc.com>
> Sent: Tuesday, 31 January, 2023 4:19 PM
> To: Conor Dooley <conor@kernel.org>
> Cc: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-
> riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
> 
> Hi,
> 
> On Mon, Jan 30, 2023 at 10:57 PM Conor Dooley <conor@kernel.org> wrote:
> >
> > +CC Alex
> >
> > On Fri, Jan 27, 2023 at 05:10:50PM +0800, Sia Jee Heng wrote:
> > > Currently kernel_page_present() function doesn't support huge page
> > > detection causes the function to mistakenly return false to the
> > > hibernation core.
> >
> > This sounds like a bug & should have a fixes tag, no? I assume for
> > whatever commit enabled huge page support...
> > We don't support set_memory, which by the looks of things is the other
> > usecase for this function, so probably doesn't need backporting.
> 
> Maybe add this patch in the Fixes tag: commit 9e953cda5cdf ("riscv:
> Introduce huge page support for 32/64bit kernel").
Sure, will add the fixes tag.
> 
> >
> > Alex, does this change look good to you?
> 
> Yes, just one thing though: what about a pgd_leaf() test? Even if very
> unlikely (I see x86 does not even test it), the privileged spec states
> it is possible to have a 256TB page.
I can add it in. But as you are probably aware that x86 and ARM don't even tested it. Thanks.
> 
> Thanks,
> 
> Alex
> 
> >
> > > Add huge page detection to the function to solve the problem.
> > >
> > > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > > ---
> > >  arch/riscv/mm/pageattr.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> > > index 86c56616e5de..792b8d10cdfc 100644
> > > --- a/arch/riscv/mm/pageattr.c
> > > +++ b/arch/riscv/mm/pageattr.c
> > > @@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
> > >       p4d = p4d_offset(pgd, addr);
> > >       if (!p4d_present(*p4d))
> > >               return false;
> > > +     if (p4d_leaf(*p4d))
> > > +             return true;
> > >
> > >       pud = pud_offset(p4d, addr);
> > >       if (!pud_present(*pud))
> > >               return false;
> > > +     if (pud_leaf(*pud))
> > > +             return true;
> > >
> > >       pmd = pmd_offset(pud, addr);
> > >       if (!pmd_present(*pmd))
> > >               return false;
> > > +     if (pmd_leaf(*pmd))
> > > +             return true;
> > >
> > >       pte = pte_offset_kernel(pmd, addr);
> > >       return pte_present(*pte);
> > > --
> > > 2.34.1
> > >

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
@ 2023-02-01  5:48         ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-01  5:48 UTC (permalink / raw)
  To: Alexandre Ghiti, Conor Dooley
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Alexandre Ghiti <alexghiti@rivosinc.com>
> Sent: Tuesday, 31 January, 2023 4:19 PM
> To: Conor Dooley <conor@kernel.org>
> Cc: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-
> riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
> 
> Hi,
> 
> On Mon, Jan 30, 2023 at 10:57 PM Conor Dooley <conor@kernel.org> wrote:
> >
> > +CC Alex
> >
> > On Fri, Jan 27, 2023 at 05:10:50PM +0800, Sia Jee Heng wrote:
> > > Currently kernel_page_present() function doesn't support huge page
> > > detection causes the function to mistakenly return false to the
> > > hibernation core.
> >
> > This sounds like a bug & should have a fixes tag, no? I assume for
> > whatever commit enabled huge page support...
> > We don't support set_memory, which by the looks of things is the other
> > usecase for this function, so probably doesn't need backporting.
> 
> Maybe add this patch in the Fixes tag: commit 9e953cda5cdf ("riscv:
> Introduce huge page support for 32/64bit kernel").
Sure, will add the fixes tag.
> 
> >
> > Alex, does this change look good to you?
> 
> Yes, just one thing though: what about a pgd_leaf() test? Even if very
> unlikely (I see x86 does not even test it), the privileged spec states
> it is possible to have a 256TB page.
I can add it in. But as you are probably aware that x86 and ARM don't even tested it. Thanks.
> 
> Thanks,
> 
> Alex
> 
> >
> > > Add huge page detection to the function to solve the problem.
> > >
> > > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > > ---
> > >  arch/riscv/mm/pageattr.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> > > index 86c56616e5de..792b8d10cdfc 100644
> > > --- a/arch/riscv/mm/pageattr.c
> > > +++ b/arch/riscv/mm/pageattr.c
> > > @@ -221,14 +221,20 @@ bool kernel_page_present(struct page *page)
> > >       p4d = p4d_offset(pgd, addr);
> > >       if (!p4d_present(*p4d))
> > >               return false;
> > > +     if (p4d_leaf(*p4d))
> > > +             return true;
> > >
> > >       pud = pud_offset(p4d, addr);
> > >       if (!pud_present(*pud))
> > >               return false;
> > > +     if (pud_leaf(*pud))
> > > +             return true;
> > >
> > >       pmd = pmd_offset(pud, addr);
> > >       if (!pmd_present(*pmd))
> > >               return false;
> > > +     if (pmd_leaf(*pmd))
> > > +             return true;
> > >
> > >       pte = pte_offset_kernel(pmd, addr);
> > >       return pte_present(*pte);
> > > --
> > > 2.34.1
> > >
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
  2023-01-30 21:49     ` Conor Dooley
@ 2023-02-01  6:19       ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-01  6:19 UTC (permalink / raw)
  To: Conor Dooley
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Conor Dooley <conor@kernel.org>
> Sent: Tuesday, 31 January, 2023 5:49 AM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
> 
> On Fri, Jan 27, 2023 at 05:10:49PM +0800, Sia Jee Heng wrote:
> > The cpu_resume() function is very similar for the suspend to disk and
> > suspend to ram cases. Factor out the common code into restore_csr macro
> > and restore_reg macro.
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > ---
> >  arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
> >  arch/riscv/kernel/suspend_entry.S  | 34 ++--------------
> >  2 files changed, 65 insertions(+), 31 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/assembler.h
> >
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > new file mode 100644
> > index 000000000000..ef1283d04b70
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -0,0 +1,62 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#ifndef __ASSEMBLY__
> > +#error "Only include this from assembly code"
> > +#endif
> > +
> > +#ifndef __ASM_ASSEMBLER_H
> > +#define __ASM_ASSEMBLER_H
> > +
> > +#include <asm/asm.h>
> > +#include <asm/csr.h>
> > +#include <asm/asm-offsets.h>
> > +
> > +/**
> > + * restore_csr - restore hart's CSR value
> > + */
> > +	.macro restore_csr
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > +		csrw	CSR_EPC, t0
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > +		csrw	CSR_STATUS, t0
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > +		csrw	CSR_TVAL, t0
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > +		csrw	CSR_CAUSE, t0
> > +	.endm
> > +
> > +/**
> > + * restore_reg - Restore registers (except A0 and T0-T6)
> 
> arch/riscv/include/asm/assembler.h:34: warning: Incorrect use of kernel-doc format:  * restore_reg - Restore registers (except A0 and
> T0-T6)
Ok, will use '/*' instead of '/**'
> 
> Otherwise, LGTM.
> 
> > + */
> > +	.macro restore_reg
> > +		REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > +		REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > +		REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > +		REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > +		REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > +		REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > +		REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > +		REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > +		REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > +		REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > +		REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > +		REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > +		REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > +		REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > +		REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > +		REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > +		REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > +		REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > +		REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > +		REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > +		REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > +		REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > +		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > +	.endm
> > +
> > +#endif	/* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> > index aafcca58c19d..74a8fab8e0f6 100644
> > --- a/arch/riscv/kernel/suspend_entry.S
> > +++ b/arch/riscv/kernel/suspend_entry.S
> > @@ -7,6 +7,7 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> >  #include <asm/csr.h>
> >  #include <asm/xip_fixup.h>
> >
> > @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
> >  	add	a0, a1, zero
> >
> >  	/* Restore CSRs */
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > -	csrw	CSR_EPC, t0
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > -	csrw	CSR_STATUS, t0
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > -	csrw	CSR_TVAL, t0
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > -	csrw	CSR_CAUSE, t0
> > +	restore_csr
> >
> >  	/* Restore registers (except A0 and T0-T6) */
> > -	REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > -	REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > -	REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > -	REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > -	REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > -	REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > -	REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > -	REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > -	REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > -	REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > -	REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > -	REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > -	REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > -	REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > -	REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > -	REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > -	REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > -	REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > -	REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > -	REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > -	REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > -	REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > -	REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > +	restore_reg
> >
> >  	/* Return zero value */
> >  	add	a0, zero, zero
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
@ 2023-02-01  6:19       ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-01  6:19 UTC (permalink / raw)
  To: Conor Dooley
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Conor Dooley <conor@kernel.org>
> Sent: Tuesday, 31 January, 2023 5:49 AM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
> 
> On Fri, Jan 27, 2023 at 05:10:49PM +0800, Sia Jee Heng wrote:
> > The cpu_resume() function is very similar for the suspend to disk and
> > suspend to ram cases. Factor out the common code into restore_csr macro
> > and restore_reg macro.
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > ---
> >  arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
> >  arch/riscv/kernel/suspend_entry.S  | 34 ++--------------
> >  2 files changed, 65 insertions(+), 31 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/assembler.h
> >
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > new file mode 100644
> > index 000000000000..ef1283d04b70
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -0,0 +1,62 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#ifndef __ASSEMBLY__
> > +#error "Only include this from assembly code"
> > +#endif
> > +
> > +#ifndef __ASM_ASSEMBLER_H
> > +#define __ASM_ASSEMBLER_H
> > +
> > +#include <asm/asm.h>
> > +#include <asm/csr.h>
> > +#include <asm/asm-offsets.h>
> > +
> > +/**
> > + * restore_csr - restore hart's CSR value
> > + */
> > +	.macro restore_csr
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > +		csrw	CSR_EPC, t0
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > +		csrw	CSR_STATUS, t0
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > +		csrw	CSR_TVAL, t0
> > +		REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > +		csrw	CSR_CAUSE, t0
> > +	.endm
> > +
> > +/**
> > + * restore_reg - Restore registers (except A0 and T0-T6)
> 
> arch/riscv/include/asm/assembler.h:34: warning: Incorrect use of kernel-doc format:  * restore_reg - Restore registers (except A0 and
> T0-T6)
Ok, will use '/*' instead of '/**'
> 
> Otherwise, LGTM.
> 
> > + */
> > +	.macro restore_reg
> > +		REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > +		REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > +		REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > +		REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > +		REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > +		REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > +		REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > +		REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > +		REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > +		REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > +		REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > +		REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > +		REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > +		REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > +		REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > +		REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > +		REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > +		REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > +		REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > +		REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > +		REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > +		REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > +		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > +	.endm
> > +
> > +#endif	/* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> > index aafcca58c19d..74a8fab8e0f6 100644
> > --- a/arch/riscv/kernel/suspend_entry.S
> > +++ b/arch/riscv/kernel/suspend_entry.S
> > @@ -7,6 +7,7 @@
> >  #include <linux/linkage.h>
> >  #include <asm/asm.h>
> >  #include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> >  #include <asm/csr.h>
> >  #include <asm/xip_fixup.h>
> >
> > @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
> >  	add	a0, a1, zero
> >
> >  	/* Restore CSRs */
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > -	csrw	CSR_EPC, t0
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > -	csrw	CSR_STATUS, t0
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > -	csrw	CSR_TVAL, t0
> > -	REG_L	t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > -	csrw	CSR_CAUSE, t0
> > +	restore_csr
> >
> >  	/* Restore registers (except A0 and T0-T6) */
> > -	REG_L	ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > -	REG_L	sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > -	REG_L	gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > -	REG_L	tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > -	REG_L	s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > -	REG_L	s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > -	REG_L	a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > -	REG_L	a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > -	REG_L	a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > -	REG_L	a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > -	REG_L	a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > -	REG_L	a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > -	REG_L	a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > -	REG_L	s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > -	REG_L	s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > -	REG_L	s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > -	REG_L	s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > -	REG_L	s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > -	REG_L	s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > -	REG_L	s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > -	REG_L	s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > -	REG_L	s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > -	REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > +	restore_reg
> >
> >  	/* Return zero value */
> >  	add	a0, zero, zero
> > --
> > 2.34.1
> >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-30 23:30     ` Conor Dooley
@ 2023-02-02  2:43       ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-02  2:43 UTC (permalink / raw)
  To: Conor Dooley, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Conor Dooley <conor@kernel.org>
> Sent: Tuesday, 31 January, 2023 7:31 AM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; Alexandre Ghiti <alexghiti@rivosinc.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> +CC Alex
> 
> Alex, could you take a look at the page table bits here when you get a
> chance please?
> 
> On Fri, Jan 27, 2023 at 05:10:51PM +0800, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >  arch/riscv/Kconfig                 |   7 +
> >  arch/riscv/include/asm/assembler.h |  20 ++
> >  arch/riscv/include/asm/suspend.h   |  21 ++
> >  arch/riscv/kernel/Makefile         |   1 +
> >  arch/riscv/kernel/asm-offsets.c    |   5 +
> >  arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >  arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >  7 files changed, 503 insertions(+)
> >  create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >  create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> >  source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > +	def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > +	def_bool y
> > +	depends on HIBERNATION
> > +
> >  endmenu # "Power management options"
> >
> >  menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index ef1283d04b70..3de70d3e6ceb 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> >  		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >  	.endm
> >
> > +/**
> > + * copy_page - copy 1 page (4KB) of data from source to destination
> 
> arch/riscv/include/asm/assembler.h:64: warning: Incorrect use of kernel-doc format:  * copy_page - copy 1 page (4KB) of data from
> source to destination
will replace the /** with /* doc format

> 
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +	.macro	copy_page a0, a1
> > +		lui	a2, 0x1
> > +		add	a2, a2, a0
> > +.1 :
> > +		REG_L	t0, 0(a1)
> > +		REG_L	t1, SZREG(a1)
> > +
> > +		REG_S	t0, 0(a0)
> > +		REG_S	t1, SZREG(a0)
> > +
> > +		addi	a0, a0, 2 * SZREG
> > +		addi	a1, a1, 2 * SZREG
> > +		bne	a2, a0, .1
> 
> allmodconfig, clang 15.0.4:
> 
> <instantiation>:3:1: error: unexpected token at start of statement
> .1 :
> ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> <instantiation>:12:15: error: unknown operand
>   bne a2, a0, .1
>               ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
> 
> > +	.endm
> > +
> >  #endif	/* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..db40ae433aa9 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,12 @@ struct suspend_context {
> >  #endif
> >  };
> >
> > +/*
> > + * This parameter will be assigned to 0 during resume and will be used by
> > + * hibernation core for the subsequent resume sequence
> 
> This isn't a parameter! I'm not sure that the comment really adds
> anything to be honest, but "Used by the hibernation core and cleared
> during the resume sequence" probably gets the point across equally well.
Sure, will update the comment
> 
> > + */
> > +extern int in_suspend;
> > +
> >  /* Low-level CPU suspend entry function */
> >  int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >  /* Used to save and restore the csr */
> >  void suspend_save_csrs(struct suspend_context *context);
> >  void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +/* Used to restore the hibernated image */
> 
> I think this comment is kinda stating the obvious, no?
> 
> > +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > +				unsigned long cpu_resume);
> > +asmlinkage int core_restore_code(void);
> 
> How about dropping the comment and prepending hiberate_ to this function
> names?
Good idea, will remove the comment and add hibernate_ prefix
> 
> >  #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >  obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >
> >  obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >
> >  obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >  obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/kbuild.h>
> >  #include <linux/mm.h>
> >  #include <linux/sched.h>
> > +#include <linux/suspend.h>
> >  #include <asm/kvm_host.h>
> >  #include <asm/thread_info.h>
> >  #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> >  	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> >  	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >  	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >  	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..a83d534b89bd
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,89 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * This code is executed when resume from the hibernation.
> > + *
> > + * It begins with loading the temporary page table then restores the memory image.
> > + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> > + * swsusp_arch_suspend().
> > + */
> 
> This file looks to be confusingly ordered. You lead with a comment
> describing a sequence but the file doesn't follow it.
> I suggest removing this comment.
Sure, will remove it
> 
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restore the CPU
> 
> nit: s/restore/restoring
> 
> > + * context.
> > + *
> > + * Always returns 0 to the C code.
> 
> s/to the C code//
Ok

> 
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > +	/* switch to hibernated image's page table */
> > +	csrw CSR_SATP, s0
> > +	sfence.vma
> > +
> > +	REG_L	a0, hibernate_cpu_context
> > +
> > +	/* Restore CSRs */
> 
> Stating the obvious again here, no?
Will remove it. thanks
>
> > +	restore_csr
> > +
> > +	/* Restore registers (except A0 and T0-T6) */
> 
> Do we need to mention the (except A0 & T0-T6) here and elsewhere?
> If they're lost across calls anyway, is it worth mentioning that they're
> lost across hibernation?
Can remove the comment, but it is worth to mention it in the macro description
> 
> > +	restore_reg
> > +
> > +	/* Return zero value */
> > +	add	a0, zero, zero
> > +
> > +	/* Return to C code */
> 
> I'd drop this comment. I don't think the presumed caller of the function
> needs to be mentioned here.
Sure. will remove it.
> 
> > +	ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables
> > + * a1: satp of temporary page tables
> > + * a2: cpu_resume
> > + */
> > +ENTRY(restore_image)
> > +	mv	s0, a0
> > +	mv	s1, a1
> > +	mv	s2, a2
> > +	REG_L	s4, restore_pblist
> > +	REG_L	a1, relocated_restore_code
> > +
> > +	jalr	a1
> > +END(restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then start to copy the pages
> 
> nit: s/start/starts/
ok
> 
> > + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> 
> nit: s/jumps to the/jumps to/
ok
> 
> > + * to restore the CPU context.
> > + */
> > +ENTRY(core_restore_code)
> > +	/* switch to temp page table */
> > +	csrw satp, s1
> > +	sfence.vma
> > +.Lcopy:
> > +	/* The below code will restore the hibernated image. */
> 
> I think this should be moved to the top of the pre-function comment.
The idea is that this comment make it easier for readers to understand.
> 
> > +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> > +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> > +
> > +	copy_page a0, a1
> > +
> > +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> > +	bnez	s4, .Lcopy
> > +
> > +	jalr	s2
> > +END(core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..bf7f3c781820
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,360 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support specific for RISCV
> 
> Well, it'd be odd if it was for another arch but sitting in arch/riscv!
> ;)
Sure.
> 
> Thanks for your patches though, it'll be great to have hibernation
> support going.
> 
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* CPU context to be saved */
> > +struct suspend_context *hibernate_cpu_context;
> > +
> > +unsigned long relocated_restore_code;
> > +
> > +/* Pointer to the temporary resume page table */
> > +pgd_t *resume_pg_dir;
> 
> sparse doesn't like what you've done here:
> /stuff/linux/arch/riscv/kernel/hibernate.c:31:24: warning: symbol 'hibernate_cpu_context' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:33:15: warning: symbol 'relocated_restore_code' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:36:7: warning: symbol 'resume_pg_dir' was not declared. Should it be static?
Thanks. will improve it.
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> > + * @uts_version: to save the build number and date so that the we are not resume with
> 
> nit: "so that we do not resume"
sure. thanks
> 
> > + *		a different kernel
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > +	char		uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> > + * @invariants: container to store kernel build version
> > + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> 
> nit: s/executing/executes
sure. thanks
> 
> Also, my OCD is triggered by the inconsistent full stops at EOL.
oops. will fix it. thanks
> 
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	unsigned long	hartid;
> > +	unsigned long	saved_satp;
> > +	unsigned long	restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > +	memset(i, 0, sizeof(*i));
> > +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > +	WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +
> > +	if (max_size < sizeof(*hdr))
> > +		return -EOVERFLOW;
> > +
> > +	arch_hdr_invariants(&hdr->invariants);
> > +
> > +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > +	hdr->saved_satp = csr_read(CSR_SATP);
> > +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_save);
> 
> EXPORT_SYMBOL_GPL(), no? Same below.
you are right. will fix it
> 
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +	int ret = 0;
> > +
> > +	arch_hdr_invariants(&invariants);
> > +
> > +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > +		pr_crit("Hibernate image not generated by this kernel!\n");
> 
> Out of curiosity more than anything else, why pr_crit()? Copy-paste from
> arm64?
The idea is from arm, and ok with the log level. 
> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > +	if (sleep_cpu < 0) {
> > +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > +		sleep_cpu = -EINVAL;
> > +		return -EINVAL;
> > +	}
> > +
> > +#ifdef CONFIG_SMP
> > +	ret = bringup_hibernate_cpu(sleep_cpu);
> > +	if (ret) {
> > +		sleep_cpu = -EINVAL;
> > +		return ret;
> > +	}
> > +#endif
> > +	resume_hdr = *hdr;
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > +	int ret = 0;
> > +
> > +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > +		sleep_cpu = smp_processor_id();
> > +		suspend_save_csrs(hibernate_cpu_context);
> > +		ret = swsusp_save();
> > +	} else {
> > +		suspend_restore_csrs(hibernate_cpu_context);
> > +		flush_tlb_all();
> > +
> > +		/* Invalidated Icache */
> 
> Think this comment can go, no?
sure. will remove it.
> 
> > +		flush_icache_all();
> > +
> > +		/*
> > +		 * Tell the hibernation core that we've just restored
> > +		 * the memory
> 
> I noticed arm64 manipulates the crash kernel in this function too.
> How come we don't?
kexec_tool doesn't support RISCV yet. We can enable it once it is supported by kexec_tool.
> 
> > +		 */
> > +		in_suspend = 0;
> > +		sleep_cpu = -EINVAL;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> 
> The page table stuff here is beyond me... Hopefully Alex can take a look!
> 
> I noticed arm64's one of these is not gated, what is different about
> RISC-V that requires it to be?
Do you mean the CONFIG_PM_SLEEP_SMP? It causes compile error if build with SMP disabled. Not sure about arm as I didn't try it.
> 
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > +	if (sleep_cpu < 0) {
> > +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> > +
> > +	if (WARN_ON(!hibernate_cpu_context))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> 
> Thanks,
> Conor.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-02  2:43       ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-02  2:43 UTC (permalink / raw)
  To: Conor Dooley, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Conor Dooley <conor@kernel.org>
> Sent: Tuesday, 31 January, 2023 7:31 AM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; Alexandre Ghiti <alexghiti@rivosinc.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> +CC Alex
> 
> Alex, could you take a look at the page table bits here when you get a
> chance please?
> 
> On Fri, Jan 27, 2023 at 05:10:51PM +0800, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >  arch/riscv/Kconfig                 |   7 +
> >  arch/riscv/include/asm/assembler.h |  20 ++
> >  arch/riscv/include/asm/suspend.h   |  21 ++
> >  arch/riscv/kernel/Makefile         |   1 +
> >  arch/riscv/kernel/asm-offsets.c    |   5 +
> >  arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >  arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >  7 files changed, 503 insertions(+)
> >  create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >  create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> >  source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > +	def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > +	def_bool y
> > +	depends on HIBERNATION
> > +
> >  endmenu # "Power management options"
> >
> >  menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index ef1283d04b70..3de70d3e6ceb 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> >  		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >  	.endm
> >
> > +/**
> > + * copy_page - copy 1 page (4KB) of data from source to destination
> 
> arch/riscv/include/asm/assembler.h:64: warning: Incorrect use of kernel-doc format:  * copy_page - copy 1 page (4KB) of data from
> source to destination
will replace the /** with /* doc format

> 
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +	.macro	copy_page a0, a1
> > +		lui	a2, 0x1
> > +		add	a2, a2, a0
> > +.1 :
> > +		REG_L	t0, 0(a1)
> > +		REG_L	t1, SZREG(a1)
> > +
> > +		REG_S	t0, 0(a0)
> > +		REG_S	t1, SZREG(a0)
> > +
> > +		addi	a0, a0, 2 * SZREG
> > +		addi	a1, a1, 2 * SZREG
> > +		bne	a2, a0, .1
> 
> allmodconfig, clang 15.0.4:
> 
> <instantiation>:3:1: error: unexpected token at start of statement
> .1 :
> ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> <instantiation>:12:15: error: unknown operand
>   bne a2, a0, .1
>               ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
> 
> > +	.endm
> > +
> >  #endif	/* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..db40ae433aa9 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,12 @@ struct suspend_context {
> >  #endif
> >  };
> >
> > +/*
> > + * This parameter will be assigned to 0 during resume and will be used by
> > + * hibernation core for the subsequent resume sequence
> 
> This isn't a parameter! I'm not sure that the comment really adds
> anything to be honest, but "Used by the hibernation core and cleared
> during the resume sequence" probably gets the point across equally well.
Sure, will update the comment
> 
> > + */
> > +extern int in_suspend;
> > +
> >  /* Low-level CPU suspend entry function */
> >  int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >  /* Used to save and restore the csr */
> >  void suspend_save_csrs(struct suspend_context *context);
> >  void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +/* Used to restore the hibernated image */
> 
> I think this comment is kinda stating the obvious, no?
> 
> > +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > +				unsigned long cpu_resume);
> > +asmlinkage int core_restore_code(void);
> 
> How about dropping the comment and prepending hiberate_ to this function
> names?
Good idea, will remove the comment and add hibernate_ prefix
> 
> >  #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >  obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >
> >  obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >
> >  obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >  obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/kbuild.h>
> >  #include <linux/mm.h>
> >  #include <linux/sched.h>
> > +#include <linux/suspend.h>
> >  #include <asm/kvm_host.h>
> >  #include <asm/thread_info.h>
> >  #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> >  	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> >  	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >  	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >  	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..a83d534b89bd
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,89 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * This code is executed when resume from the hibernation.
> > + *
> > + * It begins with loading the temporary page table then restores the memory image.
> > + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> > + * swsusp_arch_suspend().
> > + */
> 
> This file looks to be confusingly ordered. You lead with a comment
> describing a sequence but the file doesn't follow it.
> I suggest removing this comment.
Sure, will remove it
> 
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restore the CPU
> 
> nit: s/restore/restoring
> 
> > + * context.
> > + *
> > + * Always returns 0 to the C code.
> 
> s/to the C code//
Ok

> 
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > +	/* switch to hibernated image's page table */
> > +	csrw CSR_SATP, s0
> > +	sfence.vma
> > +
> > +	REG_L	a0, hibernate_cpu_context
> > +
> > +	/* Restore CSRs */
> 
> Stating the obvious again here, no?
Will remove it. thanks
>
> > +	restore_csr
> > +
> > +	/* Restore registers (except A0 and T0-T6) */
> 
> Do we need to mention the (except A0 & T0-T6) here and elsewhere?
> If they're lost across calls anyway, is it worth mentioning that they're
> lost across hibernation?
Can remove the comment, but it is worth to mention it in the macro description
> 
> > +	restore_reg
> > +
> > +	/* Return zero value */
> > +	add	a0, zero, zero
> > +
> > +	/* Return to C code */
> 
> I'd drop this comment. I don't think the presumed caller of the function
> needs to be mentioned here.
Sure. will remove it.
> 
> > +	ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables
> > + * a1: satp of temporary page tables
> > + * a2: cpu_resume
> > + */
> > +ENTRY(restore_image)
> > +	mv	s0, a0
> > +	mv	s1, a1
> > +	mv	s2, a2
> > +	REG_L	s4, restore_pblist
> > +	REG_L	a1, relocated_restore_code
> > +
> > +	jalr	a1
> > +END(restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then start to copy the pages
> 
> nit: s/start/starts/
ok
> 
> > + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> 
> nit: s/jumps to the/jumps to/
ok
> 
> > + * to restore the CPU context.
> > + */
> > +ENTRY(core_restore_code)
> > +	/* switch to temp page table */
> > +	csrw satp, s1
> > +	sfence.vma
> > +.Lcopy:
> > +	/* The below code will restore the hibernated image. */
> 
> I think this should be moved to the top of the pre-function comment.
The idea is that this comment make it easier for readers to understand.
> 
> > +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> > +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> > +
> > +	copy_page a0, a1
> > +
> > +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> > +	bnez	s4, .Lcopy
> > +
> > +	jalr	s2
> > +END(core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..bf7f3c781820
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,360 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support specific for RISCV
> 
> Well, it'd be odd if it was for another arch but sitting in arch/riscv!
> ;)
Sure.
> 
> Thanks for your patches though, it'll be great to have hibernation
> support going.
> 
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* CPU context to be saved */
> > +struct suspend_context *hibernate_cpu_context;
> > +
> > +unsigned long relocated_restore_code;
> > +
> > +/* Pointer to the temporary resume page table */
> > +pgd_t *resume_pg_dir;
> 
> sparse doesn't like what you've done here:
> /stuff/linux/arch/riscv/kernel/hibernate.c:31:24: warning: symbol 'hibernate_cpu_context' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:33:15: warning: symbol 'relocated_restore_code' was not declared. Should it be static?
> /stuff/linux/arch/riscv/kernel/hibernate.c:36:7: warning: symbol 'resume_pg_dir' was not declared. Should it be static?
Thanks. will improve it.
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> > + * @uts_version: to save the build number and date so that the we are not resume with
> 
> nit: "so that we do not resume"
sure. thanks
> 
> > + *		a different kernel
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > +	char		uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> > + * @invariants: container to store kernel build version
> > + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> 
> nit: s/executing/executes
sure. thanks
> 
> Also, my OCD is triggered by the inconsistent full stops at EOL.
oops. will fix it. thanks
> 
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	unsigned long	hartid;
> > +	unsigned long	saved_satp;
> > +	unsigned long	restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > +	memset(i, 0, sizeof(*i));
> > +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > +	WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +
> > +	if (max_size < sizeof(*hdr))
> > +		return -EOVERFLOW;
> > +
> > +	arch_hdr_invariants(&hdr->invariants);
> > +
> > +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > +	hdr->saved_satp = csr_read(CSR_SATP);
> > +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_save);
> 
> EXPORT_SYMBOL_GPL(), no? Same below.
you are right. will fix it
> 
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +	int ret = 0;
> > +
> > +	arch_hdr_invariants(&invariants);
> > +
> > +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > +		pr_crit("Hibernate image not generated by this kernel!\n");
> 
> Out of curiosity more than anything else, why pr_crit()? Copy-paste from
> arm64?
The idea is from arm, and ok with the log level. 
> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > +	if (sleep_cpu < 0) {
> > +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > +		sleep_cpu = -EINVAL;
> > +		return -EINVAL;
> > +	}
> > +
> > +#ifdef CONFIG_SMP
> > +	ret = bringup_hibernate_cpu(sleep_cpu);
> > +	if (ret) {
> > +		sleep_cpu = -EINVAL;
> > +		return ret;
> > +	}
> > +#endif
> > +	resume_hdr = *hdr;
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > +	int ret = 0;
> > +
> > +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > +		sleep_cpu = smp_processor_id();
> > +		suspend_save_csrs(hibernate_cpu_context);
> > +		ret = swsusp_save();
> > +	} else {
> > +		suspend_restore_csrs(hibernate_cpu_context);
> > +		flush_tlb_all();
> > +
> > +		/* Invalidated Icache */
> 
> Think this comment can go, no?
sure. will remove it.
> 
> > +		flush_icache_all();
> > +
> > +		/*
> > +		 * Tell the hibernation core that we've just restored
> > +		 * the memory
> 
> I noticed arm64 manipulates the crash kernel in this function too.
> How come we don't?
kexec_tool doesn't support RISCV yet. We can enable it once it is supported by kexec_tool.
> 
> > +		 */
> > +		in_suspend = 0;
> > +		sleep_cpu = -EINVAL;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> 
> The page table stuff here is beyond me... Hopefully Alex can take a look!
> 
> I noticed arm64's one of these is not gated, what is different about
> RISC-V that requires it to be?
Do you mean the CONFIG_PM_SLEEP_SMP? It causes compile error if build with SMP disabled. Not sure about arm as I didn't try it.
> 
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > +	if (sleep_cpu < 0) {
> > +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> > +
> > +	if (WARN_ON(!hibernate_cpu_context))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> 
> Thanks,
> Conor.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-30 23:30     ` Conor Dooley
@ 2023-02-03  3:43       ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-03  3:43 UTC (permalink / raw)
  To: Conor Dooley, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Conor Dooley <conor@kernel.org>
> Sent: Tuesday, 31 January, 2023 7:31 AM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; Alexandre Ghiti <alexghiti@rivosinc.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> +CC Alex
> 
> Alex, could you take a look at the page table bits here when you get a
> chance please?
> 
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +	.macro	copy_page a0, a1
> > +		lui	a2, 0x1
> > +		add	a2, a2, a0
> > +.1 :
> > +		REG_L	t0, 0(a1)
> > +		REG_L	t1, SZREG(a1)
> > +
> > +		REG_S	t0, 0(a0)
> > +		REG_S	t1, SZREG(a0)
> > +
> > +		addi	a0, a0, 2 * SZREG
> > +		addi	a1, a1, 2 * SZREG
> > +		bne	a2, a0, .1
> 
> allmodconfig, clang 15.0.4:
> 
> <instantiation>:3:1: error: unexpected token at start of statement
> .1 :
> ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> <instantiation>:12:15: error: unknown operand
>   bne a2, a0, .1
>               ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
Hi Conor, I couldn't reproduce the above error, could you share the build command please?
> 
> > +	.endm
> > +
> >  #endif	/* __ASM_ASSEMBLER_H */
 
> Thanks,
> Conor.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-03  3:43       ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-03  3:43 UTC (permalink / raw)
  To: Conor Dooley, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Conor Dooley <conor@kernel.org>
> Sent: Tuesday, 31 January, 2023 7:31 AM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; Alexandre Ghiti <alexghiti@rivosinc.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> +CC Alex
> 
> Alex, could you take a look at the page table bits here when you get a
> chance please?
> 
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +	.macro	copy_page a0, a1
> > +		lui	a2, 0x1
> > +		add	a2, a2, a0
> > +.1 :
> > +		REG_L	t0, 0(a1)
> > +		REG_L	t1, SZREG(a1)
> > +
> > +		REG_S	t0, 0(a0)
> > +		REG_S	t1, SZREG(a0)
> > +
> > +		addi	a0, a0, 2 * SZREG
> > +		addi	a1, a1, 2 * SZREG
> > +		bne	a2, a0, .1
> 
> allmodconfig, clang 15.0.4:
> 
> <instantiation>:3:1: error: unexpected token at start of statement
> .1 :
> ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> <instantiation>:12:15: error: unknown operand
>   bne a2, a0, .1
>               ^
> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>  copy_page a0, a1
>  ^
> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
Hi Conor, I couldn't reproduce the above error, could you share the build command please?
> 
> > +	.endm
> > +
> >  #endif	/* __ASM_ASSEMBLER_H */
 
> Thanks,
> Conor.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-03  3:43       ` JeeHeng Sia
@ 2023-02-03  6:30         ` Conor Dooley
  -1 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-02-03  6:30 UTC (permalink / raw)
  To: JeeHeng Sia, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



On 3 February 2023 03:43:35 GMT, JeeHeng Sia <jeeheng.sia@starfivetech.com> wrote:
>
>
>> -----Original Message-----
>> From: Conor Dooley <conor@kernel.org>
>> Sent: Tuesday, 31 January, 2023 7:31 AM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; Alexandre Ghiti <alexghiti@rivosinc.com>
>> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
>> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>> 
>> +CC Alex
>> 
>> Alex, could you take a look at the page table bits here when you get a
>> chance please?
>> 
>> > + * @a0 - destination
>> > + * @a1 - source
>> > + */
>> > +	.macro	copy_page a0, a1
>> > +		lui	a2, 0x1
>> > +		add	a2, a2, a0
>> > +.1 :
>> > +		REG_L	t0, 0(a1)
>> > +		REG_L	t1, SZREG(a1)
>> > +
>> > +		REG_S	t0, 0(a0)
>> > +		REG_S	t1, SZREG(a0)
>> > +
>> > +		addi	a0, a0, 2 * SZREG
>> > +		addi	a1, a1, 2 * SZREG
>> > +		bne	a2, a0, .1
>> 
>> allmodconfig, clang 15.0.4:
>> 
>> <instantiation>:3:1: error: unexpected token at start of statement
>> .1 :
>> ^
>> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>>  copy_page a0, a1
>>  ^
>> <instantiation>:12:15: error: unknown operand
>>   bne a2, a0, .1
>>               ^
>> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>>  copy_page a0, a1
>>  ^
>> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
>Hi Conor, I couldn't reproduce the above error, could you share the build command please?

It was just allmodconfig with LLVM=1

>> 
>> > +	.endm
>> > +
>> >  #endif	/* __ASM_ASSEMBLER_H */
> 
>> Thanks,
>> Conor.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-03  6:30         ` Conor Dooley
  0 siblings, 0 replies; 52+ messages in thread
From: Conor Dooley @ 2023-02-03  6:30 UTC (permalink / raw)
  To: JeeHeng Sia, Alexandre Ghiti
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



On 3 February 2023 03:43:35 GMT, JeeHeng Sia <jeeheng.sia@starfivetech.com> wrote:
>
>
>> -----Original Message-----
>> From: Conor Dooley <conor@kernel.org>
>> Sent: Tuesday, 31 January, 2023 7:31 AM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; Alexandre Ghiti <alexghiti@rivosinc.com>
>> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
>> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>> 
>> +CC Alex
>> 
>> Alex, could you take a look at the page table bits here when you get a
>> chance please?
>> 
>> > + * @a0 - destination
>> > + * @a1 - source
>> > + */
>> > +	.macro	copy_page a0, a1
>> > +		lui	a2, 0x1
>> > +		add	a2, a2, a0
>> > +.1 :
>> > +		REG_L	t0, 0(a1)
>> > +		REG_L	t1, SZREG(a1)
>> > +
>> > +		REG_S	t0, 0(a0)
>> > +		REG_S	t1, SZREG(a0)
>> > +
>> > +		addi	a0, a0, 2 * SZREG
>> > +		addi	a1, a1, 2 * SZREG
>> > +		bne	a2, a0, .1
>> 
>> allmodconfig, clang 15.0.4:
>> 
>> <instantiation>:3:1: error: unexpected token at start of statement
>> .1 :
>> ^
>> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>>  copy_page a0, a1
>>  ^
>> <instantiation>:12:15: error: unknown operand
>>   bne a2, a0, .1
>>               ^
>> /stuff/linux/arch/riscv/kernel/hibernate-asm.S:83:2: note: while in macro instantiation
>>  copy_page a0, a1
>>  ^
>> make[5]: *** [/stuff/linux/scripts/Makefile.build:384: arch/riscv/kernel/hibernate-asm.o] Error 1
>Hi Conor, I couldn't reproduce the above error, could you share the build command please?

It was just allmodconfig with LLVM=1

>> 
>> > +	.endm
>> > +
>> >  #endif	/* __ASM_ASSEMBLER_H */
> 
>> Thanks,
>> Conor.
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-27  9:10   ` Sia Jee Heng
@ 2023-02-04 20:42     ` kernel test robot
  -1 siblings, 0 replies; 52+ messages in thread
From: kernel test robot @ 2023-02-04 20:42 UTC (permalink / raw)
  To: Sia Jee Heng, paul.walmsley, palmer, aou
  Cc: oe-kbuild-all, linux-riscv, linux-kernel, jeeheng.sia,
	leyfoon.tan, mason.huo

Hi Sia,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on 7c46948a6e9cf47ed03b0d489fde894ad46f1437]

url:    https://github.com/intel-lab-lkp/linux/commits/Sia-Jee-Heng/RISC-V-Change-suspend_save_csrs-and-suspend_restore_csrs-to-public-function/20230128-114249
base:   7c46948a6e9cf47ed03b0d489fde894ad46f1437
patch link:    https://lore.kernel.org/r/20230127091051.1465278-5-jeeheng.sia%40starfivetech.com
patch subject: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
config: riscv-randconfig-r001-20230205 (https://download.01.org/0day-ci/archive/20230205/202302050450.VM99IQpW-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/642c1b119b3d33fe0ee22ff6823085cb847cfe79
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Sia-Jee-Heng/RISC-V-Change-suspend_save_csrs-and-suspend_restore_csrs-to-public-function/20230128-114249
        git checkout 642c1b119b3d33fe0ee22ff6823085cb847cfe79
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv64-linux-ld: arch/riscv/kernel/hibernate.o: in function `arch_hibernation_header_restore':
>> arch/riscv/kernel/hibernate.c:133: undefined reference to `__cpu_suspend_enter'
>> riscv64-linux-ld: arch/riscv/kernel/hibernate.c:137: undefined reference to `suspend_save_csrs'
>> riscv64-linux-ld: arch/riscv/kernel/hibernate.c:137: undefined reference to `suspend_restore_csrs'
   pahole: .tmp_vmlinux.btf: No such file or directory
   .btf.vmlinux.bin.o: file not recognized: file format not recognized


vim +133 arch/riscv/kernel/hibernate.c

   106	
   107	/*
   108	 * Retrieve the helper parameters from the hibernation image header
   109	 */
   110	int arch_hibernation_header_restore(void *addr)
   111	{
   112		struct arch_hibernate_hdr_invariants invariants;
   113		struct arch_hibernate_hdr *hdr = addr;
   114		int ret = 0;
   115	
   116		arch_hdr_invariants(&invariants);
   117	
   118		if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
   119			pr_crit("Hibernate image not generated by this kernel!\n");
   120			return -EINVAL;
   121		}
   122	
   123		sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
   124		if (sleep_cpu < 0) {
   125			pr_crit("Hibernated on a CPU not known to this kernel!\n");
   126			sleep_cpu = -EINVAL;
   127			return -EINVAL;
   128		}
   129	
   130	#ifdef CONFIG_SMP
   131		ret = bringup_hibernate_cpu(sleep_cpu);
   132		if (ret) {
 > 133			sleep_cpu = -EINVAL;
   134			return ret;
   135		}
   136	#endif
 > 137		resume_hdr = *hdr;
   138	
   139		return ret;
   140	}
   141	EXPORT_SYMBOL(arch_hibernation_header_restore);
   142	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-04 20:42     ` kernel test robot
  0 siblings, 0 replies; 52+ messages in thread
From: kernel test robot @ 2023-02-04 20:42 UTC (permalink / raw)
  To: Sia Jee Heng, paul.walmsley, palmer, aou
  Cc: oe-kbuild-all, linux-riscv, linux-kernel, jeeheng.sia,
	leyfoon.tan, mason.huo

Hi Sia,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on 7c46948a6e9cf47ed03b0d489fde894ad46f1437]

url:    https://github.com/intel-lab-lkp/linux/commits/Sia-Jee-Heng/RISC-V-Change-suspend_save_csrs-and-suspend_restore_csrs-to-public-function/20230128-114249
base:   7c46948a6e9cf47ed03b0d489fde894ad46f1437
patch link:    https://lore.kernel.org/r/20230127091051.1465278-5-jeeheng.sia%40starfivetech.com
patch subject: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
config: riscv-randconfig-r001-20230205 (https://download.01.org/0day-ci/archive/20230205/202302050450.VM99IQpW-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/642c1b119b3d33fe0ee22ff6823085cb847cfe79
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Sia-Jee-Heng/RISC-V-Change-suspend_save_csrs-and-suspend_restore_csrs-to-public-function/20230128-114249
        git checkout 642c1b119b3d33fe0ee22ff6823085cb847cfe79
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv64-linux-ld: arch/riscv/kernel/hibernate.o: in function `arch_hibernation_header_restore':
>> arch/riscv/kernel/hibernate.c:133: undefined reference to `__cpu_suspend_enter'
>> riscv64-linux-ld: arch/riscv/kernel/hibernate.c:137: undefined reference to `suspend_save_csrs'
>> riscv64-linux-ld: arch/riscv/kernel/hibernate.c:137: undefined reference to `suspend_restore_csrs'
   pahole: .tmp_vmlinux.btf: No such file or directory
   .btf.vmlinux.bin.o: file not recognized: file format not recognized


vim +133 arch/riscv/kernel/hibernate.c

   106	
   107	/*
   108	 * Retrieve the helper parameters from the hibernation image header
   109	 */
   110	int arch_hibernation_header_restore(void *addr)
   111	{
   112		struct arch_hibernate_hdr_invariants invariants;
   113		struct arch_hibernate_hdr *hdr = addr;
   114		int ret = 0;
   115	
   116		arch_hdr_invariants(&invariants);
   117	
   118		if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
   119			pr_crit("Hibernate image not generated by this kernel!\n");
   120			return -EINVAL;
   121		}
   122	
   123		sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
   124		if (sleep_cpu < 0) {
   125			pr_crit("Hibernated on a CPU not known to this kernel!\n");
   126			sleep_cpu = -EINVAL;
   127			return -EINVAL;
   128		}
   129	
   130	#ifdef CONFIG_SMP
   131		ret = bringup_hibernate_cpu(sleep_cpu);
   132		if (ret) {
 > 133			sleep_cpu = -EINVAL;
   134			return ret;
   135		}
   136	#endif
 > 137		resume_hdr = *hdr;
   138	
   139		return ret;
   140	}
   141	EXPORT_SYMBOL(arch_hibernation_header_restore);
   142	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-31  9:59       ` Alexandre Ghiti
@ 2023-02-07  4:58         ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-07  4:58 UTC (permalink / raw)
  To: Alexandre Ghiti, Conor Dooley
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Alexandre Ghiti <alexghiti@rivosinc.com>
> Sent: Tuesday, 31 January, 2023 6:00 PM
> To: Conor Dooley <conor@kernel.org>
> Cc: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-
> riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> Hi,
> 
> On Tue, Jan 31, 2023 at 12:30 AM Conor Dooley <conor@kernel.org> wrote:
> >
> > +CC Alex
> >
> > Alex, could you take a look at the page table bits here when you get a
> > chance please?
> 
> Yes, I'll do that soon.
Hi Alex, do you have any comment? I shall send out the v4 soon and you can provide comment to the v4 series?
> 
> Thanks,
> 
> Alex
> 
> >

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-07  4:58         ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-07  4:58 UTC (permalink / raw)
  To: Alexandre Ghiti, Conor Dooley
  Cc: paul.walmsley, palmer, aou, linux-riscv, linux-kernel,
	Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Alexandre Ghiti <alexghiti@rivosinc.com>
> Sent: Tuesday, 31 January, 2023 6:00 PM
> To: Conor Dooley <conor@kernel.org>
> Cc: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-
> riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> Hi,
> 
> On Tue, Jan 31, 2023 at 12:30 AM Conor Dooley <conor@kernel.org> wrote:
> >
> > +CC Alex
> >
> > Alex, could you take a look at the page table bits here when you get a
> > chance please?
> 
> Yes, I'll do that soon.
Hi Alex, do you have any comment? I shall send out the v4 soon and you can provide comment to the v4 series?
> 
> Thanks,
> 
> Alex
> 
> >
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-07  4:58         ` JeeHeng Sia
@ 2023-02-07  5:27           ` Alexandre Ghiti
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-07  5:27 UTC (permalink / raw)
  To: JeeHeng Sia
  Cc: Conor Dooley, paul.walmsley, palmer, aou, linux-riscv,
	linux-kernel, Leyfoon Tan, Mason Huo

On Tue, Feb 7, 2023 at 5:58 AM JeeHeng Sia <jeeheng.sia@starfivetech.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Alexandre Ghiti <alexghiti@rivosinc.com>
> > Sent: Tuesday, 31 January, 2023 6:00 PM
> > To: Conor Dooley <conor@kernel.org>
> > Cc: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-
> > riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> > <mason.huo@starfivetech.com>
> > Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > Hi,
> >
> > On Tue, Jan 31, 2023 at 12:30 AM Conor Dooley <conor@kernel.org> wrote:
> > >
> > > +CC Alex
> > >
> > > Alex, could you take a look at the page table bits here when you get a
> > > chance please?
> >
> > Yes, I'll do that soon.
> Hi Alex, do you have any comment? I shall send out the v4 soon and you can provide comment to the v4 series?

I'll do that today!

> >
> > Thanks,
> >
> > Alex
> >
> > >

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-07  5:27           ` Alexandre Ghiti
  0 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-07  5:27 UTC (permalink / raw)
  To: JeeHeng Sia
  Cc: Conor Dooley, paul.walmsley, palmer, aou, linux-riscv,
	linux-kernel, Leyfoon Tan, Mason Huo

On Tue, Feb 7, 2023 at 5:58 AM JeeHeng Sia <jeeheng.sia@starfivetech.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Alexandre Ghiti <alexghiti@rivosinc.com>
> > Sent: Tuesday, 31 January, 2023 6:00 PM
> > To: Conor Dooley <conor@kernel.org>
> > Cc: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-
> > riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> > <mason.huo@starfivetech.com>
> > Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > Hi,
> >
> > On Tue, Jan 31, 2023 at 12:30 AM Conor Dooley <conor@kernel.org> wrote:
> > >
> > > +CC Alex
> > >
> > > Alex, could you take a look at the page table bits here when you get a
> > > chance please?
> >
> > Yes, I'll do that soon.
> Hi Alex, do you have any comment? I shall send out the v4 soon and you can provide comment to the v4 series?

I'll do that today!

> >
> > Thanks,
> >
> > Alex
> >
> > >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-01-27  9:10   ` Sia Jee Heng
@ 2023-02-07 15:46     ` Alexandre Ghiti
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-07 15:46 UTC (permalink / raw)
  To: Sia Jee Heng, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, leyfoon.tan, mason.huo

Hi Sia,

On 1/27/23 10:10, Sia Jee Heng wrote:
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
>
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
>
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start
> to restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
>
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> ---
>   arch/riscv/Kconfig                 |   7 +
>   arch/riscv/include/asm/assembler.h |  20 ++
>   arch/riscv/include/asm/suspend.h   |  21 ++
>   arch/riscv/kernel/Makefile         |   1 +
>   arch/riscv/kernel/asm-offsets.c    |   5 +
>   arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>   arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>   7 files changed, 503 insertions(+)
>   create mode 100644 arch/riscv/kernel/hibernate-asm.S
>   create mode 100644 arch/riscv/kernel/hibernate.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..4555848a817f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,6 +690,13 @@ menu "Power management options"
>   
>   source "kernel/power/Kconfig"
>   
> +config ARCH_HIBERNATION_POSSIBLE
> +	def_bool y
> +
> +config ARCH_HIBERNATION_HEADER
> +	def_bool y
> +	depends on HIBERNATION
> +
>   endmenu # "Power management options"
>   
>   menu "CPU Power Management"
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> index ef1283d04b70..3de70d3e6ceb 100644
> --- a/arch/riscv/include/asm/assembler.h
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -59,4 +59,24 @@
>   		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>   	.endm
>   
> +/**
> + * copy_page - copy 1 page (4KB) of data from source to destination
> + * @a0 - destination
> + * @a1 - source
> + */
> +	.macro	copy_page a0, a1
> +		lui	a2, 0x1
> +		add	a2, a2, a0
> +.1 :
> +		REG_L	t0, 0(a1)
> +		REG_L	t1, SZREG(a1)
> +
> +		REG_S	t0, 0(a0)
> +		REG_S	t1, SZREG(a0)
> +
> +		addi	a0, a0, 2 * SZREG
> +		addi	a1, a1, 2 * SZREG
> +		bne	a2, a0, .1
> +	.endm
> +
>   #endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 75419c5ca272..db40ae433aa9 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -21,6 +21,12 @@ struct suspend_context {
>   #endif
>   };
>   
> +/*
> + * This parameter will be assigned to 0 during resume and will be used by
> + * hibernation core for the subsequent resume sequence
> + */
> +extern int in_suspend;
> +
>   /* Low-level CPU suspend entry function */
>   int __cpu_suspend_enter(struct suspend_context *context);
>   
> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>   /* Used to save and restore the csr */
>   void suspend_save_csrs(struct suspend_context *context);
>   void suspend_restore_csrs(struct suspend_context *context);
> +
> +/* Low-level API to support hibernation */
> +int swsusp_arch_suspend(void);
> +int swsusp_arch_resume(void);
> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> +int arch_hibernation_header_restore(void *addr);
> +int __hibernate_cpu_resume(void);
> +
> +/* Used to resume on the CPU we hibernated on */
> +int hibernate_resume_nonboot_cpu_disable(void);
> +
> +/* Used to restore the hibernated image */
> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> +				unsigned long cpu_resume);
> +asmlinkage int core_restore_code(void);
>   #endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 4cf303a779ab..daab341d55e4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>   obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>   
>   obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>   
>   obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>   obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..d6a75aac1d27 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -9,6 +9,7 @@
>   #include <linux/kbuild.h>
>   #include <linux/mm.h>
>   #include <linux/sched.h>
> +#include <linux/suspend.h>
>   #include <asm/kvm_host.h>
>   #include <asm/thread_info.h>
>   #include <asm/ptrace.h>
> @@ -116,6 +117,10 @@ void asm_offsets(void)
>   
>   	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>   
> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> +
>   	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>   	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>   	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> new file mode 100644
> index 000000000000..a83d534b89bd
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate-asm.S
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Hibernation support specific for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/csr.h>
> +
> +#include <linux/linkage.h>
> +
> +/*
> + * This code is executed when resume from the hibernation.
> + *
> + * It begins with loading the temporary page table then restores the memory image.
> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> + * swsusp_arch_suspend().
> + */
> +
> +/*
> + * int __hibernate_cpu_resume(void)
> + * Switch back to the hibernated image's page table prior to restore the CPU
> + * context.
> + *
> + * Always returns 0 to the C code.
> + */
> +ENTRY(__hibernate_cpu_resume)
> +	/* switch to hibernated image's page table */
> +	csrw CSR_SATP, s0
> +	sfence.vma
> +
> +	REG_L	a0, hibernate_cpu_context
> +
> +	/* Restore CSRs */
> +	restore_csr
> +
> +	/* Restore registers (except A0 and T0-T6) */
> +	restore_reg
> +
> +	/* Return zero value */
> +	add	a0, zero, zero
> +
> +	/* Return to C code */
> +	ret
> +END(__hibernate_cpu_resume)
> +
> +/*
> + * Prepare to restore the image.
> + * a0: satp of saved page tables
> + * a1: satp of temporary page tables
> + * a2: cpu_resume
> + */
> +ENTRY(restore_image)
> +	mv	s0, a0
> +	mv	s1, a1
> +	mv	s2, a2
> +	REG_L	s4, restore_pblist
> +	REG_L	a1, relocated_restore_code
> +
> +	jalr	a1
> +END(restore_image)
> +
> +/*
> + * The below code will be executed from a 'safe' page.
> + * It first switches to the temporary page table, then start to copy the pages
> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> + * to restore the CPU context.
> + */
> +ENTRY(core_restore_code)
> +	/* switch to temp page table */
> +	csrw satp, s1
> +	sfence.vma
> +.Lcopy:
> +	/* The below code will restore the hibernated image. */
> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> +
> +	copy_page a0, a1
> +
> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> +	bnez	s4, .Lcopy
> +
> +	jalr	s2
> +END(core_restore_code)
> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> new file mode 100644
> index 000000000000..bf7f3c781820
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate.c
> @@ -0,0 +1,360 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hibernation support specific for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/barrier.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
> +#include <asm/set_memory.h>
> +#include <asm/smp.h>
> +#include <asm/suspend.h>
> +
> +#include <linux/cpu.h>
> +#include <linux/memblock.h>
> +#include <linux/pm.h>
> +#include <linux/sched.h>
> +#include <linux/suspend.h>
> +#include <linux/utsname.h>
> +
> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> +static int sleep_cpu = -EINVAL;
> +
> +/* CPU context to be saved */
> +struct suspend_context *hibernate_cpu_context;
> +
> +unsigned long relocated_restore_code;
> +
> +/* Pointer to the temporary resume page table */
> +pgd_t *resume_pg_dir;
> +
> +/**
> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> + * @uts_version: to save the build number and date so that the we are not resume with
> + *		a different kernel
> + */
> +struct arch_hibernate_hdr_invariants {
> +	char		uts_version[__NEW_UTS_LEN + 1];
> +};
> +
> +/**
> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> + * @invariants: container to store kernel build version
> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> + * @saved_satp: original page table used by the hibernated image.
> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> + */
> +static struct arch_hibernate_hdr {
> +	struct arch_hibernate_hdr_invariants invariants;
> +	unsigned long	hartid;
> +	unsigned long	saved_satp;
> +	unsigned long	restore_cpu_addr;
> +} resume_hdr;
> +
> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> +{
> +	memset(i, 0, sizeof(*i));
> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> +}
> +
> +/*
> + * Check if the given pfn is in the 'nosave' section.
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> +
> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> +}
> +
> +void notrace save_processor_state(void)
> +{
> +	WARN_ON(num_online_cpus() != 1);
> +}
> +
> +void notrace restore_processor_state(void)
> +{
> +}
> +
> +/*
> + * Helper parameters need to be saved to the hibernation image header.
> + */
> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> +{
> +	struct arch_hibernate_hdr *hdr = addr;
> +
> +	if (max_size < sizeof(*hdr))
> +		return -EOVERFLOW;
> +
> +	arch_hdr_invariants(&hdr->invariants);
> +
> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> +	hdr->saved_satp = csr_read(CSR_SATP);
> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_save);
> +
> +/*
> + * Retrieve the helper parameters from the hibernation image header
> + */
> +int arch_hibernation_header_restore(void *addr)
> +{
> +	struct arch_hibernate_hdr_invariants invariants;
> +	struct arch_hibernate_hdr *hdr = addr;
> +	int ret = 0;
> +
> +	arch_hdr_invariants(&invariants);
> +
> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> +		pr_crit("Hibernate image not generated by this kernel!\n");
> +		return -EINVAL;
> +	}
> +
> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> +	if (sleep_cpu < 0) {
> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> +		sleep_cpu = -EINVAL;
> +		return -EINVAL;
> +	}
> +
> +#ifdef CONFIG_SMP
> +	ret = bringup_hibernate_cpu(sleep_cpu);
> +	if (ret) {
> +		sleep_cpu = -EINVAL;
> +		return ret;
> +	}
> +#endif
> +	resume_hdr = *hdr;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> +
> +int swsusp_arch_suspend(void)
> +{
> +	int ret = 0;
> +
> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> +		sleep_cpu = smp_processor_id();
> +		suspend_save_csrs(hibernate_cpu_context);
> +		ret = swsusp_save();
> +	} else {
> +		suspend_restore_csrs(hibernate_cpu_context);
> +		flush_tlb_all();
> +
> +		/* Invalidated Icache */
> +		flush_icache_all();
> +
> +		/*
> +		 * Tell the hibernation core that we've just restored
> +		 * the memory
> +		 */
> +		in_suspend = 0;
> +		sleep_cpu = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pte_idx = pte_index(vaddr);
> +
> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> +
> +	return 0;
> +}
> +
> +#ifndef __PAGETABLE_PMD_FOLDED
> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> +		(pgtable_l5_enabled ?					\
> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> +		(pgtable_l4_enabled ?					\
> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> +
> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pmd_idx = pmd_index(vaddr);
> +	pte_t *ptep;
> +
> +	if (pmd_none(pmdp[pmd_idx])) {
> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> +		if (!ptep)
> +			return -ENOMEM;
> +
> +		memset(ptep, 0, PAGE_SIZE);
> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> +	} else {
> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> +	}
> +
> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> +}
> +
> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pud_index = pud_index(vaddr);
> +	pmd_t *pmdp;
> +
> +	if (pud_val(pudp[pud_index]) == 0) {
> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> +		if (!pmdp)
> +			return -ENOMEM;
> +
> +		memset(pmdp, 0, PAGE_SIZE);
> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> +	} else {
> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> +	}
> +
> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> +}
> +
> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t p4d_index = p4d_index(vaddr);
> +	pud_t *pudp;
> +
> +	if (p4d_val(p4dp[p4d_index]) == 0) {
> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> +		if (!pudp)
> +			return -ENOMEM;
> +
> +		memset(pudp, 0, PAGE_SIZE);
> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> +	} else {
> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> +	}
> +
> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> +}
> +
> +#else
> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> +#endif /* __PAGETABLE_PMD_FOLDED */
> +
> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pgd_idx = pgd_index(vaddr);
> +	void *nextp;
> +
> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> +		if (!nextp)
> +			return -ENOMEM;
> +
> +		memset(nextp, 0, PAGE_SIZE);
> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> +	} else {
> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> +	}
> +
> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> +}
> +
> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> +{
> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> +}
> +
> +static unsigned long relocate_restore_code(void)
> +{
> +	unsigned long ret;
> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> +
> +	if (!page)
> +		return -ENOMEM;
> +
> +	copy_page(page, core_restore_code);
> +
> +	/* Make the page containing the relocated code executable */
> +	set_memory_x((unsigned long)page, 1);
> +
> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> +	if (ret)
> +		return ret;
> +
> +	return (unsigned long)page;
> +}
> +
> +int swsusp_arch_resume(void)
> +{
> +	unsigned long addr = PAGE_OFFSET;
> +	unsigned long ret;
> +
> +	/*
> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> +	 * we don't need to free it here.
> +	 */
> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> +	if (!resume_pg_dir)
> +		return -ENOMEM;
> +
> +	/*
> +	 * The pages need to be writable when restoring the image.
> +	 * Create a second copy of page table just for the linear map, and use this when
> +	 * restoring.
> +	 */
> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> +		if (ret)
> +			return (int)ret;
> +	}
> +


To me this is wrong as this does not account for the real physical 
mapping layout: can't you simply copy the linear mapping from 
swapper_pg_dir?

But I have to admit that I struggle to understand the need for this 
temporary page table: all we need to do is to allow to write to the 
linear mapping, so why don't we simply set_memory_rw(linear mapping)?


> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> +	relocated_restore_code = relocate_restore_code();


And do we really need to do that too? The code in question can only be 
overwritten by the same code right?

Thanks,

Alex


> +	if (relocated_restore_code == -ENOMEM)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> +	 * restore code can jump to it after finished restore the image. The next execution
> +	 * code doesn't find itself in a different address space after switching over to the
> +	 * original page table used by the hibernated image.
> +	 */
> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> +					PAGE_KERNEL_READ_EXEC);
> +	if (ret)
> +		return ret;
> +
> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> +			resume_hdr.restore_cpu_addr);
> +
> +	return 0;
> +}
> +
> +#ifdef CONFIG_PM_SLEEP_SMP
> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> +	if (sleep_cpu < 0) {
> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> +		return -ENODEV;
> +	}
> +
> +	return freeze_secondary_cpus(sleep_cpu);
> +}
> +#endif
> +
> +static int __init riscv_hibernate_init(void)
> +{
> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> +
> +	if (WARN_ON(!hibernate_cpu_context))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +early_initcall(riscv_hibernate_init);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-07 15:46     ` Alexandre Ghiti
  0 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-07 15:46 UTC (permalink / raw)
  To: Sia Jee Heng, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, leyfoon.tan, mason.huo

Hi Sia,

On 1/27/23 10:10, Sia Jee Heng wrote:
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
>
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
>
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start
> to restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
>
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>
> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> ---
>   arch/riscv/Kconfig                 |   7 +
>   arch/riscv/include/asm/assembler.h |  20 ++
>   arch/riscv/include/asm/suspend.h   |  21 ++
>   arch/riscv/kernel/Makefile         |   1 +
>   arch/riscv/kernel/asm-offsets.c    |   5 +
>   arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>   arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>   7 files changed, 503 insertions(+)
>   create mode 100644 arch/riscv/kernel/hibernate-asm.S
>   create mode 100644 arch/riscv/kernel/hibernate.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..4555848a817f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,6 +690,13 @@ menu "Power management options"
>   
>   source "kernel/power/Kconfig"
>   
> +config ARCH_HIBERNATION_POSSIBLE
> +	def_bool y
> +
> +config ARCH_HIBERNATION_HEADER
> +	def_bool y
> +	depends on HIBERNATION
> +
>   endmenu # "Power management options"
>   
>   menu "CPU Power Management"
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> index ef1283d04b70..3de70d3e6ceb 100644
> --- a/arch/riscv/include/asm/assembler.h
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -59,4 +59,24 @@
>   		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>   	.endm
>   
> +/**
> + * copy_page - copy 1 page (4KB) of data from source to destination
> + * @a0 - destination
> + * @a1 - source
> + */
> +	.macro	copy_page a0, a1
> +		lui	a2, 0x1
> +		add	a2, a2, a0
> +.1 :
> +		REG_L	t0, 0(a1)
> +		REG_L	t1, SZREG(a1)
> +
> +		REG_S	t0, 0(a0)
> +		REG_S	t1, SZREG(a0)
> +
> +		addi	a0, a0, 2 * SZREG
> +		addi	a1, a1, 2 * SZREG
> +		bne	a2, a0, .1
> +	.endm
> +
>   #endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 75419c5ca272..db40ae433aa9 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -21,6 +21,12 @@ struct suspend_context {
>   #endif
>   };
>   
> +/*
> + * This parameter will be assigned to 0 during resume and will be used by
> + * hibernation core for the subsequent resume sequence
> + */
> +extern int in_suspend;
> +
>   /* Low-level CPU suspend entry function */
>   int __cpu_suspend_enter(struct suspend_context *context);
>   
> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>   /* Used to save and restore the csr */
>   void suspend_save_csrs(struct suspend_context *context);
>   void suspend_restore_csrs(struct suspend_context *context);
> +
> +/* Low-level API to support hibernation */
> +int swsusp_arch_suspend(void);
> +int swsusp_arch_resume(void);
> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> +int arch_hibernation_header_restore(void *addr);
> +int __hibernate_cpu_resume(void);
> +
> +/* Used to resume on the CPU we hibernated on */
> +int hibernate_resume_nonboot_cpu_disable(void);
> +
> +/* Used to restore the hibernated image */
> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> +				unsigned long cpu_resume);
> +asmlinkage int core_restore_code(void);
>   #endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 4cf303a779ab..daab341d55e4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>   obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>   
>   obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>   
>   obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>   obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..d6a75aac1d27 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -9,6 +9,7 @@
>   #include <linux/kbuild.h>
>   #include <linux/mm.h>
>   #include <linux/sched.h>
> +#include <linux/suspend.h>
>   #include <asm/kvm_host.h>
>   #include <asm/thread_info.h>
>   #include <asm/ptrace.h>
> @@ -116,6 +117,10 @@ void asm_offsets(void)
>   
>   	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>   
> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> +
>   	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>   	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>   	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> new file mode 100644
> index 000000000000..a83d534b89bd
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate-asm.S
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Hibernation support specific for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/csr.h>
> +
> +#include <linux/linkage.h>
> +
> +/*
> + * This code is executed when resume from the hibernation.
> + *
> + * It begins with loading the temporary page table then restores the memory image.
> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> + * swsusp_arch_suspend().
> + */
> +
> +/*
> + * int __hibernate_cpu_resume(void)
> + * Switch back to the hibernated image's page table prior to restore the CPU
> + * context.
> + *
> + * Always returns 0 to the C code.
> + */
> +ENTRY(__hibernate_cpu_resume)
> +	/* switch to hibernated image's page table */
> +	csrw CSR_SATP, s0
> +	sfence.vma
> +
> +	REG_L	a0, hibernate_cpu_context
> +
> +	/* Restore CSRs */
> +	restore_csr
> +
> +	/* Restore registers (except A0 and T0-T6) */
> +	restore_reg
> +
> +	/* Return zero value */
> +	add	a0, zero, zero
> +
> +	/* Return to C code */
> +	ret
> +END(__hibernate_cpu_resume)
> +
> +/*
> + * Prepare to restore the image.
> + * a0: satp of saved page tables
> + * a1: satp of temporary page tables
> + * a2: cpu_resume
> + */
> +ENTRY(restore_image)
> +	mv	s0, a0
> +	mv	s1, a1
> +	mv	s2, a2
> +	REG_L	s4, restore_pblist
> +	REG_L	a1, relocated_restore_code
> +
> +	jalr	a1
> +END(restore_image)
> +
> +/*
> + * The below code will be executed from a 'safe' page.
> + * It first switches to the temporary page table, then start to copy the pages
> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> + * to restore the CPU context.
> + */
> +ENTRY(core_restore_code)
> +	/* switch to temp page table */
> +	csrw satp, s1
> +	sfence.vma
> +.Lcopy:
> +	/* The below code will restore the hibernated image. */
> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> +
> +	copy_page a0, a1
> +
> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> +	bnez	s4, .Lcopy
> +
> +	jalr	s2
> +END(core_restore_code)
> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> new file mode 100644
> index 000000000000..bf7f3c781820
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate.c
> @@ -0,0 +1,360 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hibernation support specific for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> + */
> +
> +#include <asm/barrier.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
> +#include <asm/set_memory.h>
> +#include <asm/smp.h>
> +#include <asm/suspend.h>
> +
> +#include <linux/cpu.h>
> +#include <linux/memblock.h>
> +#include <linux/pm.h>
> +#include <linux/sched.h>
> +#include <linux/suspend.h>
> +#include <linux/utsname.h>
> +
> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> +static int sleep_cpu = -EINVAL;
> +
> +/* CPU context to be saved */
> +struct suspend_context *hibernate_cpu_context;
> +
> +unsigned long relocated_restore_code;
> +
> +/* Pointer to the temporary resume page table */
> +pgd_t *resume_pg_dir;
> +
> +/**
> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> + * @uts_version: to save the build number and date so that the we are not resume with
> + *		a different kernel
> + */
> +struct arch_hibernate_hdr_invariants {
> +	char		uts_version[__NEW_UTS_LEN + 1];
> +};
> +
> +/**
> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> + * @invariants: container to store kernel build version
> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> + * @saved_satp: original page table used by the hibernated image.
> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> + */
> +static struct arch_hibernate_hdr {
> +	struct arch_hibernate_hdr_invariants invariants;
> +	unsigned long	hartid;
> +	unsigned long	saved_satp;
> +	unsigned long	restore_cpu_addr;
> +} resume_hdr;
> +
> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> +{
> +	memset(i, 0, sizeof(*i));
> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> +}
> +
> +/*
> + * Check if the given pfn is in the 'nosave' section.
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> +
> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> +}
> +
> +void notrace save_processor_state(void)
> +{
> +	WARN_ON(num_online_cpus() != 1);
> +}
> +
> +void notrace restore_processor_state(void)
> +{
> +}
> +
> +/*
> + * Helper parameters need to be saved to the hibernation image header.
> + */
> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> +{
> +	struct arch_hibernate_hdr *hdr = addr;
> +
> +	if (max_size < sizeof(*hdr))
> +		return -EOVERFLOW;
> +
> +	arch_hdr_invariants(&hdr->invariants);
> +
> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> +	hdr->saved_satp = csr_read(CSR_SATP);
> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_save);
> +
> +/*
> + * Retrieve the helper parameters from the hibernation image header
> + */
> +int arch_hibernation_header_restore(void *addr)
> +{
> +	struct arch_hibernate_hdr_invariants invariants;
> +	struct arch_hibernate_hdr *hdr = addr;
> +	int ret = 0;
> +
> +	arch_hdr_invariants(&invariants);
> +
> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> +		pr_crit("Hibernate image not generated by this kernel!\n");
> +		return -EINVAL;
> +	}
> +
> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> +	if (sleep_cpu < 0) {
> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> +		sleep_cpu = -EINVAL;
> +		return -EINVAL;
> +	}
> +
> +#ifdef CONFIG_SMP
> +	ret = bringup_hibernate_cpu(sleep_cpu);
> +	if (ret) {
> +		sleep_cpu = -EINVAL;
> +		return ret;
> +	}
> +#endif
> +	resume_hdr = *hdr;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> +
> +int swsusp_arch_suspend(void)
> +{
> +	int ret = 0;
> +
> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> +		sleep_cpu = smp_processor_id();
> +		suspend_save_csrs(hibernate_cpu_context);
> +		ret = swsusp_save();
> +	} else {
> +		suspend_restore_csrs(hibernate_cpu_context);
> +		flush_tlb_all();
> +
> +		/* Invalidated Icache */
> +		flush_icache_all();
> +
> +		/*
> +		 * Tell the hibernation core that we've just restored
> +		 * the memory
> +		 */
> +		in_suspend = 0;
> +		sleep_cpu = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pte_idx = pte_index(vaddr);
> +
> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> +
> +	return 0;
> +}
> +
> +#ifndef __PAGETABLE_PMD_FOLDED
> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> +		(pgtable_l5_enabled ?					\
> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> +		(pgtable_l4_enabled ?					\
> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> +
> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pmd_idx = pmd_index(vaddr);
> +	pte_t *ptep;
> +
> +	if (pmd_none(pmdp[pmd_idx])) {
> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> +		if (!ptep)
> +			return -ENOMEM;
> +
> +		memset(ptep, 0, PAGE_SIZE);
> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> +	} else {
> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> +	}
> +
> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> +}
> +
> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pud_index = pud_index(vaddr);
> +	pmd_t *pmdp;
> +
> +	if (pud_val(pudp[pud_index]) == 0) {
> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> +		if (!pmdp)
> +			return -ENOMEM;
> +
> +		memset(pmdp, 0, PAGE_SIZE);
> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> +	} else {
> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> +	}
> +
> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> +}
> +
> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t p4d_index = p4d_index(vaddr);
> +	pud_t *pudp;
> +
> +	if (p4d_val(p4dp[p4d_index]) == 0) {
> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> +		if (!pudp)
> +			return -ENOMEM;
> +
> +		memset(pudp, 0, PAGE_SIZE);
> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> +	} else {
> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> +	}
> +
> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> +}
> +
> +#else
> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> +#endif /* __PAGETABLE_PMD_FOLDED */
> +
> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> +{
> +	uintptr_t pgd_idx = pgd_index(vaddr);
> +	void *nextp;
> +
> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> +		if (!nextp)
> +			return -ENOMEM;
> +
> +		memset(nextp, 0, PAGE_SIZE);
> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> +	} else {
> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> +	}
> +
> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> +}
> +
> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> +{
> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> +}
> +
> +static unsigned long relocate_restore_code(void)
> +{
> +	unsigned long ret;
> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> +
> +	if (!page)
> +		return -ENOMEM;
> +
> +	copy_page(page, core_restore_code);
> +
> +	/* Make the page containing the relocated code executable */
> +	set_memory_x((unsigned long)page, 1);
> +
> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> +	if (ret)
> +		return ret;
> +
> +	return (unsigned long)page;
> +}
> +
> +int swsusp_arch_resume(void)
> +{
> +	unsigned long addr = PAGE_OFFSET;
> +	unsigned long ret;
> +
> +	/*
> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> +	 * we don't need to free it here.
> +	 */
> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> +	if (!resume_pg_dir)
> +		return -ENOMEM;
> +
> +	/*
> +	 * The pages need to be writable when restoring the image.
> +	 * Create a second copy of page table just for the linear map, and use this when
> +	 * restoring.
> +	 */
> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> +		if (ret)
> +			return (int)ret;
> +	}
> +


To me this is wrong as this does not account for the real physical 
mapping layout: can't you simply copy the linear mapping from 
swapper_pg_dir?

But I have to admit that I struggle to understand the need for this 
temporary page table: all we need to do is to allow to write to the 
linear mapping, so why don't we simply set_memory_rw(linear mapping)?


> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> +	relocated_restore_code = relocate_restore_code();


And do we really need to do that too? The code in question can only be 
overwritten by the same code right?

Thanks,

Alex


> +	if (relocated_restore_code == -ENOMEM)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> +	 * restore code can jump to it after finished restore the image. The next execution
> +	 * code doesn't find itself in a different address space after switching over to the
> +	 * original page table used by the hibernated image.
> +	 */
> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> +					PAGE_KERNEL_READ_EXEC);
> +	if (ret)
> +		return ret;
> +
> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> +			resume_hdr.restore_cpu_addr);
> +
> +	return 0;
> +}
> +
> +#ifdef CONFIG_PM_SLEEP_SMP
> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> +	if (sleep_cpu < 0) {
> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> +		return -ENODEV;
> +	}
> +
> +	return freeze_secondary_cpus(sleep_cpu);
> +}
> +#endif
> +
> +static int __init riscv_hibernate_init(void)
> +{
> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> +
> +	if (WARN_ON(!hibernate_cpu_context))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +early_initcall(riscv_hibernate_init);

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-07 15:46     ` Alexandre Ghiti
@ 2023-02-08  4:43       ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-08  4:43 UTC (permalink / raw)
  To: Alexandre Ghiti, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Alexandre Ghiti <alex@ghiti.fr>
> Sent: Tuesday, 7 February, 2023 11:46 PM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> Hi Sia,
> 
> On 1/27/23 10:10, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >   arch/riscv/Kconfig                 |   7 +
> >   arch/riscv/include/asm/assembler.h |  20 ++
> >   arch/riscv/include/asm/suspend.h   |  21 ++
> >   arch/riscv/kernel/Makefile         |   1 +
> >   arch/riscv/kernel/asm-offsets.c    |   5 +
> >   arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >   arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >   7 files changed, 503 insertions(+)
> >   create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >   create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> >   source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > +	def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > +	def_bool y
> > +	depends on HIBERNATION
> > +
> >   endmenu # "Power management options"
> >
> >   menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index ef1283d04b70..3de70d3e6ceb 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> >   		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >   	.endm
> >
> > +/**
> > + * copy_page - copy 1 page (4KB) of data from source to destination
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +	.macro	copy_page a0, a1
> > +		lui	a2, 0x1
> > +		add	a2, a2, a0
> > +.1 :
> > +		REG_L	t0, 0(a1)
> > +		REG_L	t1, SZREG(a1)
> > +
> > +		REG_S	t0, 0(a0)
> > +		REG_S	t1, SZREG(a0)
> > +
> > +		addi	a0, a0, 2 * SZREG
> > +		addi	a1, a1, 2 * SZREG
> > +		bne	a2, a0, .1
> > +	.endm
> > +
> >   #endif	/* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..db40ae433aa9 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,12 @@ struct suspend_context {
> >   #endif
> >   };
> >
> > +/*
> > + * This parameter will be assigned to 0 during resume and will be used by
> > + * hibernation core for the subsequent resume sequence
> > + */
> > +extern int in_suspend;
> > +
> >   /* Low-level CPU suspend entry function */
> >   int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >   /* Used to save and restore the csr */
> >   void suspend_save_csrs(struct suspend_context *context);
> >   void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +/* Used to restore the hibernated image */
> > +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > +				unsigned long cpu_resume);
> > +asmlinkage int core_restore_code(void);
> >   #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >   obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >
> >   obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >
> >   obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >   obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> >   #include <linux/kbuild.h>
> >   #include <linux/mm.h>
> >   #include <linux/sched.h>
> > +#include <linux/suspend.h>
> >   #include <asm/kvm_host.h>
> >   #include <asm/thread_info.h>
> >   #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> >   	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> >   	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >   	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >   	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..a83d534b89bd
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,89 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * This code is executed when resume from the hibernation.
> > + *
> > + * It begins with loading the temporary page table then restores the memory image.
> > + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> > + * swsusp_arch_suspend().
> > + */
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restore the CPU
> > + * context.
> > + *
> > + * Always returns 0 to the C code.
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > +	/* switch to hibernated image's page table */
> > +	csrw CSR_SATP, s0
> > +	sfence.vma
> > +
> > +	REG_L	a0, hibernate_cpu_context
> > +
> > +	/* Restore CSRs */
> > +	restore_csr
> > +
> > +	/* Restore registers (except A0 and T0-T6) */
> > +	restore_reg
> > +
> > +	/* Return zero value */
> > +	add	a0, zero, zero
> > +
> > +	/* Return to C code */
> > +	ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables
> > + * a1: satp of temporary page tables
> > + * a2: cpu_resume
> > + */
> > +ENTRY(restore_image)
> > +	mv	s0, a0
> > +	mv	s1, a1
> > +	mv	s2, a2
> > +	REG_L	s4, restore_pblist
> > +	REG_L	a1, relocated_restore_code
> > +
> > +	jalr	a1
> > +END(restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then start to copy the pages
> > + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> > + * to restore the CPU context.
> > + */
> > +ENTRY(core_restore_code)
> > +	/* switch to temp page table */
> > +	csrw satp, s1
> > +	sfence.vma
> > +.Lcopy:
> > +	/* The below code will restore the hibernated image. */
> > +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> > +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> > +
> > +	copy_page a0, a1
> > +
> > +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> > +	bnez	s4, .Lcopy
> > +
> > +	jalr	s2
> > +END(core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..bf7f3c781820
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,360 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* CPU context to be saved */
> > +struct suspend_context *hibernate_cpu_context;
> > +
> > +unsigned long relocated_restore_code;
> > +
> > +/* Pointer to the temporary resume page table */
> > +pgd_t *resume_pg_dir;
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> > + * @uts_version: to save the build number and date so that the we are not resume with
> > + *		a different kernel
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > +	char		uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> > + * @invariants: container to store kernel build version
> > + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	unsigned long	hartid;
> > +	unsigned long	saved_satp;
> > +	unsigned long	restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > +	memset(i, 0, sizeof(*i));
> > +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > +	WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +
> > +	if (max_size < sizeof(*hdr))
> > +		return -EOVERFLOW;
> > +
> > +	arch_hdr_invariants(&hdr->invariants);
> > +
> > +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > +	hdr->saved_satp = csr_read(CSR_SATP);
> > +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_save);
> > +
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +	int ret = 0;
> > +
> > +	arch_hdr_invariants(&invariants);
> > +
> > +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > +		pr_crit("Hibernate image not generated by this kernel!\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > +	if (sleep_cpu < 0) {
> > +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > +		sleep_cpu = -EINVAL;
> > +		return -EINVAL;
> > +	}
> > +
> > +#ifdef CONFIG_SMP
> > +	ret = bringup_hibernate_cpu(sleep_cpu);
> > +	if (ret) {
> > +		sleep_cpu = -EINVAL;
> > +		return ret;
> > +	}
> > +#endif
> > +	resume_hdr = *hdr;
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > +	int ret = 0;
> > +
> > +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > +		sleep_cpu = smp_processor_id();
> > +		suspend_save_csrs(hibernate_cpu_context);
> > +		ret = swsusp_save();
> > +	} else {
> > +		suspend_restore_csrs(hibernate_cpu_context);
> > +		flush_tlb_all();
> > +
> > +		/* Invalidated Icache */
> > +		flush_icache_all();
> > +
> > +		/*
> > +		 * Tell the hibernation core that we've just restored
> > +		 * the memory
> > +		 */
> > +		in_suspend = 0;
> > +		sleep_cpu = -EINVAL;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pte_idx = pte_index(vaddr);
> > +
> > +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> > +
> > +	return 0;
> > +}
> > +
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> > +		(pgtable_l5_enabled ?					\
> > +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> > +		(pgtable_l4_enabled ?					\
> > +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> > +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> > +
> > +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pmd_idx = pmd_index(vaddr);
> > +	pte_t *ptep;
> > +
> > +	if (pmd_none(pmdp[pmd_idx])) {
> > +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> > +		if (!ptep)
> > +			return -ENOMEM;
> > +
> > +		memset(ptep, 0, PAGE_SIZE);
> > +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> > +	} else {
> > +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> > +	}
> > +
> > +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pud_index = pud_index(vaddr);
> > +	pmd_t *pmdp;
> > +
> > +	if (pud_val(pudp[pud_index]) == 0) {
> > +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> > +		if (!pmdp)
> > +			return -ENOMEM;
> > +
> > +		memset(pmdp, 0, PAGE_SIZE);
> > +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> > +	} else {
> > +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> > +	}
> > +
> > +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> > +}
> > +
> > +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t p4d_index = p4d_index(vaddr);
> > +	pud_t *pudp;
> > +
> > +	if (p4d_val(p4dp[p4d_index]) == 0) {
> > +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> > +		if (!pudp)
> > +			return -ENOMEM;
> > +
> > +		memset(pudp, 0, PAGE_SIZE);
> > +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> > +	} else {
> > +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> > +	}
> > +
> > +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> > +}
> > +
> > +#else
> > +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> > +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> > +#endif /* __PAGETABLE_PMD_FOLDED */
> > +
> > +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pgd_idx = pgd_index(vaddr);
> > +	void *nextp;
> > +
> > +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> > +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> > +		if (!nextp)
> > +			return -ENOMEM;
> > +
> > +		memset(nextp, 0, PAGE_SIZE);
> > +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> > +	} else {
> > +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> > +	}
> > +
> > +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> > +}
> > +
> > +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> > +}
> > +
> > +static unsigned long relocate_restore_code(void)
> > +{
> > +	unsigned long ret;
> > +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> > +
> > +	if (!page)
> > +		return -ENOMEM;
> > +
> > +	copy_page(page, core_restore_code);
> > +
> > +	/* Make the page containing the relocated code executable */
> > +	set_memory_x((unsigned long)page, 1);
> > +
> > +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return (unsigned long)page;
> > +}
> > +
> > +int swsusp_arch_resume(void)
> > +{
> > +	unsigned long addr = PAGE_OFFSET;
> > +	unsigned long ret;
> > +
> > +	/*
> > +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> > +	 * we don't need to free it here.
> > +	 */
> > +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> > +	if (!resume_pg_dir)
> > +		return -ENOMEM;
> > +
> > +	/*
> > +	 * The pages need to be writable when restoring the image.
> > +	 * Create a second copy of page table just for the linear map, and use this when
> > +	 * restoring.
> > +	 */
> > +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> > +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> > +		if (ret)
> > +			return (int)ret;
> > +	}
> > +
> 
> 
> To me this is wrong as this does not account for the real physical
> mapping layout: can't you simply copy the linear mapping from
> swapper_pg_dir?
Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the swapper_pg_dir as we are not suppose to modify its content.
> 
> But I have to admit that I struggle to understand the need for this
> temporary page table: all we need to do is to allow to write to the
> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify the swapper_pg_dir, the kernel will crash afterwards.
That’s why we need a second page table to do the recovering job.
> 
> 
> > +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> > +	relocated_restore_code = relocate_restore_code();
> 
> 
> And do we really need to do that too? The code in question can only be
> overwritten by the same code right?
Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
> 
> Thanks,
> 
> Alex
> 
> 
> > +	if (relocated_restore_code == -ENOMEM)
> > +		return -ENOMEM;
> > +
> > +	/*
> > +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> > +	 * restore code can jump to it after finished restore the image. The next execution
> > +	 * code doesn't find itself in a different address space after switching over to the
> > +	 * original page table used by the hibernated image.
> > +	 */
> > +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> > +					PAGE_KERNEL_READ_EXEC);
> > +	if (ret)
> > +		return ret;
> > +
> > +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> > +			resume_hdr.restore_cpu_addr);
> > +
> > +	return 0;
> > +}
> > +
> > +#ifdef CONFIG_PM_SLEEP_SMP
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > +	if (sleep_cpu < 0) {
> > +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> > +
> > +	if (WARN_ON(!hibernate_cpu_context))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +early_initcall(riscv_hibernate_init);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-08  4:43       ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-08  4:43 UTC (permalink / raw)
  To: Alexandre Ghiti, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo



> -----Original Message-----
> From: Alexandre Ghiti <alex@ghiti.fr>
> Sent: Tuesday, 7 February, 2023 11:46 PM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> Hi Sia,
> 
> On 1/27/23 10:10, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> > Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> > Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> > ---
> >   arch/riscv/Kconfig                 |   7 +
> >   arch/riscv/include/asm/assembler.h |  20 ++
> >   arch/riscv/include/asm/suspend.h   |  21 ++
> >   arch/riscv/kernel/Makefile         |   1 +
> >   arch/riscv/kernel/asm-offsets.c    |   5 +
> >   arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >   arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >   7 files changed, 503 insertions(+)
> >   create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >   create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> >   source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > +	def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > +	def_bool y
> > +	depends on HIBERNATION
> > +
> >   endmenu # "Power management options"
> >
> >   menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index ef1283d04b70..3de70d3e6ceb 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> >   		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >   	.endm
> >
> > +/**
> > + * copy_page - copy 1 page (4KB) of data from source to destination
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > +	.macro	copy_page a0, a1
> > +		lui	a2, 0x1
> > +		add	a2, a2, a0
> > +.1 :
> > +		REG_L	t0, 0(a1)
> > +		REG_L	t1, SZREG(a1)
> > +
> > +		REG_S	t0, 0(a0)
> > +		REG_S	t1, SZREG(a0)
> > +
> > +		addi	a0, a0, 2 * SZREG
> > +		addi	a1, a1, 2 * SZREG
> > +		bne	a2, a0, .1
> > +	.endm
> > +
> >   #endif	/* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..db40ae433aa9 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,12 @@ struct suspend_context {
> >   #endif
> >   };
> >
> > +/*
> > + * This parameter will be assigned to 0 during resume and will be used by
> > + * hibernation core for the subsequent resume sequence
> > + */
> > +extern int in_suspend;
> > +
> >   /* Low-level CPU suspend entry function */
> >   int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >   /* Used to save and restore the csr */
> >   void suspend_save_csrs(struct suspend_context *context);
> >   void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +/* Used to restore the hibernated image */
> > +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > +				unsigned long cpu_resume);
> > +asmlinkage int core_restore_code(void);
> >   #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >   obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >
> >   obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >
> >   obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >   obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> >   #include <linux/kbuild.h>
> >   #include <linux/mm.h>
> >   #include <linux/sched.h>
> > +#include <linux/suspend.h>
> >   #include <asm/kvm_host.h>
> >   #include <asm/thread_info.h>
> >   #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> >   	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> >   	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >   	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >   	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..a83d534b89bd
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,89 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * This code is executed when resume from the hibernation.
> > + *
> > + * It begins with loading the temporary page table then restores the memory image.
> > + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> > + * swsusp_arch_suspend().
> > + */
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restore the CPU
> > + * context.
> > + *
> > + * Always returns 0 to the C code.
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > +	/* switch to hibernated image's page table */
> > +	csrw CSR_SATP, s0
> > +	sfence.vma
> > +
> > +	REG_L	a0, hibernate_cpu_context
> > +
> > +	/* Restore CSRs */
> > +	restore_csr
> > +
> > +	/* Restore registers (except A0 and T0-T6) */
> > +	restore_reg
> > +
> > +	/* Return zero value */
> > +	add	a0, zero, zero
> > +
> > +	/* Return to C code */
> > +	ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables
> > + * a1: satp of temporary page tables
> > + * a2: cpu_resume
> > + */
> > +ENTRY(restore_image)
> > +	mv	s0, a0
> > +	mv	s1, a1
> > +	mv	s2, a2
> > +	REG_L	s4, restore_pblist
> > +	REG_L	a1, relocated_restore_code
> > +
> > +	jalr	a1
> > +END(restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then start to copy the pages
> > + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> > + * to restore the CPU context.
> > + */
> > +ENTRY(core_restore_code)
> > +	/* switch to temp page table */
> > +	csrw satp, s1
> > +	sfence.vma
> > +.Lcopy:
> > +	/* The below code will restore the hibernated image. */
> > +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> > +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> > +
> > +	copy_page a0, a1
> > +
> > +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> > +	bnez	s4, .Lcopy
> > +
> > +	jalr	s2
> > +END(core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..bf7f3c781820
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,360 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support specific for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* CPU context to be saved */
> > +struct suspend_context *hibernate_cpu_context;
> > +
> > +unsigned long relocated_restore_code;
> > +
> > +/* Pointer to the temporary resume page table */
> > +pgd_t *resume_pg_dir;
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> > + * @uts_version: to save the build number and date so that the we are not resume with
> > + *		a different kernel
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > +	char		uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> > + * @invariants: container to store kernel build version
> > + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	unsigned long	hartid;
> > +	unsigned long	saved_satp;
> > +	unsigned long	restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > +	memset(i, 0, sizeof(*i));
> > +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > +	WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +
> > +	if (max_size < sizeof(*hdr))
> > +		return -EOVERFLOW;
> > +
> > +	arch_hdr_invariants(&hdr->invariants);
> > +
> > +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > +	hdr->saved_satp = csr_read(CSR_SATP);
> > +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_save);
> > +
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > +	struct arch_hibernate_hdr_invariants invariants;
> > +	struct arch_hibernate_hdr *hdr = addr;
> > +	int ret = 0;
> > +
> > +	arch_hdr_invariants(&invariants);
> > +
> > +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > +		pr_crit("Hibernate image not generated by this kernel!\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > +	if (sleep_cpu < 0) {
> > +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > +		sleep_cpu = -EINVAL;
> > +		return -EINVAL;
> > +	}
> > +
> > +#ifdef CONFIG_SMP
> > +	ret = bringup_hibernate_cpu(sleep_cpu);
> > +	if (ret) {
> > +		sleep_cpu = -EINVAL;
> > +		return ret;
> > +	}
> > +#endif
> > +	resume_hdr = *hdr;
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > +	int ret = 0;
> > +
> > +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > +		sleep_cpu = smp_processor_id();
> > +		suspend_save_csrs(hibernate_cpu_context);
> > +		ret = swsusp_save();
> > +	} else {
> > +		suspend_restore_csrs(hibernate_cpu_context);
> > +		flush_tlb_all();
> > +
> > +		/* Invalidated Icache */
> > +		flush_icache_all();
> > +
> > +		/*
> > +		 * Tell the hibernation core that we've just restored
> > +		 * the memory
> > +		 */
> > +		in_suspend = 0;
> > +		sleep_cpu = -EINVAL;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pte_idx = pte_index(vaddr);
> > +
> > +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> > +
> > +	return 0;
> > +}
> > +
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> > +		(pgtable_l5_enabled ?					\
> > +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> > +		(pgtable_l4_enabled ?					\
> > +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> > +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> > +
> > +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pmd_idx = pmd_index(vaddr);
> > +	pte_t *ptep;
> > +
> > +	if (pmd_none(pmdp[pmd_idx])) {
> > +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> > +		if (!ptep)
> > +			return -ENOMEM;
> > +
> > +		memset(ptep, 0, PAGE_SIZE);
> > +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> > +	} else {
> > +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> > +	}
> > +
> > +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pud_index = pud_index(vaddr);
> > +	pmd_t *pmdp;
> > +
> > +	if (pud_val(pudp[pud_index]) == 0) {
> > +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> > +		if (!pmdp)
> > +			return -ENOMEM;
> > +
> > +		memset(pmdp, 0, PAGE_SIZE);
> > +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> > +	} else {
> > +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> > +	}
> > +
> > +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> > +}
> > +
> > +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t p4d_index = p4d_index(vaddr);
> > +	pud_t *pudp;
> > +
> > +	if (p4d_val(p4dp[p4d_index]) == 0) {
> > +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> > +		if (!pudp)
> > +			return -ENOMEM;
> > +
> > +		memset(pudp, 0, PAGE_SIZE);
> > +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> > +	} else {
> > +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> > +	}
> > +
> > +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> > +}
> > +
> > +#else
> > +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> > +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> > +#endif /* __PAGETABLE_PMD_FOLDED */
> > +
> > +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	uintptr_t pgd_idx = pgd_index(vaddr);
> > +	void *nextp;
> > +
> > +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> > +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> > +		if (!nextp)
> > +			return -ENOMEM;
> > +
> > +		memset(nextp, 0, PAGE_SIZE);
> > +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> > +	} else {
> > +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> > +	}
> > +
> > +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> > +}
> > +
> > +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> > +{
> > +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> > +}
> > +
> > +static unsigned long relocate_restore_code(void)
> > +{
> > +	unsigned long ret;
> > +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> > +
> > +	if (!page)
> > +		return -ENOMEM;
> > +
> > +	copy_page(page, core_restore_code);
> > +
> > +	/* Make the page containing the relocated code executable */
> > +	set_memory_x((unsigned long)page, 1);
> > +
> > +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return (unsigned long)page;
> > +}
> > +
> > +int swsusp_arch_resume(void)
> > +{
> > +	unsigned long addr = PAGE_OFFSET;
> > +	unsigned long ret;
> > +
> > +	/*
> > +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> > +	 * we don't need to free it here.
> > +	 */
> > +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> > +	if (!resume_pg_dir)
> > +		return -ENOMEM;
> > +
> > +	/*
> > +	 * The pages need to be writable when restoring the image.
> > +	 * Create a second copy of page table just for the linear map, and use this when
> > +	 * restoring.
> > +	 */
> > +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> > +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> > +		if (ret)
> > +			return (int)ret;
> > +	}
> > +
> 
> 
> To me this is wrong as this does not account for the real physical
> mapping layout: can't you simply copy the linear mapping from
> swapper_pg_dir?
Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the swapper_pg_dir as we are not suppose to modify its content.
> 
> But I have to admit that I struggle to understand the need for this
> temporary page table: all we need to do is to allow to write to the
> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify the swapper_pg_dir, the kernel will crash afterwards.
That’s why we need a second page table to do the recovering job.
> 
> 
> > +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> > +	relocated_restore_code = relocate_restore_code();
> 
> 
> And do we really need to do that too? The code in question can only be
> overwritten by the same code right?
Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
> 
> Thanks,
> 
> Alex
> 
> 
> > +	if (relocated_restore_code == -ENOMEM)
> > +		return -ENOMEM;
> > +
> > +	/*
> > +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> > +	 * restore code can jump to it after finished restore the image. The next execution
> > +	 * code doesn't find itself in a different address space after switching over to the
> > +	 * original page table used by the hibernated image.
> > +	 */
> > +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> > +					PAGE_KERNEL_READ_EXEC);
> > +	if (ret)
> > +		return ret;
> > +
> > +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> > +			resume_hdr.restore_cpu_addr);
> > +
> > +	return 0;
> > +}
> > +
> > +#ifdef CONFIG_PM_SLEEP_SMP
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > +	if (sleep_cpu < 0) {
> > +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> > +
> > +	if (WARN_ON(!hibernate_cpu_context))
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +early_initcall(riscv_hibernate_init);
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-08  4:43       ` JeeHeng Sia
@ 2023-02-08 12:04         ` Alexandre Ghiti
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-08 12:04 UTC (permalink / raw)
  To: JeeHeng Sia, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

Hi Sia,

On 2/8/23 05:43, JeeHeng Sia wrote:
>
>> -----Original Message-----
>> From: Alexandre Ghiti <alex@ghiti.fr>
>> Sent: Tuesday, 7 February, 2023 11:46 PM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>> <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>
>> Hi Sia,
>>
>> On 1/27/23 10:10, Sia Jee Heng wrote:
>>> Low level Arch functions were created to support hibernation.
>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
>>> cpu state onto the stack, then calling swsusp_save() to save the memory
>>> image.
>>>
>>> Arch specific hibernation header is implemented and is utilized by the
>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
>>> functions. The arch specific hibernation header consists of satp, hartid,
>>> and the cpu_resume address. The kernel built version is also need to be
>>> saved into the hibernation image header to making sure only the same
>>> kernel is restore when resume.
>>>
>>> swsusp_arch_resume() creates a temporary page table that covering only
>>> the linear map. It copies the restore code to a 'safe' page, then start
>>> to restore the memory image. Once completed, it restores the original
>>> kernel's page table. It then calls into __hibernate_cpu_resume()
>>> to restore the CPU context. Finally, it follows the normal hibernation
>>> path back to the hibernation core.
>>>
>>> To enable hibernation/suspend to disk into RISCV, the below config
>>> need to be enabled:
>>> - CONFIG_ARCH_HIBERNATION_HEADER
>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>>>
>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
>>> ---
>>>    arch/riscv/Kconfig                 |   7 +
>>>    arch/riscv/include/asm/assembler.h |  20 ++
>>>    arch/riscv/include/asm/suspend.h   |  21 ++
>>>    arch/riscv/kernel/Makefile         |   1 +
>>>    arch/riscv/kernel/asm-offsets.c    |   5 +
>>>    arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>>>    arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>>>    7 files changed, 503 insertions(+)
>>>    create mode 100644 arch/riscv/kernel/hibernate-asm.S
>>>    create mode 100644 arch/riscv/kernel/hibernate.c
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index e2b656043abf..4555848a817f 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -690,6 +690,13 @@ menu "Power management options"
>>>
>>>    source "kernel/power/Kconfig"
>>>
>>> +config ARCH_HIBERNATION_POSSIBLE
>>> +	def_bool y
>>> +
>>> +config ARCH_HIBERNATION_HEADER
>>> +	def_bool y
>>> +	depends on HIBERNATION
>>> +
>>>    endmenu # "Power management options"
>>>
>>>    menu "CPU Power Management"
>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
>>> index ef1283d04b70..3de70d3e6ceb 100644
>>> --- a/arch/riscv/include/asm/assembler.h
>>> +++ b/arch/riscv/include/asm/assembler.h
>>> @@ -59,4 +59,24 @@
>>>    		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>>>    	.endm
>>>
>>> +/**
>>> + * copy_page - copy 1 page (4KB) of data from source to destination
>>> + * @a0 - destination
>>> + * @a1 - source
>>> + */
>>> +	.macro	copy_page a0, a1
>>> +		lui	a2, 0x1
>>> +		add	a2, a2, a0
>>> +.1 :
>>> +		REG_L	t0, 0(a1)
>>> +		REG_L	t1, SZREG(a1)
>>> +
>>> +		REG_S	t0, 0(a0)
>>> +		REG_S	t1, SZREG(a0)
>>> +
>>> +		addi	a0, a0, 2 * SZREG
>>> +		addi	a1, a1, 2 * SZREG
>>> +		bne	a2, a0, .1
>>> +	.endm
>>> +
>>>    #endif	/* __ASM_ASSEMBLER_H */
>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
>>> index 75419c5ca272..db40ae433aa9 100644
>>> --- a/arch/riscv/include/asm/suspend.h
>>> +++ b/arch/riscv/include/asm/suspend.h
>>> @@ -21,6 +21,12 @@ struct suspend_context {
>>>    #endif
>>>    };
>>>
>>> +/*
>>> + * This parameter will be assigned to 0 during resume and will be used by
>>> + * hibernation core for the subsequent resume sequence
>>> + */
>>> +extern int in_suspend;
>>> +
>>>    /* Low-level CPU suspend entry function */
>>>    int __cpu_suspend_enter(struct suspend_context *context);
>>>
>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>>>    /* Used to save and restore the csr */
>>>    void suspend_save_csrs(struct suspend_context *context);
>>>    void suspend_restore_csrs(struct suspend_context *context);
>>> +
>>> +/* Low-level API to support hibernation */
>>> +int swsusp_arch_suspend(void);
>>> +int swsusp_arch_resume(void);
>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
>>> +int arch_hibernation_header_restore(void *addr);
>>> +int __hibernate_cpu_resume(void);
>>> +
>>> +/* Used to resume on the CPU we hibernated on */
>>> +int hibernate_resume_nonboot_cpu_disable(void);
>>> +
>>> +/* Used to restore the hibernated image */
>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
>>> +				unsigned long cpu_resume);
>>> +asmlinkage int core_restore_code(void);
>>>    #endif
>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>> index 4cf303a779ab..daab341d55e4 100644
>>> --- a/arch/riscv/kernel/Makefile
>>> +++ b/arch/riscv/kernel/Makefile
>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>>>    obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>>>
>>>    obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>>>
>>>    obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>>>    obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>> index df9444397908..d6a75aac1d27 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -9,6 +9,7 @@
>>>    #include <linux/kbuild.h>
>>>    #include <linux/mm.h>
>>>    #include <linux/sched.h>
>>> +#include <linux/suspend.h>
>>>    #include <asm/kvm_host.h>
>>>    #include <asm/thread_info.h>
>>>    #include <asm/ptrace.h>
>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
>>>
>>>    	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>>>
>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
>>> +
>>>    	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>>>    	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>>>    	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
>>> new file mode 100644
>>> index 000000000000..a83d534b89bd
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/hibernate-asm.S
>>> @@ -0,0 +1,89 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Hibernation support specific for RISCV
>>> + *
>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>> + *
>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>> + */
>>> +
>>> +#include <asm/asm.h>
>>> +#include <asm/asm-offsets.h>
>>> +#include <asm/assembler.h>
>>> +#include <asm/csr.h>
>>> +
>>> +#include <linux/linkage.h>
>>> +
>>> +/*
>>> + * This code is executed when resume from the hibernation.
>>> + *
>>> + * It begins with loading the temporary page table then restores the memory image.
>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
>>> + * swsusp_arch_suspend().
>>> + */
>>> +
>>> +/*
>>> + * int __hibernate_cpu_resume(void)
>>> + * Switch back to the hibernated image's page table prior to restore the CPU
>>> + * context.
>>> + *
>>> + * Always returns 0 to the C code.
>>> + */
>>> +ENTRY(__hibernate_cpu_resume)
>>> +	/* switch to hibernated image's page table */
>>> +	csrw CSR_SATP, s0
>>> +	sfence.vma
>>> +
>>> +	REG_L	a0, hibernate_cpu_context
>>> +
>>> +	/* Restore CSRs */
>>> +	restore_csr
>>> +
>>> +	/* Restore registers (except A0 and T0-T6) */
>>> +	restore_reg
>>> +
>>> +	/* Return zero value */
>>> +	add	a0, zero, zero
>>> +
>>> +	/* Return to C code */
>>> +	ret
>>> +END(__hibernate_cpu_resume)
>>> +
>>> +/*
>>> + * Prepare to restore the image.
>>> + * a0: satp of saved page tables
>>> + * a1: satp of temporary page tables
>>> + * a2: cpu_resume
>>> + */
>>> +ENTRY(restore_image)
>>> +	mv	s0, a0
>>> +	mv	s1, a1
>>> +	mv	s2, a2
>>> +	REG_L	s4, restore_pblist
>>> +	REG_L	a1, relocated_restore_code
>>> +
>>> +	jalr	a1
>>> +END(restore_image)
>>> +
>>> +/*
>>> + * The below code will be executed from a 'safe' page.
>>> + * It first switches to the temporary page table, then start to copy the pages
>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>>> + * to restore the CPU context.
>>> + */
>>> +ENTRY(core_restore_code)
>>> +	/* switch to temp page table */
>>> +	csrw satp, s1
>>> +	sfence.vma
>>> +.Lcopy:
>>> +	/* The below code will restore the hibernated image. */
>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
>>> +
>>> +	copy_page a0, a1
>>> +
>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
>>> +	bnez	s4, .Lcopy
>>> +
>>> +	jalr	s2
>>> +END(core_restore_code)
>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
>>> new file mode 100644
>>> index 000000000000..bf7f3c781820
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/hibernate.c
>>> @@ -0,0 +1,360 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * Hibernation support specific for RISCV
>>> + *
>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>> + *
>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>> + */
>>> +
>>> +#include <asm/barrier.h>
>>> +#include <asm/cacheflush.h>
>>> +#include <asm/mmu_context.h>
>>> +#include <asm/page.h>
>>> +#include <asm/pgtable.h>
>>> +#include <asm/sections.h>
>>> +#include <asm/set_memory.h>
>>> +#include <asm/smp.h>
>>> +#include <asm/suspend.h>
>>> +
>>> +#include <linux/cpu.h>
>>> +#include <linux/memblock.h>
>>> +#include <linux/pm.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/suspend.h>
>>> +#include <linux/utsname.h>
>>> +
>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
>>> +static int sleep_cpu = -EINVAL;
>>> +
>>> +/* CPU context to be saved */
>>> +struct suspend_context *hibernate_cpu_context;
>>> +
>>> +unsigned long relocated_restore_code;
>>> +
>>> +/* Pointer to the temporary resume page table */
>>> +pgd_t *resume_pg_dir;
>>> +
>>> +/**
>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
>>> + * @uts_version: to save the build number and date so that the we are not resume with
>>> + *		a different kernel
>>> + */
>>> +struct arch_hibernate_hdr_invariants {
>>> +	char		uts_version[__NEW_UTS_LEN + 1];
>>> +};
>>> +
>>> +/**
>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
>>> + * @invariants: container to store kernel build version
>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>>> + * @saved_satp: original page table used by the hibernated image.
>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
>>> + */
>>> +static struct arch_hibernate_hdr {
>>> +	struct arch_hibernate_hdr_invariants invariants;
>>> +	unsigned long	hartid;
>>> +	unsigned long	saved_satp;
>>> +	unsigned long	restore_cpu_addr;
>>> +} resume_hdr;
>>> +
>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>>> +{
>>> +	memset(i, 0, sizeof(*i));
>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>>> +}
>>> +
>>> +/*
>>> + * Check if the given pfn is in the 'nosave' section.
>>> + */
>>> +int pfn_is_nosave(unsigned long pfn)
>>> +{
>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>> +
>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
>>> +}
>>> +
>>> +void notrace save_processor_state(void)
>>> +{
>>> +	WARN_ON(num_online_cpus() != 1);
>>> +}
>>> +
>>> +void notrace restore_processor_state(void)
>>> +{
>>> +}
>>> +
>>> +/*
>>> + * Helper parameters need to be saved to the hibernation image header.
>>> + */
>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
>>> +{
>>> +	struct arch_hibernate_hdr *hdr = addr;
>>> +
>>> +	if (max_size < sizeof(*hdr))
>>> +		return -EOVERFLOW;
>>> +
>>> +	arch_hdr_invariants(&hdr->invariants);
>>> +
>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
>>> +	hdr->saved_satp = csr_read(CSR_SATP);
>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
>>> +
>>> +/*
>>> + * Retrieve the helper parameters from the hibernation image header
>>> + */
>>> +int arch_hibernation_header_restore(void *addr)
>>> +{
>>> +	struct arch_hibernate_hdr_invariants invariants;
>>> +	struct arch_hibernate_hdr *hdr = addr;
>>> +	int ret = 0;
>>> +
>>> +	arch_hdr_invariants(&invariants);
>>> +
>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
>>> +	if (sleep_cpu < 0) {
>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
>>> +		sleep_cpu = -EINVAL;
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +#ifdef CONFIG_SMP
>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
>>> +	if (ret) {
>>> +		sleep_cpu = -EINVAL;
>>> +		return ret;
>>> +	}
>>> +#endif
>>> +	resume_hdr = *hdr;
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
>>> +
>>> +int swsusp_arch_suspend(void)
>>> +{
>>> +	int ret = 0;
>>> +
>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
>>> +		sleep_cpu = smp_processor_id();
>>> +		suspend_save_csrs(hibernate_cpu_context);
>>> +		ret = swsusp_save();
>>> +	} else {
>>> +		suspend_restore_csrs(hibernate_cpu_context);
>>> +		flush_tlb_all();
>>> +
>>> +		/* Invalidated Icache */
>>> +		flush_icache_all();
>>> +
>>> +		/*
>>> +		 * Tell the hibernation core that we've just restored
>>> +		 * the memory
>>> +		 */
>>> +		in_suspend = 0;
>>> +		sleep_cpu = -EINVAL;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pte_idx = pte_index(vaddr);
>>> +
>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +#ifndef __PAGETABLE_PMD_FOLDED
>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
>>> +		(pgtable_l5_enabled ?					\
>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
>>> +		(pgtable_l4_enabled ?					\
>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
>>> +
>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
>>> +	pte_t *ptep;
>>> +
>>> +	if (pmd_none(pmdp[pmd_idx])) {
>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
>>> +		if (!ptep)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(ptep, 0, PAGE_SIZE);
>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
>>> +	} else {
>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pud_index = pud_index(vaddr);
>>> +	pmd_t *pmdp;
>>> +
>>> +	if (pud_val(pudp[pud_index]) == 0) {
>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
>>> +		if (!pmdp)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(pmdp, 0, PAGE_SIZE);
>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
>>> +	} else {
>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t p4d_index = p4d_index(vaddr);
>>> +	pud_t *pudp;
>>> +
>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
>>> +		if (!pudp)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(pudp, 0, PAGE_SIZE);
>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
>>> +	} else {
>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
>>> +}
>>> +
>>> +#else
>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
>>> +#endif /* __PAGETABLE_PMD_FOLDED */
>>> +
>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
>>> +	void *nextp;
>>> +
>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
>>> +		if (!nextp)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(nextp, 0, PAGE_SIZE);
>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
>>> +	} else {
>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long relocate_restore_code(void)
>>> +{
>>> +	unsigned long ret;
>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
>>> +
>>> +	if (!page)
>>> +		return -ENOMEM;
>>> +
>>> +	copy_page(page, core_restore_code);
>>> +
>>> +	/* Make the page containing the relocated code executable */
>>> +	set_memory_x((unsigned long)page, 1);
>>> +
>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	return (unsigned long)page;
>>> +}
>>> +
>>> +int swsusp_arch_resume(void)
>>> +{
>>> +	unsigned long addr = PAGE_OFFSET;
>>> +	unsigned long ret;
>>> +
>>> +	/*
>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
>>> +	 * we don't need to free it here.
>>> +	 */
>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
>>> +	if (!resume_pg_dir)
>>> +		return -ENOMEM;
>>> +
>>> +	/*
>>> +	 * The pages need to be writable when restoring the image.
>>> +	 * Create a second copy of page table just for the linear map, and use this when
>>> +	 * restoring.
>>> +	 */
>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
>>> +		if (ret)
>>> +			return (int)ret;
>>> +	}
>>> +
>>
>> To me this is wrong as this does not account for the real physical
>> mapping layout: can't you simply copy the linear mapping from
>> swapper_pg_dir?
> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the swapper_pg_dir as we are not suppose to modify its content.


First, you're right, we need the temporary page table as swapper_pg_dir 
will get overwritten under our feet.

Now, I still disagree with mapping all the memory: the linear mapping is 
sparse because we only map what memblock gives us (some regions are 
marked as "nomap" for a reason).

I just took a look at arm64, and they do exactly that: they go through 
swapper_pg_dir, copy the linear mapping and enable write at every leaf 
level 
(https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).


>> But I have to admit that I struggle to understand the need for this
>> temporary page table: all we need to do is to allow to write to the
>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify the swapper_pg_dir, the kernel will crash afterwards.
> That’s why we need a second page table to do the recovering job.
>>
>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
>>> +	relocated_restore_code = relocate_restore_code();
>>
>> And do we really need to do that too? The code in question can only be
>> overwritten by the same code right?
> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
>> Thanks,
>>
>> Alex
>>
>>
>>> +	if (relocated_restore_code == -ENOMEM)
>>> +		return -ENOMEM;
>>> +
>>> +	/*
>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
>>> +	 * restore code can jump to it after finished restore the image. The next execution
>>> +	 * code doesn't find itself in a different address space after switching over to the
>>> +	 * original page table used by the hibernated image.
>>> +	 */
>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
>>> +					PAGE_KERNEL_READ_EXEC);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
>>> +			resume_hdr.restore_cpu_addr);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +#ifdef CONFIG_PM_SLEEP_SMP
>>> +int hibernate_resume_nonboot_cpu_disable(void)
>>> +{
>>> +	if (sleep_cpu < 0) {
>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
>>> +		return -ENODEV;
>>> +	}
>>> +
>>> +	return freeze_secondary_cpus(sleep_cpu);
>>> +}
>>> +#endif
>>> +
>>> +static int __init riscv_hibernate_init(void)
>>> +{
>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
>>> +
>>> +	if (WARN_ON(!hibernate_cpu_context))
>>> +		return -ENOMEM;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +early_initcall(riscv_hibernate_init);
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-08 12:04         ` Alexandre Ghiti
  0 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-08 12:04 UTC (permalink / raw)
  To: JeeHeng Sia, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

Hi Sia,

On 2/8/23 05:43, JeeHeng Sia wrote:
>
>> -----Original Message-----
>> From: Alexandre Ghiti <alex@ghiti.fr>
>> Sent: Tuesday, 7 February, 2023 11:46 PM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>> <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>
>> Hi Sia,
>>
>> On 1/27/23 10:10, Sia Jee Heng wrote:
>>> Low level Arch functions were created to support hibernation.
>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
>>> cpu state onto the stack, then calling swsusp_save() to save the memory
>>> image.
>>>
>>> Arch specific hibernation header is implemented and is utilized by the
>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
>>> functions. The arch specific hibernation header consists of satp, hartid,
>>> and the cpu_resume address. The kernel built version is also need to be
>>> saved into the hibernation image header to making sure only the same
>>> kernel is restore when resume.
>>>
>>> swsusp_arch_resume() creates a temporary page table that covering only
>>> the linear map. It copies the restore code to a 'safe' page, then start
>>> to restore the memory image. Once completed, it restores the original
>>> kernel's page table. It then calls into __hibernate_cpu_resume()
>>> to restore the CPU context. Finally, it follows the normal hibernation
>>> path back to the hibernation core.
>>>
>>> To enable hibernation/suspend to disk into RISCV, the below config
>>> need to be enabled:
>>> - CONFIG_ARCH_HIBERNATION_HEADER
>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>>>
>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
>>> ---
>>>    arch/riscv/Kconfig                 |   7 +
>>>    arch/riscv/include/asm/assembler.h |  20 ++
>>>    arch/riscv/include/asm/suspend.h   |  21 ++
>>>    arch/riscv/kernel/Makefile         |   1 +
>>>    arch/riscv/kernel/asm-offsets.c    |   5 +
>>>    arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>>>    arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>>>    7 files changed, 503 insertions(+)
>>>    create mode 100644 arch/riscv/kernel/hibernate-asm.S
>>>    create mode 100644 arch/riscv/kernel/hibernate.c
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index e2b656043abf..4555848a817f 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -690,6 +690,13 @@ menu "Power management options"
>>>
>>>    source "kernel/power/Kconfig"
>>>
>>> +config ARCH_HIBERNATION_POSSIBLE
>>> +	def_bool y
>>> +
>>> +config ARCH_HIBERNATION_HEADER
>>> +	def_bool y
>>> +	depends on HIBERNATION
>>> +
>>>    endmenu # "Power management options"
>>>
>>>    menu "CPU Power Management"
>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
>>> index ef1283d04b70..3de70d3e6ceb 100644
>>> --- a/arch/riscv/include/asm/assembler.h
>>> +++ b/arch/riscv/include/asm/assembler.h
>>> @@ -59,4 +59,24 @@
>>>    		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>>>    	.endm
>>>
>>> +/**
>>> + * copy_page - copy 1 page (4KB) of data from source to destination
>>> + * @a0 - destination
>>> + * @a1 - source
>>> + */
>>> +	.macro	copy_page a0, a1
>>> +		lui	a2, 0x1
>>> +		add	a2, a2, a0
>>> +.1 :
>>> +		REG_L	t0, 0(a1)
>>> +		REG_L	t1, SZREG(a1)
>>> +
>>> +		REG_S	t0, 0(a0)
>>> +		REG_S	t1, SZREG(a0)
>>> +
>>> +		addi	a0, a0, 2 * SZREG
>>> +		addi	a1, a1, 2 * SZREG
>>> +		bne	a2, a0, .1
>>> +	.endm
>>> +
>>>    #endif	/* __ASM_ASSEMBLER_H */
>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
>>> index 75419c5ca272..db40ae433aa9 100644
>>> --- a/arch/riscv/include/asm/suspend.h
>>> +++ b/arch/riscv/include/asm/suspend.h
>>> @@ -21,6 +21,12 @@ struct suspend_context {
>>>    #endif
>>>    };
>>>
>>> +/*
>>> + * This parameter will be assigned to 0 during resume and will be used by
>>> + * hibernation core for the subsequent resume sequence
>>> + */
>>> +extern int in_suspend;
>>> +
>>>    /* Low-level CPU suspend entry function */
>>>    int __cpu_suspend_enter(struct suspend_context *context);
>>>
>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>>>    /* Used to save and restore the csr */
>>>    void suspend_save_csrs(struct suspend_context *context);
>>>    void suspend_restore_csrs(struct suspend_context *context);
>>> +
>>> +/* Low-level API to support hibernation */
>>> +int swsusp_arch_suspend(void);
>>> +int swsusp_arch_resume(void);
>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
>>> +int arch_hibernation_header_restore(void *addr);
>>> +int __hibernate_cpu_resume(void);
>>> +
>>> +/* Used to resume on the CPU we hibernated on */
>>> +int hibernate_resume_nonboot_cpu_disable(void);
>>> +
>>> +/* Used to restore the hibernated image */
>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
>>> +				unsigned long cpu_resume);
>>> +asmlinkage int core_restore_code(void);
>>>    #endif
>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>> index 4cf303a779ab..daab341d55e4 100644
>>> --- a/arch/riscv/kernel/Makefile
>>> +++ b/arch/riscv/kernel/Makefile
>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>>>    obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>>>
>>>    obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>>>
>>>    obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>>>    obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>> index df9444397908..d6a75aac1d27 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -9,6 +9,7 @@
>>>    #include <linux/kbuild.h>
>>>    #include <linux/mm.h>
>>>    #include <linux/sched.h>
>>> +#include <linux/suspend.h>
>>>    #include <asm/kvm_host.h>
>>>    #include <asm/thread_info.h>
>>>    #include <asm/ptrace.h>
>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
>>>
>>>    	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>>>
>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
>>> +
>>>    	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>>>    	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>>>    	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
>>> new file mode 100644
>>> index 000000000000..a83d534b89bd
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/hibernate-asm.S
>>> @@ -0,0 +1,89 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Hibernation support specific for RISCV
>>> + *
>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>> + *
>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>> + */
>>> +
>>> +#include <asm/asm.h>
>>> +#include <asm/asm-offsets.h>
>>> +#include <asm/assembler.h>
>>> +#include <asm/csr.h>
>>> +
>>> +#include <linux/linkage.h>
>>> +
>>> +/*
>>> + * This code is executed when resume from the hibernation.
>>> + *
>>> + * It begins with loading the temporary page table then restores the memory image.
>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
>>> + * swsusp_arch_suspend().
>>> + */
>>> +
>>> +/*
>>> + * int __hibernate_cpu_resume(void)
>>> + * Switch back to the hibernated image's page table prior to restore the CPU
>>> + * context.
>>> + *
>>> + * Always returns 0 to the C code.
>>> + */
>>> +ENTRY(__hibernate_cpu_resume)
>>> +	/* switch to hibernated image's page table */
>>> +	csrw CSR_SATP, s0
>>> +	sfence.vma
>>> +
>>> +	REG_L	a0, hibernate_cpu_context
>>> +
>>> +	/* Restore CSRs */
>>> +	restore_csr
>>> +
>>> +	/* Restore registers (except A0 and T0-T6) */
>>> +	restore_reg
>>> +
>>> +	/* Return zero value */
>>> +	add	a0, zero, zero
>>> +
>>> +	/* Return to C code */
>>> +	ret
>>> +END(__hibernate_cpu_resume)
>>> +
>>> +/*
>>> + * Prepare to restore the image.
>>> + * a0: satp of saved page tables
>>> + * a1: satp of temporary page tables
>>> + * a2: cpu_resume
>>> + */
>>> +ENTRY(restore_image)
>>> +	mv	s0, a0
>>> +	mv	s1, a1
>>> +	mv	s2, a2
>>> +	REG_L	s4, restore_pblist
>>> +	REG_L	a1, relocated_restore_code
>>> +
>>> +	jalr	a1
>>> +END(restore_image)
>>> +
>>> +/*
>>> + * The below code will be executed from a 'safe' page.
>>> + * It first switches to the temporary page table, then start to copy the pages
>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>>> + * to restore the CPU context.
>>> + */
>>> +ENTRY(core_restore_code)
>>> +	/* switch to temp page table */
>>> +	csrw satp, s1
>>> +	sfence.vma
>>> +.Lcopy:
>>> +	/* The below code will restore the hibernated image. */
>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
>>> +
>>> +	copy_page a0, a1
>>> +
>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
>>> +	bnez	s4, .Lcopy
>>> +
>>> +	jalr	s2
>>> +END(core_restore_code)
>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
>>> new file mode 100644
>>> index 000000000000..bf7f3c781820
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/hibernate.c
>>> @@ -0,0 +1,360 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * Hibernation support specific for RISCV
>>> + *
>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>> + *
>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>> + */
>>> +
>>> +#include <asm/barrier.h>
>>> +#include <asm/cacheflush.h>
>>> +#include <asm/mmu_context.h>
>>> +#include <asm/page.h>
>>> +#include <asm/pgtable.h>
>>> +#include <asm/sections.h>
>>> +#include <asm/set_memory.h>
>>> +#include <asm/smp.h>
>>> +#include <asm/suspend.h>
>>> +
>>> +#include <linux/cpu.h>
>>> +#include <linux/memblock.h>
>>> +#include <linux/pm.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/suspend.h>
>>> +#include <linux/utsname.h>
>>> +
>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
>>> +static int sleep_cpu = -EINVAL;
>>> +
>>> +/* CPU context to be saved */
>>> +struct suspend_context *hibernate_cpu_context;
>>> +
>>> +unsigned long relocated_restore_code;
>>> +
>>> +/* Pointer to the temporary resume page table */
>>> +pgd_t *resume_pg_dir;
>>> +
>>> +/**
>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
>>> + * @uts_version: to save the build number and date so that the we are not resume with
>>> + *		a different kernel
>>> + */
>>> +struct arch_hibernate_hdr_invariants {
>>> +	char		uts_version[__NEW_UTS_LEN + 1];
>>> +};
>>> +
>>> +/**
>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
>>> + * @invariants: container to store kernel build version
>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>>> + * @saved_satp: original page table used by the hibernated image.
>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
>>> + */
>>> +static struct arch_hibernate_hdr {
>>> +	struct arch_hibernate_hdr_invariants invariants;
>>> +	unsigned long	hartid;
>>> +	unsigned long	saved_satp;
>>> +	unsigned long	restore_cpu_addr;
>>> +} resume_hdr;
>>> +
>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>>> +{
>>> +	memset(i, 0, sizeof(*i));
>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>>> +}
>>> +
>>> +/*
>>> + * Check if the given pfn is in the 'nosave' section.
>>> + */
>>> +int pfn_is_nosave(unsigned long pfn)
>>> +{
>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>> +
>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
>>> +}
>>> +
>>> +void notrace save_processor_state(void)
>>> +{
>>> +	WARN_ON(num_online_cpus() != 1);
>>> +}
>>> +
>>> +void notrace restore_processor_state(void)
>>> +{
>>> +}
>>> +
>>> +/*
>>> + * Helper parameters need to be saved to the hibernation image header.
>>> + */
>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
>>> +{
>>> +	struct arch_hibernate_hdr *hdr = addr;
>>> +
>>> +	if (max_size < sizeof(*hdr))
>>> +		return -EOVERFLOW;
>>> +
>>> +	arch_hdr_invariants(&hdr->invariants);
>>> +
>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
>>> +	hdr->saved_satp = csr_read(CSR_SATP);
>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
>>> +
>>> +/*
>>> + * Retrieve the helper parameters from the hibernation image header
>>> + */
>>> +int arch_hibernation_header_restore(void *addr)
>>> +{
>>> +	struct arch_hibernate_hdr_invariants invariants;
>>> +	struct arch_hibernate_hdr *hdr = addr;
>>> +	int ret = 0;
>>> +
>>> +	arch_hdr_invariants(&invariants);
>>> +
>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
>>> +	if (sleep_cpu < 0) {
>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
>>> +		sleep_cpu = -EINVAL;
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +#ifdef CONFIG_SMP
>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
>>> +	if (ret) {
>>> +		sleep_cpu = -EINVAL;
>>> +		return ret;
>>> +	}
>>> +#endif
>>> +	resume_hdr = *hdr;
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
>>> +
>>> +int swsusp_arch_suspend(void)
>>> +{
>>> +	int ret = 0;
>>> +
>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
>>> +		sleep_cpu = smp_processor_id();
>>> +		suspend_save_csrs(hibernate_cpu_context);
>>> +		ret = swsusp_save();
>>> +	} else {
>>> +		suspend_restore_csrs(hibernate_cpu_context);
>>> +		flush_tlb_all();
>>> +
>>> +		/* Invalidated Icache */
>>> +		flush_icache_all();
>>> +
>>> +		/*
>>> +		 * Tell the hibernation core that we've just restored
>>> +		 * the memory
>>> +		 */
>>> +		in_suspend = 0;
>>> +		sleep_cpu = -EINVAL;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pte_idx = pte_index(vaddr);
>>> +
>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +#ifndef __PAGETABLE_PMD_FOLDED
>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
>>> +		(pgtable_l5_enabled ?					\
>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
>>> +		(pgtable_l4_enabled ?					\
>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
>>> +
>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
>>> +	pte_t *ptep;
>>> +
>>> +	if (pmd_none(pmdp[pmd_idx])) {
>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
>>> +		if (!ptep)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(ptep, 0, PAGE_SIZE);
>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
>>> +	} else {
>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pud_index = pud_index(vaddr);
>>> +	pmd_t *pmdp;
>>> +
>>> +	if (pud_val(pudp[pud_index]) == 0) {
>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
>>> +		if (!pmdp)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(pmdp, 0, PAGE_SIZE);
>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
>>> +	} else {
>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t p4d_index = p4d_index(vaddr);
>>> +	pud_t *pudp;
>>> +
>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
>>> +		if (!pudp)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(pudp, 0, PAGE_SIZE);
>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
>>> +	} else {
>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
>>> +}
>>> +
>>> +#else
>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
>>> +#endif /* __PAGETABLE_PMD_FOLDED */
>>> +
>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
>>> +	void *nextp;
>>> +
>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
>>> +		if (!nextp)
>>> +			return -ENOMEM;
>>> +
>>> +		memset(nextp, 0, PAGE_SIZE);
>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
>>> +	} else {
>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
>>> +	}
>>> +
>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>> +{
>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
>>> +}
>>> +
>>> +static unsigned long relocate_restore_code(void)
>>> +{
>>> +	unsigned long ret;
>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
>>> +
>>> +	if (!page)
>>> +		return -ENOMEM;
>>> +
>>> +	copy_page(page, core_restore_code);
>>> +
>>> +	/* Make the page containing the relocated code executable */
>>> +	set_memory_x((unsigned long)page, 1);
>>> +
>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	return (unsigned long)page;
>>> +}
>>> +
>>> +int swsusp_arch_resume(void)
>>> +{
>>> +	unsigned long addr = PAGE_OFFSET;
>>> +	unsigned long ret;
>>> +
>>> +	/*
>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
>>> +	 * we don't need to free it here.
>>> +	 */
>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
>>> +	if (!resume_pg_dir)
>>> +		return -ENOMEM;
>>> +
>>> +	/*
>>> +	 * The pages need to be writable when restoring the image.
>>> +	 * Create a second copy of page table just for the linear map, and use this when
>>> +	 * restoring.
>>> +	 */
>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
>>> +		if (ret)
>>> +			return (int)ret;
>>> +	}
>>> +
>>
>> To me this is wrong as this does not account for the real physical
>> mapping layout: can't you simply copy the linear mapping from
>> swapper_pg_dir?
> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the swapper_pg_dir as we are not suppose to modify its content.


First, you're right, we need the temporary page table as swapper_pg_dir 
will get overwritten under our feet.

Now, I still disagree with mapping all the memory: the linear mapping is 
sparse because we only map what memblock gives us (some regions are 
marked as "nomap" for a reason).

I just took a look at arm64, and they do exactly that: they go through 
swapper_pg_dir, copy the linear mapping and enable write at every leaf 
level 
(https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).


>> But I have to admit that I struggle to understand the need for this
>> temporary page table: all we need to do is to allow to write to the
>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify the swapper_pg_dir, the kernel will crash afterwards.
> That’s why we need a second page table to do the recovering job.
>>
>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
>>> +	relocated_restore_code = relocate_restore_code();
>>
>> And do we really need to do that too? The code in question can only be
>> overwritten by the same code right?
> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
>> Thanks,
>>
>> Alex
>>
>>
>>> +	if (relocated_restore_code == -ENOMEM)
>>> +		return -ENOMEM;
>>> +
>>> +	/*
>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
>>> +	 * restore code can jump to it after finished restore the image. The next execution
>>> +	 * code doesn't find itself in a different address space after switching over to the
>>> +	 * original page table used by the hibernated image.
>>> +	 */
>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
>>> +					PAGE_KERNEL_READ_EXEC);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
>>> +			resume_hdr.restore_cpu_addr);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +#ifdef CONFIG_PM_SLEEP_SMP
>>> +int hibernate_resume_nonboot_cpu_disable(void)
>>> +{
>>> +	if (sleep_cpu < 0) {
>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
>>> +		return -ENODEV;
>>> +	}
>>> +
>>> +	return freeze_secondary_cpus(sleep_cpu);
>>> +}
>>> +#endif
>>> +
>>> +static int __init riscv_hibernate_init(void)
>>> +{
>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
>>> +
>>> +	if (WARN_ON(!hibernate_cpu_context))
>>> +		return -ENOMEM;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +early_initcall(riscv_hibernate_init);
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-08 12:04         ` Alexandre Ghiti
@ 2023-02-09  6:12           ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-09  6:12 UTC (permalink / raw)
  To: Alexandre Ghiti, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

Hi Alex,

> -----Original Message-----
> From: Alexandre Ghiti <alex@ghiti.fr>
> Sent: Wednesday, 8 February, 2023 8:05 PM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> Hi Sia,
> 
> On 2/8/23 05:43, JeeHeng Sia wrote:
> >
> >> -----Original Message-----
> >> From: Alexandre Ghiti <alex@ghiti.fr>
> >> Sent: Tuesday, 7 February, 2023 11:46 PM
> >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> >> <mason.huo@starfivetech.com>
> >> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >>
> >> Hi Sia,
> >>
> >> On 1/27/23 10:10, Sia Jee Heng wrote:
> >>> Low level Arch functions were created to support hibernation.
> >>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> >>> cpu state onto the stack, then calling swsusp_save() to save the memory
> >>> image.
> >>>
> >>> Arch specific hibernation header is implemented and is utilized by the
> >>> arch_hibernation_header_restore() and arch_hibernation_header_save()
> >>> functions. The arch specific hibernation header consists of satp, hartid,
> >>> and the cpu_resume address. The kernel built version is also need to be
> >>> saved into the hibernation image header to making sure only the same
> >>> kernel is restore when resume.
> >>>
> >>> swsusp_arch_resume() creates a temporary page table that covering only
> >>> the linear map. It copies the restore code to a 'safe' page, then start
> >>> to restore the memory image. Once completed, it restores the original
> >>> kernel's page table. It then calls into __hibernate_cpu_resume()
> >>> to restore the CPU context. Finally, it follows the normal hibernation
> >>> path back to the hibernation core.
> >>>
> >>> To enable hibernation/suspend to disk into RISCV, the below config
> >>> need to be enabled:
> >>> - CONFIG_ARCH_HIBERNATION_HEADER
> >>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >>>
> >>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> >>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> >>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> >>> ---
> >>>    arch/riscv/Kconfig                 |   7 +
> >>>    arch/riscv/include/asm/assembler.h |  20 ++
> >>>    arch/riscv/include/asm/suspend.h   |  21 ++
> >>>    arch/riscv/kernel/Makefile         |   1 +
> >>>    arch/riscv/kernel/asm-offsets.c    |   5 +
> >>>    arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >>>    arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >>>    7 files changed, 503 insertions(+)
> >>>    create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >>>    create mode 100644 arch/riscv/kernel/hibernate.c
> >>>
> >>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> >>> index e2b656043abf..4555848a817f 100644
> >>> --- a/arch/riscv/Kconfig
> >>> +++ b/arch/riscv/Kconfig
> >>> @@ -690,6 +690,13 @@ menu "Power management options"
> >>>
> >>>    source "kernel/power/Kconfig"
> >>>
> >>> +config ARCH_HIBERNATION_POSSIBLE
> >>> +	def_bool y
> >>> +
> >>> +config ARCH_HIBERNATION_HEADER
> >>> +	def_bool y
> >>> +	depends on HIBERNATION
> >>> +
> >>>    endmenu # "Power management options"
> >>>
> >>>    menu "CPU Power Management"
> >>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> >>> index ef1283d04b70..3de70d3e6ceb 100644
> >>> --- a/arch/riscv/include/asm/assembler.h
> >>> +++ b/arch/riscv/include/asm/assembler.h
> >>> @@ -59,4 +59,24 @@
> >>>    		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >>>    	.endm
> >>>
> >>> +/**
> >>> + * copy_page - copy 1 page (4KB) of data from source to destination
> >>> + * @a0 - destination
> >>> + * @a1 - source
> >>> + */
> >>> +	.macro	copy_page a0, a1
> >>> +		lui	a2, 0x1
> >>> +		add	a2, a2, a0
> >>> +.1 :
> >>> +		REG_L	t0, 0(a1)
> >>> +		REG_L	t1, SZREG(a1)
> >>> +
> >>> +		REG_S	t0, 0(a0)
> >>> +		REG_S	t1, SZREG(a0)
> >>> +
> >>> +		addi	a0, a0, 2 * SZREG
> >>> +		addi	a1, a1, 2 * SZREG
> >>> +		bne	a2, a0, .1
> >>> +	.endm
> >>> +
> >>>    #endif	/* __ASM_ASSEMBLER_H */
> >>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> >>> index 75419c5ca272..db40ae433aa9 100644
> >>> --- a/arch/riscv/include/asm/suspend.h
> >>> +++ b/arch/riscv/include/asm/suspend.h
> >>> @@ -21,6 +21,12 @@ struct suspend_context {
> >>>    #endif
> >>>    };
> >>>
> >>> +/*
> >>> + * This parameter will be assigned to 0 during resume and will be used by
> >>> + * hibernation core for the subsequent resume sequence
> >>> + */
> >>> +extern int in_suspend;
> >>> +
> >>>    /* Low-level CPU suspend entry function */
> >>>    int __cpu_suspend_enter(struct suspend_context *context);
> >>>
> >>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >>>    /* Used to save and restore the csr */
> >>>    void suspend_save_csrs(struct suspend_context *context);
> >>>    void suspend_restore_csrs(struct suspend_context *context);
> >>> +
> >>> +/* Low-level API to support hibernation */
> >>> +int swsusp_arch_suspend(void);
> >>> +int swsusp_arch_resume(void);
> >>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> >>> +int arch_hibernation_header_restore(void *addr);
> >>> +int __hibernate_cpu_resume(void);
> >>> +
> >>> +/* Used to resume on the CPU we hibernated on */
> >>> +int hibernate_resume_nonboot_cpu_disable(void);
> >>> +
> >>> +/* Used to restore the hibernated image */
> >>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> >>> +				unsigned long cpu_resume);
> >>> +asmlinkage int core_restore_code(void);
> >>>    #endif
> >>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> >>> index 4cf303a779ab..daab341d55e4 100644
> >>> --- a/arch/riscv/kernel/Makefile
> >>> +++ b/arch/riscv/kernel/Makefile
> >>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >>>    obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >>>
> >>>    obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> >>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >>>
> >>>    obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >>>    obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> >>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> >>> index df9444397908..d6a75aac1d27 100644
> >>> --- a/arch/riscv/kernel/asm-offsets.c
> >>> +++ b/arch/riscv/kernel/asm-offsets.c
> >>> @@ -9,6 +9,7 @@
> >>>    #include <linux/kbuild.h>
> >>>    #include <linux/mm.h>
> >>>    #include <linux/sched.h>
> >>> +#include <linux/suspend.h>
> >>>    #include <asm/kvm_host.h>
> >>>    #include <asm/thread_info.h>
> >>>    #include <asm/ptrace.h>
> >>> @@ -116,6 +117,10 @@ void asm_offsets(void)
> >>>
> >>>    	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >>>
> >>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> >>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> >>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> >>> +
> >>>    	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >>>    	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >>>    	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> >>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> >>> new file mode 100644
> >>> index 000000000000..a83d534b89bd
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kernel/hibernate-asm.S
> >>> @@ -0,0 +1,89 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>> +/*
> >>> + * Hibernation support specific for RISCV
> >>> + *
> >>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>> + *
> >>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>> + */
> >>> +
> >>> +#include <asm/asm.h>
> >>> +#include <asm/asm-offsets.h>
> >>> +#include <asm/assembler.h>
> >>> +#include <asm/csr.h>
> >>> +
> >>> +#include <linux/linkage.h>
> >>> +
> >>> +/*
> >>> + * This code is executed when resume from the hibernation.
> >>> + *
> >>> + * It begins with loading the temporary page table then restores the memory image.
> >>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> >>> + * swsusp_arch_suspend().
> >>> + */
> >>> +
> >>> +/*
> >>> + * int __hibernate_cpu_resume(void)
> >>> + * Switch back to the hibernated image's page table prior to restore the CPU
> >>> + * context.
> >>> + *
> >>> + * Always returns 0 to the C code.
> >>> + */
> >>> +ENTRY(__hibernate_cpu_resume)
> >>> +	/* switch to hibernated image's page table */
> >>> +	csrw CSR_SATP, s0
> >>> +	sfence.vma
> >>> +
> >>> +	REG_L	a0, hibernate_cpu_context
> >>> +
> >>> +	/* Restore CSRs */
> >>> +	restore_csr
> >>> +
> >>> +	/* Restore registers (except A0 and T0-T6) */
> >>> +	restore_reg
> >>> +
> >>> +	/* Return zero value */
> >>> +	add	a0, zero, zero
> >>> +
> >>> +	/* Return to C code */
> >>> +	ret
> >>> +END(__hibernate_cpu_resume)
> >>> +
> >>> +/*
> >>> + * Prepare to restore the image.
> >>> + * a0: satp of saved page tables
> >>> + * a1: satp of temporary page tables
> >>> + * a2: cpu_resume
> >>> + */
> >>> +ENTRY(restore_image)
> >>> +	mv	s0, a0
> >>> +	mv	s1, a1
> >>> +	mv	s2, a2
> >>> +	REG_L	s4, restore_pblist
> >>> +	REG_L	a1, relocated_restore_code
> >>> +
> >>> +	jalr	a1
> >>> +END(restore_image)
> >>> +
> >>> +/*
> >>> + * The below code will be executed from a 'safe' page.
> >>> + * It first switches to the temporary page table, then start to copy the pages
> >>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> >>> + * to restore the CPU context.
> >>> + */
> >>> +ENTRY(core_restore_code)
> >>> +	/* switch to temp page table */
> >>> +	csrw satp, s1
> >>> +	sfence.vma
> >>> +.Lcopy:
> >>> +	/* The below code will restore the hibernated image. */
> >>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> >>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> >>> +
> >>> +	copy_page a0, a1
> >>> +
> >>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> >>> +	bnez	s4, .Lcopy
> >>> +
> >>> +	jalr	s2
> >>> +END(core_restore_code)
> >>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> >>> new file mode 100644
> >>> index 000000000000..bf7f3c781820
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kernel/hibernate.c
> >>> @@ -0,0 +1,360 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/*
> >>> + * Hibernation support specific for RISCV
> >>> + *
> >>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>> + *
> >>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>> + */
> >>> +
> >>> +#include <asm/barrier.h>
> >>> +#include <asm/cacheflush.h>
> >>> +#include <asm/mmu_context.h>
> >>> +#include <asm/page.h>
> >>> +#include <asm/pgtable.h>
> >>> +#include <asm/sections.h>
> >>> +#include <asm/set_memory.h>
> >>> +#include <asm/smp.h>
> >>> +#include <asm/suspend.h>
> >>> +
> >>> +#include <linux/cpu.h>
> >>> +#include <linux/memblock.h>
> >>> +#include <linux/pm.h>
> >>> +#include <linux/sched.h>
> >>> +#include <linux/suspend.h>
> >>> +#include <linux/utsname.h>
> >>> +
> >>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> >>> +static int sleep_cpu = -EINVAL;
> >>> +
> >>> +/* CPU context to be saved */
> >>> +struct suspend_context *hibernate_cpu_context;
> >>> +
> >>> +unsigned long relocated_restore_code;
> >>> +
> >>> +/* Pointer to the temporary resume page table */
> >>> +pgd_t *resume_pg_dir;
> >>> +
> >>> +/**
> >>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> >>> + * @uts_version: to save the build number and date so that the we are not resume with
> >>> + *		a different kernel
> >>> + */
> >>> +struct arch_hibernate_hdr_invariants {
> >>> +	char		uts_version[__NEW_UTS_LEN + 1];
> >>> +};
> >>> +
> >>> +/**
> >>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> >>> + * @invariants: container to store kernel build version
> >>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> >>> + * @saved_satp: original page table used by the hibernated image.
> >>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> >>> + */
> >>> +static struct arch_hibernate_hdr {
> >>> +	struct arch_hibernate_hdr_invariants invariants;
> >>> +	unsigned long	hartid;
> >>> +	unsigned long	saved_satp;
> >>> +	unsigned long	restore_cpu_addr;
> >>> +} resume_hdr;
> >>> +
> >>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> >>> +{
> >>> +	memset(i, 0, sizeof(*i));
> >>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> >>> +}
> >>> +
> >>> +/*
> >>> + * Check if the given pfn is in the 'nosave' section.
> >>> + */
> >>> +int pfn_is_nosave(unsigned long pfn)
> >>> +{
> >>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> >>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> >>> +
> >>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> >>> +}
> >>> +
> >>> +void notrace save_processor_state(void)
> >>> +{
> >>> +	WARN_ON(num_online_cpus() != 1);
> >>> +}
> >>> +
> >>> +void notrace restore_processor_state(void)
> >>> +{
> >>> +}
> >>> +
> >>> +/*
> >>> + * Helper parameters need to be saved to the hibernation image header.
> >>> + */
> >>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> >>> +{
> >>> +	struct arch_hibernate_hdr *hdr = addr;
> >>> +
> >>> +	if (max_size < sizeof(*hdr))
> >>> +		return -EOVERFLOW;
> >>> +
> >>> +	arch_hdr_invariants(&hdr->invariants);
> >>> +
> >>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> >>> +	hdr->saved_satp = csr_read(CSR_SATP);
> >>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +EXPORT_SYMBOL(arch_hibernation_header_save);
> >>> +
> >>> +/*
> >>> + * Retrieve the helper parameters from the hibernation image header
> >>> + */
> >>> +int arch_hibernation_header_restore(void *addr)
> >>> +{
> >>> +	struct arch_hibernate_hdr_invariants invariants;
> >>> +	struct arch_hibernate_hdr *hdr = addr;
> >>> +	int ret = 0;
> >>> +
> >>> +	arch_hdr_invariants(&invariants);
> >>> +
> >>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> >>> +		pr_crit("Hibernate image not generated by this kernel!\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> >>> +	if (sleep_cpu < 0) {
> >>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> >>> +		sleep_cpu = -EINVAL;
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +#ifdef CONFIG_SMP
> >>> +	ret = bringup_hibernate_cpu(sleep_cpu);
> >>> +	if (ret) {
> >>> +		sleep_cpu = -EINVAL;
> >>> +		return ret;
> >>> +	}
> >>> +#endif
> >>> +	resume_hdr = *hdr;
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> >>> +
> >>> +int swsusp_arch_suspend(void)
> >>> +{
> >>> +	int ret = 0;
> >>> +
> >>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> >>> +		sleep_cpu = smp_processor_id();
> >>> +		suspend_save_csrs(hibernate_cpu_context);
> >>> +		ret = swsusp_save();
> >>> +	} else {
> >>> +		suspend_restore_csrs(hibernate_cpu_context);
> >>> +		flush_tlb_all();
> >>> +
> >>> +		/* Invalidated Icache */
> >>> +		flush_icache_all();
> >>> +
> >>> +		/*
> >>> +		 * Tell the hibernation core that we've just restored
> >>> +		 * the memory
> >>> +		 */
> >>> +		in_suspend = 0;
> >>> +		sleep_cpu = -EINVAL;
> >>> +	}
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pte_idx = pte_index(vaddr);
> >>> +
> >>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +#ifndef __PAGETABLE_PMD_FOLDED
> >>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> >>> +		(pgtable_l5_enabled ?					\
> >>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> >>> +		(pgtable_l4_enabled ?					\
> >>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> >>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> >>> +
> >>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pmd_idx = pmd_index(vaddr);
> >>> +	pte_t *ptep;
> >>> +
> >>> +	if (pmd_none(pmdp[pmd_idx])) {
> >>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!ptep)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(ptep, 0, PAGE_SIZE);
> >>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> >>> +	} else {
> >>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pud_index = pud_index(vaddr);
> >>> +	pmd_t *pmdp;
> >>> +
> >>> +	if (pud_val(pudp[pud_index]) == 0) {
> >>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!pmdp)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(pmdp, 0, PAGE_SIZE);
> >>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> >>> +	} else {
> >>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t p4d_index = p4d_index(vaddr);
> >>> +	pud_t *pudp;
> >>> +
> >>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
> >>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!pudp)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(pudp, 0, PAGE_SIZE);
> >>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> >>> +	} else {
> >>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> >>> +}
> >>> +
> >>> +#else
> >>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> >>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> >>> +#endif /* __PAGETABLE_PMD_FOLDED */
> >>> +
> >>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pgd_idx = pgd_index(vaddr);
> >>> +	void *nextp;
> >>> +
> >>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> >>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!nextp)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(nextp, 0, PAGE_SIZE);
> >>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> >>> +	} else {
> >>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long relocate_restore_code(void)
> >>> +{
> >>> +	unsigned long ret;
> >>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> >>> +
> >>> +	if (!page)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	copy_page(page, core_restore_code);
> >>> +
> >>> +	/* Make the page containing the relocated code executable */
> >>> +	set_memory_x((unsigned long)page, 1);
> >>> +
> >>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	return (unsigned long)page;
> >>> +}
> >>> +
> >>> +int swsusp_arch_resume(void)
> >>> +{
> >>> +	unsigned long addr = PAGE_OFFSET;
> >>> +	unsigned long ret;
> >>> +
> >>> +	/*
> >>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> >>> +	 * we don't need to free it here.
> >>> +	 */
> >>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> >>> +	if (!resume_pg_dir)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	/*
> >>> +	 * The pages need to be writable when restoring the image.
> >>> +	 * Create a second copy of page table just for the linear map, and use this when
> >>> +	 * restoring.
> >>> +	 */
> >>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> >>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> >>> +		if (ret)
> >>> +			return (int)ret;
> >>> +	}
> >>> +
> >>
> >> To me this is wrong as this does not account for the real physical
> >> mapping layout: can't you simply copy the linear mapping from
> >> swapper_pg_dir?
> > Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
> swapper_pg_dir as we are not suppose to modify its content.
> 
> 
> First, you're right, we need the temporary page table as swapper_pg_dir
> will get overwritten under our feet.
> 
> Now, I still disagree with mapping all the memory: the linear mapping is
> sparse because we only map what memblock gives us (some regions are
> marked as "nomap" for a reason).
> 
> I just took a look at arm64, and they do exactly that: they go through
> swapper_pg_dir, copy the linear mapping and enable write at every leaf
> level
> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the page validity prior to do the mapping. Agree?
> 
> 
> >> But I have to admit that I struggle to understand the need for this
> >> temporary page table: all we need to do is to allow to write to the
> >> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
> > Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify
> the swapper_pg_dir, the kernel will crash afterwards.
> > That’s why we need a second page table to do the recovering job.
> >>
> >>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> >>> +	relocated_restore_code = relocate_restore_code();
> >>
> >> And do we really need to do that too? The code in question can only be
> >> overwritten by the same code right?
> > Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
> >> Thanks,
> >>
> >> Alex
> >>
> >>
> >>> +	if (relocated_restore_code == -ENOMEM)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	/*
> >>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> >>> +	 * restore code can jump to it after finished restore the image. The next execution
> >>> +	 * code doesn't find itself in a different address space after switching over to the
> >>> +	 * original page table used by the hibernated image.
> >>> +	 */
> >>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> >>> +					PAGE_KERNEL_READ_EXEC);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> >>> +			resume_hdr.restore_cpu_addr);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +#ifdef CONFIG_PM_SLEEP_SMP
> >>> +int hibernate_resume_nonboot_cpu_disable(void)
> >>> +{
> >>> +	if (sleep_cpu < 0) {
> >>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> >>> +		return -ENODEV;
> >>> +	}
> >>> +
> >>> +	return freeze_secondary_cpus(sleep_cpu);
> >>> +}
> >>> +#endif
> >>> +
> >>> +static int __init riscv_hibernate_init(void)
> >>> +{
> >>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> >>> +
> >>> +	if (WARN_ON(!hibernate_cpu_context))
> >>> +		return -ENOMEM;
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +early_initcall(riscv_hibernate_init);
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-09  6:12           ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-09  6:12 UTC (permalink / raw)
  To: Alexandre Ghiti, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

Hi Alex,

> -----Original Message-----
> From: Alexandre Ghiti <alex@ghiti.fr>
> Sent: Wednesday, 8 February, 2023 8:05 PM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> Hi Sia,
> 
> On 2/8/23 05:43, JeeHeng Sia wrote:
> >
> >> -----Original Message-----
> >> From: Alexandre Ghiti <alex@ghiti.fr>
> >> Sent: Tuesday, 7 February, 2023 11:46 PM
> >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> >> <mason.huo@starfivetech.com>
> >> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >>
> >> Hi Sia,
> >>
> >> On 1/27/23 10:10, Sia Jee Heng wrote:
> >>> Low level Arch functions were created to support hibernation.
> >>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> >>> cpu state onto the stack, then calling swsusp_save() to save the memory
> >>> image.
> >>>
> >>> Arch specific hibernation header is implemented and is utilized by the
> >>> arch_hibernation_header_restore() and arch_hibernation_header_save()
> >>> functions. The arch specific hibernation header consists of satp, hartid,
> >>> and the cpu_resume address. The kernel built version is also need to be
> >>> saved into the hibernation image header to making sure only the same
> >>> kernel is restore when resume.
> >>>
> >>> swsusp_arch_resume() creates a temporary page table that covering only
> >>> the linear map. It copies the restore code to a 'safe' page, then start
> >>> to restore the memory image. Once completed, it restores the original
> >>> kernel's page table. It then calls into __hibernate_cpu_resume()
> >>> to restore the CPU context. Finally, it follows the normal hibernation
> >>> path back to the hibernation core.
> >>>
> >>> To enable hibernation/suspend to disk into RISCV, the below config
> >>> need to be enabled:
> >>> - CONFIG_ARCH_HIBERNATION_HEADER
> >>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >>>
> >>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> >>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> >>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> >>> ---
> >>>    arch/riscv/Kconfig                 |   7 +
> >>>    arch/riscv/include/asm/assembler.h |  20 ++
> >>>    arch/riscv/include/asm/suspend.h   |  21 ++
> >>>    arch/riscv/kernel/Makefile         |   1 +
> >>>    arch/riscv/kernel/asm-offsets.c    |   5 +
> >>>    arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >>>    arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >>>    7 files changed, 503 insertions(+)
> >>>    create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >>>    create mode 100644 arch/riscv/kernel/hibernate.c
> >>>
> >>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> >>> index e2b656043abf..4555848a817f 100644
> >>> --- a/arch/riscv/Kconfig
> >>> +++ b/arch/riscv/Kconfig
> >>> @@ -690,6 +690,13 @@ menu "Power management options"
> >>>
> >>>    source "kernel/power/Kconfig"
> >>>
> >>> +config ARCH_HIBERNATION_POSSIBLE
> >>> +	def_bool y
> >>> +
> >>> +config ARCH_HIBERNATION_HEADER
> >>> +	def_bool y
> >>> +	depends on HIBERNATION
> >>> +
> >>>    endmenu # "Power management options"
> >>>
> >>>    menu "CPU Power Management"
> >>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> >>> index ef1283d04b70..3de70d3e6ceb 100644
> >>> --- a/arch/riscv/include/asm/assembler.h
> >>> +++ b/arch/riscv/include/asm/assembler.h
> >>> @@ -59,4 +59,24 @@
> >>>    		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >>>    	.endm
> >>>
> >>> +/**
> >>> + * copy_page - copy 1 page (4KB) of data from source to destination
> >>> + * @a0 - destination
> >>> + * @a1 - source
> >>> + */
> >>> +	.macro	copy_page a0, a1
> >>> +		lui	a2, 0x1
> >>> +		add	a2, a2, a0
> >>> +.1 :
> >>> +		REG_L	t0, 0(a1)
> >>> +		REG_L	t1, SZREG(a1)
> >>> +
> >>> +		REG_S	t0, 0(a0)
> >>> +		REG_S	t1, SZREG(a0)
> >>> +
> >>> +		addi	a0, a0, 2 * SZREG
> >>> +		addi	a1, a1, 2 * SZREG
> >>> +		bne	a2, a0, .1
> >>> +	.endm
> >>> +
> >>>    #endif	/* __ASM_ASSEMBLER_H */
> >>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> >>> index 75419c5ca272..db40ae433aa9 100644
> >>> --- a/arch/riscv/include/asm/suspend.h
> >>> +++ b/arch/riscv/include/asm/suspend.h
> >>> @@ -21,6 +21,12 @@ struct suspend_context {
> >>>    #endif
> >>>    };
> >>>
> >>> +/*
> >>> + * This parameter will be assigned to 0 during resume and will be used by
> >>> + * hibernation core for the subsequent resume sequence
> >>> + */
> >>> +extern int in_suspend;
> >>> +
> >>>    /* Low-level CPU suspend entry function */
> >>>    int __cpu_suspend_enter(struct suspend_context *context);
> >>>
> >>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >>>    /* Used to save and restore the csr */
> >>>    void suspend_save_csrs(struct suspend_context *context);
> >>>    void suspend_restore_csrs(struct suspend_context *context);
> >>> +
> >>> +/* Low-level API to support hibernation */
> >>> +int swsusp_arch_suspend(void);
> >>> +int swsusp_arch_resume(void);
> >>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> >>> +int arch_hibernation_header_restore(void *addr);
> >>> +int __hibernate_cpu_resume(void);
> >>> +
> >>> +/* Used to resume on the CPU we hibernated on */
> >>> +int hibernate_resume_nonboot_cpu_disable(void);
> >>> +
> >>> +/* Used to restore the hibernated image */
> >>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> >>> +				unsigned long cpu_resume);
> >>> +asmlinkage int core_restore_code(void);
> >>>    #endif
> >>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> >>> index 4cf303a779ab..daab341d55e4 100644
> >>> --- a/arch/riscv/kernel/Makefile
> >>> +++ b/arch/riscv/kernel/Makefile
> >>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >>>    obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >>>
> >>>    obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> >>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >>>
> >>>    obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >>>    obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> >>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> >>> index df9444397908..d6a75aac1d27 100644
> >>> --- a/arch/riscv/kernel/asm-offsets.c
> >>> +++ b/arch/riscv/kernel/asm-offsets.c
> >>> @@ -9,6 +9,7 @@
> >>>    #include <linux/kbuild.h>
> >>>    #include <linux/mm.h>
> >>>    #include <linux/sched.h>
> >>> +#include <linux/suspend.h>
> >>>    #include <asm/kvm_host.h>
> >>>    #include <asm/thread_info.h>
> >>>    #include <asm/ptrace.h>
> >>> @@ -116,6 +117,10 @@ void asm_offsets(void)
> >>>
> >>>    	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >>>
> >>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> >>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> >>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> >>> +
> >>>    	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >>>    	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >>>    	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> >>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> >>> new file mode 100644
> >>> index 000000000000..a83d534b89bd
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kernel/hibernate-asm.S
> >>> @@ -0,0 +1,89 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>> +/*
> >>> + * Hibernation support specific for RISCV
> >>> + *
> >>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>> + *
> >>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>> + */
> >>> +
> >>> +#include <asm/asm.h>
> >>> +#include <asm/asm-offsets.h>
> >>> +#include <asm/assembler.h>
> >>> +#include <asm/csr.h>
> >>> +
> >>> +#include <linux/linkage.h>
> >>> +
> >>> +/*
> >>> + * This code is executed when resume from the hibernation.
> >>> + *
> >>> + * It begins with loading the temporary page table then restores the memory image.
> >>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> >>> + * swsusp_arch_suspend().
> >>> + */
> >>> +
> >>> +/*
> >>> + * int __hibernate_cpu_resume(void)
> >>> + * Switch back to the hibernated image's page table prior to restore the CPU
> >>> + * context.
> >>> + *
> >>> + * Always returns 0 to the C code.
> >>> + */
> >>> +ENTRY(__hibernate_cpu_resume)
> >>> +	/* switch to hibernated image's page table */
> >>> +	csrw CSR_SATP, s0
> >>> +	sfence.vma
> >>> +
> >>> +	REG_L	a0, hibernate_cpu_context
> >>> +
> >>> +	/* Restore CSRs */
> >>> +	restore_csr
> >>> +
> >>> +	/* Restore registers (except A0 and T0-T6) */
> >>> +	restore_reg
> >>> +
> >>> +	/* Return zero value */
> >>> +	add	a0, zero, zero
> >>> +
> >>> +	/* Return to C code */
> >>> +	ret
> >>> +END(__hibernate_cpu_resume)
> >>> +
> >>> +/*
> >>> + * Prepare to restore the image.
> >>> + * a0: satp of saved page tables
> >>> + * a1: satp of temporary page tables
> >>> + * a2: cpu_resume
> >>> + */
> >>> +ENTRY(restore_image)
> >>> +	mv	s0, a0
> >>> +	mv	s1, a1
> >>> +	mv	s2, a2
> >>> +	REG_L	s4, restore_pblist
> >>> +	REG_L	a1, relocated_restore_code
> >>> +
> >>> +	jalr	a1
> >>> +END(restore_image)
> >>> +
> >>> +/*
> >>> + * The below code will be executed from a 'safe' page.
> >>> + * It first switches to the temporary page table, then start to copy the pages
> >>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> >>> + * to restore the CPU context.
> >>> + */
> >>> +ENTRY(core_restore_code)
> >>> +	/* switch to temp page table */
> >>> +	csrw satp, s1
> >>> +	sfence.vma
> >>> +.Lcopy:
> >>> +	/* The below code will restore the hibernated image. */
> >>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> >>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> >>> +
> >>> +	copy_page a0, a1
> >>> +
> >>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> >>> +	bnez	s4, .Lcopy
> >>> +
> >>> +	jalr	s2
> >>> +END(core_restore_code)
> >>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> >>> new file mode 100644
> >>> index 000000000000..bf7f3c781820
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kernel/hibernate.c
> >>> @@ -0,0 +1,360 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/*
> >>> + * Hibernation support specific for RISCV
> >>> + *
> >>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>> + *
> >>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>> + */
> >>> +
> >>> +#include <asm/barrier.h>
> >>> +#include <asm/cacheflush.h>
> >>> +#include <asm/mmu_context.h>
> >>> +#include <asm/page.h>
> >>> +#include <asm/pgtable.h>
> >>> +#include <asm/sections.h>
> >>> +#include <asm/set_memory.h>
> >>> +#include <asm/smp.h>
> >>> +#include <asm/suspend.h>
> >>> +
> >>> +#include <linux/cpu.h>
> >>> +#include <linux/memblock.h>
> >>> +#include <linux/pm.h>
> >>> +#include <linux/sched.h>
> >>> +#include <linux/suspend.h>
> >>> +#include <linux/utsname.h>
> >>> +
> >>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> >>> +static int sleep_cpu = -EINVAL;
> >>> +
> >>> +/* CPU context to be saved */
> >>> +struct suspend_context *hibernate_cpu_context;
> >>> +
> >>> +unsigned long relocated_restore_code;
> >>> +
> >>> +/* Pointer to the temporary resume page table */
> >>> +pgd_t *resume_pg_dir;
> >>> +
> >>> +/**
> >>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> >>> + * @uts_version: to save the build number and date so that the we are not resume with
> >>> + *		a different kernel
> >>> + */
> >>> +struct arch_hibernate_hdr_invariants {
> >>> +	char		uts_version[__NEW_UTS_LEN + 1];
> >>> +};
> >>> +
> >>> +/**
> >>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> >>> + * @invariants: container to store kernel build version
> >>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> >>> + * @saved_satp: original page table used by the hibernated image.
> >>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> >>> + */
> >>> +static struct arch_hibernate_hdr {
> >>> +	struct arch_hibernate_hdr_invariants invariants;
> >>> +	unsigned long	hartid;
> >>> +	unsigned long	saved_satp;
> >>> +	unsigned long	restore_cpu_addr;
> >>> +} resume_hdr;
> >>> +
> >>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> >>> +{
> >>> +	memset(i, 0, sizeof(*i));
> >>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> >>> +}
> >>> +
> >>> +/*
> >>> + * Check if the given pfn is in the 'nosave' section.
> >>> + */
> >>> +int pfn_is_nosave(unsigned long pfn)
> >>> +{
> >>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> >>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> >>> +
> >>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> >>> +}
> >>> +
> >>> +void notrace save_processor_state(void)
> >>> +{
> >>> +	WARN_ON(num_online_cpus() != 1);
> >>> +}
> >>> +
> >>> +void notrace restore_processor_state(void)
> >>> +{
> >>> +}
> >>> +
> >>> +/*
> >>> + * Helper parameters need to be saved to the hibernation image header.
> >>> + */
> >>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> >>> +{
> >>> +	struct arch_hibernate_hdr *hdr = addr;
> >>> +
> >>> +	if (max_size < sizeof(*hdr))
> >>> +		return -EOVERFLOW;
> >>> +
> >>> +	arch_hdr_invariants(&hdr->invariants);
> >>> +
> >>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> >>> +	hdr->saved_satp = csr_read(CSR_SATP);
> >>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +EXPORT_SYMBOL(arch_hibernation_header_save);
> >>> +
> >>> +/*
> >>> + * Retrieve the helper parameters from the hibernation image header
> >>> + */
> >>> +int arch_hibernation_header_restore(void *addr)
> >>> +{
> >>> +	struct arch_hibernate_hdr_invariants invariants;
> >>> +	struct arch_hibernate_hdr *hdr = addr;
> >>> +	int ret = 0;
> >>> +
> >>> +	arch_hdr_invariants(&invariants);
> >>> +
> >>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> >>> +		pr_crit("Hibernate image not generated by this kernel!\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> >>> +	if (sleep_cpu < 0) {
> >>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> >>> +		sleep_cpu = -EINVAL;
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +#ifdef CONFIG_SMP
> >>> +	ret = bringup_hibernate_cpu(sleep_cpu);
> >>> +	if (ret) {
> >>> +		sleep_cpu = -EINVAL;
> >>> +		return ret;
> >>> +	}
> >>> +#endif
> >>> +	resume_hdr = *hdr;
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> >>> +
> >>> +int swsusp_arch_suspend(void)
> >>> +{
> >>> +	int ret = 0;
> >>> +
> >>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> >>> +		sleep_cpu = smp_processor_id();
> >>> +		suspend_save_csrs(hibernate_cpu_context);
> >>> +		ret = swsusp_save();
> >>> +	} else {
> >>> +		suspend_restore_csrs(hibernate_cpu_context);
> >>> +		flush_tlb_all();
> >>> +
> >>> +		/* Invalidated Icache */
> >>> +		flush_icache_all();
> >>> +
> >>> +		/*
> >>> +		 * Tell the hibernation core that we've just restored
> >>> +		 * the memory
> >>> +		 */
> >>> +		in_suspend = 0;
> >>> +		sleep_cpu = -EINVAL;
> >>> +	}
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pte_idx = pte_index(vaddr);
> >>> +
> >>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +#ifndef __PAGETABLE_PMD_FOLDED
> >>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> >>> +		(pgtable_l5_enabled ?					\
> >>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> >>> +		(pgtable_l4_enabled ?					\
> >>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> >>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> >>> +
> >>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pmd_idx = pmd_index(vaddr);
> >>> +	pte_t *ptep;
> >>> +
> >>> +	if (pmd_none(pmdp[pmd_idx])) {
> >>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!ptep)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(ptep, 0, PAGE_SIZE);
> >>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> >>> +	} else {
> >>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pud_index = pud_index(vaddr);
> >>> +	pmd_t *pmdp;
> >>> +
> >>> +	if (pud_val(pudp[pud_index]) == 0) {
> >>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!pmdp)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(pmdp, 0, PAGE_SIZE);
> >>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> >>> +	} else {
> >>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t p4d_index = p4d_index(vaddr);
> >>> +	pud_t *pudp;
> >>> +
> >>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
> >>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!pudp)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(pudp, 0, PAGE_SIZE);
> >>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> >>> +	} else {
> >>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> >>> +}
> >>> +
> >>> +#else
> >>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> >>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> >>> +#endif /* __PAGETABLE_PMD_FOLDED */
> >>> +
> >>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	uintptr_t pgd_idx = pgd_index(vaddr);
> >>> +	void *nextp;
> >>> +
> >>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> >>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> >>> +		if (!nextp)
> >>> +			return -ENOMEM;
> >>> +
> >>> +		memset(nextp, 0, PAGE_SIZE);
> >>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> >>> +	} else {
> >>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> >>> +	}
> >>> +
> >>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>> +{
> >>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> >>> +}
> >>> +
> >>> +static unsigned long relocate_restore_code(void)
> >>> +{
> >>> +	unsigned long ret;
> >>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> >>> +
> >>> +	if (!page)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	copy_page(page, core_restore_code);
> >>> +
> >>> +	/* Make the page containing the relocated code executable */
> >>> +	set_memory_x((unsigned long)page, 1);
> >>> +
> >>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	return (unsigned long)page;
> >>> +}
> >>> +
> >>> +int swsusp_arch_resume(void)
> >>> +{
> >>> +	unsigned long addr = PAGE_OFFSET;
> >>> +	unsigned long ret;
> >>> +
> >>> +	/*
> >>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> >>> +	 * we don't need to free it here.
> >>> +	 */
> >>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> >>> +	if (!resume_pg_dir)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	/*
> >>> +	 * The pages need to be writable when restoring the image.
> >>> +	 * Create a second copy of page table just for the linear map, and use this when
> >>> +	 * restoring.
> >>> +	 */
> >>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> >>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> >>> +		if (ret)
> >>> +			return (int)ret;
> >>> +	}
> >>> +
> >>
> >> To me this is wrong as this does not account for the real physical
> >> mapping layout: can't you simply copy the linear mapping from
> >> swapper_pg_dir?
> > Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
> swapper_pg_dir as we are not suppose to modify its content.
> 
> 
> First, you're right, we need the temporary page table as swapper_pg_dir
> will get overwritten under our feet.
> 
> Now, I still disagree with mapping all the memory: the linear mapping is
> sparse because we only map what memblock gives us (some regions are
> marked as "nomap" for a reason).
> 
> I just took a look at arm64, and they do exactly that: they go through
> swapper_pg_dir, copy the linear mapping and enable write at every leaf
> level
> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the page validity prior to do the mapping. Agree?
> 
> 
> >> But I have to admit that I struggle to understand the need for this
> >> temporary page table: all we need to do is to allow to write to the
> >> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
> > Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify
> the swapper_pg_dir, the kernel will crash afterwards.
> > That’s why we need a second page table to do the recovering job.
> >>
> >>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> >>> +	relocated_restore_code = relocate_restore_code();
> >>
> >> And do we really need to do that too? The code in question can only be
> >> overwritten by the same code right?
> > Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
> >> Thanks,
> >>
> >> Alex
> >>
> >>
> >>> +	if (relocated_restore_code == -ENOMEM)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	/*
> >>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> >>> +	 * restore code can jump to it after finished restore the image. The next execution
> >>> +	 * code doesn't find itself in a different address space after switching over to the
> >>> +	 * original page table used by the hibernated image.
> >>> +	 */
> >>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> >>> +					PAGE_KERNEL_READ_EXEC);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> >>> +			resume_hdr.restore_cpu_addr);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +#ifdef CONFIG_PM_SLEEP_SMP
> >>> +int hibernate_resume_nonboot_cpu_disable(void)
> >>> +{
> >>> +	if (sleep_cpu < 0) {
> >>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> >>> +		return -ENODEV;
> >>> +	}
> >>> +
> >>> +	return freeze_secondary_cpus(sleep_cpu);
> >>> +}
> >>> +#endif
> >>> +
> >>> +static int __init riscv_hibernate_init(void)
> >>> +{
> >>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> >>> +
> >>> +	if (WARN_ON(!hibernate_cpu_context))
> >>> +		return -ENOMEM;
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +early_initcall(riscv_hibernate_init);
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-09  6:12           ` JeeHeng Sia
@ 2023-02-10 13:24             ` Alexandre Ghiti
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-10 13:24 UTC (permalink / raw)
  To: JeeHeng Sia, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

On 2/9/23 07:12, JeeHeng Sia wrote:
> Hi Alex,
>
>> -----Original Message-----
>> From: Alexandre Ghiti <alex@ghiti.fr>
>> Sent: Wednesday, 8 February, 2023 8:05 PM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>> <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>
>> Hi Sia,
>>
>> On 2/8/23 05:43, JeeHeng Sia wrote:
>>>> -----Original Message-----
>>>> From: Alexandre Ghiti <alex@ghiti.fr>
>>>> Sent: Tuesday, 7 February, 2023 11:46 PM
>>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>>>> <mason.huo@starfivetech.com>
>>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>>>
>>>> Hi Sia,
>>>>
>>>> On 1/27/23 10:10, Sia Jee Heng wrote:
>>>>> Low level Arch functions were created to support hibernation.
>>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
>>>>> cpu state onto the stack, then calling swsusp_save() to save the memory
>>>>> image.
>>>>>
>>>>> Arch specific hibernation header is implemented and is utilized by the
>>>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
>>>>> functions. The arch specific hibernation header consists of satp, hartid,
>>>>> and the cpu_resume address. The kernel built version is also need to be
>>>>> saved into the hibernation image header to making sure only the same
>>>>> kernel is restore when resume.
>>>>>
>>>>> swsusp_arch_resume() creates a temporary page table that covering only
>>>>> the linear map. It copies the restore code to a 'safe' page, then start
>>>>> to restore the memory image. Once completed, it restores the original
>>>>> kernel's page table. It then calls into __hibernate_cpu_resume()
>>>>> to restore the CPU context. Finally, it follows the normal hibernation
>>>>> path back to the hibernation core.
>>>>>
>>>>> To enable hibernation/suspend to disk into RISCV, the below config
>>>>> need to be enabled:
>>>>> - CONFIG_ARCH_HIBERNATION_HEADER
>>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>>>>>
>>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
>>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
>>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
>>>>> ---
>>>>>     arch/riscv/Kconfig                 |   7 +
>>>>>     arch/riscv/include/asm/assembler.h |  20 ++
>>>>>     arch/riscv/include/asm/suspend.h   |  21 ++
>>>>>     arch/riscv/kernel/Makefile         |   1 +
>>>>>     arch/riscv/kernel/asm-offsets.c    |   5 +
>>>>>     arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>>>>>     arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>>>>>     7 files changed, 503 insertions(+)
>>>>>     create mode 100644 arch/riscv/kernel/hibernate-asm.S
>>>>>     create mode 100644 arch/riscv/kernel/hibernate.c
>>>>>
>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>>> index e2b656043abf..4555848a817f 100644
>>>>> --- a/arch/riscv/Kconfig
>>>>> +++ b/arch/riscv/Kconfig
>>>>> @@ -690,6 +690,13 @@ menu "Power management options"
>>>>>
>>>>>     source "kernel/power/Kconfig"
>>>>>
>>>>> +config ARCH_HIBERNATION_POSSIBLE
>>>>> +	def_bool y
>>>>> +
>>>>> +config ARCH_HIBERNATION_HEADER
>>>>> +	def_bool y
>>>>> +	depends on HIBERNATION
>>>>> +
>>>>>     endmenu # "Power management options"
>>>>>
>>>>>     menu "CPU Power Management"
>>>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
>>>>> index ef1283d04b70..3de70d3e6ceb 100644
>>>>> --- a/arch/riscv/include/asm/assembler.h
>>>>> +++ b/arch/riscv/include/asm/assembler.h
>>>>> @@ -59,4 +59,24 @@
>>>>>     		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>>>>>     	.endm
>>>>>
>>>>> +/**
>>>>> + * copy_page - copy 1 page (4KB) of data from source to destination
>>>>> + * @a0 - destination
>>>>> + * @a1 - source
>>>>> + */
>>>>> +	.macro	copy_page a0, a1
>>>>> +		lui	a2, 0x1
>>>>> +		add	a2, a2, a0
>>>>> +.1 :
>>>>> +		REG_L	t0, 0(a1)
>>>>> +		REG_L	t1, SZREG(a1)
>>>>> +
>>>>> +		REG_S	t0, 0(a0)
>>>>> +		REG_S	t1, SZREG(a0)
>>>>> +
>>>>> +		addi	a0, a0, 2 * SZREG
>>>>> +		addi	a1, a1, 2 * SZREG
>>>>> +		bne	a2, a0, .1
>>>>> +	.endm
>>>>> +
>>>>>     #endif	/* __ASM_ASSEMBLER_H */
>>>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
>>>>> index 75419c5ca272..db40ae433aa9 100644
>>>>> --- a/arch/riscv/include/asm/suspend.h
>>>>> +++ b/arch/riscv/include/asm/suspend.h
>>>>> @@ -21,6 +21,12 @@ struct suspend_context {
>>>>>     #endif
>>>>>     };
>>>>>
>>>>> +/*
>>>>> + * This parameter will be assigned to 0 during resume and will be used by
>>>>> + * hibernation core for the subsequent resume sequence
>>>>> + */
>>>>> +extern int in_suspend;
>>>>> +
>>>>>     /* Low-level CPU suspend entry function */
>>>>>     int __cpu_suspend_enter(struct suspend_context *context);
>>>>>
>>>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>>>>>     /* Used to save and restore the csr */
>>>>>     void suspend_save_csrs(struct suspend_context *context);
>>>>>     void suspend_restore_csrs(struct suspend_context *context);
>>>>> +
>>>>> +/* Low-level API to support hibernation */
>>>>> +int swsusp_arch_suspend(void);
>>>>> +int swsusp_arch_resume(void);
>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
>>>>> +int arch_hibernation_header_restore(void *addr);
>>>>> +int __hibernate_cpu_resume(void);
>>>>> +
>>>>> +/* Used to resume on the CPU we hibernated on */
>>>>> +int hibernate_resume_nonboot_cpu_disable(void);
>>>>> +
>>>>> +/* Used to restore the hibernated image */
>>>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
>>>>> +				unsigned long cpu_resume);
>>>>> +asmlinkage int core_restore_code(void);
>>>>>     #endif
>>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>>>> index 4cf303a779ab..daab341d55e4 100644
>>>>> --- a/arch/riscv/kernel/Makefile
>>>>> +++ b/arch/riscv/kernel/Makefile
>>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>>>>>     obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>>>>>
>>>>>     obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
>>>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>>>>>
>>>>>     obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>>>>>     obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>>>> index df9444397908..d6a75aac1d27 100644
>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>> @@ -9,6 +9,7 @@
>>>>>     #include <linux/kbuild.h>
>>>>>     #include <linux/mm.h>
>>>>>     #include <linux/sched.h>
>>>>> +#include <linux/suspend.h>
>>>>>     #include <asm/kvm_host.h>
>>>>>     #include <asm/thread_info.h>
>>>>>     #include <asm/ptrace.h>
>>>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
>>>>>
>>>>>     	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>>>>>
>>>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
>>>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
>>>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
>>>>> +
>>>>>     	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>>>>>     	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>>>>>     	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
>>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
>>>>> new file mode 100644
>>>>> index 000000000000..a83d534b89bd
>>>>> --- /dev/null
>>>>> +++ b/arch/riscv/kernel/hibernate-asm.S
>>>>> @@ -0,0 +1,89 @@
>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>>> +/*
>>>>> + * Hibernation support specific for RISCV
>>>>> + *
>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>> + *
>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>> + */
>>>>> +
>>>>> +#include <asm/asm.h>
>>>>> +#include <asm/asm-offsets.h>
>>>>> +#include <asm/assembler.h>
>>>>> +#include <asm/csr.h>
>>>>> +
>>>>> +#include <linux/linkage.h>
>>>>> +
>>>>> +/*
>>>>> + * This code is executed when resume from the hibernation.
>>>>> + *
>>>>> + * It begins with loading the temporary page table then restores the memory image.
>>>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
>>>>> + * swsusp_arch_suspend().
>>>>> + */
>>>>> +
>>>>> +/*
>>>>> + * int __hibernate_cpu_resume(void)
>>>>> + * Switch back to the hibernated image's page table prior to restore the CPU
>>>>> + * context.
>>>>> + *
>>>>> + * Always returns 0 to the C code.
>>>>> + */
>>>>> +ENTRY(__hibernate_cpu_resume)
>>>>> +	/* switch to hibernated image's page table */
>>>>> +	csrw CSR_SATP, s0
>>>>> +	sfence.vma
>>>>> +
>>>>> +	REG_L	a0, hibernate_cpu_context
>>>>> +
>>>>> +	/* Restore CSRs */
>>>>> +	restore_csr
>>>>> +
>>>>> +	/* Restore registers (except A0 and T0-T6) */
>>>>> +	restore_reg
>>>>> +
>>>>> +	/* Return zero value */
>>>>> +	add	a0, zero, zero
>>>>> +
>>>>> +	/* Return to C code */
>>>>> +	ret
>>>>> +END(__hibernate_cpu_resume)
>>>>> +
>>>>> +/*
>>>>> + * Prepare to restore the image.
>>>>> + * a0: satp of saved page tables
>>>>> + * a1: satp of temporary page tables
>>>>> + * a2: cpu_resume
>>>>> + */
>>>>> +ENTRY(restore_image)
>>>>> +	mv	s0, a0
>>>>> +	mv	s1, a1
>>>>> +	mv	s2, a2
>>>>> +	REG_L	s4, restore_pblist
>>>>> +	REG_L	a1, relocated_restore_code
>>>>> +
>>>>> +	jalr	a1
>>>>> +END(restore_image)
>>>>> +
>>>>> +/*
>>>>> + * The below code will be executed from a 'safe' page.
>>>>> + * It first switches to the temporary page table, then start to copy the pages
>>>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>>>>> + * to restore the CPU context.
>>>>> + */
>>>>> +ENTRY(core_restore_code)
>>>>> +	/* switch to temp page table */
>>>>> +	csrw satp, s1
>>>>> +	sfence.vma
>>>>> +.Lcopy:
>>>>> +	/* The below code will restore the hibernated image. */
>>>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
>>>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
>>>>> +
>>>>> +	copy_page a0, a1
>>>>> +
>>>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
>>>>> +	bnez	s4, .Lcopy
>>>>> +
>>>>> +	jalr	s2
>>>>> +END(core_restore_code)
>>>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
>>>>> new file mode 100644
>>>>> index 000000000000..bf7f3c781820
>>>>> --- /dev/null
>>>>> +++ b/arch/riscv/kernel/hibernate.c
>>>>> @@ -0,0 +1,360 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>>> +/*
>>>>> + * Hibernation support specific for RISCV
>>>>> + *
>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>> + *
>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>> + */
>>>>> +
>>>>> +#include <asm/barrier.h>
>>>>> +#include <asm/cacheflush.h>
>>>>> +#include <asm/mmu_context.h>
>>>>> +#include <asm/page.h>
>>>>> +#include <asm/pgtable.h>
>>>>> +#include <asm/sections.h>
>>>>> +#include <asm/set_memory.h>
>>>>> +#include <asm/smp.h>
>>>>> +#include <asm/suspend.h>
>>>>> +
>>>>> +#include <linux/cpu.h>
>>>>> +#include <linux/memblock.h>
>>>>> +#include <linux/pm.h>
>>>>> +#include <linux/sched.h>
>>>>> +#include <linux/suspend.h>
>>>>> +#include <linux/utsname.h>
>>>>> +
>>>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
>>>>> +static int sleep_cpu = -EINVAL;
>>>>> +
>>>>> +/* CPU context to be saved */
>>>>> +struct suspend_context *hibernate_cpu_context;
>>>>> +
>>>>> +unsigned long relocated_restore_code;
>>>>> +
>>>>> +/* Pointer to the temporary resume page table */
>>>>> +pgd_t *resume_pg_dir;
>>>>> +
>>>>> +/**
>>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
>>>>> + * @uts_version: to save the build number and date so that the we are not resume with
>>>>> + *		a different kernel
>>>>> + */
>>>>> +struct arch_hibernate_hdr_invariants {
>>>>> +	char		uts_version[__NEW_UTS_LEN + 1];
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
>>>>> + * @invariants: container to store kernel build version
>>>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>>>>> + * @saved_satp: original page table used by the hibernated image.
>>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
>>>>> + */
>>>>> +static struct arch_hibernate_hdr {
>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>> +	unsigned long	hartid;
>>>>> +	unsigned long	saved_satp;
>>>>> +	unsigned long	restore_cpu_addr;
>>>>> +} resume_hdr;
>>>>> +
>>>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>>>>> +{
>>>>> +	memset(i, 0, sizeof(*i));
>>>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Check if the given pfn is in the 'nosave' section.
>>>>> + */
>>>>> +int pfn_is_nosave(unsigned long pfn)
>>>>> +{
>>>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
>>>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>>>> +
>>>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
>>>>> +}
>>>>> +
>>>>> +void notrace save_processor_state(void)
>>>>> +{
>>>>> +	WARN_ON(num_online_cpus() != 1);
>>>>> +}
>>>>> +
>>>>> +void notrace restore_processor_state(void)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Helper parameters need to be saved to the hibernation image header.
>>>>> + */
>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
>>>>> +{
>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>> +
>>>>> +	if (max_size < sizeof(*hdr))
>>>>> +		return -EOVERFLOW;
>>>>> +
>>>>> +	arch_hdr_invariants(&hdr->invariants);
>>>>> +
>>>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
>>>>> +	hdr->saved_satp = csr_read(CSR_SATP);
>>>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
>>>>> +
>>>>> +/*
>>>>> + * Retrieve the helper parameters from the hibernation image header
>>>>> + */
>>>>> +int arch_hibernation_header_restore(void *addr)
>>>>> +{
>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	arch_hdr_invariants(&invariants);
>>>>> +
>>>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
>>>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
>>>>> +	if (sleep_cpu < 0) {
>>>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
>>>>> +		sleep_cpu = -EINVAL;
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +#ifdef CONFIG_SMP
>>>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
>>>>> +	if (ret) {
>>>>> +		sleep_cpu = -EINVAL;
>>>>> +		return ret;
>>>>> +	}
>>>>> +#endif
>>>>> +	resume_hdr = *hdr;
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
>>>>> +
>>>>> +int swsusp_arch_suspend(void)
>>>>> +{
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
>>>>> +		sleep_cpu = smp_processor_id();
>>>>> +		suspend_save_csrs(hibernate_cpu_context);
>>>>> +		ret = swsusp_save();
>>>>> +	} else {
>>>>> +		suspend_restore_csrs(hibernate_cpu_context);
>>>>> +		flush_tlb_all();
>>>>> +
>>>>> +		/* Invalidated Icache */
>>>>> +		flush_icache_all();
>>>>> +
>>>>> +		/*
>>>>> +		 * Tell the hibernation core that we've just restored
>>>>> +		 * the memory
>>>>> +		 */
>>>>> +		in_suspend = 0;
>>>>> +		sleep_cpu = -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pte_idx = pte_index(vaddr);
>>>>> +
>>>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +#ifndef __PAGETABLE_PMD_FOLDED
>>>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
>>>>> +		(pgtable_l5_enabled ?					\
>>>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
>>>>> +		(pgtable_l4_enabled ?					\
>>>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
>>>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
>>>>> +	pte_t *ptep;
>>>>> +
>>>>> +	if (pmd_none(pmdp[pmd_idx])) {
>>>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!ptep)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(ptep, 0, PAGE_SIZE);
>>>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pud_index = pud_index(vaddr);
>>>>> +	pmd_t *pmdp;
>>>>> +
>>>>> +	if (pud_val(pudp[pud_index]) == 0) {
>>>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!pmdp)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(pmdp, 0, PAGE_SIZE);
>>>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t p4d_index = p4d_index(vaddr);
>>>>> +	pud_t *pudp;
>>>>> +
>>>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
>>>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!pudp)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(pudp, 0, PAGE_SIZE);
>>>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +#else
>>>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
>>>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
>>>>> +#endif /* __PAGETABLE_PMD_FOLDED */
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
>>>>> +	void *nextp;
>>>>> +
>>>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
>>>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!nextp)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(nextp, 0, PAGE_SIZE);
>>>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);


Is it possible to use "standard" way of going through a page table 
instead of using the _next macros? I mean something like this (example 
from arm64 code 
https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/trans_pgd.c#L174 
or my recent kasan patchset 
https://patchwork.kernel.org/project/linux-riscv/patch/20230203075232.274282-3-alexghiti@rivosinc.com/):

         do {
                 next = pgd_addr_end(vaddr, end);

                 if (pgd_none(*pgd_k)) {
                         nextp = get_safe_page(GFP_ATOMIC);
                         memset(nextp, 0, PAGE_SIZE);
                         set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(nextp)), 
PAGE_TABLE));
continue;
}

                 kasan_shallow_populate_p4d(pgd_k, vaddr, next);
         } while (pgd_k++, vaddr = next, vaddr != end);


I have the same change to our early page table code on my todo list.


>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +static unsigned long relocate_restore_code(void)
>>>>> +{
>>>>> +	unsigned long ret;
>>>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
>>>>> +
>>>>> +	if (!page)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	copy_page(page, core_restore_code);
>>>>> +
>>>>> +	/* Make the page containing the relocated code executable */
>>>>> +	set_memory_x((unsigned long)page, 1);
>>>>> +
>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	return (unsigned long)page;
>>>>> +}
>>>>> +
>>>>> +int swsusp_arch_resume(void)
>>>>> +{
>>>>> +	unsigned long addr = PAGE_OFFSET;
>>>>> +	unsigned long ret;
>>>>> +
>>>>> +	/*
>>>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
>>>>> +	 * we don't need to free it here.
>>>>> +	 */
>>>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
>>>>> +	if (!resume_pg_dir)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	/*
>>>>> +	 * The pages need to be writable when restoring the image.
>>>>> +	 * Create a second copy of page table just for the linear map, and use this when
>>>>> +	 * restoring.
>>>>> +	 */
>>>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
>>>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
>>>>> +		if (ret)
>>>>> +			return (int)ret;
>>>>> +	}
>>>>> +
>>>> To me this is wrong as this does not account for the real physical
>>>> mapping layout: can't you simply copy the linear mapping from
>>>> swapper_pg_dir?
>>> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
>> swapper_pg_dir as we are not suppose to modify its content.
>>
>>
>> First, you're right, we need the temporary page table as swapper_pg_dir
>> will get overwritten under our feet.
>>
>> Now, I still disagree with mapping all the memory: the linear mapping is
>> sparse because we only map what memblock gives us (some regions are
>> marked as "nomap" for a reason).
>>
>> I just took a look at arm64, and they do exactly that: they go through
>> swapper_pg_dir, copy the linear mapping and enable write at every leaf
>> level
>> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
> You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the page validity prior to do the mapping. Agree?


That would work, we'd lose the benefit of huge pages though, I'm not 
opposed at all but if we can leverage existing arm64 code, that would 
even be better, only the PTE write flag is different!

Thanks,

Alex


>>
>>>> But I have to admit that I struggle to understand the need for this
>>>> temporary page table: all we need to do is to allow to write to the
>>>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
>>> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify
>> the swapper_pg_dir, the kernel will crash afterwards.
>>> That’s why we need a second page table to do the recovering job.
>>>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
>>>>> +	relocated_restore_code = relocate_restore_code();
>>>> And do we really need to do that too? The code in question can only be
>>>> overwritten by the same code right?
>>> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
>>>> Thanks,
>>>>
>>>> Alex
>>>>
>>>>
>>>>> +	if (relocated_restore_code == -ENOMEM)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	/*
>>>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
>>>>> +	 * restore code can jump to it after finished restore the image. The next execution
>>>>> +	 * code doesn't find itself in a different address space after switching over to the
>>>>> +	 * original page table used by the hibernated image.
>>>>> +	 */
>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
>>>>> +					PAGE_KERNEL_READ_EXEC);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
>>>>> +			resume_hdr.restore_cpu_addr);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +#ifdef CONFIG_PM_SLEEP_SMP
>>>>> +int hibernate_resume_nonboot_cpu_disable(void)
>>>>> +{
>>>>> +	if (sleep_cpu < 0) {
>>>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
>>>>> +		return -ENODEV;
>>>>> +	}
>>>>> +
>>>>> +	return freeze_secondary_cpus(sleep_cpu);
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>> +static int __init riscv_hibernate_init(void)
>>>>> +{
>>>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
>>>>> +
>>>>> +	if (WARN_ON(!hibernate_cpu_context))
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +early_initcall(riscv_hibernate_init);
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-10 13:24             ` Alexandre Ghiti
  0 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-10 13:24 UTC (permalink / raw)
  To: JeeHeng Sia, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

On 2/9/23 07:12, JeeHeng Sia wrote:
> Hi Alex,
>
>> -----Original Message-----
>> From: Alexandre Ghiti <alex@ghiti.fr>
>> Sent: Wednesday, 8 February, 2023 8:05 PM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>> <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>
>> Hi Sia,
>>
>> On 2/8/23 05:43, JeeHeng Sia wrote:
>>>> -----Original Message-----
>>>> From: Alexandre Ghiti <alex@ghiti.fr>
>>>> Sent: Tuesday, 7 February, 2023 11:46 PM
>>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>>>> <mason.huo@starfivetech.com>
>>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>>>
>>>> Hi Sia,
>>>>
>>>> On 1/27/23 10:10, Sia Jee Heng wrote:
>>>>> Low level Arch functions were created to support hibernation.
>>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
>>>>> cpu state onto the stack, then calling swsusp_save() to save the memory
>>>>> image.
>>>>>
>>>>> Arch specific hibernation header is implemented and is utilized by the
>>>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
>>>>> functions. The arch specific hibernation header consists of satp, hartid,
>>>>> and the cpu_resume address. The kernel built version is also need to be
>>>>> saved into the hibernation image header to making sure only the same
>>>>> kernel is restore when resume.
>>>>>
>>>>> swsusp_arch_resume() creates a temporary page table that covering only
>>>>> the linear map. It copies the restore code to a 'safe' page, then start
>>>>> to restore the memory image. Once completed, it restores the original
>>>>> kernel's page table. It then calls into __hibernate_cpu_resume()
>>>>> to restore the CPU context. Finally, it follows the normal hibernation
>>>>> path back to the hibernation core.
>>>>>
>>>>> To enable hibernation/suspend to disk into RISCV, the below config
>>>>> need to be enabled:
>>>>> - CONFIG_ARCH_HIBERNATION_HEADER
>>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>>>>>
>>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
>>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
>>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
>>>>> ---
>>>>>     arch/riscv/Kconfig                 |   7 +
>>>>>     arch/riscv/include/asm/assembler.h |  20 ++
>>>>>     arch/riscv/include/asm/suspend.h   |  21 ++
>>>>>     arch/riscv/kernel/Makefile         |   1 +
>>>>>     arch/riscv/kernel/asm-offsets.c    |   5 +
>>>>>     arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>>>>>     arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>>>>>     7 files changed, 503 insertions(+)
>>>>>     create mode 100644 arch/riscv/kernel/hibernate-asm.S
>>>>>     create mode 100644 arch/riscv/kernel/hibernate.c
>>>>>
>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>>> index e2b656043abf..4555848a817f 100644
>>>>> --- a/arch/riscv/Kconfig
>>>>> +++ b/arch/riscv/Kconfig
>>>>> @@ -690,6 +690,13 @@ menu "Power management options"
>>>>>
>>>>>     source "kernel/power/Kconfig"
>>>>>
>>>>> +config ARCH_HIBERNATION_POSSIBLE
>>>>> +	def_bool y
>>>>> +
>>>>> +config ARCH_HIBERNATION_HEADER
>>>>> +	def_bool y
>>>>> +	depends on HIBERNATION
>>>>> +
>>>>>     endmenu # "Power management options"
>>>>>
>>>>>     menu "CPU Power Management"
>>>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
>>>>> index ef1283d04b70..3de70d3e6ceb 100644
>>>>> --- a/arch/riscv/include/asm/assembler.h
>>>>> +++ b/arch/riscv/include/asm/assembler.h
>>>>> @@ -59,4 +59,24 @@
>>>>>     		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>>>>>     	.endm
>>>>>
>>>>> +/**
>>>>> + * copy_page - copy 1 page (4KB) of data from source to destination
>>>>> + * @a0 - destination
>>>>> + * @a1 - source
>>>>> + */
>>>>> +	.macro	copy_page a0, a1
>>>>> +		lui	a2, 0x1
>>>>> +		add	a2, a2, a0
>>>>> +.1 :
>>>>> +		REG_L	t0, 0(a1)
>>>>> +		REG_L	t1, SZREG(a1)
>>>>> +
>>>>> +		REG_S	t0, 0(a0)
>>>>> +		REG_S	t1, SZREG(a0)
>>>>> +
>>>>> +		addi	a0, a0, 2 * SZREG
>>>>> +		addi	a1, a1, 2 * SZREG
>>>>> +		bne	a2, a0, .1
>>>>> +	.endm
>>>>> +
>>>>>     #endif	/* __ASM_ASSEMBLER_H */
>>>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
>>>>> index 75419c5ca272..db40ae433aa9 100644
>>>>> --- a/arch/riscv/include/asm/suspend.h
>>>>> +++ b/arch/riscv/include/asm/suspend.h
>>>>> @@ -21,6 +21,12 @@ struct suspend_context {
>>>>>     #endif
>>>>>     };
>>>>>
>>>>> +/*
>>>>> + * This parameter will be assigned to 0 during resume and will be used by
>>>>> + * hibernation core for the subsequent resume sequence
>>>>> + */
>>>>> +extern int in_suspend;
>>>>> +
>>>>>     /* Low-level CPU suspend entry function */
>>>>>     int __cpu_suspend_enter(struct suspend_context *context);
>>>>>
>>>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>>>>>     /* Used to save and restore the csr */
>>>>>     void suspend_save_csrs(struct suspend_context *context);
>>>>>     void suspend_restore_csrs(struct suspend_context *context);
>>>>> +
>>>>> +/* Low-level API to support hibernation */
>>>>> +int swsusp_arch_suspend(void);
>>>>> +int swsusp_arch_resume(void);
>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
>>>>> +int arch_hibernation_header_restore(void *addr);
>>>>> +int __hibernate_cpu_resume(void);
>>>>> +
>>>>> +/* Used to resume on the CPU we hibernated on */
>>>>> +int hibernate_resume_nonboot_cpu_disable(void);
>>>>> +
>>>>> +/* Used to restore the hibernated image */
>>>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
>>>>> +				unsigned long cpu_resume);
>>>>> +asmlinkage int core_restore_code(void);
>>>>>     #endif
>>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>>>> index 4cf303a779ab..daab341d55e4 100644
>>>>> --- a/arch/riscv/kernel/Makefile
>>>>> +++ b/arch/riscv/kernel/Makefile
>>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>>>>>     obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>>>>>
>>>>>     obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
>>>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>>>>>
>>>>>     obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>>>>>     obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>>>> index df9444397908..d6a75aac1d27 100644
>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>> @@ -9,6 +9,7 @@
>>>>>     #include <linux/kbuild.h>
>>>>>     #include <linux/mm.h>
>>>>>     #include <linux/sched.h>
>>>>> +#include <linux/suspend.h>
>>>>>     #include <asm/kvm_host.h>
>>>>>     #include <asm/thread_info.h>
>>>>>     #include <asm/ptrace.h>
>>>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
>>>>>
>>>>>     	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>>>>>
>>>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
>>>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
>>>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
>>>>> +
>>>>>     	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>>>>>     	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>>>>>     	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
>>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
>>>>> new file mode 100644
>>>>> index 000000000000..a83d534b89bd
>>>>> --- /dev/null
>>>>> +++ b/arch/riscv/kernel/hibernate-asm.S
>>>>> @@ -0,0 +1,89 @@
>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>>> +/*
>>>>> + * Hibernation support specific for RISCV
>>>>> + *
>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>> + *
>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>> + */
>>>>> +
>>>>> +#include <asm/asm.h>
>>>>> +#include <asm/asm-offsets.h>
>>>>> +#include <asm/assembler.h>
>>>>> +#include <asm/csr.h>
>>>>> +
>>>>> +#include <linux/linkage.h>
>>>>> +
>>>>> +/*
>>>>> + * This code is executed when resume from the hibernation.
>>>>> + *
>>>>> + * It begins with loading the temporary page table then restores the memory image.
>>>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
>>>>> + * swsusp_arch_suspend().
>>>>> + */
>>>>> +
>>>>> +/*
>>>>> + * int __hibernate_cpu_resume(void)
>>>>> + * Switch back to the hibernated image's page table prior to restore the CPU
>>>>> + * context.
>>>>> + *
>>>>> + * Always returns 0 to the C code.
>>>>> + */
>>>>> +ENTRY(__hibernate_cpu_resume)
>>>>> +	/* switch to hibernated image's page table */
>>>>> +	csrw CSR_SATP, s0
>>>>> +	sfence.vma
>>>>> +
>>>>> +	REG_L	a0, hibernate_cpu_context
>>>>> +
>>>>> +	/* Restore CSRs */
>>>>> +	restore_csr
>>>>> +
>>>>> +	/* Restore registers (except A0 and T0-T6) */
>>>>> +	restore_reg
>>>>> +
>>>>> +	/* Return zero value */
>>>>> +	add	a0, zero, zero
>>>>> +
>>>>> +	/* Return to C code */
>>>>> +	ret
>>>>> +END(__hibernate_cpu_resume)
>>>>> +
>>>>> +/*
>>>>> + * Prepare to restore the image.
>>>>> + * a0: satp of saved page tables
>>>>> + * a1: satp of temporary page tables
>>>>> + * a2: cpu_resume
>>>>> + */
>>>>> +ENTRY(restore_image)
>>>>> +	mv	s0, a0
>>>>> +	mv	s1, a1
>>>>> +	mv	s2, a2
>>>>> +	REG_L	s4, restore_pblist
>>>>> +	REG_L	a1, relocated_restore_code
>>>>> +
>>>>> +	jalr	a1
>>>>> +END(restore_image)
>>>>> +
>>>>> +/*
>>>>> + * The below code will be executed from a 'safe' page.
>>>>> + * It first switches to the temporary page table, then start to copy the pages
>>>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>>>>> + * to restore the CPU context.
>>>>> + */
>>>>> +ENTRY(core_restore_code)
>>>>> +	/* switch to temp page table */
>>>>> +	csrw satp, s1
>>>>> +	sfence.vma
>>>>> +.Lcopy:
>>>>> +	/* The below code will restore the hibernated image. */
>>>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
>>>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
>>>>> +
>>>>> +	copy_page a0, a1
>>>>> +
>>>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
>>>>> +	bnez	s4, .Lcopy
>>>>> +
>>>>> +	jalr	s2
>>>>> +END(core_restore_code)
>>>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
>>>>> new file mode 100644
>>>>> index 000000000000..bf7f3c781820
>>>>> --- /dev/null
>>>>> +++ b/arch/riscv/kernel/hibernate.c
>>>>> @@ -0,0 +1,360 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>>> +/*
>>>>> + * Hibernation support specific for RISCV
>>>>> + *
>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>> + *
>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>> + */
>>>>> +
>>>>> +#include <asm/barrier.h>
>>>>> +#include <asm/cacheflush.h>
>>>>> +#include <asm/mmu_context.h>
>>>>> +#include <asm/page.h>
>>>>> +#include <asm/pgtable.h>
>>>>> +#include <asm/sections.h>
>>>>> +#include <asm/set_memory.h>
>>>>> +#include <asm/smp.h>
>>>>> +#include <asm/suspend.h>
>>>>> +
>>>>> +#include <linux/cpu.h>
>>>>> +#include <linux/memblock.h>
>>>>> +#include <linux/pm.h>
>>>>> +#include <linux/sched.h>
>>>>> +#include <linux/suspend.h>
>>>>> +#include <linux/utsname.h>
>>>>> +
>>>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
>>>>> +static int sleep_cpu = -EINVAL;
>>>>> +
>>>>> +/* CPU context to be saved */
>>>>> +struct suspend_context *hibernate_cpu_context;
>>>>> +
>>>>> +unsigned long relocated_restore_code;
>>>>> +
>>>>> +/* Pointer to the temporary resume page table */
>>>>> +pgd_t *resume_pg_dir;
>>>>> +
>>>>> +/**
>>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
>>>>> + * @uts_version: to save the build number and date so that the we are not resume with
>>>>> + *		a different kernel
>>>>> + */
>>>>> +struct arch_hibernate_hdr_invariants {
>>>>> +	char		uts_version[__NEW_UTS_LEN + 1];
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
>>>>> + * @invariants: container to store kernel build version
>>>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>>>>> + * @saved_satp: original page table used by the hibernated image.
>>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
>>>>> + */
>>>>> +static struct arch_hibernate_hdr {
>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>> +	unsigned long	hartid;
>>>>> +	unsigned long	saved_satp;
>>>>> +	unsigned long	restore_cpu_addr;
>>>>> +} resume_hdr;
>>>>> +
>>>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>>>>> +{
>>>>> +	memset(i, 0, sizeof(*i));
>>>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Check if the given pfn is in the 'nosave' section.
>>>>> + */
>>>>> +int pfn_is_nosave(unsigned long pfn)
>>>>> +{
>>>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
>>>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>>>> +
>>>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
>>>>> +}
>>>>> +
>>>>> +void notrace save_processor_state(void)
>>>>> +{
>>>>> +	WARN_ON(num_online_cpus() != 1);
>>>>> +}
>>>>> +
>>>>> +void notrace restore_processor_state(void)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Helper parameters need to be saved to the hibernation image header.
>>>>> + */
>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
>>>>> +{
>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>> +
>>>>> +	if (max_size < sizeof(*hdr))
>>>>> +		return -EOVERFLOW;
>>>>> +
>>>>> +	arch_hdr_invariants(&hdr->invariants);
>>>>> +
>>>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
>>>>> +	hdr->saved_satp = csr_read(CSR_SATP);
>>>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
>>>>> +
>>>>> +/*
>>>>> + * Retrieve the helper parameters from the hibernation image header
>>>>> + */
>>>>> +int arch_hibernation_header_restore(void *addr)
>>>>> +{
>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	arch_hdr_invariants(&invariants);
>>>>> +
>>>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
>>>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
>>>>> +	if (sleep_cpu < 0) {
>>>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
>>>>> +		sleep_cpu = -EINVAL;
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +#ifdef CONFIG_SMP
>>>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
>>>>> +	if (ret) {
>>>>> +		sleep_cpu = -EINVAL;
>>>>> +		return ret;
>>>>> +	}
>>>>> +#endif
>>>>> +	resume_hdr = *hdr;
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
>>>>> +
>>>>> +int swsusp_arch_suspend(void)
>>>>> +{
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
>>>>> +		sleep_cpu = smp_processor_id();
>>>>> +		suspend_save_csrs(hibernate_cpu_context);
>>>>> +		ret = swsusp_save();
>>>>> +	} else {
>>>>> +		suspend_restore_csrs(hibernate_cpu_context);
>>>>> +		flush_tlb_all();
>>>>> +
>>>>> +		/* Invalidated Icache */
>>>>> +		flush_icache_all();
>>>>> +
>>>>> +		/*
>>>>> +		 * Tell the hibernation core that we've just restored
>>>>> +		 * the memory
>>>>> +		 */
>>>>> +		in_suspend = 0;
>>>>> +		sleep_cpu = -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pte_idx = pte_index(vaddr);
>>>>> +
>>>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +#ifndef __PAGETABLE_PMD_FOLDED
>>>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
>>>>> +		(pgtable_l5_enabled ?					\
>>>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
>>>>> +		(pgtable_l4_enabled ?					\
>>>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
>>>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
>>>>> +	pte_t *ptep;
>>>>> +
>>>>> +	if (pmd_none(pmdp[pmd_idx])) {
>>>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!ptep)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(ptep, 0, PAGE_SIZE);
>>>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pud_index = pud_index(vaddr);
>>>>> +	pmd_t *pmdp;
>>>>> +
>>>>> +	if (pud_val(pudp[pud_index]) == 0) {
>>>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!pmdp)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(pmdp, 0, PAGE_SIZE);
>>>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t p4d_index = p4d_index(vaddr);
>>>>> +	pud_t *pudp;
>>>>> +
>>>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
>>>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!pudp)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(pudp, 0, PAGE_SIZE);
>>>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +#else
>>>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
>>>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
>>>>> +#endif /* __PAGETABLE_PMD_FOLDED */
>>>>> +
>>>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
>>>>> +	void *nextp;
>>>>> +
>>>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
>>>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
>>>>> +		if (!nextp)
>>>>> +			return -ENOMEM;
>>>>> +
>>>>> +		memset(nextp, 0, PAGE_SIZE);
>>>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
>>>>> +	} else {
>>>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
>>>>> +	}
>>>>> +
>>>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);


Is it possible to use "standard" way of going through a page table 
instead of using the _next macros? I mean something like this (example 
from arm64 code 
https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/trans_pgd.c#L174 
or my recent kasan patchset 
https://patchwork.kernel.org/project/linux-riscv/patch/20230203075232.274282-3-alexghiti@rivosinc.com/):

         do {
                 next = pgd_addr_end(vaddr, end);

                 if (pgd_none(*pgd_k)) {
                         nextp = get_safe_page(GFP_ATOMIC);
                         memset(nextp, 0, PAGE_SIZE);
                         set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(nextp)), 
PAGE_TABLE));
continue;
}

                 kasan_shallow_populate_p4d(pgd_k, vaddr, next);
         } while (pgd_k++, vaddr = next, vaddr != end);


I have the same change to our early page table code on my todo list.


>>>>> +}
>>>>> +
>>>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>> +{
>>>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
>>>>> +}
>>>>> +
>>>>> +static unsigned long relocate_restore_code(void)
>>>>> +{
>>>>> +	unsigned long ret;
>>>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
>>>>> +
>>>>> +	if (!page)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	copy_page(page, core_restore_code);
>>>>> +
>>>>> +	/* Make the page containing the relocated code executable */
>>>>> +	set_memory_x((unsigned long)page, 1);
>>>>> +
>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	return (unsigned long)page;
>>>>> +}
>>>>> +
>>>>> +int swsusp_arch_resume(void)
>>>>> +{
>>>>> +	unsigned long addr = PAGE_OFFSET;
>>>>> +	unsigned long ret;
>>>>> +
>>>>> +	/*
>>>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
>>>>> +	 * we don't need to free it here.
>>>>> +	 */
>>>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
>>>>> +	if (!resume_pg_dir)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	/*
>>>>> +	 * The pages need to be writable when restoring the image.
>>>>> +	 * Create a second copy of page table just for the linear map, and use this when
>>>>> +	 * restoring.
>>>>> +	 */
>>>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
>>>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
>>>>> +		if (ret)
>>>>> +			return (int)ret;
>>>>> +	}
>>>>> +
>>>> To me this is wrong as this does not account for the real physical
>>>> mapping layout: can't you simply copy the linear mapping from
>>>> swapper_pg_dir?
>>> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
>> swapper_pg_dir as we are not suppose to modify its content.
>>
>>
>> First, you're right, we need the temporary page table as swapper_pg_dir
>> will get overwritten under our feet.
>>
>> Now, I still disagree with mapping all the memory: the linear mapping is
>> sparse because we only map what memblock gives us (some regions are
>> marked as "nomap" for a reason).
>>
>> I just took a look at arm64, and they do exactly that: they go through
>> swapper_pg_dir, copy the linear mapping and enable write at every leaf
>> level
>> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
> You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the page validity prior to do the mapping. Agree?


That would work, we'd lose the benefit of huge pages though, I'm not 
opposed at all but if we can leverage existing arm64 code, that would 
even be better, only the PTE write flag is different!

Thanks,

Alex


>>
>>>> But I have to admit that I struggle to understand the need for this
>>>> temporary page table: all we need to do is to allow to write to the
>>>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
>>> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you modify
>> the swapper_pg_dir, the kernel will crash afterwards.
>>> That’s why we need a second page table to do the recovering job.
>>>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
>>>>> +	relocated_restore_code = relocate_restore_code();
>>>> And do we really need to do that too? The code in question can only be
>>>> overwritten by the same code right?
>>> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
>>>> Thanks,
>>>>
>>>> Alex
>>>>
>>>>
>>>>> +	if (relocated_restore_code == -ENOMEM)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	/*
>>>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
>>>>> +	 * restore code can jump to it after finished restore the image. The next execution
>>>>> +	 * code doesn't find itself in a different address space after switching over to the
>>>>> +	 * original page table used by the hibernated image.
>>>>> +	 */
>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
>>>>> +					PAGE_KERNEL_READ_EXEC);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
>>>>> +			resume_hdr.restore_cpu_addr);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +#ifdef CONFIG_PM_SLEEP_SMP
>>>>> +int hibernate_resume_nonboot_cpu_disable(void)
>>>>> +{
>>>>> +	if (sleep_cpu < 0) {
>>>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
>>>>> +		return -ENODEV;
>>>>> +	}
>>>>> +
>>>>> +	return freeze_secondary_cpus(sleep_cpu);
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>> +static int __init riscv_hibernate_init(void)
>>>>> +{
>>>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
>>>>> +
>>>>> +	if (WARN_ON(!hibernate_cpu_context))
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +early_initcall(riscv_hibernate_init);
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-10 13:24             ` Alexandre Ghiti
@ 2023-02-13  1:51               ` JeeHeng Sia
  -1 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-13  1:51 UTC (permalink / raw)
  To: Alexandre Ghiti, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

Hi Alex,

> -----Original Message-----
> From: Alexandre Ghiti <alex@ghiti.fr>
> Sent: Friday, 10 February, 2023 9:24 PM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> On 2/9/23 07:12, JeeHeng Sia wrote:
> > Hi Alex,
> >
> >> -----Original Message-----
> >> From: Alexandre Ghiti <alex@ghiti.fr>
> >> Sent: Wednesday, 8 February, 2023 8:05 PM
> >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> >> <mason.huo@starfivetech.com>
> >> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >>
> >> Hi Sia,
> >>
> >> On 2/8/23 05:43, JeeHeng Sia wrote:
> >>>> -----Original Message-----
> >>>> From: Alexandre Ghiti <alex@ghiti.fr>
> >>>> Sent: Tuesday, 7 February, 2023 11:46 PM
> >>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> >>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> >>>> <mason.huo@starfivetech.com>
> >>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >>>>
> >>>> Hi Sia,
> >>>>
> >>>> On 1/27/23 10:10, Sia Jee Heng wrote:
> >>>>> Low level Arch functions were created to support hibernation.
> >>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> >>>>> cpu state onto the stack, then calling swsusp_save() to save the memory
> >>>>> image.
> >>>>>
> >>>>> Arch specific hibernation header is implemented and is utilized by the
> >>>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
> >>>>> functions. The arch specific hibernation header consists of satp, hartid,
> >>>>> and the cpu_resume address. The kernel built version is also need to be
> >>>>> saved into the hibernation image header to making sure only the same
> >>>>> kernel is restore when resume.
> >>>>>
> >>>>> swsusp_arch_resume() creates a temporary page table that covering only
> >>>>> the linear map. It copies the restore code to a 'safe' page, then start
> >>>>> to restore the memory image. Once completed, it restores the original
> >>>>> kernel's page table. It then calls into __hibernate_cpu_resume()
> >>>>> to restore the CPU context. Finally, it follows the normal hibernation
> >>>>> path back to the hibernation core.
> >>>>>
> >>>>> To enable hibernation/suspend to disk into RISCV, the below config
> >>>>> need to be enabled:
> >>>>> - CONFIG_ARCH_HIBERNATION_HEADER
> >>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >>>>>
> >>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> >>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> >>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> >>>>> ---
> >>>>>     arch/riscv/Kconfig                 |   7 +
> >>>>>     arch/riscv/include/asm/assembler.h |  20 ++
> >>>>>     arch/riscv/include/asm/suspend.h   |  21 ++
> >>>>>     arch/riscv/kernel/Makefile         |   1 +
> >>>>>     arch/riscv/kernel/asm-offsets.c    |   5 +
> >>>>>     arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >>>>>     arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >>>>>     7 files changed, 503 insertions(+)
> >>>>>     create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >>>>>     create mode 100644 arch/riscv/kernel/hibernate.c
> >>>>>
> >>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> >>>>> index e2b656043abf..4555848a817f 100644
> >>>>> --- a/arch/riscv/Kconfig
> >>>>> +++ b/arch/riscv/Kconfig
> >>>>> @@ -690,6 +690,13 @@ menu "Power management options"
> >>>>>
> >>>>>     source "kernel/power/Kconfig"
> >>>>>
> >>>>> +config ARCH_HIBERNATION_POSSIBLE
> >>>>> +	def_bool y
> >>>>> +
> >>>>> +config ARCH_HIBERNATION_HEADER
> >>>>> +	def_bool y
> >>>>> +	depends on HIBERNATION
> >>>>> +
> >>>>>     endmenu # "Power management options"
> >>>>>
> >>>>>     menu "CPU Power Management"
> >>>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> >>>>> index ef1283d04b70..3de70d3e6ceb 100644
> >>>>> --- a/arch/riscv/include/asm/assembler.h
> >>>>> +++ b/arch/riscv/include/asm/assembler.h
> >>>>> @@ -59,4 +59,24 @@
> >>>>>     		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >>>>>     	.endm
> >>>>>
> >>>>> +/**
> >>>>> + * copy_page - copy 1 page (4KB) of data from source to destination
> >>>>> + * @a0 - destination
> >>>>> + * @a1 - source
> >>>>> + */
> >>>>> +	.macro	copy_page a0, a1
> >>>>> +		lui	a2, 0x1
> >>>>> +		add	a2, a2, a0
> >>>>> +.1 :
> >>>>> +		REG_L	t0, 0(a1)
> >>>>> +		REG_L	t1, SZREG(a1)
> >>>>> +
> >>>>> +		REG_S	t0, 0(a0)
> >>>>> +		REG_S	t1, SZREG(a0)
> >>>>> +
> >>>>> +		addi	a0, a0, 2 * SZREG
> >>>>> +		addi	a1, a1, 2 * SZREG
> >>>>> +		bne	a2, a0, .1
> >>>>> +	.endm
> >>>>> +
> >>>>>     #endif	/* __ASM_ASSEMBLER_H */
> >>>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> >>>>> index 75419c5ca272..db40ae433aa9 100644
> >>>>> --- a/arch/riscv/include/asm/suspend.h
> >>>>> +++ b/arch/riscv/include/asm/suspend.h
> >>>>> @@ -21,6 +21,12 @@ struct suspend_context {
> >>>>>     #endif
> >>>>>     };
> >>>>>
> >>>>> +/*
> >>>>> + * This parameter will be assigned to 0 during resume and will be used by
> >>>>> + * hibernation core for the subsequent resume sequence
> >>>>> + */
> >>>>> +extern int in_suspend;
> >>>>> +
> >>>>>     /* Low-level CPU suspend entry function */
> >>>>>     int __cpu_suspend_enter(struct suspend_context *context);
> >>>>>
> >>>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >>>>>     /* Used to save and restore the csr */
> >>>>>     void suspend_save_csrs(struct suspend_context *context);
> >>>>>     void suspend_restore_csrs(struct suspend_context *context);
> >>>>> +
> >>>>> +/* Low-level API to support hibernation */
> >>>>> +int swsusp_arch_suspend(void);
> >>>>> +int swsusp_arch_resume(void);
> >>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> >>>>> +int arch_hibernation_header_restore(void *addr);
> >>>>> +int __hibernate_cpu_resume(void);
> >>>>> +
> >>>>> +/* Used to resume on the CPU we hibernated on */
> >>>>> +int hibernate_resume_nonboot_cpu_disable(void);
> >>>>> +
> >>>>> +/* Used to restore the hibernated image */
> >>>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> >>>>> +				unsigned long cpu_resume);
> >>>>> +asmlinkage int core_restore_code(void);
> >>>>>     #endif
> >>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> >>>>> index 4cf303a779ab..daab341d55e4 100644
> >>>>> --- a/arch/riscv/kernel/Makefile
> >>>>> +++ b/arch/riscv/kernel/Makefile
> >>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >>>>>     obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >>>>>
> >>>>>     obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> >>>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >>>>>
> >>>>>     obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >>>>>     obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> >>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> >>>>> index df9444397908..d6a75aac1d27 100644
> >>>>> --- a/arch/riscv/kernel/asm-offsets.c
> >>>>> +++ b/arch/riscv/kernel/asm-offsets.c
> >>>>> @@ -9,6 +9,7 @@
> >>>>>     #include <linux/kbuild.h>
> >>>>>     #include <linux/mm.h>
> >>>>>     #include <linux/sched.h>
> >>>>> +#include <linux/suspend.h>
> >>>>>     #include <asm/kvm_host.h>
> >>>>>     #include <asm/thread_info.h>
> >>>>>     #include <asm/ptrace.h>
> >>>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
> >>>>>
> >>>>>     	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >>>>>
> >>>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> >>>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> >>>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> >>>>> +
> >>>>>     	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >>>>>     	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >>>>>     	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> >>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> >>>>> new file mode 100644
> >>>>> index 000000000000..a83d534b89bd
> >>>>> --- /dev/null
> >>>>> +++ b/arch/riscv/kernel/hibernate-asm.S
> >>>>> @@ -0,0 +1,89 @@
> >>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>>>> +/*
> >>>>> + * Hibernation support specific for RISCV
> >>>>> + *
> >>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>>>> + *
> >>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>>>> + */
> >>>>> +
> >>>>> +#include <asm/asm.h>
> >>>>> +#include <asm/asm-offsets.h>
> >>>>> +#include <asm/assembler.h>
> >>>>> +#include <asm/csr.h>
> >>>>> +
> >>>>> +#include <linux/linkage.h>
> >>>>> +
> >>>>> +/*
> >>>>> + * This code is executed when resume from the hibernation.
> >>>>> + *
> >>>>> + * It begins with loading the temporary page table then restores the memory image.
> >>>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> >>>>> + * swsusp_arch_suspend().
> >>>>> + */
> >>>>> +
> >>>>> +/*
> >>>>> + * int __hibernate_cpu_resume(void)
> >>>>> + * Switch back to the hibernated image's page table prior to restore the CPU
> >>>>> + * context.
> >>>>> + *
> >>>>> + * Always returns 0 to the C code.
> >>>>> + */
> >>>>> +ENTRY(__hibernate_cpu_resume)
> >>>>> +	/* switch to hibernated image's page table */
> >>>>> +	csrw CSR_SATP, s0
> >>>>> +	sfence.vma
> >>>>> +
> >>>>> +	REG_L	a0, hibernate_cpu_context
> >>>>> +
> >>>>> +	/* Restore CSRs */
> >>>>> +	restore_csr
> >>>>> +
> >>>>> +	/* Restore registers (except A0 and T0-T6) */
> >>>>> +	restore_reg
> >>>>> +
> >>>>> +	/* Return zero value */
> >>>>> +	add	a0, zero, zero
> >>>>> +
> >>>>> +	/* Return to C code */
> >>>>> +	ret
> >>>>> +END(__hibernate_cpu_resume)
> >>>>> +
> >>>>> +/*
> >>>>> + * Prepare to restore the image.
> >>>>> + * a0: satp of saved page tables
> >>>>> + * a1: satp of temporary page tables
> >>>>> + * a2: cpu_resume
> >>>>> + */
> >>>>> +ENTRY(restore_image)
> >>>>> +	mv	s0, a0
> >>>>> +	mv	s1, a1
> >>>>> +	mv	s2, a2
> >>>>> +	REG_L	s4, restore_pblist
> >>>>> +	REG_L	a1, relocated_restore_code
> >>>>> +
> >>>>> +	jalr	a1
> >>>>> +END(restore_image)
> >>>>> +
> >>>>> +/*
> >>>>> + * The below code will be executed from a 'safe' page.
> >>>>> + * It first switches to the temporary page table, then start to copy the pages
> >>>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> >>>>> + * to restore the CPU context.
> >>>>> + */
> >>>>> +ENTRY(core_restore_code)
> >>>>> +	/* switch to temp page table */
> >>>>> +	csrw satp, s1
> >>>>> +	sfence.vma
> >>>>> +.Lcopy:
> >>>>> +	/* The below code will restore the hibernated image. */
> >>>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> >>>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> >>>>> +
> >>>>> +	copy_page a0, a1
> >>>>> +
> >>>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> >>>>> +	bnez	s4, .Lcopy
> >>>>> +
> >>>>> +	jalr	s2
> >>>>> +END(core_restore_code)
> >>>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> >>>>> new file mode 100644
> >>>>> index 000000000000..bf7f3c781820
> >>>>> --- /dev/null
> >>>>> +++ b/arch/riscv/kernel/hibernate.c
> >>>>> @@ -0,0 +1,360 @@
> >>>>> +// SPDX-License-Identifier: GPL-2.0-only
> >>>>> +/*
> >>>>> + * Hibernation support specific for RISCV
> >>>>> + *
> >>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>>>> + *
> >>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>>>> + */
> >>>>> +
> >>>>> +#include <asm/barrier.h>
> >>>>> +#include <asm/cacheflush.h>
> >>>>> +#include <asm/mmu_context.h>
> >>>>> +#include <asm/page.h>
> >>>>> +#include <asm/pgtable.h>
> >>>>> +#include <asm/sections.h>
> >>>>> +#include <asm/set_memory.h>
> >>>>> +#include <asm/smp.h>
> >>>>> +#include <asm/suspend.h>
> >>>>> +
> >>>>> +#include <linux/cpu.h>
> >>>>> +#include <linux/memblock.h>
> >>>>> +#include <linux/pm.h>
> >>>>> +#include <linux/sched.h>
> >>>>> +#include <linux/suspend.h>
> >>>>> +#include <linux/utsname.h>
> >>>>> +
> >>>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> >>>>> +static int sleep_cpu = -EINVAL;
> >>>>> +
> >>>>> +/* CPU context to be saved */
> >>>>> +struct suspend_context *hibernate_cpu_context;
> >>>>> +
> >>>>> +unsigned long relocated_restore_code;
> >>>>> +
> >>>>> +/* Pointer to the temporary resume page table */
> >>>>> +pgd_t *resume_pg_dir;
> >>>>> +
> >>>>> +/**
> >>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> >>>>> + * @uts_version: to save the build number and date so that the we are not resume with
> >>>>> + *		a different kernel
> >>>>> + */
> >>>>> +struct arch_hibernate_hdr_invariants {
> >>>>> +	char		uts_version[__NEW_UTS_LEN + 1];
> >>>>> +};
> >>>>> +
> >>>>> +/**
> >>>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> >>>>> + * @invariants: container to store kernel build version
> >>>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> >>>>> + * @saved_satp: original page table used by the hibernated image.
> >>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> >>>>> + */
> >>>>> +static struct arch_hibernate_hdr {
> >>>>> +	struct arch_hibernate_hdr_invariants invariants;
> >>>>> +	unsigned long	hartid;
> >>>>> +	unsigned long	saved_satp;
> >>>>> +	unsigned long	restore_cpu_addr;
> >>>>> +} resume_hdr;
> >>>>> +
> >>>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> >>>>> +{
> >>>>> +	memset(i, 0, sizeof(*i));
> >>>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> >>>>> +}
> >>>>> +
> >>>>> +/*
> >>>>> + * Check if the given pfn is in the 'nosave' section.
> >>>>> + */
> >>>>> +int pfn_is_nosave(unsigned long pfn)
> >>>>> +{
> >>>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> >>>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> >>>>> +
> >>>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> >>>>> +}
> >>>>> +
> >>>>> +void notrace save_processor_state(void)
> >>>>> +{
> >>>>> +	WARN_ON(num_online_cpus() != 1);
> >>>>> +}
> >>>>> +
> >>>>> +void notrace restore_processor_state(void)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>>> +/*
> >>>>> + * Helper parameters need to be saved to the hibernation image header.
> >>>>> + */
> >>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> >>>>> +{
> >>>>> +	struct arch_hibernate_hdr *hdr = addr;
> >>>>> +
> >>>>> +	if (max_size < sizeof(*hdr))
> >>>>> +		return -EOVERFLOW;
> >>>>> +
> >>>>> +	arch_hdr_invariants(&hdr->invariants);
> >>>>> +
> >>>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> >>>>> +	hdr->saved_satp = csr_read(CSR_SATP);
> >>>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
> >>>>> +
> >>>>> +/*
> >>>>> + * Retrieve the helper parameters from the hibernation image header
> >>>>> + */
> >>>>> +int arch_hibernation_header_restore(void *addr)
> >>>>> +{
> >>>>> +	struct arch_hibernate_hdr_invariants invariants;
> >>>>> +	struct arch_hibernate_hdr *hdr = addr;
> >>>>> +	int ret = 0;
> >>>>> +
> >>>>> +	arch_hdr_invariants(&invariants);
> >>>>> +
> >>>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> >>>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> >>>>> +	if (sleep_cpu < 0) {
> >>>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> >>>>> +		sleep_cpu = -EINVAL;
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +#ifdef CONFIG_SMP
> >>>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
> >>>>> +	if (ret) {
> >>>>> +		sleep_cpu = -EINVAL;
> >>>>> +		return ret;
> >>>>> +	}
> >>>>> +#endif
> >>>>> +	resume_hdr = *hdr;
> >>>>> +
> >>>>> +	return ret;
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> >>>>> +
> >>>>> +int swsusp_arch_suspend(void)
> >>>>> +{
> >>>>> +	int ret = 0;
> >>>>> +
> >>>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> >>>>> +		sleep_cpu = smp_processor_id();
> >>>>> +		suspend_save_csrs(hibernate_cpu_context);
> >>>>> +		ret = swsusp_save();
> >>>>> +	} else {
> >>>>> +		suspend_restore_csrs(hibernate_cpu_context);
> >>>>> +		flush_tlb_all();
> >>>>> +
> >>>>> +		/* Invalidated Icache */
> >>>>> +		flush_icache_all();
> >>>>> +
> >>>>> +		/*
> >>>>> +		 * Tell the hibernation core that we've just restored
> >>>>> +		 * the memory
> >>>>> +		 */
> >>>>> +		in_suspend = 0;
> >>>>> +		sleep_cpu = -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	return ret;
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pte_idx = pte_index(vaddr);
> >>>>> +
> >>>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +#ifndef __PAGETABLE_PMD_FOLDED
> >>>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> >>>>> +		(pgtable_l5_enabled ?					\
> >>>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> >>>>> +		(pgtable_l4_enabled ?					\
> >>>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> >>>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
> >>>>> +	pte_t *ptep;
> >>>>> +
> >>>>> +	if (pmd_none(pmdp[pmd_idx])) {
> >>>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!ptep)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(ptep, 0, PAGE_SIZE);
> >>>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pud_index = pud_index(vaddr);
> >>>>> +	pmd_t *pmdp;
> >>>>> +
> >>>>> +	if (pud_val(pudp[pud_index]) == 0) {
> >>>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!pmdp)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(pmdp, 0, PAGE_SIZE);
> >>>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t p4d_index = p4d_index(vaddr);
> >>>>> +	pud_t *pudp;
> >>>>> +
> >>>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
> >>>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!pudp)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(pudp, 0, PAGE_SIZE);
> >>>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +#else
> >>>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> >>>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> >>>>> +#endif /* __PAGETABLE_PMD_FOLDED */
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
> >>>>> +	void *nextp;
> >>>>> +
> >>>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> >>>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!nextp)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(nextp, 0, PAGE_SIZE);
> >>>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> 
> 
> Is it possible to use "standard" way of going through a page table
> instead of using the _next macros? I mean something like this (example
> from arm64 code
> https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/trans_pgd.c#L174
> or my recent kasan patchset
> https://patchwork.kernel.org/project/linux-riscv/patch/20230203075232.274282-3-alexghiti@rivosinc.com/):
> 
>          do {
>                  next = pgd_addr_end(vaddr, end);
> 
>                  if (pgd_none(*pgd_k)) {
>                          nextp = get_safe_page(GFP_ATOMIC);
>                          memset(nextp, 0, PAGE_SIZE);
>                          set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(nextp)),
> PAGE_TABLE));
> continue;
> }
> 
>                  kasan_shallow_populate_p4d(pgd_k, vaddr, next);
>          } while (pgd_k++, vaddr = next, vaddr != end);
> 
> 
> I have the same change to our early page table code on my todo list.
> 
> 
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long relocate_restore_code(void)
> >>>>> +{
> >>>>> +	unsigned long ret;
> >>>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> >>>>> +
> >>>>> +	if (!page)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	copy_page(page, core_restore_code);
> >>>>> +
> >>>>> +	/* Make the page containing the relocated code executable */
> >>>>> +	set_memory_x((unsigned long)page, 1);
> >>>>> +
> >>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> >>>>> +	if (ret)
> >>>>> +		return ret;
> >>>>> +
> >>>>> +	return (unsigned long)page;
> >>>>> +}
> >>>>> +
> >>>>> +int swsusp_arch_resume(void)
> >>>>> +{
> >>>>> +	unsigned long addr = PAGE_OFFSET;
> >>>>> +	unsigned long ret;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> >>>>> +	 * we don't need to free it here.
> >>>>> +	 */
> >>>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +	if (!resume_pg_dir)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * The pages need to be writable when restoring the image.
> >>>>> +	 * Create a second copy of page table just for the linear map, and use this when
> >>>>> +	 * restoring.
> >>>>> +	 */
> >>>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> >>>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> >>>>> +		if (ret)
> >>>>> +			return (int)ret;
> >>>>> +	}
> >>>>> +
> >>>> To me this is wrong as this does not account for the real physical
> >>>> mapping layout: can't you simply copy the linear mapping from
> >>>> swapper_pg_dir?
> >>> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
> >> swapper_pg_dir as we are not suppose to modify its content.
> >>
> >>
> >> First, you're right, we need the temporary page table as swapper_pg_dir
> >> will get overwritten under our feet.
> >>
> >> Now, I still disagree with mapping all the memory: the linear mapping is
> >> sparse because we only map what memblock gives us (some regions are
> >> marked as "nomap" for a reason).
> >>
> >> I just took a look at arm64, and they do exactly that: they go through
> >> swapper_pg_dir, copy the linear mapping and enable write at every leaf
> >> level
> >> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
> > You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the
> page validity prior to do the mapping. Agree?
> 
> 
> That would work, we'd lose the benefit of huge pages though, I'm not
> opposed at all but if we can leverage existing arm64 code, that would
> even be better, only the PTE write flag is different!
that's the thing. Arm64 uses two ttbr registers to manipulate the virtual address. The 1 page of relocated execution code is mapped to the ttbr0 while the temporary page table with huge page supported is mapped to the ttbr1. So, there is no need to handle the huge page split in the arch/arm64 for temporary page table.
RISCV only has 1 satp, if we trying to support huge page for hibernation, then we need to handle the huge page split in arch/riscv for the temporary page table to handle the 1 page of relocated execution code. For current RISCV Arch, we can invoke the kernel_page_present() to check for the page validity but I am doubt we should implement the huge page split in the arch/riscv which is normally handled by the buddy allocator.
> 
> Thanks,
> 
> Alex
> 
> 
> >>
> >>>> But I have to admit that I struggle to understand the need for this
> >>>> temporary page table: all we need to do is to allow to write to the
> >>>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
> >>> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you
> modify
> >> the swapper_pg_dir, the kernel will crash afterwards.
> >>> That’s why we need a second page table to do the recovering job.
> >>>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> >>>>> +	relocated_restore_code = relocate_restore_code();
> >>>> And do we really need to do that too? The code in question can only be
> >>>> overwritten by the same code right?
> >>> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
> >>>> Thanks,
> >>>>
> >>>> Alex
> >>>>
> >>>>
> >>>>> +	if (relocated_restore_code == -ENOMEM)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> >>>>> +	 * restore code can jump to it after finished restore the image. The next execution
> >>>>> +	 * code doesn't find itself in a different address space after switching over to the
> >>>>> +	 * original page table used by the hibernated image.
> >>>>> +	 */
> >>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> >>>>> +					PAGE_KERNEL_READ_EXEC);
> >>>>> +	if (ret)
> >>>>> +		return ret;
> >>>>> +
> >>>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> >>>>> +			resume_hdr.restore_cpu_addr);
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +#ifdef CONFIG_PM_SLEEP_SMP
> >>>>> +int hibernate_resume_nonboot_cpu_disable(void)
> >>>>> +{
> >>>>> +	if (sleep_cpu < 0) {
> >>>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> >>>>> +		return -ENODEV;
> >>>>> +	}
> >>>>> +
> >>>>> +	return freeze_secondary_cpus(sleep_cpu);
> >>>>> +}
> >>>>> +#endif
> >>>>> +
> >>>>> +static int __init riscv_hibernate_init(void)
> >>>>> +{
> >>>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> >>>>> +
> >>>>> +	if (WARN_ON(!hibernate_cpu_context))
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +early_initcall(riscv_hibernate_init);
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-13  1:51               ` JeeHeng Sia
  0 siblings, 0 replies; 52+ messages in thread
From: JeeHeng Sia @ 2023-02-13  1:51 UTC (permalink / raw)
  To: Alexandre Ghiti, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

Hi Alex,

> -----Original Message-----
> From: Alexandre Ghiti <alex@ghiti.fr>
> Sent: Friday, 10 February, 2023 9:24 PM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> 
> On 2/9/23 07:12, JeeHeng Sia wrote:
> > Hi Alex,
> >
> >> -----Original Message-----
> >> From: Alexandre Ghiti <alex@ghiti.fr>
> >> Sent: Wednesday, 8 February, 2023 8:05 PM
> >> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> >> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> >> <mason.huo@starfivetech.com>
> >> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >>
> >> Hi Sia,
> >>
> >> On 2/8/23 05:43, JeeHeng Sia wrote:
> >>>> -----Original Message-----
> >>>> From: Alexandre Ghiti <alex@ghiti.fr>
> >>>> Sent: Tuesday, 7 February, 2023 11:46 PM
> >>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
> >>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
> >>>> <mason.huo@starfivetech.com>
> >>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >>>>
> >>>> Hi Sia,
> >>>>
> >>>> On 1/27/23 10:10, Sia Jee Heng wrote:
> >>>>> Low level Arch functions were created to support hibernation.
> >>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> >>>>> cpu state onto the stack, then calling swsusp_save() to save the memory
> >>>>> image.
> >>>>>
> >>>>> Arch specific hibernation header is implemented and is utilized by the
> >>>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
> >>>>> functions. The arch specific hibernation header consists of satp, hartid,
> >>>>> and the cpu_resume address. The kernel built version is also need to be
> >>>>> saved into the hibernation image header to making sure only the same
> >>>>> kernel is restore when resume.
> >>>>>
> >>>>> swsusp_arch_resume() creates a temporary page table that covering only
> >>>>> the linear map. It copies the restore code to a 'safe' page, then start
> >>>>> to restore the memory image. Once completed, it restores the original
> >>>>> kernel's page table. It then calls into __hibernate_cpu_resume()
> >>>>> to restore the CPU context. Finally, it follows the normal hibernation
> >>>>> path back to the hibernation core.
> >>>>>
> >>>>> To enable hibernation/suspend to disk into RISCV, the below config
> >>>>> need to be enabled:
> >>>>> - CONFIG_ARCH_HIBERNATION_HEADER
> >>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >>>>>
> >>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
> >>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
> >>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
> >>>>> ---
> >>>>>     arch/riscv/Kconfig                 |   7 +
> >>>>>     arch/riscv/include/asm/assembler.h |  20 ++
> >>>>>     arch/riscv/include/asm/suspend.h   |  21 ++
> >>>>>     arch/riscv/kernel/Makefile         |   1 +
> >>>>>     arch/riscv/kernel/asm-offsets.c    |   5 +
> >>>>>     arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
> >>>>>     arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
> >>>>>     7 files changed, 503 insertions(+)
> >>>>>     create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >>>>>     create mode 100644 arch/riscv/kernel/hibernate.c
> >>>>>
> >>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> >>>>> index e2b656043abf..4555848a817f 100644
> >>>>> --- a/arch/riscv/Kconfig
> >>>>> +++ b/arch/riscv/Kconfig
> >>>>> @@ -690,6 +690,13 @@ menu "Power management options"
> >>>>>
> >>>>>     source "kernel/power/Kconfig"
> >>>>>
> >>>>> +config ARCH_HIBERNATION_POSSIBLE
> >>>>> +	def_bool y
> >>>>> +
> >>>>> +config ARCH_HIBERNATION_HEADER
> >>>>> +	def_bool y
> >>>>> +	depends on HIBERNATION
> >>>>> +
> >>>>>     endmenu # "Power management options"
> >>>>>
> >>>>>     menu "CPU Power Management"
> >>>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> >>>>> index ef1283d04b70..3de70d3e6ceb 100644
> >>>>> --- a/arch/riscv/include/asm/assembler.h
> >>>>> +++ b/arch/riscv/include/asm/assembler.h
> >>>>> @@ -59,4 +59,24 @@
> >>>>>     		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >>>>>     	.endm
> >>>>>
> >>>>> +/**
> >>>>> + * copy_page - copy 1 page (4KB) of data from source to destination
> >>>>> + * @a0 - destination
> >>>>> + * @a1 - source
> >>>>> + */
> >>>>> +	.macro	copy_page a0, a1
> >>>>> +		lui	a2, 0x1
> >>>>> +		add	a2, a2, a0
> >>>>> +.1 :
> >>>>> +		REG_L	t0, 0(a1)
> >>>>> +		REG_L	t1, SZREG(a1)
> >>>>> +
> >>>>> +		REG_S	t0, 0(a0)
> >>>>> +		REG_S	t1, SZREG(a0)
> >>>>> +
> >>>>> +		addi	a0, a0, 2 * SZREG
> >>>>> +		addi	a1, a1, 2 * SZREG
> >>>>> +		bne	a2, a0, .1
> >>>>> +	.endm
> >>>>> +
> >>>>>     #endif	/* __ASM_ASSEMBLER_H */
> >>>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> >>>>> index 75419c5ca272..db40ae433aa9 100644
> >>>>> --- a/arch/riscv/include/asm/suspend.h
> >>>>> +++ b/arch/riscv/include/asm/suspend.h
> >>>>> @@ -21,6 +21,12 @@ struct suspend_context {
> >>>>>     #endif
> >>>>>     };
> >>>>>
> >>>>> +/*
> >>>>> + * This parameter will be assigned to 0 during resume and will be used by
> >>>>> + * hibernation core for the subsequent resume sequence
> >>>>> + */
> >>>>> +extern int in_suspend;
> >>>>> +
> >>>>>     /* Low-level CPU suspend entry function */
> >>>>>     int __cpu_suspend_enter(struct suspend_context *context);
> >>>>>
> >>>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >>>>>     /* Used to save and restore the csr */
> >>>>>     void suspend_save_csrs(struct suspend_context *context);
> >>>>>     void suspend_restore_csrs(struct suspend_context *context);
> >>>>> +
> >>>>> +/* Low-level API to support hibernation */
> >>>>> +int swsusp_arch_suspend(void);
> >>>>> +int swsusp_arch_resume(void);
> >>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> >>>>> +int arch_hibernation_header_restore(void *addr);
> >>>>> +int __hibernate_cpu_resume(void);
> >>>>> +
> >>>>> +/* Used to resume on the CPU we hibernated on */
> >>>>> +int hibernate_resume_nonboot_cpu_disable(void);
> >>>>> +
> >>>>> +/* Used to restore the hibernated image */
> >>>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
> >>>>> +				unsigned long cpu_resume);
> >>>>> +asmlinkage int core_restore_code(void);
> >>>>>     #endif
> >>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> >>>>> index 4cf303a779ab..daab341d55e4 100644
> >>>>> --- a/arch/riscv/kernel/Makefile
> >>>>> +++ b/arch/riscv/kernel/Makefile
> >>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
> >>>>>     obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
> >>>>>
> >>>>>     obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
> >>>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
> >>>>>
> >>>>>     obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
> >>>>>     obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
> >>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> >>>>> index df9444397908..d6a75aac1d27 100644
> >>>>> --- a/arch/riscv/kernel/asm-offsets.c
> >>>>> +++ b/arch/riscv/kernel/asm-offsets.c
> >>>>> @@ -9,6 +9,7 @@
> >>>>>     #include <linux/kbuild.h>
> >>>>>     #include <linux/mm.h>
> >>>>>     #include <linux/sched.h>
> >>>>> +#include <linux/suspend.h>
> >>>>>     #include <asm/kvm_host.h>
> >>>>>     #include <asm/thread_info.h>
> >>>>>     #include <asm/ptrace.h>
> >>>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
> >>>>>
> >>>>>     	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >>>>>
> >>>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
> >>>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> >>>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
> >>>>> +
> >>>>>     	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >>>>>     	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >>>>>     	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> >>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> >>>>> new file mode 100644
> >>>>> index 000000000000..a83d534b89bd
> >>>>> --- /dev/null
> >>>>> +++ b/arch/riscv/kernel/hibernate-asm.S
> >>>>> @@ -0,0 +1,89 @@
> >>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>>>> +/*
> >>>>> + * Hibernation support specific for RISCV
> >>>>> + *
> >>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>>>> + *
> >>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>>>> + */
> >>>>> +
> >>>>> +#include <asm/asm.h>
> >>>>> +#include <asm/asm-offsets.h>
> >>>>> +#include <asm/assembler.h>
> >>>>> +#include <asm/csr.h>
> >>>>> +
> >>>>> +#include <linux/linkage.h>
> >>>>> +
> >>>>> +/*
> >>>>> + * This code is executed when resume from the hibernation.
> >>>>> + *
> >>>>> + * It begins with loading the temporary page table then restores the memory image.
> >>>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
> >>>>> + * swsusp_arch_suspend().
> >>>>> + */
> >>>>> +
> >>>>> +/*
> >>>>> + * int __hibernate_cpu_resume(void)
> >>>>> + * Switch back to the hibernated image's page table prior to restore the CPU
> >>>>> + * context.
> >>>>> + *
> >>>>> + * Always returns 0 to the C code.
> >>>>> + */
> >>>>> +ENTRY(__hibernate_cpu_resume)
> >>>>> +	/* switch to hibernated image's page table */
> >>>>> +	csrw CSR_SATP, s0
> >>>>> +	sfence.vma
> >>>>> +
> >>>>> +	REG_L	a0, hibernate_cpu_context
> >>>>> +
> >>>>> +	/* Restore CSRs */
> >>>>> +	restore_csr
> >>>>> +
> >>>>> +	/* Restore registers (except A0 and T0-T6) */
> >>>>> +	restore_reg
> >>>>> +
> >>>>> +	/* Return zero value */
> >>>>> +	add	a0, zero, zero
> >>>>> +
> >>>>> +	/* Return to C code */
> >>>>> +	ret
> >>>>> +END(__hibernate_cpu_resume)
> >>>>> +
> >>>>> +/*
> >>>>> + * Prepare to restore the image.
> >>>>> + * a0: satp of saved page tables
> >>>>> + * a1: satp of temporary page tables
> >>>>> + * a2: cpu_resume
> >>>>> + */
> >>>>> +ENTRY(restore_image)
> >>>>> +	mv	s0, a0
> >>>>> +	mv	s1, a1
> >>>>> +	mv	s2, a2
> >>>>> +	REG_L	s4, restore_pblist
> >>>>> +	REG_L	a1, relocated_restore_code
> >>>>> +
> >>>>> +	jalr	a1
> >>>>> +END(restore_image)
> >>>>> +
> >>>>> +/*
> >>>>> + * The below code will be executed from a 'safe' page.
> >>>>> + * It first switches to the temporary page table, then start to copy the pages
> >>>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
> >>>>> + * to restore the CPU context.
> >>>>> + */
> >>>>> +ENTRY(core_restore_code)
> >>>>> +	/* switch to temp page table */
> >>>>> +	csrw satp, s1
> >>>>> +	sfence.vma
> >>>>> +.Lcopy:
> >>>>> +	/* The below code will restore the hibernated image. */
> >>>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
> >>>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
> >>>>> +
> >>>>> +	copy_page a0, a1
> >>>>> +
> >>>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
> >>>>> +	bnez	s4, .Lcopy
> >>>>> +
> >>>>> +	jalr	s2
> >>>>> +END(core_restore_code)
> >>>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> >>>>> new file mode 100644
> >>>>> index 000000000000..bf7f3c781820
> >>>>> --- /dev/null
> >>>>> +++ b/arch/riscv/kernel/hibernate.c
> >>>>> @@ -0,0 +1,360 @@
> >>>>> +// SPDX-License-Identifier: GPL-2.0-only
> >>>>> +/*
> >>>>> + * Hibernation support specific for RISCV
> >>>>> + *
> >>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>>>> + *
> >>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
> >>>>> + */
> >>>>> +
> >>>>> +#include <asm/barrier.h>
> >>>>> +#include <asm/cacheflush.h>
> >>>>> +#include <asm/mmu_context.h>
> >>>>> +#include <asm/page.h>
> >>>>> +#include <asm/pgtable.h>
> >>>>> +#include <asm/sections.h>
> >>>>> +#include <asm/set_memory.h>
> >>>>> +#include <asm/smp.h>
> >>>>> +#include <asm/suspend.h>
> >>>>> +
> >>>>> +#include <linux/cpu.h>
> >>>>> +#include <linux/memblock.h>
> >>>>> +#include <linux/pm.h>
> >>>>> +#include <linux/sched.h>
> >>>>> +#include <linux/suspend.h>
> >>>>> +#include <linux/utsname.h>
> >>>>> +
> >>>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
> >>>>> +static int sleep_cpu = -EINVAL;
> >>>>> +
> >>>>> +/* CPU context to be saved */
> >>>>> +struct suspend_context *hibernate_cpu_context;
> >>>>> +
> >>>>> +unsigned long relocated_restore_code;
> >>>>> +
> >>>>> +/* Pointer to the temporary resume page table */
> >>>>> +pgd_t *resume_pg_dir;
> >>>>> +
> >>>>> +/**
> >>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
> >>>>> + * @uts_version: to save the build number and date so that the we are not resume with
> >>>>> + *		a different kernel
> >>>>> + */
> >>>>> +struct arch_hibernate_hdr_invariants {
> >>>>> +	char		uts_version[__NEW_UTS_LEN + 1];
> >>>>> +};
> >>>>> +
> >>>>> +/**
> >>>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
> >>>>> + * @invariants: container to store kernel build version
> >>>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
> >>>>> + * @saved_satp: original page table used by the hibernated image.
> >>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> >>>>> + */
> >>>>> +static struct arch_hibernate_hdr {
> >>>>> +	struct arch_hibernate_hdr_invariants invariants;
> >>>>> +	unsigned long	hartid;
> >>>>> +	unsigned long	saved_satp;
> >>>>> +	unsigned long	restore_cpu_addr;
> >>>>> +} resume_hdr;
> >>>>> +
> >>>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> >>>>> +{
> >>>>> +	memset(i, 0, sizeof(*i));
> >>>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> >>>>> +}
> >>>>> +
> >>>>> +/*
> >>>>> + * Check if the given pfn is in the 'nosave' section.
> >>>>> + */
> >>>>> +int pfn_is_nosave(unsigned long pfn)
> >>>>> +{
> >>>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> >>>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> >>>>> +
> >>>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> >>>>> +}
> >>>>> +
> >>>>> +void notrace save_processor_state(void)
> >>>>> +{
> >>>>> +	WARN_ON(num_online_cpus() != 1);
> >>>>> +}
> >>>>> +
> >>>>> +void notrace restore_processor_state(void)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>>> +/*
> >>>>> + * Helper parameters need to be saved to the hibernation image header.
> >>>>> + */
> >>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> >>>>> +{
> >>>>> +	struct arch_hibernate_hdr *hdr = addr;
> >>>>> +
> >>>>> +	if (max_size < sizeof(*hdr))
> >>>>> +		return -EOVERFLOW;
> >>>>> +
> >>>>> +	arch_hdr_invariants(&hdr->invariants);
> >>>>> +
> >>>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> >>>>> +	hdr->saved_satp = csr_read(CSR_SATP);
> >>>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
> >>>>> +
> >>>>> +/*
> >>>>> + * Retrieve the helper parameters from the hibernation image header
> >>>>> + */
> >>>>> +int arch_hibernation_header_restore(void *addr)
> >>>>> +{
> >>>>> +	struct arch_hibernate_hdr_invariants invariants;
> >>>>> +	struct arch_hibernate_hdr *hdr = addr;
> >>>>> +	int ret = 0;
> >>>>> +
> >>>>> +	arch_hdr_invariants(&invariants);
> >>>>> +
> >>>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> >>>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> >>>>> +	if (sleep_cpu < 0) {
> >>>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
> >>>>> +		sleep_cpu = -EINVAL;
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +#ifdef CONFIG_SMP
> >>>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
> >>>>> +	if (ret) {
> >>>>> +		sleep_cpu = -EINVAL;
> >>>>> +		return ret;
> >>>>> +	}
> >>>>> +#endif
> >>>>> +	resume_hdr = *hdr;
> >>>>> +
> >>>>> +	return ret;
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
> >>>>> +
> >>>>> +int swsusp_arch_suspend(void)
> >>>>> +{
> >>>>> +	int ret = 0;
> >>>>> +
> >>>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
> >>>>> +		sleep_cpu = smp_processor_id();
> >>>>> +		suspend_save_csrs(hibernate_cpu_context);
> >>>>> +		ret = swsusp_save();
> >>>>> +	} else {
> >>>>> +		suspend_restore_csrs(hibernate_cpu_context);
> >>>>> +		flush_tlb_all();
> >>>>> +
> >>>>> +		/* Invalidated Icache */
> >>>>> +		flush_icache_all();
> >>>>> +
> >>>>> +		/*
> >>>>> +		 * Tell the hibernation core that we've just restored
> >>>>> +		 * the memory
> >>>>> +		 */
> >>>>> +		in_suspend = 0;
> >>>>> +		sleep_cpu = -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	return ret;
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pte_idx = pte_index(vaddr);
> >>>>> +
> >>>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +#ifndef __PAGETABLE_PMD_FOLDED
> >>>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
> >>>>> +		(pgtable_l5_enabled ?					\
> >>>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
> >>>>> +		(pgtable_l4_enabled ?					\
> >>>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
> >>>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
> >>>>> +	pte_t *ptep;
> >>>>> +
> >>>>> +	if (pmd_none(pmdp[pmd_idx])) {
> >>>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!ptep)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(ptep, 0, PAGE_SIZE);
> >>>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pud_index = pud_index(vaddr);
> >>>>> +	pmd_t *pmdp;
> >>>>> +
> >>>>> +	if (pud_val(pudp[pud_index]) == 0) {
> >>>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!pmdp)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(pmdp, 0, PAGE_SIZE);
> >>>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t p4d_index = p4d_index(vaddr);
> >>>>> +	pud_t *pudp;
> >>>>> +
> >>>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
> >>>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!pudp)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(pudp, 0, PAGE_SIZE);
> >>>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +#else
> >>>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
> >>>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
> >>>>> +#endif /* __PAGETABLE_PMD_FOLDED */
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
> >>>>> +	void *nextp;
> >>>>> +
> >>>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
> >>>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
> >>>>> +		if (!nextp)
> >>>>> +			return -ENOMEM;
> >>>>> +
> >>>>> +		memset(nextp, 0, PAGE_SIZE);
> >>>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
> >>>>> +	} else {
> >>>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
> >>>>> +	}
> >>>>> +
> >>>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
> 
> 
> Is it possible to use "standard" way of going through a page table
> instead of using the _next macros? I mean something like this (example
> from arm64 code
> https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/trans_pgd.c#L174
> or my recent kasan patchset
> https://patchwork.kernel.org/project/linux-riscv/patch/20230203075232.274282-3-alexghiti@rivosinc.com/):
> 
>          do {
>                  next = pgd_addr_end(vaddr, end);
> 
>                  if (pgd_none(*pgd_k)) {
>                          nextp = get_safe_page(GFP_ATOMIC);
>                          memset(nextp, 0, PAGE_SIZE);
>                          set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(nextp)),
> PAGE_TABLE));
> continue;
> }
> 
>                  kasan_shallow_populate_p4d(pgd_k, vaddr, next);
>          } while (pgd_k++, vaddr = next, vaddr != end);
> 
> 
> I have the same change to our early page table code on my todo list.
> 
> 
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
> >>>>> +{
> >>>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned long relocate_restore_code(void)
> >>>>> +{
> >>>>> +	unsigned long ret;
> >>>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
> >>>>> +
> >>>>> +	if (!page)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	copy_page(page, core_restore_code);
> >>>>> +
> >>>>> +	/* Make the page containing the relocated code executable */
> >>>>> +	set_memory_x((unsigned long)page, 1);
> >>>>> +
> >>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
> >>>>> +	if (ret)
> >>>>> +		return ret;
> >>>>> +
> >>>>> +	return (unsigned long)page;
> >>>>> +}
> >>>>> +
> >>>>> +int swsusp_arch_resume(void)
> >>>>> +{
> >>>>> +	unsigned long addr = PAGE_OFFSET;
> >>>>> +	unsigned long ret;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> >>>>> +	 * we don't need to free it here.
> >>>>> +	 */
> >>>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> >>>>> +	if (!resume_pg_dir)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * The pages need to be writable when restoring the image.
> >>>>> +	 * Create a second copy of page table just for the linear map, and use this when
> >>>>> +	 * restoring.
> >>>>> +	 */
> >>>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
> >>>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
> >>>>> +		if (ret)
> >>>>> +			return (int)ret;
> >>>>> +	}
> >>>>> +
> >>>> To me this is wrong as this does not account for the real physical
> >>>> mapping layout: can't you simply copy the linear mapping from
> >>>> swapper_pg_dir?
> >>> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
> >> swapper_pg_dir as we are not suppose to modify its content.
> >>
> >>
> >> First, you're right, we need the temporary page table as swapper_pg_dir
> >> will get overwritten under our feet.
> >>
> >> Now, I still disagree with mapping all the memory: the linear mapping is
> >> sparse because we only map what memblock gives us (some regions are
> >> marked as "nomap" for a reason).
> >>
> >> I just took a look at arm64, and they do exactly that: they go through
> >> swapper_pg_dir, copy the linear mapping and enable write at every leaf
> >> level
> >> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
> > You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the
> page validity prior to do the mapping. Agree?
> 
> 
> That would work, we'd lose the benefit of huge pages though, I'm not
> opposed at all but if we can leverage existing arm64 code, that would
> even be better, only the PTE write flag is different!
that's the thing. Arm64 uses two ttbr registers to manipulate the virtual address. The 1 page of relocated execution code is mapped to the ttbr0 while the temporary page table with huge page supported is mapped to the ttbr1. So, there is no need to handle the huge page split in the arch/arm64 for temporary page table.
RISCV only has 1 satp, if we trying to support huge page for hibernation, then we need to handle the huge page split in arch/riscv for the temporary page table to handle the 1 page of relocated execution code. For current RISCV Arch, we can invoke the kernel_page_present() to check for the page validity but I am doubt we should implement the huge page split in the arch/riscv which is normally handled by the buddy allocator.
> 
> Thanks,
> 
> Alex
> 
> 
> >>
> >>>> But I have to admit that I struggle to understand the need for this
> >>>> temporary page table: all we need to do is to allow to write to the
> >>>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
> >>> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you
> modify
> >> the swapper_pg_dir, the kernel will crash afterwards.
> >>> That’s why we need a second page table to do the recovering job.
> >>>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
> >>>>> +	relocated_restore_code = relocate_restore_code();
> >>>> And do we really need to do that too? The code in question can only be
> >>>> overwritten by the same code right?
> >>> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
> >>>> Thanks,
> >>>>
> >>>> Alex
> >>>>
> >>>>
> >>>>> +	if (relocated_restore_code == -ENOMEM)
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> >>>>> +	 * restore code can jump to it after finished restore the image. The next execution
> >>>>> +	 * code doesn't find itself in a different address space after switching over to the
> >>>>> +	 * original page table used by the hibernated image.
> >>>>> +	 */
> >>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
> >>>>> +					PAGE_KERNEL_READ_EXEC);
> >>>>> +	if (ret)
> >>>>> +		return ret;
> >>>>> +
> >>>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> >>>>> +			resume_hdr.restore_cpu_addr);
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +#ifdef CONFIG_PM_SLEEP_SMP
> >>>>> +int hibernate_resume_nonboot_cpu_disable(void)
> >>>>> +{
> >>>>> +	if (sleep_cpu < 0) {
> >>>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
> >>>>> +		return -ENODEV;
> >>>>> +	}
> >>>>> +
> >>>>> +	return freeze_secondary_cpus(sleep_cpu);
> >>>>> +}
> >>>>> +#endif
> >>>>> +
> >>>>> +static int __init riscv_hibernate_init(void)
> >>>>> +{
> >>>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
> >>>>> +
> >>>>> +	if (WARN_ON(!hibernate_cpu_context))
> >>>>> +		return -ENOMEM;
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +early_initcall(riscv_hibernate_init);
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
  2023-02-13  1:51               ` JeeHeng Sia
@ 2023-02-14  6:57                 ` Alexandre Ghiti
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-14  6:57 UTC (permalink / raw)
  To: JeeHeng Sia, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

On 2/13/23 02:51, JeeHeng Sia wrote:
> Hi Alex,
>
>> -----Original Message-----
>> From: Alexandre Ghiti <alex@ghiti.fr>
>> Sent: Friday, 10 February, 2023 9:24 PM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>> <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>
>> On 2/9/23 07:12, JeeHeng Sia wrote:
>>> Hi Alex,
>>>
>>>> -----Original Message-----
>>>> From: Alexandre Ghiti <alex@ghiti.fr>
>>>> Sent: Wednesday, 8 February, 2023 8:05 PM
>>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>>>> <mason.huo@starfivetech.com>
>>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>>>
>>>> Hi Sia,
>>>>
>>>> On 2/8/23 05:43, JeeHeng Sia wrote:
>>>>>> -----Original Message-----
>>>>>> From: Alexandre Ghiti <alex@ghiti.fr>
>>>>>> Sent: Tuesday, 7 February, 2023 11:46 PM
>>>>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>>>>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>>>>>> <mason.huo@starfivetech.com>
>>>>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>>>>>
>>>>>> Hi Sia,
>>>>>>
>>>>>> On 1/27/23 10:10, Sia Jee Heng wrote:
>>>>>>> Low level Arch functions were created to support hibernation.
>>>>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
>>>>>>> cpu state onto the stack, then calling swsusp_save() to save the memory
>>>>>>> image.
>>>>>>>
>>>>>>> Arch specific hibernation header is implemented and is utilized by the
>>>>>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
>>>>>>> functions. The arch specific hibernation header consists of satp, hartid,
>>>>>>> and the cpu_resume address. The kernel built version is also need to be
>>>>>>> saved into the hibernation image header to making sure only the same
>>>>>>> kernel is restore when resume.
>>>>>>>
>>>>>>> swsusp_arch_resume() creates a temporary page table that covering only
>>>>>>> the linear map. It copies the restore code to a 'safe' page, then start
>>>>>>> to restore the memory image. Once completed, it restores the original
>>>>>>> kernel's page table. It then calls into __hibernate_cpu_resume()
>>>>>>> to restore the CPU context. Finally, it follows the normal hibernation
>>>>>>> path back to the hibernation core.
>>>>>>>
>>>>>>> To enable hibernation/suspend to disk into RISCV, the below config
>>>>>>> need to be enabled:
>>>>>>> - CONFIG_ARCH_HIBERNATION_HEADER
>>>>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>>>>>>>
>>>>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
>>>>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
>>>>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
>>>>>>> ---
>>>>>>>      arch/riscv/Kconfig                 |   7 +
>>>>>>>      arch/riscv/include/asm/assembler.h |  20 ++
>>>>>>>      arch/riscv/include/asm/suspend.h   |  21 ++
>>>>>>>      arch/riscv/kernel/Makefile         |   1 +
>>>>>>>      arch/riscv/kernel/asm-offsets.c    |   5 +
>>>>>>>      arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>>>>>>>      arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>>>>>>>      7 files changed, 503 insertions(+)
>>>>>>>      create mode 100644 arch/riscv/kernel/hibernate-asm.S
>>>>>>>      create mode 100644 arch/riscv/kernel/hibernate.c
>>>>>>>
>>>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>>>>> index e2b656043abf..4555848a817f 100644
>>>>>>> --- a/arch/riscv/Kconfig
>>>>>>> +++ b/arch/riscv/Kconfig
>>>>>>> @@ -690,6 +690,13 @@ menu "Power management options"
>>>>>>>
>>>>>>>      source "kernel/power/Kconfig"
>>>>>>>
>>>>>>> +config ARCH_HIBERNATION_POSSIBLE
>>>>>>> +	def_bool y
>>>>>>> +
>>>>>>> +config ARCH_HIBERNATION_HEADER
>>>>>>> +	def_bool y
>>>>>>> +	depends on HIBERNATION
>>>>>>> +
>>>>>>>      endmenu # "Power management options"
>>>>>>>
>>>>>>>      menu "CPU Power Management"
>>>>>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
>>>>>>> index ef1283d04b70..3de70d3e6ceb 100644
>>>>>>> --- a/arch/riscv/include/asm/assembler.h
>>>>>>> +++ b/arch/riscv/include/asm/assembler.h
>>>>>>> @@ -59,4 +59,24 @@
>>>>>>>      		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>>>>>>>      	.endm
>>>>>>>
>>>>>>> +/**
>>>>>>> + * copy_page - copy 1 page (4KB) of data from source to destination
>>>>>>> + * @a0 - destination
>>>>>>> + * @a1 - source
>>>>>>> + */
>>>>>>> +	.macro	copy_page a0, a1
>>>>>>> +		lui	a2, 0x1
>>>>>>> +		add	a2, a2, a0
>>>>>>> +.1 :
>>>>>>> +		REG_L	t0, 0(a1)
>>>>>>> +		REG_L	t1, SZREG(a1)
>>>>>>> +
>>>>>>> +		REG_S	t0, 0(a0)
>>>>>>> +		REG_S	t1, SZREG(a0)
>>>>>>> +
>>>>>>> +		addi	a0, a0, 2 * SZREG
>>>>>>> +		addi	a1, a1, 2 * SZREG
>>>>>>> +		bne	a2, a0, .1
>>>>>>> +	.endm
>>>>>>> +
>>>>>>>      #endif	/* __ASM_ASSEMBLER_H */
>>>>>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
>>>>>>> index 75419c5ca272..db40ae433aa9 100644
>>>>>>> --- a/arch/riscv/include/asm/suspend.h
>>>>>>> +++ b/arch/riscv/include/asm/suspend.h
>>>>>>> @@ -21,6 +21,12 @@ struct suspend_context {
>>>>>>>      #endif
>>>>>>>      };
>>>>>>>
>>>>>>> +/*
>>>>>>> + * This parameter will be assigned to 0 during resume and will be used by
>>>>>>> + * hibernation core for the subsequent resume sequence
>>>>>>> + */
>>>>>>> +extern int in_suspend;
>>>>>>> +
>>>>>>>      /* Low-level CPU suspend entry function */
>>>>>>>      int __cpu_suspend_enter(struct suspend_context *context);
>>>>>>>
>>>>>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>>>>>>>      /* Used to save and restore the csr */
>>>>>>>      void suspend_save_csrs(struct suspend_context *context);
>>>>>>>      void suspend_restore_csrs(struct suspend_context *context);
>>>>>>> +
>>>>>>> +/* Low-level API to support hibernation */
>>>>>>> +int swsusp_arch_suspend(void);
>>>>>>> +int swsusp_arch_resume(void);
>>>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
>>>>>>> +int arch_hibernation_header_restore(void *addr);
>>>>>>> +int __hibernate_cpu_resume(void);
>>>>>>> +
>>>>>>> +/* Used to resume on the CPU we hibernated on */
>>>>>>> +int hibernate_resume_nonboot_cpu_disable(void);
>>>>>>> +
>>>>>>> +/* Used to restore the hibernated image */
>>>>>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
>>>>>>> +				unsigned long cpu_resume);
>>>>>>> +asmlinkage int core_restore_code(void);
>>>>>>>      #endif
>>>>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>>>>>> index 4cf303a779ab..daab341d55e4 100644
>>>>>>> --- a/arch/riscv/kernel/Makefile
>>>>>>> +++ b/arch/riscv/kernel/Makefile
>>>>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>>>>>>>      obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>>>>>>>
>>>>>>>      obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
>>>>>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>>>>>>>
>>>>>>>      obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>>>>>>>      obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>>>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>>>>>> index df9444397908..d6a75aac1d27 100644
>>>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>> @@ -9,6 +9,7 @@
>>>>>>>      #include <linux/kbuild.h>
>>>>>>>      #include <linux/mm.h>
>>>>>>>      #include <linux/sched.h>
>>>>>>> +#include <linux/suspend.h>
>>>>>>>      #include <asm/kvm_host.h>
>>>>>>>      #include <asm/thread_info.h>
>>>>>>>      #include <asm/ptrace.h>
>>>>>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
>>>>>>>
>>>>>>>      	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>>>>>>>
>>>>>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
>>>>>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
>>>>>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
>>>>>>> +
>>>>>>>      	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>>>>>>>      	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>>>>>>>      	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
>>>>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..a83d534b89bd
>>>>>>> --- /dev/null
>>>>>>> +++ b/arch/riscv/kernel/hibernate-asm.S
>>>>>>> @@ -0,0 +1,89 @@
>>>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>>>>> +/*
>>>>>>> + * Hibernation support specific for RISCV
>>>>>>> + *
>>>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>>>> + *
>>>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>>>> + */
>>>>>>> +
>>>>>>> +#include <asm/asm.h>
>>>>>>> +#include <asm/asm-offsets.h>
>>>>>>> +#include <asm/assembler.h>
>>>>>>> +#include <asm/csr.h>
>>>>>>> +
>>>>>>> +#include <linux/linkage.h>
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * This code is executed when resume from the hibernation.
>>>>>>> + *
>>>>>>> + * It begins with loading the temporary page table then restores the memory image.
>>>>>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
>>>>>>> + * swsusp_arch_suspend().
>>>>>>> + */
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * int __hibernate_cpu_resume(void)
>>>>>>> + * Switch back to the hibernated image's page table prior to restore the CPU
>>>>>>> + * context.
>>>>>>> + *
>>>>>>> + * Always returns 0 to the C code.
>>>>>>> + */
>>>>>>> +ENTRY(__hibernate_cpu_resume)
>>>>>>> +	/* switch to hibernated image's page table */
>>>>>>> +	csrw CSR_SATP, s0
>>>>>>> +	sfence.vma
>>>>>>> +
>>>>>>> +	REG_L	a0, hibernate_cpu_context
>>>>>>> +
>>>>>>> +	/* Restore CSRs */
>>>>>>> +	restore_csr
>>>>>>> +
>>>>>>> +	/* Restore registers (except A0 and T0-T6) */
>>>>>>> +	restore_reg
>>>>>>> +
>>>>>>> +	/* Return zero value */
>>>>>>> +	add	a0, zero, zero
>>>>>>> +
>>>>>>> +	/* Return to C code */
>>>>>>> +	ret
>>>>>>> +END(__hibernate_cpu_resume)
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Prepare to restore the image.
>>>>>>> + * a0: satp of saved page tables
>>>>>>> + * a1: satp of temporary page tables
>>>>>>> + * a2: cpu_resume
>>>>>>> + */
>>>>>>> +ENTRY(restore_image)
>>>>>>> +	mv	s0, a0
>>>>>>> +	mv	s1, a1
>>>>>>> +	mv	s2, a2
>>>>>>> +	REG_L	s4, restore_pblist
>>>>>>> +	REG_L	a1, relocated_restore_code
>>>>>>> +
>>>>>>> +	jalr	a1
>>>>>>> +END(restore_image)
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * The below code will be executed from a 'safe' page.
>>>>>>> + * It first switches to the temporary page table, then start to copy the pages
>>>>>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>>>>>>> + * to restore the CPU context.
>>>>>>> + */
>>>>>>> +ENTRY(core_restore_code)
>>>>>>> +	/* switch to temp page table */
>>>>>>> +	csrw satp, s1
>>>>>>> +	sfence.vma
>>>>>>> +.Lcopy:
>>>>>>> +	/* The below code will restore the hibernated image. */
>>>>>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
>>>>>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
>>>>>>> +
>>>>>>> +	copy_page a0, a1
>>>>>>> +
>>>>>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
>>>>>>> +	bnez	s4, .Lcopy
>>>>>>> +
>>>>>>> +	jalr	s2
>>>>>>> +END(core_restore_code)
>>>>>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..bf7f3c781820
>>>>>>> --- /dev/null
>>>>>>> +++ b/arch/riscv/kernel/hibernate.c
>>>>>>> @@ -0,0 +1,360 @@
>>>>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>>>>> +/*
>>>>>>> + * Hibernation support specific for RISCV
>>>>>>> + *
>>>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>>>> + *
>>>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>>>> + */
>>>>>>> +
>>>>>>> +#include <asm/barrier.h>
>>>>>>> +#include <asm/cacheflush.h>
>>>>>>> +#include <asm/mmu_context.h>
>>>>>>> +#include <asm/page.h>
>>>>>>> +#include <asm/pgtable.h>
>>>>>>> +#include <asm/sections.h>
>>>>>>> +#include <asm/set_memory.h>
>>>>>>> +#include <asm/smp.h>
>>>>>>> +#include <asm/suspend.h>
>>>>>>> +
>>>>>>> +#include <linux/cpu.h>
>>>>>>> +#include <linux/memblock.h>
>>>>>>> +#include <linux/pm.h>
>>>>>>> +#include <linux/sched.h>
>>>>>>> +#include <linux/suspend.h>
>>>>>>> +#include <linux/utsname.h>
>>>>>>> +
>>>>>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
>>>>>>> +static int sleep_cpu = -EINVAL;
>>>>>>> +
>>>>>>> +/* CPU context to be saved */
>>>>>>> +struct suspend_context *hibernate_cpu_context;
>>>>>>> +
>>>>>>> +unsigned long relocated_restore_code;
>>>>>>> +
>>>>>>> +/* Pointer to the temporary resume page table */
>>>>>>> +pgd_t *resume_pg_dir;
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
>>>>>>> + * @uts_version: to save the build number and date so that the we are not resume with
>>>>>>> + *		a different kernel
>>>>>>> + */
>>>>>>> +struct arch_hibernate_hdr_invariants {
>>>>>>> +	char		uts_version[__NEW_UTS_LEN + 1];
>>>>>>> +};
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
>>>>>>> + * @invariants: container to store kernel build version
>>>>>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>>>>>>> + * @saved_satp: original page table used by the hibernated image.
>>>>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
>>>>>>> + */
>>>>>>> +static struct arch_hibernate_hdr {
>>>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>>>> +	unsigned long	hartid;
>>>>>>> +	unsigned long	saved_satp;
>>>>>>> +	unsigned long	restore_cpu_addr;
>>>>>>> +} resume_hdr;
>>>>>>> +
>>>>>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>>>>>>> +{
>>>>>>> +	memset(i, 0, sizeof(*i));
>>>>>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Check if the given pfn is in the 'nosave' section.
>>>>>>> + */
>>>>>>> +int pfn_is_nosave(unsigned long pfn)
>>>>>>> +{
>>>>>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
>>>>>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>>>>>> +
>>>>>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
>>>>>>> +}
>>>>>>> +
>>>>>>> +void notrace save_processor_state(void)
>>>>>>> +{
>>>>>>> +	WARN_ON(num_online_cpus() != 1);
>>>>>>> +}
>>>>>>> +
>>>>>>> +void notrace restore_processor_state(void)
>>>>>>> +{
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Helper parameters need to be saved to the hibernation image header.
>>>>>>> + */
>>>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
>>>>>>> +{
>>>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>>>> +
>>>>>>> +	if (max_size < sizeof(*hdr))
>>>>>>> +		return -EOVERFLOW;
>>>>>>> +
>>>>>>> +	arch_hdr_invariants(&hdr->invariants);
>>>>>>> +
>>>>>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
>>>>>>> +	hdr->saved_satp = csr_read(CSR_SATP);
>>>>>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Retrieve the helper parameters from the hibernation image header
>>>>>>> + */
>>>>>>> +int arch_hibernation_header_restore(void *addr)
>>>>>>> +{
>>>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>>>> +	int ret = 0;
>>>>>>> +
>>>>>>> +	arch_hdr_invariants(&invariants);
>>>>>>> +
>>>>>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
>>>>>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
>>>>>>> +		return -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
>>>>>>> +	if (sleep_cpu < 0) {
>>>>>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
>>>>>>> +		sleep_cpu = -EINVAL;
>>>>>>> +		return -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +#ifdef CONFIG_SMP
>>>>>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
>>>>>>> +	if (ret) {
>>>>>>> +		sleep_cpu = -EINVAL;
>>>>>>> +		return ret;
>>>>>>> +	}
>>>>>>> +#endif
>>>>>>> +	resume_hdr = *hdr;
>>>>>>> +
>>>>>>> +	return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
>>>>>>> +
>>>>>>> +int swsusp_arch_suspend(void)
>>>>>>> +{
>>>>>>> +	int ret = 0;
>>>>>>> +
>>>>>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
>>>>>>> +		sleep_cpu = smp_processor_id();
>>>>>>> +		suspend_save_csrs(hibernate_cpu_context);
>>>>>>> +		ret = swsusp_save();
>>>>>>> +	} else {
>>>>>>> +		suspend_restore_csrs(hibernate_cpu_context);
>>>>>>> +		flush_tlb_all();
>>>>>>> +
>>>>>>> +		/* Invalidated Icache */
>>>>>>> +		flush_icache_all();
>>>>>>> +
>>>>>>> +		/*
>>>>>>> +		 * Tell the hibernation core that we've just restored
>>>>>>> +		 * the memory
>>>>>>> +		 */
>>>>>>> +		in_suspend = 0;
>>>>>>> +		sleep_cpu = -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pte_idx = pte_index(vaddr);
>>>>>>> +
>>>>>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +#ifndef __PAGETABLE_PMD_FOLDED
>>>>>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
>>>>>>> +		(pgtable_l5_enabled ?					\
>>>>>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
>>>>>>> +		(pgtable_l4_enabled ?					\
>>>>>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
>>>>>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
>>>>>>> +	pte_t *ptep;
>>>>>>> +
>>>>>>> +	if (pmd_none(pmdp[pmd_idx])) {
>>>>>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!ptep)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(ptep, 0, PAGE_SIZE);
>>>>>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pud_index = pud_index(vaddr);
>>>>>>> +	pmd_t *pmdp;
>>>>>>> +
>>>>>>> +	if (pud_val(pudp[pud_index]) == 0) {
>>>>>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!pmdp)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(pmdp, 0, PAGE_SIZE);
>>>>>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t p4d_index = p4d_index(vaddr);
>>>>>>> +	pud_t *pudp;
>>>>>>> +
>>>>>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
>>>>>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!pudp)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(pudp, 0, PAGE_SIZE);
>>>>>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +#else
>>>>>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
>>>>>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
>>>>>>> +#endif /* __PAGETABLE_PMD_FOLDED */
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
>>>>>>> +	void *nextp;
>>>>>>> +
>>>>>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
>>>>>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!nextp)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(nextp, 0, PAGE_SIZE);
>>>>>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
>>
>> Is it possible to use "standard" way of going through a page table
>> instead of using the _next macros? I mean something like this (example
>> from arm64 code
>> https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/trans_pgd.c#L174
>> or my recent kasan patchset
>> https://patchwork.kernel.org/project/linux-riscv/patch/20230203075232.274282-3-alexghiti@rivosinc.com/):
>>
>>           do {
>>                   next = pgd_addr_end(vaddr, end);
>>
>>                   if (pgd_none(*pgd_k)) {
>>                           nextp = get_safe_page(GFP_ATOMIC);
>>                           memset(nextp, 0, PAGE_SIZE);
>>                           set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(nextp)),
>> PAGE_TABLE));
>> continue;
>> }
>>
>>                   kasan_shallow_populate_p4d(pgd_k, vaddr, next);
>>           } while (pgd_k++, vaddr = next, vaddr != end);
>>
>>
>> I have the same change to our early page table code on my todo list.
>>
>>
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long relocate_restore_code(void)
>>>>>>> +{
>>>>>>> +	unsigned long ret;
>>>>>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
>>>>>>> +
>>>>>>> +	if (!page)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	copy_page(page, core_restore_code);
>>>>>>> +
>>>>>>> +	/* Make the page containing the relocated code executable */
>>>>>>> +	set_memory_x((unsigned long)page, 1);
>>>>>>> +
>>>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
>>>>>>> +	if (ret)
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	return (unsigned long)page;
>>>>>>> +}
>>>>>>> +
>>>>>>> +int swsusp_arch_resume(void)
>>>>>>> +{
>>>>>>> +	unsigned long addr = PAGE_OFFSET;
>>>>>>> +	unsigned long ret;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
>>>>>>> +	 * we don't need to free it here.
>>>>>>> +	 */
>>>>>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +	if (!resume_pg_dir)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * The pages need to be writable when restoring the image.
>>>>>>> +	 * Create a second copy of page table just for the linear map, and use this when
>>>>>>> +	 * restoring.
>>>>>>> +	 */
>>>>>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
>>>>>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
>>>>>>> +		if (ret)
>>>>>>> +			return (int)ret;
>>>>>>> +	}
>>>>>>> +
>>>>>> To me this is wrong as this does not account for the real physical
>>>>>> mapping layout: can't you simply copy the linear mapping from
>>>>>> swapper_pg_dir?
>>>>> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
>>>> swapper_pg_dir as we are not suppose to modify its content.
>>>>
>>>>
>>>> First, you're right, we need the temporary page table as swapper_pg_dir
>>>> will get overwritten under our feet.
>>>>
>>>> Now, I still disagree with mapping all the memory: the linear mapping is
>>>> sparse because we only map what memblock gives us (some regions are
>>>> marked as "nomap" for a reason).
>>>>
>>>> I just took a look at arm64, and they do exactly that: they go through
>>>> swapper_pg_dir, copy the linear mapping and enable write at every leaf
>>>> level
>>>> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
>>> You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the
>> page validity prior to do the mapping. Agree?
>>
>>
>> That would work, we'd lose the benefit of huge pages though, I'm not
>> opposed at all but if we can leverage existing arm64 code, that would
>> even be better, only the PTE write flag is different!
> that's the thing. Arm64 uses two ttbr registers to manipulate the virtual address. The 1 page of relocated execution code is mapped to the ttbr0 while the temporary page table with huge page supported is mapped to the ttbr1. So, there is no need to handle the huge page split in the arch/arm64 for temporary page table.
> RISCV only has 1 satp, if we trying to support huge page for hibernation, then we need to handle the huge page split in arch/riscv for the temporary page table to handle the 1 page of relocated execution code. For current RISCV Arch, we can invoke the kernel_page_present() to check for the page validity but I am doubt we should implement the huge page split in the arch/riscv which is normally handled by the buddy allocator.


You mean that inserting the relocated page into the temporary page table 
would be hard because we would have to split huge page entries along the 
way right? But we can still map the relocated page to an address that is 
in its own pgd and avoid this, if we only map the linear mapping in the 
temporary page table, that is easy to find.


>> Thanks,
>>
>> Alex
>>
>>
>>>>>> But I have to admit that I struggle to understand the need for this
>>>>>> temporary page table: all we need to do is to allow to write to the
>>>>>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
>>>>> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you
>> modify
>>>> the swapper_pg_dir, the kernel will crash afterwards.
>>>>> That’s why we need a second page table to do the recovering job.
>>>>>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
>>>>>>> +	relocated_restore_code = relocate_restore_code();
>>>>>> And do we really need to do that too? The code in question can only be
>>>>>> overwritten by the same code right?
>>>>> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
>>>>>> Thanks,
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>> +	if (relocated_restore_code == -ENOMEM)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
>>>>>>> +	 * restore code can jump to it after finished restore the image. The next execution
>>>>>>> +	 * code doesn't find itself in a different address space after switching over to the
>>>>>>> +	 * original page table used by the hibernated image.
>>>>>>> +	 */
>>>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
>>>>>>> +					PAGE_KERNEL_READ_EXEC);
>>>>>>> +	if (ret)
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
>>>>>>> +			resume_hdr.restore_cpu_addr);
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +#ifdef CONFIG_PM_SLEEP_SMP
>>>>>>> +int hibernate_resume_nonboot_cpu_disable(void)
>>>>>>> +{
>>>>>>> +	if (sleep_cpu < 0) {
>>>>>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
>>>>>>> +		return -ENODEV;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return freeze_secondary_cpus(sleep_cpu);
>>>>>>> +}
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +static int __init riscv_hibernate_init(void)
>>>>>>> +{
>>>>>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
>>>>>>> +
>>>>>>> +	if (WARN_ON(!hibernate_cpu_context))
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +early_initcall(riscv_hibernate_init);
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
@ 2023-02-14  6:57                 ` Alexandre Ghiti
  0 siblings, 0 replies; 52+ messages in thread
From: Alexandre Ghiti @ 2023-02-14  6:57 UTC (permalink / raw)
  To: JeeHeng Sia, paul.walmsley, palmer, aou
  Cc: linux-riscv, linux-kernel, Leyfoon Tan, Mason Huo

On 2/13/23 02:51, JeeHeng Sia wrote:
> Hi Alex,
>
>> -----Original Message-----
>> From: Alexandre Ghiti <alex@ghiti.fr>
>> Sent: Friday, 10 February, 2023 9:24 PM
>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>> <mason.huo@starfivetech.com>
>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>
>> On 2/9/23 07:12, JeeHeng Sia wrote:
>>> Hi Alex,
>>>
>>>> -----Original Message-----
>>>> From: Alexandre Ghiti <alex@ghiti.fr>
>>>> Sent: Wednesday, 8 February, 2023 8:05 PM
>>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>>>> <mason.huo@starfivetech.com>
>>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>>>
>>>> Hi Sia,
>>>>
>>>> On 2/8/23 05:43, JeeHeng Sia wrote:
>>>>>> -----Original Message-----
>>>>>> From: Alexandre Ghiti <alex@ghiti.fr>
>>>>>> Sent: Tuesday, 7 February, 2023 11:46 PM
>>>>>> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>; paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu
>>>>>> Cc: linux-riscv@lists.infradead.org; linux-kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo
>>>>>> <mason.huo@starfivetech.com>
>>>>>> Subject: Re: [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>>>>>
>>>>>> Hi Sia,
>>>>>>
>>>>>> On 1/27/23 10:10, Sia Jee Heng wrote:
>>>>>>> Low level Arch functions were created to support hibernation.
>>>>>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
>>>>>>> cpu state onto the stack, then calling swsusp_save() to save the memory
>>>>>>> image.
>>>>>>>
>>>>>>> Arch specific hibernation header is implemented and is utilized by the
>>>>>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
>>>>>>> functions. The arch specific hibernation header consists of satp, hartid,
>>>>>>> and the cpu_resume address. The kernel built version is also need to be
>>>>>>> saved into the hibernation image header to making sure only the same
>>>>>>> kernel is restore when resume.
>>>>>>>
>>>>>>> swsusp_arch_resume() creates a temporary page table that covering only
>>>>>>> the linear map. It copies the restore code to a 'safe' page, then start
>>>>>>> to restore the memory image. Once completed, it restores the original
>>>>>>> kernel's page table. It then calls into __hibernate_cpu_resume()
>>>>>>> to restore the CPU context. Finally, it follows the normal hibernation
>>>>>>> path back to the hibernation core.
>>>>>>>
>>>>>>> To enable hibernation/suspend to disk into RISCV, the below config
>>>>>>> need to be enabled:
>>>>>>> - CONFIG_ARCH_HIBERNATION_HEADER
>>>>>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>>>>>>>
>>>>>>> Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
>>>>>>> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
>>>>>>> Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
>>>>>>> ---
>>>>>>>      arch/riscv/Kconfig                 |   7 +
>>>>>>>      arch/riscv/include/asm/assembler.h |  20 ++
>>>>>>>      arch/riscv/include/asm/suspend.h   |  21 ++
>>>>>>>      arch/riscv/kernel/Makefile         |   1 +
>>>>>>>      arch/riscv/kernel/asm-offsets.c    |   5 +
>>>>>>>      arch/riscv/kernel/hibernate-asm.S  |  89 +++++++
>>>>>>>      arch/riscv/kernel/hibernate.c      | 360 +++++++++++++++++++++++++++++
>>>>>>>      7 files changed, 503 insertions(+)
>>>>>>>      create mode 100644 arch/riscv/kernel/hibernate-asm.S
>>>>>>>      create mode 100644 arch/riscv/kernel/hibernate.c
>>>>>>>
>>>>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>>>>> index e2b656043abf..4555848a817f 100644
>>>>>>> --- a/arch/riscv/Kconfig
>>>>>>> +++ b/arch/riscv/Kconfig
>>>>>>> @@ -690,6 +690,13 @@ menu "Power management options"
>>>>>>>
>>>>>>>      source "kernel/power/Kconfig"
>>>>>>>
>>>>>>> +config ARCH_HIBERNATION_POSSIBLE
>>>>>>> +	def_bool y
>>>>>>> +
>>>>>>> +config ARCH_HIBERNATION_HEADER
>>>>>>> +	def_bool y
>>>>>>> +	depends on HIBERNATION
>>>>>>> +
>>>>>>>      endmenu # "Power management options"
>>>>>>>
>>>>>>>      menu "CPU Power Management"
>>>>>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
>>>>>>> index ef1283d04b70..3de70d3e6ceb 100644
>>>>>>> --- a/arch/riscv/include/asm/assembler.h
>>>>>>> +++ b/arch/riscv/include/asm/assembler.h
>>>>>>> @@ -59,4 +59,24 @@
>>>>>>>      		REG_L	s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>>>>>>>      	.endm
>>>>>>>
>>>>>>> +/**
>>>>>>> + * copy_page - copy 1 page (4KB) of data from source to destination
>>>>>>> + * @a0 - destination
>>>>>>> + * @a1 - source
>>>>>>> + */
>>>>>>> +	.macro	copy_page a0, a1
>>>>>>> +		lui	a2, 0x1
>>>>>>> +		add	a2, a2, a0
>>>>>>> +.1 :
>>>>>>> +		REG_L	t0, 0(a1)
>>>>>>> +		REG_L	t1, SZREG(a1)
>>>>>>> +
>>>>>>> +		REG_S	t0, 0(a0)
>>>>>>> +		REG_S	t1, SZREG(a0)
>>>>>>> +
>>>>>>> +		addi	a0, a0, 2 * SZREG
>>>>>>> +		addi	a1, a1, 2 * SZREG
>>>>>>> +		bne	a2, a0, .1
>>>>>>> +	.endm
>>>>>>> +
>>>>>>>      #endif	/* __ASM_ASSEMBLER_H */
>>>>>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
>>>>>>> index 75419c5ca272..db40ae433aa9 100644
>>>>>>> --- a/arch/riscv/include/asm/suspend.h
>>>>>>> +++ b/arch/riscv/include/asm/suspend.h
>>>>>>> @@ -21,6 +21,12 @@ struct suspend_context {
>>>>>>>      #endif
>>>>>>>      };
>>>>>>>
>>>>>>> +/*
>>>>>>> + * This parameter will be assigned to 0 during resume and will be used by
>>>>>>> + * hibernation core for the subsequent resume sequence
>>>>>>> + */
>>>>>>> +extern int in_suspend;
>>>>>>> +
>>>>>>>      /* Low-level CPU suspend entry function */
>>>>>>>      int __cpu_suspend_enter(struct suspend_context *context);
>>>>>>>
>>>>>>> @@ -36,4 +42,19 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>>>>>>>      /* Used to save and restore the csr */
>>>>>>>      void suspend_save_csrs(struct suspend_context *context);
>>>>>>>      void suspend_restore_csrs(struct suspend_context *context);
>>>>>>> +
>>>>>>> +/* Low-level API to support hibernation */
>>>>>>> +int swsusp_arch_suspend(void);
>>>>>>> +int swsusp_arch_resume(void);
>>>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
>>>>>>> +int arch_hibernation_header_restore(void *addr);
>>>>>>> +int __hibernate_cpu_resume(void);
>>>>>>> +
>>>>>>> +/* Used to resume on the CPU we hibernated on */
>>>>>>> +int hibernate_resume_nonboot_cpu_disable(void);
>>>>>>> +
>>>>>>> +/* Used to restore the hibernated image */
>>>>>>> +asmlinkage void restore_image(unsigned long resume_satp, unsigned long satp_temp,
>>>>>>> +				unsigned long cpu_resume);
>>>>>>> +asmlinkage int core_restore_code(void);
>>>>>>>      #endif
>>>>>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>>>>>> index 4cf303a779ab..daab341d55e4 100644
>>>>>>> --- a/arch/riscv/kernel/Makefile
>>>>>>> +++ b/arch/riscv/kernel/Makefile
>>>>>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES)		+= module.o
>>>>>>>      obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>>>>>>>
>>>>>>>      obj-$(CONFIG_CPU_PM)		+= suspend_entry.o suspend.o
>>>>>>> +obj-$(CONFIG_HIBERNATION)	+= hibernate.o hibernate-asm.o
>>>>>>>
>>>>>>>      obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>>>>>>>      obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>>>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>>>>>> index df9444397908..d6a75aac1d27 100644
>>>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>> @@ -9,6 +9,7 @@
>>>>>>>      #include <linux/kbuild.h>
>>>>>>>      #include <linux/mm.h>
>>>>>>>      #include <linux/sched.h>
>>>>>>> +#include <linux/suspend.h>
>>>>>>>      #include <asm/kvm_host.h>
>>>>>>>      #include <asm/thread_info.h>
>>>>>>>      #include <asm/ptrace.h>
>>>>>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
>>>>>>>
>>>>>>>      	OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>>>>>>>
>>>>>>> +	OFFSET(HIBERN_PBE_ADDR, pbe, address);
>>>>>>> +	OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
>>>>>>> +	OFFSET(HIBERN_PBE_NEXT, pbe, next);
>>>>>>> +
>>>>>>>      	OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>>>>>>>      	OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>>>>>>>      	OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
>>>>>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..a83d534b89bd
>>>>>>> --- /dev/null
>>>>>>> +++ b/arch/riscv/kernel/hibernate-asm.S
>>>>>>> @@ -0,0 +1,89 @@
>>>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>>>>> +/*
>>>>>>> + * Hibernation support specific for RISCV
>>>>>>> + *
>>>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>>>> + *
>>>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>>>> + */
>>>>>>> +
>>>>>>> +#include <asm/asm.h>
>>>>>>> +#include <asm/asm-offsets.h>
>>>>>>> +#include <asm/assembler.h>
>>>>>>> +#include <asm/csr.h>
>>>>>>> +
>>>>>>> +#include <linux/linkage.h>
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * This code is executed when resume from the hibernation.
>>>>>>> + *
>>>>>>> + * It begins with loading the temporary page table then restores the memory image.
>>>>>>> + * Finally branches to __hibernate_cpu_resume() to restore the state saved by
>>>>>>> + * swsusp_arch_suspend().
>>>>>>> + */
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * int __hibernate_cpu_resume(void)
>>>>>>> + * Switch back to the hibernated image's page table prior to restore the CPU
>>>>>>> + * context.
>>>>>>> + *
>>>>>>> + * Always returns 0 to the C code.
>>>>>>> + */
>>>>>>> +ENTRY(__hibernate_cpu_resume)
>>>>>>> +	/* switch to hibernated image's page table */
>>>>>>> +	csrw CSR_SATP, s0
>>>>>>> +	sfence.vma
>>>>>>> +
>>>>>>> +	REG_L	a0, hibernate_cpu_context
>>>>>>> +
>>>>>>> +	/* Restore CSRs */
>>>>>>> +	restore_csr
>>>>>>> +
>>>>>>> +	/* Restore registers (except A0 and T0-T6) */
>>>>>>> +	restore_reg
>>>>>>> +
>>>>>>> +	/* Return zero value */
>>>>>>> +	add	a0, zero, zero
>>>>>>> +
>>>>>>> +	/* Return to C code */
>>>>>>> +	ret
>>>>>>> +END(__hibernate_cpu_resume)
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Prepare to restore the image.
>>>>>>> + * a0: satp of saved page tables
>>>>>>> + * a1: satp of temporary page tables
>>>>>>> + * a2: cpu_resume
>>>>>>> + */
>>>>>>> +ENTRY(restore_image)
>>>>>>> +	mv	s0, a0
>>>>>>> +	mv	s1, a1
>>>>>>> +	mv	s2, a2
>>>>>>> +	REG_L	s4, restore_pblist
>>>>>>> +	REG_L	a1, relocated_restore_code
>>>>>>> +
>>>>>>> +	jalr	a1
>>>>>>> +END(restore_image)
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * The below code will be executed from a 'safe' page.
>>>>>>> + * It first switches to the temporary page table, then start to copy the pages
>>>>>>> + * back to the original memory location. Finally, it jumps to the __hibernate_cpu_resume()
>>>>>>> + * to restore the CPU context.
>>>>>>> + */
>>>>>>> +ENTRY(core_restore_code)
>>>>>>> +	/* switch to temp page table */
>>>>>>> +	csrw satp, s1
>>>>>>> +	sfence.vma
>>>>>>> +.Lcopy:
>>>>>>> +	/* The below code will restore the hibernated image. */
>>>>>>> +	REG_L	a1, HIBERN_PBE_ADDR(s4)
>>>>>>> +	REG_L	a0, HIBERN_PBE_ORIG(s4)
>>>>>>> +
>>>>>>> +	copy_page a0, a1
>>>>>>> +
>>>>>>> +	REG_L	s4, HIBERN_PBE_NEXT(s4)
>>>>>>> +	bnez	s4, .Lcopy
>>>>>>> +
>>>>>>> +	jalr	s2
>>>>>>> +END(core_restore_code)
>>>>>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..bf7f3c781820
>>>>>>> --- /dev/null
>>>>>>> +++ b/arch/riscv/kernel/hibernate.c
>>>>>>> @@ -0,0 +1,360 @@
>>>>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>>>>> +/*
>>>>>>> + * Hibernation support specific for RISCV
>>>>>>> + *
>>>>>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>>>>>> + *
>>>>>>> + * Author: Jee Heng Sia <jeeheng.sia@starfivetech.com>
>>>>>>> + */
>>>>>>> +
>>>>>>> +#include <asm/barrier.h>
>>>>>>> +#include <asm/cacheflush.h>
>>>>>>> +#include <asm/mmu_context.h>
>>>>>>> +#include <asm/page.h>
>>>>>>> +#include <asm/pgtable.h>
>>>>>>> +#include <asm/sections.h>
>>>>>>> +#include <asm/set_memory.h>
>>>>>>> +#include <asm/smp.h>
>>>>>>> +#include <asm/suspend.h>
>>>>>>> +
>>>>>>> +#include <linux/cpu.h>
>>>>>>> +#include <linux/memblock.h>
>>>>>>> +#include <linux/pm.h>
>>>>>>> +#include <linux/sched.h>
>>>>>>> +#include <linux/suspend.h>
>>>>>>> +#include <linux/utsname.h>
>>>>>>> +
>>>>>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number */
>>>>>>> +static int sleep_cpu = -EINVAL;
>>>>>>> +
>>>>>>> +/* CPU context to be saved */
>>>>>>> +struct suspend_context *hibernate_cpu_context;
>>>>>>> +
>>>>>>> +unsigned long relocated_restore_code;
>>>>>>> +
>>>>>>> +/* Pointer to the temporary resume page table */
>>>>>>> +pgd_t *resume_pg_dir;
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version
>>>>>>> + * @uts_version: to save the build number and date so that the we are not resume with
>>>>>>> + *		a different kernel
>>>>>>> + */
>>>>>>> +struct arch_hibernate_hdr_invariants {
>>>>>>> +	char		uts_version[__NEW_UTS_LEN + 1];
>>>>>>> +};
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image
>>>>>>> + * @invariants: container to store kernel build version
>>>>>>> + * @hartid: to make sure same boot_cpu executing the hibernate/restore code.
>>>>>>> + * @saved_satp: original page table used by the hibernated image.
>>>>>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
>>>>>>> + */
>>>>>>> +static struct arch_hibernate_hdr {
>>>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>>>> +	unsigned long	hartid;
>>>>>>> +	unsigned long	saved_satp;
>>>>>>> +	unsigned long	restore_cpu_addr;
>>>>>>> +} resume_hdr;
>>>>>>> +
>>>>>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>>>>>>> +{
>>>>>>> +	memset(i, 0, sizeof(*i));
>>>>>>> +	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Check if the given pfn is in the 'nosave' section.
>>>>>>> + */
>>>>>>> +int pfn_is_nosave(unsigned long pfn)
>>>>>>> +{
>>>>>>> +	unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
>>>>>>> +	unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>>>>>> +
>>>>>>> +	return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
>>>>>>> +}
>>>>>>> +
>>>>>>> +void notrace save_processor_state(void)
>>>>>>> +{
>>>>>>> +	WARN_ON(num_online_cpus() != 1);
>>>>>>> +}
>>>>>>> +
>>>>>>> +void notrace restore_processor_state(void)
>>>>>>> +{
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Helper parameters need to be saved to the hibernation image header.
>>>>>>> + */
>>>>>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
>>>>>>> +{
>>>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>>>> +
>>>>>>> +	if (max_size < sizeof(*hdr))
>>>>>>> +		return -EOVERFLOW;
>>>>>>> +
>>>>>>> +	arch_hdr_invariants(&hdr->invariants);
>>>>>>> +
>>>>>>> +	hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
>>>>>>> +	hdr->saved_satp = csr_read(CSR_SATP);
>>>>>>> +	hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(arch_hibernation_header_save);
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Retrieve the helper parameters from the hibernation image header
>>>>>>> + */
>>>>>>> +int arch_hibernation_header_restore(void *addr)
>>>>>>> +{
>>>>>>> +	struct arch_hibernate_hdr_invariants invariants;
>>>>>>> +	struct arch_hibernate_hdr *hdr = addr;
>>>>>>> +	int ret = 0;
>>>>>>> +
>>>>>>> +	arch_hdr_invariants(&invariants);
>>>>>>> +
>>>>>>> +	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
>>>>>>> +		pr_crit("Hibernate image not generated by this kernel!\n");
>>>>>>> +		return -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
>>>>>>> +	if (sleep_cpu < 0) {
>>>>>>> +		pr_crit("Hibernated on a CPU not known to this kernel!\n");
>>>>>>> +		sleep_cpu = -EINVAL;
>>>>>>> +		return -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +#ifdef CONFIG_SMP
>>>>>>> +	ret = bringup_hibernate_cpu(sleep_cpu);
>>>>>>> +	if (ret) {
>>>>>>> +		sleep_cpu = -EINVAL;
>>>>>>> +		return ret;
>>>>>>> +	}
>>>>>>> +#endif
>>>>>>> +	resume_hdr = *hdr;
>>>>>>> +
>>>>>>> +	return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(arch_hibernation_header_restore);
>>>>>>> +
>>>>>>> +int swsusp_arch_suspend(void)
>>>>>>> +{
>>>>>>> +	int ret = 0;
>>>>>>> +
>>>>>>> +	if (__cpu_suspend_enter(hibernate_cpu_context)) {
>>>>>>> +		sleep_cpu = smp_processor_id();
>>>>>>> +		suspend_save_csrs(hibernate_cpu_context);
>>>>>>> +		ret = swsusp_save();
>>>>>>> +	} else {
>>>>>>> +		suspend_restore_csrs(hibernate_cpu_context);
>>>>>>> +		flush_tlb_all();
>>>>>>> +
>>>>>>> +		/* Invalidated Icache */
>>>>>>> +		flush_icache_all();
>>>>>>> +
>>>>>>> +		/*
>>>>>>> +		 * Tell the hibernation core that we've just restored
>>>>>>> +		 * the memory
>>>>>>> +		 */
>>>>>>> +		in_suspend = 0;
>>>>>>> +		sleep_cpu = -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pte(pte_t *ptep, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pte_idx = pte_index(vaddr);
>>>>>>> +
>>>>>>> +	ptep[pte_idx] = pfn_pte(PFN_DOWN(__pa(vaddr)), prot);
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +#ifndef __PAGETABLE_PMD_FOLDED
>>>>>>> +#define temp_pgtable_map_pgd_next(pgdp, vaddr, prot)			\
>>>>>>> +		(pgtable_l5_enabled ?					\
>>>>>>> +		temp_pgtable_map_p4d(pgdp, vaddr, prot) :		\
>>>>>>> +		(pgtable_l4_enabled ?					\
>>>>>>> +		temp_pgtable_map_pud((pud_t *)pgdp, vaddr, prot) :	\
>>>>>>> +		temp_pgtable_map_pmd((pmd_t *)pgdp, vaddr, prot)))
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pmd(pmd_t *pmdp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pmd_idx = pmd_index(vaddr);
>>>>>>> +	pte_t *ptep;
>>>>>>> +
>>>>>>> +	if (pmd_none(pmdp[pmd_idx])) {
>>>>>>> +		ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!ptep)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(ptep, 0, PAGE_SIZE);
>>>>>>> +		pmdp[pmd_idx] = pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		ptep = (pte_t *)__va(PFN_PHYS(_pmd_pfn(pmdp[pmd_idx])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pte(ptep, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pud(pud_t *pudp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pud_index = pud_index(vaddr);
>>>>>>> +	pmd_t *pmdp;
>>>>>>> +
>>>>>>> +	if (pud_val(pudp[pud_index]) == 0) {
>>>>>>> +		pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!pmdp)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(pmdp, 0, PAGE_SIZE);
>>>>>>> +		pudp[pud_index] = pfn_pud(PFN_DOWN(__pa(pmdp)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		pmdp = (pmd_t *)__va(PFN_PHYS(_pud_pfn(pudp[pud_index])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pmd(pmdp, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_p4d(p4d_t *p4dp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t p4d_index = p4d_index(vaddr);
>>>>>>> +	pud_t *pudp;
>>>>>>> +
>>>>>>> +	if (p4d_val(p4dp[p4d_index]) == 0) {
>>>>>>> +		pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!pudp)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(pudp, 0, PAGE_SIZE);
>>>>>>> +		p4dp[p4d_index] = pfn_p4d(PFN_DOWN(__pa(pudp)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		pudp = (pud_t *)__va(PFN_PHYS(_p4d_pfn(p4dp[p4d_index])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pud(pudp, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +#else
>>>>>>> +#define temp_pgtable_map_pgd_next(nextp, vaddr, prot)	\
>>>>>>> +	temp_pgtable_map_pte((pte_t *)nextp, vaddr, prot)
>>>>>>> +#endif /* __PAGETABLE_PMD_FOLDED */
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_map_pgd(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	uintptr_t pgd_idx = pgd_index(vaddr);
>>>>>>> +	void *nextp;
>>>>>>> +
>>>>>>> +	if (pgd_val(pgdp[pgd_idx]) == 0) {
>>>>>>> +		nextp = (void *)get_safe_page(GFP_ATOMIC);
>>>>>>> +		if (!nextp)
>>>>>>> +			return -ENOMEM;
>>>>>>> +
>>>>>>> +		memset(nextp, 0, PAGE_SIZE);
>>>>>>> +		pgdp[pgd_idx] = pfn_pgd(PFN_DOWN(__pa(nextp)), PAGE_TABLE);
>>>>>>> +	} else {
>>>>>>> +		nextp = (void *)__va(PFN_PHYS(_pgd_pfn(pgdp[pgd_idx])));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return temp_pgtable_map_pgd_next(nextp, vaddr, prot);
>>
>> Is it possible to use "standard" way of going through a page table
>> instead of using the _next macros? I mean something like this (example
>> from arm64 code
>> https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/trans_pgd.c#L174
>> or my recent kasan patchset
>> https://patchwork.kernel.org/project/linux-riscv/patch/20230203075232.274282-3-alexghiti@rivosinc.com/):
>>
>>           do {
>>                   next = pgd_addr_end(vaddr, end);
>>
>>                   if (pgd_none(*pgd_k)) {
>>                           nextp = get_safe_page(GFP_ATOMIC);
>>                           memset(nextp, 0, PAGE_SIZE);
>>                           set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(nextp)),
>> PAGE_TABLE));
>> continue;
>> }
>>
>>                   kasan_shallow_populate_p4d(pgd_k, vaddr, next);
>>           } while (pgd_k++, vaddr = next, vaddr != end);
>>
>>
>> I have the same change to our early page table code on my todo list.
>>
>>
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp, unsigned long vaddr, pgprot_t prot)
>>>>>>> +{
>>>>>>> +	return temp_pgtable_map_pgd(pgdp, vaddr, prot);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static unsigned long relocate_restore_code(void)
>>>>>>> +{
>>>>>>> +	unsigned long ret;
>>>>>>> +	void *page = (void *)get_safe_page(GFP_ATOMIC);
>>>>>>> +
>>>>>>> +	if (!page)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	copy_page(page, core_restore_code);
>>>>>>> +
>>>>>>> +	/* Make the page containing the relocated code executable */
>>>>>>> +	set_memory_x((unsigned long)page, 1);
>>>>>>> +
>>>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)page, PAGE_KERNEL_READ_EXEC);
>>>>>>> +	if (ret)
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	return (unsigned long)page;
>>>>>>> +}
>>>>>>> +
>>>>>>> +int swsusp_arch_resume(void)
>>>>>>> +{
>>>>>>> +	unsigned long addr = PAGE_OFFSET;
>>>>>>> +	unsigned long ret;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
>>>>>>> +	 * we don't need to free it here.
>>>>>>> +	 */
>>>>>>> +	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
>>>>>>> +	if (!resume_pg_dir)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * The pages need to be writable when restoring the image.
>>>>>>> +	 * Create a second copy of page table just for the linear map, and use this when
>>>>>>> +	 * restoring.
>>>>>>> +	 */
>>>>>>> +	for (; addr <= (unsigned long)pfn_to_virt(max_low_pfn); addr += PAGE_SIZE) {
>>>>>>> +		ret = temp_pgtable_mapping(resume_pg_dir, addr, PAGE_KERNEL);
>>>>>>> +		if (ret)
>>>>>>> +			return (int)ret;
>>>>>>> +	}
>>>>>>> +
>>>>>> To me this is wrong as this does not account for the real physical
>>>>>> mapping layout: can't you simply copy the linear mapping from
>>>>>> swapper_pg_dir?
>>>>> Hi, we covering the linear mapping from PAGE_OFFSET up to pfn_to_virt(max_low_pfn). We shouldn't copy from the
>>>> swapper_pg_dir as we are not suppose to modify its content.
>>>>
>>>>
>>>> First, you're right, we need the temporary page table as swapper_pg_dir
>>>> will get overwritten under our feet.
>>>>
>>>> Now, I still disagree with mapping all the memory: the linear mapping is
>>>> sparse because we only map what memblock gives us (some regions are
>>>> marked as "nomap" for a reason).
>>>>
>>>> I just took a look at arm64, and they do exactly that: they go through
>>>> swapper_pg_dir, copy the linear mapping and enable write at every leaf
>>>> level
>>>> (https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/hibernate.c#L419).
>>> You're right, but we don’t have to copy from the swapper_pg_dir. We can insert kernel_page_present() to the function to check the
>> page validity prior to do the mapping. Agree?
>>
>>
>> That would work, we'd lose the benefit of huge pages though, I'm not
>> opposed at all but if we can leverage existing arm64 code, that would
>> even be better, only the PTE write flag is different!
> that's the thing. Arm64 uses two ttbr registers to manipulate the virtual address. The 1 page of relocated execution code is mapped to the ttbr0 while the temporary page table with huge page supported is mapped to the ttbr1. So, there is no need to handle the huge page split in the arch/arm64 for temporary page table.
> RISCV only has 1 satp, if we trying to support huge page for hibernation, then we need to handle the huge page split in arch/riscv for the temporary page table to handle the 1 page of relocated execution code. For current RISCV Arch, we can invoke the kernel_page_present() to check for the page validity but I am doubt we should implement the huge page split in the arch/riscv which is normally handled by the buddy allocator.


You mean that inserting the relocated page into the temporary page table 
would be hard because we would have to split huge page entries along the 
way right? But we can still map the relocated page to an address that is 
in its own pgd and avoid this, if we only map the linear mapping in the 
temporary page table, that is easy to find.


>> Thanks,
>>
>> Alex
>>
>>
>>>>>> But I have to admit that I struggle to understand the need for this
>>>>>> temporary page table: all we need to do is to allow to write to the
>>>>>> linear mapping, so why don't we simply set_memory_rw(linear mapping)?
>>>>> Similar to the above comment. When we restore the memory content, we need to make sure the pages are write-able. if you
>> modify
>>>> the swapper_pg_dir, the kernel will crash afterwards.
>>>>> That’s why we need a second page table to do the recovering job.
>>>>>>> +	/* Move the restore code to a new page so that it doesn't get overwritten by itself */
>>>>>>> +	relocated_restore_code = relocate_restore_code();
>>>>>> And do we really need to do that too? The code in question can only be
>>>>>> overwritten by the same code right?
>>>>> Yes, we need to move the recovering code to a new page to prevent the code from overwrite itself when restoring the memory.
>>>>>> Thanks,
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>> +	if (relocated_restore_code == -ENOMEM)
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * Map the __hibernate_cpu_resume() address to the temporary page table so that the
>>>>>>> +	 * restore code can jump to it after finished restore the image. The next execution
>>>>>>> +	 * code doesn't find itself in a different address space after switching over to the
>>>>>>> +	 * original page table used by the hibernated image.
>>>>>>> +	 */
>>>>>>> +	ret = temp_pgtable_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr,
>>>>>>> +					PAGE_KERNEL_READ_EXEC);
>>>>>>> +	if (ret)
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
>>>>>>> +			resume_hdr.restore_cpu_addr);
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +#ifdef CONFIG_PM_SLEEP_SMP
>>>>>>> +int hibernate_resume_nonboot_cpu_disable(void)
>>>>>>> +{
>>>>>>> +	if (sleep_cpu < 0) {
>>>>>>> +		pr_err("Failing to resume from hibernate on an unknown CPU.\n");
>>>>>>> +		return -ENODEV;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return freeze_secondary_cpus(sleep_cpu);
>>>>>>> +}
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +static int __init riscv_hibernate_init(void)
>>>>>>> +{
>>>>>>> +	hibernate_cpu_context = kcalloc(1, sizeof(struct suspend_context), GFP_KERNEL);
>>>>>>> +
>>>>>>> +	if (WARN_ON(!hibernate_cpu_context))
>>>>>>> +		return -ENOMEM;
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +early_initcall(riscv_hibernate_init);
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-02-14  6:57 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-27  9:10 [PATCH v3 0/4] RISC-V Hibernation Support Sia Jee Heng
2023-01-27  9:10 ` Sia Jee Heng
2023-01-27  9:10 ` [PATCH v3 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function Sia Jee Heng
2023-01-27  9:10   ` Sia Jee Heng
2023-01-30 23:31   ` Conor Dooley
2023-01-30 23:31     ` Conor Dooley
2023-01-27  9:10 ` [PATCH v3 2/4] RISC-V: Factor out common code of __cpu_resume_enter() Sia Jee Heng
2023-01-27  9:10   ` Sia Jee Heng
2023-01-30 21:49   ` Conor Dooley
2023-01-30 21:49     ` Conor Dooley
2023-02-01  6:19     ` JeeHeng Sia
2023-02-01  6:19       ` JeeHeng Sia
2023-01-27  9:10 ` [PATCH v3 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function Sia Jee Heng
2023-01-27  9:10   ` Sia Jee Heng
2023-01-30 21:57   ` Conor Dooley
2023-01-30 21:57     ` Conor Dooley
2023-01-31  8:19     ` Alexandre Ghiti
2023-01-31  8:19       ` Alexandre Ghiti
2023-02-01  5:48       ` JeeHeng Sia
2023-02-01  5:48         ` JeeHeng Sia
2023-01-27  9:10 ` [PATCH v3 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk Sia Jee Heng
2023-01-27  9:10   ` Sia Jee Heng
2023-01-30 23:30   ` Conor Dooley
2023-01-30 23:30     ` Conor Dooley
2023-01-31  9:59     ` Alexandre Ghiti
2023-01-31  9:59       ` Alexandre Ghiti
2023-02-07  4:58       ` JeeHeng Sia
2023-02-07  4:58         ` JeeHeng Sia
2023-02-07  5:27         ` Alexandre Ghiti
2023-02-07  5:27           ` Alexandre Ghiti
2023-02-02  2:43     ` JeeHeng Sia
2023-02-02  2:43       ` JeeHeng Sia
2023-02-03  3:43     ` JeeHeng Sia
2023-02-03  3:43       ` JeeHeng Sia
2023-02-03  6:30       ` Conor Dooley
2023-02-03  6:30         ` Conor Dooley
2023-02-04 20:42   ` kernel test robot
2023-02-04 20:42     ` kernel test robot
2023-02-07 15:46   ` Alexandre Ghiti
2023-02-07 15:46     ` Alexandre Ghiti
2023-02-08  4:43     ` JeeHeng Sia
2023-02-08  4:43       ` JeeHeng Sia
2023-02-08 12:04       ` Alexandre Ghiti
2023-02-08 12:04         ` Alexandre Ghiti
2023-02-09  6:12         ` JeeHeng Sia
2023-02-09  6:12           ` JeeHeng Sia
2023-02-10 13:24           ` Alexandre Ghiti
2023-02-10 13:24             ` Alexandre Ghiti
2023-02-13  1:51             ` JeeHeng Sia
2023-02-13  1:51               ` JeeHeng Sia
2023-02-14  6:57               ` Alexandre Ghiti
2023-02-14  6:57                 ` Alexandre Ghiti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.