All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk
@ 2016-04-01 16:53 James Morse
  2016-04-01 16:53 ` [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP James Morse
                   ` (16 more replies)
  0 siblings, 17 replies; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This is the latest version of hibernate for arm64, based on v4.6-rc1.

Patch 1 was sent as a fix for KVM [0]. It is included here to avoid conflicts
with patch 7 ("arm64: kvm: allows kvm cpu hotplug") after rc2.

After feedback on Kexec's 'arm64: Convert hcalls to use HVC immediate value'
[1], I've merged two of those patches and kept the call-number/function-pointer
in x0 convention and use of 'hvc #0'. The result is patch 6.
It looks like there are no registers that are safe for the hyp-stub to keep
values in, so I've moved the lr save/restore to wrap the 'hvc' call in EL1.
(patches 4 and 5).

Patch 7 conflicted with the VHE changes, the main change is now the
hardware enable/disable calls that used to wrap the gic setup code on one
cpu, now wraps everything in init_subsystems() and runs on all cpus. This
is needed by 0d98d00b8d80 ('arm64: KVM: vgic-v3: Reset LRs at boot time'),
which expects KVM to be initialised on all cpus when it runs.

Patch 8 (arm64: kernel: Rework finisher callback out of __cpu_suspend_enter())
and later are just a straight rebase.

This series can be retrieved from:
git://linux-arm.org/linux-jm.git -b hibernate/v7

[0] http://www.spinics.net/lists/arm-kernel/msg494214.html
[1] http://www.spinics.net/lists/arm-kernel/msg490781.html

Changes since v6:
 * Rebase on v4.6-rc1
 * Merged hcall patches, changed to keep switch on x0,
 * allow lr to be clobbered by hvc calls
 * Added missing icache maintenance + isb from __kvm_hyp_reset()

Changes since v5:
 * Call el2_setup() from hibernate's assembly code
 * Use UTS_VERSION to catch hibernate/resume with different builds

Changes since v4:
 * Mask in current T0SZ bits when restoring tcr_el1.
 * Use el2_setup() to reconfigure el2.
 * Remove fiq calls.
 * Removed kvm_arch_hardware_disable() call.
 * Added 'in_suspend = 0' to avoid 'if (uninitialised-memory) ...'.
 * Removed icache-flush from hibernate-asm.S, this was done in hibernate.c.
 * Commented tlbis.
 * Removed dsb after copy_page(), this behaviour is guaranteed by the
   architecture.

Changes since v3:
 * To work with kaslr:
   * hibernate now uses the arch-header to store the address of the page
     tables, and the point to re-enter the resumed kernel.
   * The el2 vectors are reloaded to point to the 'safe' page, then back to the
     resumed kernel.
   * PoC cleaning is done after the jump to the resumed kernel, as we don't
     know the restored kernel's boundaries in advance.
   * Some variables are accessed via aliases in the linear map, as the kernel
     text is not mapped during resume. restore_pblist is one example.
   * Execute the safe page from the bottom of memory, not the top, so that we
     can restore the resumed kernel's page tables directly.

 * Rebased the common patches onto v13 of kexec
 * Changed hibernate-asm.S to use the new copy_page macro.
 * Changed copy_p?d()s to use the do { } while(); pattern.
 * Added some missing barriers. (dsb after ic ialluis).

Changes since v2:
 * Rewrote restore in-place patch - we can't clean pages in copy_page(), we
   need to publish a list for the architecture to clean
 * Added missing pgprot_val() in hibernate.c, spotted by STRICT_MM_TYPECHECKS
 * Removed 'tcr_set_idmap_t0sz' from proc.S - I missed this when rebase-ing
 * Re-imported the first four patches from kexec v12
 * Rebased onto v4.4-rc2
 * Changes from Pavel Machek's comments

Changes since v1:
 * Removed for_each_process(){ for_each_vma() { } }; cache cleaning, replaced
   with icache_flush_range() call in core hibernate code
 * Rebased onto conflicting tcr_ek1.t0sz bug-fix patch

[v6] http://www.spinics.net/lists/arm-kernel/msg487467.html
[v5] http://www.spinics.net/lists/arm-kernel/msg483595.html
[v4] http://www.spinics.net/lists/arm-kernel/msg477769.html
[v3] http://www.spinics.net/lists/arm-kernel/msg463590.html
[v2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/376450.html
[v1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/376450.html


AKASHI Takahiro (1):
  arm64: kvm: allows kvm cpu hotplug

Geoff Levand (4):
  arm64: Fold proc-macros.S into assembler.h
  arm64: Cleanup SCTLR flags
  arm64: hyp/kvm: Extend hyp-stub API to allow function calls at EL2
  arm64: Add new asm macro copy_page

James Morse (11):
  arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  arm64: kvm: Move the do_el2_call macro to a header file
  arm64: kvm: Move lr save/restore from do_el2_call into EL1
  arm64: kernel: Rework finisher callback out of __cpu_suspend_enter().
  arm64: Change cpu_resume() to enable mmu early then access sleep_sp by
    va
  arm64: kernel: Include _AC definition in page.h
  arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
  arm64: head.S: el2_setup() to accept sctlr_el1 as an argument
  PM / Hibernate: Call flush_icache_range() on pages restored in-place
  arm64: kernel: Add support for hibernate/suspend-to-disk
  arm64: hibernate: Prevent resume from a different kernel version

 arch/arm/include/asm/kvm_host.h    |  10 +-
 arch/arm/include/asm/kvm_mmu.h     |   1 +
 arch/arm/kvm/arm.c                 | 148 +++++++-----
 arch/arm/kvm/mmu.c                 |   5 +
 arch/arm64/Kconfig                 |   7 +
 arch/arm64/include/asm/assembler.h | 111 ++++++++-
 arch/arm64/include/asm/kvm_arm.h   |  11 -
 arch/arm64/include/asm/kvm_asm.h   |   1 +
 arch/arm64/include/asm/kvm_host.h  |  13 +-
 arch/arm64/include/asm/kvm_mmu.h   |   1 +
 arch/arm64/include/asm/memory.h    |   3 +
 arch/arm64/include/asm/page.h      |   2 +
 arch/arm64/include/asm/suspend.h   |  32 ++-
 arch/arm64/include/asm/sysreg.h    |  19 +-
 arch/arm64/include/asm/virt.h      |  30 ++-
 arch/arm64/kernel/Makefile         |   1 +
 arch/arm64/kernel/asm-offsets.c    |  10 +-
 arch/arm64/kernel/head.S           |  24 +-
 arch/arm64/kernel/hibernate-asm.S  | 166 +++++++++++++
 arch/arm64/kernel/hibernate.c      | 478 +++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/hyp-stub.S       |  43 +++-
 arch/arm64/kernel/setup.c          |   1 -
 arch/arm64/kernel/sleep.S          | 155 +++++-------
 arch/arm64/kernel/suspend.c        |  96 ++++----
 arch/arm64/kernel/vmlinux.lds.S    |  15 ++
 arch/arm64/kvm/hyp-init.S          |  43 +++-
 arch/arm64/kvm/hyp.S               |  11 +-
 arch/arm64/kvm/hyp/hyp-entry.S     |  23 +-
 arch/arm64/kvm/reset.c             |  14 ++
 arch/arm64/mm/cache.S              |   2 -
 arch/arm64/mm/proc-macros.S        |  98 --------
 arch/arm64/mm/proc.S               |  56 ++---
 kernel/power/swap.c                |  18 ++
 33 files changed, 1236 insertions(+), 412 deletions(-)
 create mode 100644 arch/arm64/kernel/hibernate-asm.S
 create mode 100644 arch/arm64/kernel/hibernate.c
 delete mode 100644 arch/arm64/mm/proc-macros.S

-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-18 16:10   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 02/16] arm64: Fold proc-macros.S into assembler.h James Morse
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

When the kernel is running at EL2, it doesn't need init_hyp_mode() to
configure page tables for HYP. This function also registers the CPU
hotplug and lower power notifiers that cause HYP to be re-initialised
after the CPU has been reset.

To avoid losing the register state that controls stage2 translation, move
the registering of these notifiers into init_subsystems(), and add a
is_kernel_in_hyp_mode() path to each callback.

Fixes: 1e947bad0b6 ("arm64: KVM: Skip HYP setup when already running in HYP")
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm/kvm/arm.c | 52 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6accd66d26f0..b5384311dec4 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1061,15 +1061,27 @@ static void cpu_init_hyp_mode(void *dummy)
 	kvm_arm_init_debug();
 }
 
+static void cpu_hyp_reinit(void)
+{
+	if (is_kernel_in_hyp_mode()) {
+		/*
+		 * cpu_init_stage2() is safe to call even if the PM
+		 * event was cancelled before the CPU was reset.
+		 */
+		cpu_init_stage2(NULL);
+	} else {
+		if (__hyp_get_vectors() == hyp_default_vectors)
+			cpu_init_hyp_mode(NULL);
+	}
+}
+
 static int hyp_init_cpu_notify(struct notifier_block *self,
 			       unsigned long action, void *cpu)
 {
 	switch (action) {
 	case CPU_STARTING:
 	case CPU_STARTING_FROZEN:
-		if (__hyp_get_vectors() == hyp_default_vectors)
-			cpu_init_hyp_mode(NULL);
-		break;
+		cpu_hyp_reinit();
 	}
 
 	return NOTIFY_OK;
@@ -1084,9 +1096,8 @@ static int hyp_init_cpu_pm_notifier(struct notifier_block *self,
 				    unsigned long cmd,
 				    void *v)
 {
-	if (cmd == CPU_PM_EXIT &&
-	    __hyp_get_vectors() == hyp_default_vectors) {
-		cpu_init_hyp_mode(NULL);
+	if (cmd == CPU_PM_EXIT) {
+		cpu_hyp_reinit();
 		return NOTIFY_OK;
 	}
 
@@ -1128,6 +1139,22 @@ static int init_subsystems(void)
 	int err;
 
 	/*
+	 * Register CPU Hotplug notifier
+	 */
+	cpu_notifier_register_begin();
+	err = __register_cpu_notifier(&hyp_init_cpu_nb);
+	cpu_notifier_register_done();
+	if (err) {
+		kvm_err("Cannot register KVM init CPU notifier (%d)\n", err);
+		return err;
+	}
+
+	/*
+	 * Register CPU lower-power notifier
+	 */
+	hyp_cpu_pm_init();
+
+	/*
 	 * Init HYP view of VGIC
 	 */
 	err = kvm_vgic_hyp_init();
@@ -1270,19 +1297,6 @@ static int init_hyp_mode(void)
 	free_boot_hyp_pgd();
 #endif
 
-	cpu_notifier_register_begin();
-
-	err = __register_cpu_notifier(&hyp_init_cpu_nb);
-
-	cpu_notifier_register_done();
-
-	if (err) {
-		kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
-		goto out_err;
-	}
-
-	hyp_cpu_pm_init();
-
 	/* set size of VMID supported by CPU */
 	kvm_vmid_bits = kvm_get_vmid_bits();
 	kvm_info("%d-bit VMID\n", kvm_vmid_bits);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 02/16] arm64: Fold proc-macros.S into assembler.h
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
  2016-04-01 16:53 ` [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-18 16:11   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 03/16] arm64: Cleanup SCTLR flags James Morse
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: Geoff Levand <geoff@infradead.org>

To allow the assembler macros defined in arch/arm64/mm/proc-macros.S to
be used outside the mm code move the contents of proc-macros.S to
asm/assembler.h.  Also, delete proc-macros.S, and fix up all references
to proc-macros.S.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
[rebased, included dcache_by_line_op]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/assembler.h | 82 ++++++++++++++++++++++++++++++-
 arch/arm64/mm/cache.S              |  2 -
 arch/arm64/mm/proc-macros.S        | 98 --------------------------------------
 arch/arm64/mm/proc.S               |  3 --
 4 files changed, 81 insertions(+), 104 deletions(-)
 delete mode 100644 arch/arm64/mm/proc-macros.S

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 70f7b9e04598..c72474c2d4dc 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -1,5 +1,5 @@
 /*
- * Based on arch/arm/include/asm/assembler.h
+ * Based on arch/arm/include/asm/assembler.h, arch/arm/mm/proc-macros.S
  *
  * Copyright (C) 1996-2000 Russell King
  * Copyright (C) 2012 ARM Ltd.
@@ -23,6 +23,8 @@
 #ifndef __ASM_ASSEMBLER_H
 #define __ASM_ASSEMBLER_H
 
+#include <asm/asm-offsets.h>
+#include <asm/pgtable-hwdef.h>
 #include <asm/ptrace.h>
 #include <asm/thread_info.h>
 
@@ -212,6 +214,84 @@ lr	.req	x30		// link register
 	.endm
 
 /*
+ * vma_vm_mm - get mm pointer from vma pointer (vma->vm_mm)
+ */
+	.macro	vma_vm_mm, rd, rn
+	ldr	\rd, [\rn, #VMA_VM_MM]
+	.endm
+
+/*
+ * mmid - get context id from mm pointer (mm->context.id)
+ */
+	.macro	mmid, rd, rn
+	ldr	\rd, [\rn, #MM_CONTEXT_ID]
+	.endm
+
+/*
+ * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ */
+	.macro	dcache_line_size, reg, tmp
+	mrs	\tmp, ctr_el0			// read CTR
+	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding
+	mov	\reg, #4			// bytes per word
+	lsl	\reg, \reg, \tmp		// actual cache line size
+	.endm
+
+/*
+ * icache_line_size - get the minimum I-cache line size from the CTR register.
+ */
+	.macro	icache_line_size, reg, tmp
+	mrs	\tmp, ctr_el0			// read CTR
+	and	\tmp, \tmp, #0xf		// cache line size encoding
+	mov	\reg, #4			// bytes per word
+	lsl	\reg, \reg, \tmp		// actual cache line size
+	.endm
+
+/*
+ * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
+ */
+	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
+#ifndef CONFIG_ARM64_VA_BITS_48
+	ldr_l	\tmpreg, idmap_t0sz
+	bfi	\valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
+#endif
+	.endm
+
+/*
+ * Macro to perform a data cache maintenance for the interval
+ * [kaddr, kaddr + size)
+ *
+ * 	op:		operation passed to dc instruction
+ * 	domain:		domain used in dsb instruciton
+ * 	kaddr:		starting virtual address of the region
+ * 	size:		size of the region
+ * 	Corrupts:	kaddr, size, tmp1, tmp2
+ */
+	.macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
+	dcache_line_size \tmp1, \tmp2
+	add	\size, \kaddr, \size
+	sub	\tmp2, \tmp1, #1
+	bic	\kaddr, \kaddr, \tmp2
+9998:	dc	\op, \kaddr
+	add	\kaddr, \kaddr, \tmp1
+	cmp	\kaddr, \size
+	b.lo	9998b
+	dsb	\domain
+	.endm
+
+/*
+ * reset_pmuserenr_el0 - reset PMUSERENR_EL0 if PMUv3 present
+ */
+	.macro	reset_pmuserenr_el0, tmpreg
+	mrs	\tmpreg, id_aa64dfr0_el1	// Check ID_AA64DFR0_EL1 PMUVer
+	sbfx	\tmpreg, \tmpreg, #8, #4
+	cmp	\tmpreg, #1			// Skip if no PMU present
+	b.lt	9000f
+	msr	pmuserenr_el0, xzr		// Disable PMU access from EL0
+9000:
+	.endm
+
+/*
  * Annotate a function as position independent, i.e., safe to be called before
  * the kernel virtual mapping is activated.
  */
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 6df07069a025..50ff9ba3a236 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -24,8 +24,6 @@
 #include <asm/cpufeature.h>
 #include <asm/alternative.h>
 
-#include "proc-macros.S"
-
 /*
  *	flush_icache_range(start,end)
  *
diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
deleted file mode 100644
index e6a30e1268a8..000000000000
--- a/arch/arm64/mm/proc-macros.S
+++ /dev/null
@@ -1,98 +0,0 @@
-/*
- * Based on arch/arm/mm/proc-macros.S
- *
- * Copyright (C) 2012 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include <asm/asm-offsets.h>
-#include <asm/thread_info.h>
-
-/*
- * vma_vm_mm - get mm pointer from vma pointer (vma->vm_mm)
- */
-	.macro	vma_vm_mm, rd, rn
-	ldr	\rd, [\rn, #VMA_VM_MM]
-	.endm
-
-/*
- * mmid - get context id from mm pointer (mm->context.id)
- */
-	.macro	mmid, rd, rn
-	ldr	\rd, [\rn, #MM_CONTEXT_ID]
-	.endm
-
-/*
- * dcache_line_size - get the minimum D-cache line size from the CTR register.
- */
-	.macro	dcache_line_size, reg, tmp
-	mrs	\tmp, ctr_el0			// read CTR
-	ubfm	\tmp, \tmp, #16, #19		// cache line size encoding
-	mov	\reg, #4			// bytes per word
-	lsl	\reg, \reg, \tmp		// actual cache line size
-	.endm
-
-/*
- * icache_line_size - get the minimum I-cache line size from the CTR register.
- */
-	.macro	icache_line_size, reg, tmp
-	mrs	\tmp, ctr_el0			// read CTR
-	and	\tmp, \tmp, #0xf		// cache line size encoding
-	mov	\reg, #4			// bytes per word
-	lsl	\reg, \reg, \tmp		// actual cache line size
-	.endm
-
-/*
- * tcr_set_idmap_t0sz - update TCR.T0SZ so that we can load the ID map
- */
-	.macro	tcr_set_idmap_t0sz, valreg, tmpreg
-#ifndef CONFIG_ARM64_VA_BITS_48
-	ldr_l	\tmpreg, idmap_t0sz
-	bfi	\valreg, \tmpreg, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
-#endif
-	.endm
-
-/*
- * Macro to perform a data cache maintenance for the interval
- * [kaddr, kaddr + size)
- *
- * 	op:		operation passed to dc instruction
- * 	domain:		domain used in dsb instruciton
- * 	kaddr:		starting virtual address of the region
- * 	size:		size of the region
- * 	Corrupts: 	kaddr, size, tmp1, tmp2
- */
-	.macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
-	dcache_line_size \tmp1, \tmp2
-	add	\size, \kaddr, \size
-	sub	\tmp2, \tmp1, #1
-	bic	\kaddr, \kaddr, \tmp2
-9998:	dc	\op, \kaddr
-	add	\kaddr, \kaddr, \tmp1
-	cmp	\kaddr, \size
-	b.lo	9998b
-	dsb	\domain
-	.endm
-
-/*
- * reset_pmuserenr_el0 - reset PMUSERENR_EL0 if PMUv3 present
- */
-	.macro	reset_pmuserenr_el0, tmpreg
-	mrs	\tmpreg, id_aa64dfr0_el1	// Check ID_AA64DFR0_EL1 PMUVer
-	sbfx	\tmpreg, \tmpreg, #8, #4
-	cmp	\tmpreg, #1			// Skip if no PMU present
-	b.lt	9000f
-	msr	pmuserenr_el0, xzr		// Disable PMU access from EL0
-9000:
-	.endm
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 543f5198005a..9f6deacf41d2 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -23,13 +23,10 @@
 #include <asm/assembler.h>
 #include <asm/asm-offsets.h>
 #include <asm/hwcap.h>
-#include <asm/pgtable-hwdef.h>
 #include <asm/pgtable.h>
 #include <asm/cpufeature.h>
 #include <asm/alternative.h>
 
-#include "proc-macros.S"
-
 #ifdef CONFIG_ARM64_64K_PAGES
 #define TCR_TG_FLAGS	TCR_TG0_64K | TCR_TG1_64K
 #elif defined(CONFIG_ARM64_16K_PAGES)
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 03/16] arm64: Cleanup SCTLR flags
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
  2016-04-01 16:53 ` [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP James Morse
  2016-04-01 16:53 ` [PATCH v7 02/16] arm64: Fold proc-macros.S into assembler.h James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-19 14:44   ` Marc Zyngier
  2016-04-01 16:53 ` [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file James Morse
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: Geoff Levand <geoff@infradead.org>

We currently have macros defining flags for the arm64 sctlr registers in
both kvm_arm.h and sysreg.h.  To clean things up and simplify move the
definitions of the SCTLR_EL2 flags from kvm_arm.h to sysreg.h, rename any
SCTLR_EL1 or SCTLR_EL2 flags that are common to both registers to be
SCTLR_ELx, with 'x' indicating a common flag, and fixup all files to
include the proper header or to use the new macro names.

Signed-off-by: Geoff Levand <geoff@infradead.org>
[Restored pgtable-hwdef.h include]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h | 11 -----------
 arch/arm64/include/asm/sysreg.h  | 19 +++++++++++++++----
 arch/arm64/kvm/hyp-init.S        |  5 +++--
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 0e391dbfc420..6f8e62a65c9a 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -84,17 +84,6 @@
 #define HCR_INT_OVERRIDE   (HCR_FMO | HCR_IMO)
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
-/* Hyp System Control Register (SCTLR_EL2) bits */
-#define SCTLR_EL2_EE	(1 << 25)
-#define SCTLR_EL2_WXN	(1 << 19)
-#define SCTLR_EL2_I	(1 << 12)
-#define SCTLR_EL2_SA	(1 << 3)
-#define SCTLR_EL2_C	(1 << 2)
-#define SCTLR_EL2_A	(1 << 1)
-#define SCTLR_EL2_M	1
-#define SCTLR_EL2_FLAGS	(SCTLR_EL2_M | SCTLR_EL2_A | SCTLR_EL2_C |	\
-			 SCTLR_EL2_SA | SCTLR_EL2_I)
-
 /* TCR_EL2 Registers bits */
 #define TCR_EL2_RES1	((1 << 31) | (1 << 23))
 #define TCR_EL2_TBI	(1 << 20)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 1a78d6e2a78b..0961a24e8d48 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -86,10 +86,21 @@
 #define SET_PSTATE_UAO(x) __inst_arm(0xd5000000 | REG_PSTATE_UAO_IMM |\
 				     (!!x)<<8 | 0x1f)
 
-/* SCTLR_EL1 */
-#define SCTLR_EL1_CP15BEN	(0x1 << 5)
-#define SCTLR_EL1_SED		(0x1 << 8)
-#define SCTLR_EL1_SPAN		(0x1 << 23)
+/* Common SCTLR_ELx flags. */
+#define SCTLR_ELx_EE    (1 << 25)
+#define SCTLR_ELx_I	(1 << 12)
+#define SCTLR_ELx_SA	(1 << 3)
+#define SCTLR_ELx_C	(1 << 2)
+#define SCTLR_ELx_A	(1 << 1)
+#define SCTLR_ELx_M	1
+
+#define SCTLR_ELx_FLAGS	(SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \
+			 SCTLR_ELx_SA | SCTLR_ELx_I)
+
+/* SCTLR_EL1 specific flags. */
+#define SCTLR_EL1_SPAN		(1 << 23)
+#define SCTLR_EL1_SED		(1 << 8)
+#define SCTLR_EL1_CP15BEN	(1 << 5)
 
 
 /* id_aa64isar0 */
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index 7d8747c6427c..5ce1b47ef770 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -21,6 +21,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/pgtable-hwdef.h>
+#include <asm/sysreg.h>
 
 	.text
 	.pushsection	.hyp.idmap.text, "ax"
@@ -103,8 +104,8 @@ __do_hyp_init:
 	dsb	sy
 
 	mrs	x4, sctlr_el2
-	and	x4, x4, #SCTLR_EL2_EE	// preserve endianness of EL2
-	ldr	x5, =SCTLR_EL2_FLAGS
+	and	x4, x4, #SCTLR_ELx_EE	// preserve endianness of EL2
+	ldr	x5, =SCTLR_ELx_FLAGS
 	orr	x4, x4, x5
 	msr	sctlr_el2, x4
 	isb
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (2 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 03/16] arm64: Cleanup SCTLR flags James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-19 15:02   ` Marc Zyngier
  2016-04-01 16:53 ` [PATCH v7 05/16] arm64: kvm: Move lr save/restore from do_el2_call into EL1 James Morse
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

The hyp-stub could make use of the do_el2_call(), move it to a header file.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/virt.h  | 18 +++++++++++++++++-
 arch/arm64/kvm/hyp/hyp-entry.S | 17 +----------------
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 9f22dd607958..b8fdddeca71b 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -21,7 +21,23 @@
 #define BOOT_CPU_MODE_EL1	(0xe11)
 #define BOOT_CPU_MODE_EL2	(0xe12)
 
-#ifndef __ASSEMBLY__
+#ifdef __ASSEMBLY__
+.macro do_el2_call
+	/*
+	 * Shuffle the parameters before calling the function
+	 * pointed to in x0. Assumes parameters in x[1,2,3].
+	 */
+	sub	sp, sp, #16
+	str	lr, [sp]
+	mov	lr, x0
+	mov	x0, x1
+	mov	x1, x2
+	mov	x2, x3
+	blr	lr
+	ldr	lr, [sp]
+	add	sp, sp, #16
+.endm
+#else
 
 #include <asm/ptrace.h>
 
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 3488894397ff..358f27a1edc2 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -23,6 +23,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/virt.h>
 
 	.text
 	.pushsection	.hyp.text, "ax"
@@ -37,22 +38,6 @@
 	ldp	x0, x1, [sp], #16
 .endm
 
-.macro do_el2_call
-	/*
-	 * Shuffle the parameters before calling the function
-	 * pointed to in x0. Assumes parameters in x[1,2,3].
-	 */
-	sub	sp, sp, #16
-	str	lr, [sp]
-	mov	lr, x0
-	mov	x0, x1
-	mov	x1, x2
-	mov	x2, x3
-	blr	lr
-	ldr	lr, [sp]
-	add	sp, sp, #16
-.endm
-
 ENTRY(__vhe_hyp_call)
 	do_el2_call
 	/*
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 05/16] arm64: kvm: Move lr save/restore from do_el2_call into EL1
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (3 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-19 15:11   ` Marc Zyngier
  2016-04-01 16:53 ` [PATCH v7 06/16] arm64: hyp/kvm: Extend hyp-stub API to allow function calls at EL2 James Morse
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Today the 'hvc' calling kvm or the hyp-stub is expected to preserve all
registers. Kvm saves/restores the registers it needs on the EL2 stack using
do_el2_call. The hyp-stub has no stack, later patches need at least one
register they can use.

Allow theses calls to clobber the link register, and add code to
save/restore this register at the call sites.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/virt.h  |  4 ----
 arch/arm64/kernel/hyp-stub.S   | 10 ++++++++--
 arch/arm64/kvm/hyp.S           |  7 ++++++-
 arch/arm64/kvm/hyp/hyp-entry.S |  2 ++
 4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b8fdddeca71b..23e9c6f68b8c 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -27,15 +27,11 @@
 	 * Shuffle the parameters before calling the function
 	 * pointed to in x0. Assumes parameters in x[1,2,3].
 	 */
-	sub	sp, sp, #16
-	str	lr, [sp]
 	mov	lr, x0
 	mov	x0, x1
 	mov	x1, x2
 	mov	x2, x3
 	blr	lr
-	ldr	lr, [sp]
-	add	sp, sp, #16
 .endm
 #else
 
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index a272f335c289..7eab8acbbbd9 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -101,10 +101,16 @@ ENDPROC(\label)
  */
 
 ENTRY(__hyp_get_vectors)
+	str	lr, [sp, #-16]!
 	mov	x0, xzr
-	// fall through
-ENTRY(__hyp_set_vectors)
 	hvc	#0
+	ldr	lr, [sp], #16
 	ret
 ENDPROC(__hyp_get_vectors)
+
+ENTRY(__hyp_set_vectors)
+	str	lr, [sp, #-16]!
+	hvc	#0
+	ldr	lr, [sp], #16
+	ret
 ENDPROC(__hyp_set_vectors)
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 48f19a37b3df..4ee5612f43ea 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -38,13 +38,18 @@
  * A function pointer with a value of 0 has a special meaning, and is
  * used to implement __hyp_get_vectors in the same way as in
  * arch/arm64/kernel/hyp_stub.S.
+ * HVC behaves as a 'bl' call and will clobber lr.
  */
 ENTRY(__kvm_call_hyp)
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN	
+alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
+	str     lr, [sp, #-16]!
 	hvc	#0
+	ldr     lr, [sp], #16
 	ret
 alternative_else
 	b	__vhe_hyp_call
 	nop
+	nop
+	nop
 alternative_endif
 ENDPROC(__kvm_call_hyp)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 358f27a1edc2..2cee779ce910 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -39,7 +39,9 @@
 .endm
 
 ENTRY(__vhe_hyp_call)
+	str	lr, [sp, #-16]!
 	do_el2_call
+	ldr	lr, [sp], #16
 	/*
 	 * We used to rely on having an exception return to get
 	 * an implicit isb. In the E2H case, we don't have it anymore.
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 06/16] arm64: hyp/kvm: Extend hyp-stub API to allow function calls at EL2
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (4 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 05/16] arm64: kvm: Move lr save/restore from do_el2_call into EL1 James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-19 15:22   ` Marc Zyngier
  2016-04-01 16:53 ` [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug James Morse
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: Geoff Levand <geoff@infradead.org>

The existing arm64 hcall implementations are limited in that they only
allow for two distinct hcalls; with the x0 register either zero or not
zero.  Also, the API of the hyp-stub exception vector routines and the
KVM exception vector routines differ; hyp-stub uses a non-zero value in
x0 to implement __hyp_set_vectors, whereas KVM uses it to implement
kvm_call_hyp.

To allow for additional hcalls to be defined and to make the arm64 hcall
API more consistent across exception vector routines, change the hcall
implementations to reserve all x0 values below PAGE_SIZE for hcalls such
as {s,g}et_vectors().

Define two new preprocessor macros HVC_GET_VECTORS, and HVC_SET_VECTORS
to be used as hcall type specifiers and convert the existing
__hyp_get_vectors() and __hyp_set_vectors() routines to use these new
macros when executing an HVC call.  Also, change the corresponding
hyp-stub and KVM el1_sync exception vector routines to use these new
macros.

Signed-off-by: Geoff Levand <geoff@infradead.org>
[Merged two hcall patches, moved immediate value from esr to x0, use lr
 as a scratch register]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/virt.h  | 16 ++++++++++++++++
 arch/arm64/kernel/hyp-stub.S   | 33 +++++++++++++++++++++++----------
 arch/arm64/kvm/hyp.S           |  4 ++--
 arch/arm64/kvm/hyp/hyp-entry.S |  4 ++--
 4 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 23e9c6f68b8c..d406d2ebe80d 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -18,6 +18,22 @@
 #ifndef __ASM__VIRT_H
 #define __ASM__VIRT_H
 
+/*
+ * The arm64 hcall implementation uses x0 to specify the hcall type. A value
+ * less than PAGE_SIZE indicates a special hcall, such as get/set vector.
+ * Any other value is used as a pointer to the function to call.
+ */
+
+/* HVC_GET_VECTORS - Return the value of the vbar_el2 register. */
+#define HVC_GET_VECTORS 0
+
+/*
+ * HVC_SET_VECTORS - Set the value of the vbar_el2 register.
+ *
+ * @x0: Physical address of the new vector table.
+ */
+#define HVC_SET_VECTORS 1
+
 #define BOOT_CPU_MODE_EL1	(0xe11)
 #define BOOT_CPU_MODE_EL2	(0xe12)
 
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 7eab8acbbbd9..6bba25cf7d15 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -22,6 +22,7 @@
 #include <linux/irqchip/arm-gic-v3.h>
 
 #include <asm/assembler.h>
+#include <asm/kvm_arm.h>
 #include <asm/ptrace.h>
 #include <asm/virt.h>
 
@@ -53,15 +54,25 @@ ENDPROC(__hyp_stub_vectors)
 	.align 11
 
 el1_sync:
-	mrs	x1, esr_el2
-	lsr	x1, x1, #26
-	cmp	x1, #0x16
-	b.ne	2f				// Not an HVC trap
-	cbz	x0, 1f
-	msr	vbar_el2, x0			// Set vbar_el2
-	b	2f
-1:	mrs	x0, vbar_el2			// Return vbar_el2
-2:	eret
+	mrs	x30, esr_el2
+	lsr	x30, x30, #ESR_ELx_EC_SHIFT
+
+	cmp	x30, #ESR_ELx_EC_HVC64
+	b.ne	9f				// Not an HVC trap
+
+	cmp	x0, #HVC_GET_VECTORS
+	b.ne	1f
+	mrs	x0, vbar_el2
+	b	9f
+
+1:	cmp	x0, #HVC_SET_VECTORS
+	b.ne	2f
+	msr	vbar_el2, x1
+	b	9f
+
+2:	do_el2_call
+
+9:	eret
 ENDPROC(el1_sync)
 
 .macro invalid_vector	label
@@ -102,7 +113,7 @@ ENDPROC(\label)
 
 ENTRY(__hyp_get_vectors)
 	str	lr, [sp, #-16]!
-	mov	x0, xzr
+	mov	x0, #HVC_GET_VECTORS
 	hvc	#0
 	ldr	lr, [sp], #16
 	ret
@@ -110,6 +121,8 @@ ENDPROC(__hyp_get_vectors)
 
 ENTRY(__hyp_set_vectors)
 	str	lr, [sp, #-16]!
+	mov	x1, x0
+	mov	x0, #HVC_SET_VECTORS
 	hvc	#0
 	ldr	lr, [sp], #16
 	ret
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 4ee5612f43ea..ba344942f0a8 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -35,8 +35,8 @@
  * in Hyp mode (see init_hyp_mode in arch/arm/kvm/arm.c).  Return values are
  * passed in x0.
  *
- * A function pointer with a value of 0 has a special meaning, and is
- * used to implement __hyp_get_vectors in the same way as in
+ * A function pointer with a value less than PAGE_SIZE has a special meaning,
+ * and is used to implement __hyp_get_vectors in the same way as in
  * arch/arm64/kernel/hyp_stub.S.
  * HVC behaves as a 'bl' call and will clobber lr.
  */
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 2cee779ce910..c8ad21c78d3f 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -71,8 +71,8 @@ alternative_endif
 	/* Here, we're pretty sure the host called HVC. */
 	restore_x0_to_x3
 
-	/* Check for __hyp_get_vectors */
-	cbnz	x0, 1f
+	cmp	x0, #HVC_GET_VECTORS
+	b.ne	1f
 	mrs	x0, vbar_el2
 	b	2f
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (5 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 06/16] arm64: hyp/kvm: Extend hyp-stub API to allow function calls at EL2 James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-19 16:03   ` Marc Zyngier
  2016-04-01 16:53 ` [PATCH v7 08/16] arm64: kernel: Rework finisher callback out of __cpu_suspend_enter() James Morse
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: AKASHI Takahiro <takahiro.akashi@linaro.org>

The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents kexec from rebooting the system at EL2.

This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need the arm64 specific cpu hotplug hook any more.

Since this patch modifies common code between arm and arm64, one stub
definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compilation errors.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[Rebase, added separate VHE init/exit path, changed resets use of
 kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
 added icache maintenance to __kvm_hyp_reset() and removed lr restore]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  10 ++-
 arch/arm/include/asm/kvm_mmu.h    |   1 +
 arch/arm/kvm/arm.c                | 128 +++++++++++++++++++++++---------------
 arch/arm/kvm/mmu.c                |   5 ++
 arch/arm64/include/asm/kvm_asm.h  |   1 +
 arch/arm64/include/asm/kvm_host.h |  13 +++-
 arch/arm64/include/asm/kvm_mmu.h  |   1 +
 arch/arm64/kvm/hyp-init.S         |  38 +++++++++++
 arch/arm64/kvm/reset.c            |  14 +++++
 9 files changed, 158 insertions(+), 53 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 385070180c25..738d5eee91de 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -265,6 +265,15 @@ static inline void __cpu_init_stage2(void)
 	kvm_call_hyp(__init_stage2_translation);
 }
 
+static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
+					phys_addr_t phys_idmap_start)
+{
+	/*
+	 * TODO
+	 * kvm_call_reset(boot_pgd_ptr, phys_idmap_start);
+	 */
+}
+
 static inline int kvm_arch_dev_ioctl_check_extension(long ext)
 {
 	return 0;
@@ -277,7 +286,6 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
-static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index da44be9db4fa..f17a8d41822c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -66,6 +66,7 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
+phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index b5384311dec4..962904a443be 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -16,7 +16,6 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
-#include <linux/cpu.h>
 #include <linux/cpu_pm.h>
 #include <linux/errno.h>
 #include <linux/err.h>
@@ -66,6 +65,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);
 
 static bool vgic_present;
 
+static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
+
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 {
 	BUG_ON(preemptible());
@@ -90,11 +91,6 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
 	return &kvm_arm_running_vcpu;
 }
 
-int kvm_arch_hardware_enable(void)
-{
-	return 0;
-}
-
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
@@ -591,7 +587,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		/*
 		 * Re-check atomic conditions
 		 */
-		if (signal_pending(current)) {
+		if (unlikely(!__this_cpu_read(kvm_arm_hardware_enabled))) {
+			/* cpu has been torn down */
+			ret = 0;
+			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
+			run->fail_entry.hardware_entry_failure_reason
+					= (u64)-ENOEXEC;
+		} else if (signal_pending(current)) {
 			ret = -EINTR;
 			run->exit_reason = KVM_EXIT_INTR;
 		}
@@ -1033,11 +1035,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	}
 }
 
-static void cpu_init_stage2(void *dummy)
-{
-	__cpu_init_stage2();
-}
-
 static void cpu_init_hyp_mode(void *dummy)
 {
 	phys_addr_t boot_pgd_ptr;
@@ -1065,43 +1062,87 @@ static void cpu_hyp_reinit(void)
 {
 	if (is_kernel_in_hyp_mode()) {
 		/*
-		 * cpu_init_stage2() is safe to call even if the PM
+		 * __cpu_init_stage2() is safe to call even if the PM
 		 * event was cancelled before the CPU was reset.
 		 */
-		cpu_init_stage2(NULL);
+		__cpu_init_stage2();
 	} else {
 		if (__hyp_get_vectors() == hyp_default_vectors)
 			cpu_init_hyp_mode(NULL);
 	}
 }
 
-static int hyp_init_cpu_notify(struct notifier_block *self,
-			       unsigned long action, void *cpu)
+static void cpu_hyp_reset(void)
 {
-	switch (action) {
-	case CPU_STARTING:
-	case CPU_STARTING_FROZEN:
+	phys_addr_t boot_pgd_ptr;
+	phys_addr_t phys_idmap_start;
+
+	if (!is_kernel_in_hyp_mode()) {
+		boot_pgd_ptr = kvm_mmu_get_boot_httbr();
+		phys_idmap_start = kvm_get_idmap_start();
+
+		__cpu_reset_hyp_mode(boot_pgd_ptr, phys_idmap_start);
+	}
+}
+
+static void _kvm_arch_hardware_enable(void *discard)
+{
+	if (!__this_cpu_read(kvm_arm_hardware_enabled)) {
 		cpu_hyp_reinit();
+		__this_cpu_write(kvm_arm_hardware_enabled, 1);
 	}
+}
 
-	return NOTIFY_OK;
+int kvm_arch_hardware_enable(void)
+{
+	_kvm_arch_hardware_enable(NULL);
+	return 0;
 }
 
-static struct notifier_block hyp_init_cpu_nb = {
-	.notifier_call = hyp_init_cpu_notify,
-};
+static void _kvm_arch_hardware_disable(void *discard)
+{
+	if (__this_cpu_read(kvm_arm_hardware_enabled)) {
+		cpu_hyp_reset();
+		__this_cpu_write(kvm_arm_hardware_enabled, 0);
+	}
+}
+
+void kvm_arch_hardware_disable(void)
+{
+	_kvm_arch_hardware_disable(NULL);
+}
 
 #ifdef CONFIG_CPU_PM
 static int hyp_init_cpu_pm_notifier(struct notifier_block *self,
 				    unsigned long cmd,
 				    void *v)
 {
-	if (cmd == CPU_PM_EXIT) {
-		cpu_hyp_reinit();
+	/*
+	 * kvm_arm_hardware_enabled is left with its old value over
+	 * PM_ENTER->PM_EXIT. It is used to indicate PM_EXIT should
+	 * re-enable hyp.
+	 */
+	switch (cmd) {
+	case CPU_PM_ENTER:
+		if (__this_cpu_read(kvm_arm_hardware_enabled))
+			/*
+			 * don't update kvm_arm_hardware_enabled here
+			 * so that the hardware will be re-enabled
+			 * when we resume. See below.
+			 */
+			cpu_hyp_reset();
+
 		return NOTIFY_OK;
-	}
+	case CPU_PM_EXIT:
+		if (__this_cpu_read(kvm_arm_hardware_enabled))
+			/* The hardware was enabled before suspend. */
+			cpu_hyp_reinit();
 
-	return NOTIFY_DONE;
+		return NOTIFY_OK;
+
+	default:
+		return NOTIFY_DONE;
+	}
 }
 
 static struct notifier_block hyp_init_cpu_pm_nb = {
@@ -1136,18 +1177,9 @@ static int init_common_resources(void)
 
 static int init_subsystems(void)
 {
-	int err;
+	int err = 0;
 
-	/*
-	 * Register CPU Hotplug notifier
-	 */
-	cpu_notifier_register_begin();
-	err = __register_cpu_notifier(&hyp_init_cpu_nb);
-	cpu_notifier_register_done();
-	if (err) {
-		kvm_err("Cannot register KVM init CPU notifier (%d)\n", err);
-		return err;
-	}
+	on_each_cpu(_kvm_arch_hardware_enable, NULL, 1);
 
 	/*
 	 * Register CPU lower-power notifier
@@ -1165,9 +1197,10 @@ static int init_subsystems(void)
 	case -ENODEV:
 	case -ENXIO:
 		vgic_present = false;
+		err = 0;
 		break;
 	default:
-		return err;
+		goto out;
 	}
 
 	/*
@@ -1175,12 +1208,15 @@ static int init_subsystems(void)
 	 */
 	err = kvm_timer_hyp_init();
 	if (err)
-		return err;
+		goto out;
 
 	kvm_perf_init();
 	kvm_coproc_table_init();
 
-	return 0;
+out:
+	on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
+
+	return err;
 }
 
 static void teardown_hyp_mode(void)
@@ -1197,11 +1233,6 @@ static void teardown_hyp_mode(void)
 
 static int init_vhe_mode(void)
 {
-	/*
-	 * Execute the init code on each CPU.
-	 */
-	on_each_cpu(cpu_init_stage2, NULL, 1);
-
 	/* set size of VMID supported by CPU */
 	kvm_vmid_bits = kvm_get_vmid_bits();
 	kvm_info("%d-bit VMID\n", kvm_vmid_bits);
@@ -1288,11 +1319,6 @@ static int init_hyp_mode(void)
 		}
 	}
 
-	/*
-	 * Execute the init code on each CPU.
-	 */
-	on_each_cpu(cpu_init_hyp_mode, NULL, 1);
-
 #ifndef CONFIG_HOTPLUG_CPU
 	free_boot_hyp_pgd();
 #endif
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 58dbd5c439df..7d86e15b5d85 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1666,6 +1666,11 @@ phys_addr_t kvm_get_idmap_vector(void)
 	return hyp_idmap_vector;
 }
 
+phys_addr_t kvm_get_idmap_start(void)
+{
+	return hyp_idmap_start;
+}
+
 int kvm_mmu_init(void)
 {
 	int err;
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index eb7490d232a0..ebc8d0ee1901 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -40,6 +40,7 @@ struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
+extern char __kvm_hyp_reset[];
 
 extern char __kvm_hyp_vector[];
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 227ed475dbd3..3d991fa5c0d8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -47,6 +47,7 @@
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 int kvm_arch_dev_ioctl_check_extension(long ext);
+phys_addr_t kvm_hyp_reset_entry(void);
 
 struct kvm_arch {
 	/* The VMID generation used for the virt. memory system */
@@ -353,7 +354,17 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 		       hyp_stack_ptr, vector_ptr);
 }
 
-static inline void kvm_arch_hardware_disable(void) {}
+static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
+					phys_addr_t phys_idmap_start)
+{
+	/*
+	 * Call reset code, and switch back to stub hyp vectors.
+	 * Uses __kvm_call_hyp() to avoid kaslr's kvm_ksym_ref() translation.
+	 */
+	__kvm_call_hyp((void *)kvm_hyp_reset_entry(),
+		       boot_pgd_ptr, phys_idmap_start);
+}
+
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 22732a5e3119..e8d39d4f86b6 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -109,6 +109,7 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_mmu_get_boot_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
+phys_addr_t kvm_get_idmap_start(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
 
diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
index 5ce1b47ef770..44ec4cb23ae7 100644
--- a/arch/arm64/kvm/hyp-init.S
+++ b/arch/arm64/kvm/hyp-init.S
@@ -139,6 +139,44 @@ merged:
 	eret
 ENDPROC(__kvm_hyp_init)
 
+	/*
+	 * x0: HYP boot pgd
+	 * x1: HYP phys_idmap_start
+	 */
+ENTRY(__kvm_hyp_reset)
+	/* We're in trampoline code in VA, switch back to boot page tables */
+	msr	ttbr0_el2, x0
+	isb
+
+	/* Ensure the PA branch doesn't find a stale tlb entry or stale code. */
+	ic	iallu
+	tlbi	alle2
+	dsb	sy
+	isb
+
+	/* Branch into PA space */
+	adr	x0, 1f
+	bfi	x1, x0, #0, #PAGE_SHIFT
+	br	x1
+
+	/* We're now in idmap, disable MMU */
+1:	mrs	x0, sctlr_el2
+	ldr	x1, =SCTLR_ELx_FLAGS
+	bic	x0, x0, x1		// Clear SCTL_M and etc
+	msr	sctlr_el2, x0
+	isb
+
+	/* Invalidate the old TLBs */
+	tlbi	alle2
+	dsb	sy
+
+	/* Install stub vectors */
+	adr_l	x0, __hyp_stub_vectors
+	msr	vbar_el2, x0
+
+	eret
+ENDPROC(__kvm_hyp_reset)
+
 	.ltorg
 
 	.popsection
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 9677bf069bcc..4062e6dd4cc1 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -29,7 +29,9 @@
 #include <asm/cputype.h>
 #include <asm/ptrace.h>
 #include <asm/kvm_arm.h>
+#include <asm/kvm_asm.h>
 #include <asm/kvm_coproc.h>
+#include <asm/kvm_mmu.h>
 
 /*
  * ARMv8 Reset Values
@@ -130,3 +132,15 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	/* Reset timer */
 	return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 }
+
+extern char __hyp_idmap_text_start[];
+
+phys_addr_t kvm_hyp_reset_entry(void)
+{
+	unsigned long offset;
+
+	offset = (unsigned long)__kvm_hyp_reset
+		 - ((unsigned long)__hyp_idmap_text_start & PAGE_MASK);
+
+	return TRAMPOLINE_VA + offset;
+}
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 08/16] arm64: kernel: Rework finisher callback out of __cpu_suspend_enter().
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (6 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-18 17:20   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 09/16] arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va James Morse
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hibernate could make use of the cpu_suspend() code to save/restore cpu
state, however it needs to be able to return '0' from the 'finisher'.

Rework cpu_suspend() so that the finisher is called from C code,
independently from the save/restore of cpu state. Space to save the context
in is allocated in the caller's stack frame, and passed into
__cpu_suspend_enter().

Hibernate's use of this API will look like a copy of the cpu_suspend()
function.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm64/include/asm/suspend.h | 20 +++++++++
 arch/arm64/kernel/asm-offsets.c  |  2 +
 arch/arm64/kernel/sleep.S        | 93 ++++++++++++++--------------------------
 arch/arm64/kernel/suspend.c      | 66 +++++++++++++++++-----------
 4 files changed, 97 insertions(+), 84 deletions(-)

diff --git a/arch/arm64/include/asm/suspend.h b/arch/arm64/include/asm/suspend.h
index 59a5b0f1e81c..365d8cdd0073 100644
--- a/arch/arm64/include/asm/suspend.h
+++ b/arch/arm64/include/asm/suspend.h
@@ -2,6 +2,7 @@
 #define __ASM_SUSPEND_H
 
 #define NR_CTX_REGS 11
+#define NR_CALLEE_SAVED_REGS 12
 
 /*
  * struct cpu_suspend_ctx must be 16-byte aligned since it is allocated on
@@ -21,6 +22,25 @@ struct sleep_save_sp {
 	phys_addr_t save_ptr_stash_phys;
 };
 
+/*
+ * Memory to save the cpu state is allocated on the stack by
+ * __cpu_suspend_enter()'s caller, and populated by __cpu_suspend_enter().
+ * This data must survive until cpu_resume() is called.
+ *
+ * This struct desribes the size and the layout of the saved cpu state.
+ * The layout of the callee_saved_regs is defined by the implementation
+ * of __cpu_suspend_enter(), and cpu_resume(). This struct must be passed
+ * in by the caller as __cpu_suspend_enter()'s stack-frame is gone once it
+ * returns, and the data would be subsequently corrupted by the call to the
+ * finisher.
+ */
+struct sleep_stack_data {
+	struct cpu_suspend_ctx	system_regs;
+	unsigned long		callee_saved_regs[NR_CALLEE_SAVED_REGS];
+};
+
 extern int cpu_suspend(unsigned long arg, int (*fn)(unsigned long));
 extern void cpu_resume(void);
+int __cpu_suspend_enter(struct sleep_stack_data *state);
+void __cpu_suspend_exit(void);
 #endif
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 3ae6b310ac9b..0902acb7afe5 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -122,6 +122,8 @@ int main(void)
   DEFINE(SLEEP_SAVE_SP_SZ,	sizeof(struct sleep_save_sp));
   DEFINE(SLEEP_SAVE_SP_PHYS,	offsetof(struct sleep_save_sp, save_ptr_stash_phys));
   DEFINE(SLEEP_SAVE_SP_VIRT,	offsetof(struct sleep_save_sp, save_ptr_stash));
+  DEFINE(SLEEP_STACK_DATA_SYSTEM_REGS,	offsetof(struct sleep_stack_data, system_regs));
+  DEFINE(SLEEP_STACK_DATA_CALLEE_REGS,	offsetof(struct sleep_stack_data, callee_saved_regs));
 #endif
   DEFINE(ARM_SMCCC_RES_X0_OFFS,	offsetof(struct arm_smccc_res, a0));
   DEFINE(ARM_SMCCC_RES_X2_OFFS,	offsetof(struct arm_smccc_res, a2));
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index fd10eb663868..17d4d864c96e 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -49,37 +49,30 @@
 	orr	\dst, \dst, \mask		// dst|=(aff3>>rs3)
 	.endm
 /*
- * Save CPU state for a suspend and execute the suspend finisher.
- * On success it will return 0 through cpu_resume - ie through a CPU
- * soft/hard reboot from the reset vector.
- * On failure it returns the suspend finisher return value or force
- * -EOPNOTSUPP if the finisher erroneously returns 0 (the suspend finisher
- * is not allowed to return, if it does this must be considered failure).
- * It saves callee registers, and allocates space on the kernel stack
- * to save the CPU specific registers + some other data for resume.
+ * Save CPU state in the provided sleep_stack_data area, and publish its
+ * location for cpu_resume()'s use in sleep_save_stash.
  *
- *  x0 = suspend finisher argument
- *  x1 = suspend finisher function pointer
+ * cpu_resume() will restore this saved state, and return. Because the
+ * link-register is saved and restored, it will appear to return from this
+ * function. So that the caller can tell the suspend/resume paths apart,
+ * __cpu_suspend_enter() will always return a non-zero value, whereas the
+ * path through cpu_resume() will return 0.
+ *
+ *  x0 = struct sleep_stack_data area
  */
 ENTRY(__cpu_suspend_enter)
-	stp	x29, lr, [sp, #-96]!
-	stp	x19, x20, [sp,#16]
-	stp	x21, x22, [sp,#32]
-	stp	x23, x24, [sp,#48]
-	stp	x25, x26, [sp,#64]
-	stp	x27, x28, [sp,#80]
-	/*
-	 * Stash suspend finisher and its argument in x20 and x19
-	 */
-	mov	x19, x0
-	mov	x20, x1
+	stp	x29, lr, [x0, #SLEEP_STACK_DATA_CALLEE_REGS]
+	stp	x19, x20, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+16]
+	stp	x21, x22, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+32]
+	stp	x23, x24, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+48]
+	stp	x25, x26, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+64]
+	stp	x27, x28, [x0,#SLEEP_STACK_DATA_CALLEE_REGS+80]
+
+	/* save the sp in cpu_suspend_ctx */
 	mov	x2, sp
-	sub	sp, sp, #CPU_SUSPEND_SZ	// allocate cpu_suspend_ctx
-	mov	x0, sp
-	/*
-	 * x0 now points to struct cpu_suspend_ctx allocated on the stack
-	 */
-	str	x2, [x0, #CPU_CTX_SP]
+	str	x2, [x0, #SLEEP_STACK_DATA_SYSTEM_REGS + CPU_CTX_SP]
+
+	/* find the mpidr_hash */
 	ldr	x1, =sleep_save_sp
 	ldr	x1, [x1, #SLEEP_SAVE_SP_VIRT]
 	mrs	x7, mpidr_el1
@@ -93,34 +86,11 @@ ENTRY(__cpu_suspend_enter)
 	ldp	w5, w6, [x9, #(MPIDR_HASH_SHIFTS + 8)]
 	compute_mpidr_hash x8, x3, x4, x5, x6, x7, x10
 	add	x1, x1, x8, lsl #3
+
+	push	x29, lr
 	bl	__cpu_suspend_save
-	/*
-	 * Grab suspend finisher in x20 and its argument in x19
-	 */
-	mov	x0, x19
-	mov	x1, x20
-	/*
-	 * We are ready for power down, fire off the suspend finisher
-	 * in x1, with argument in x0
-	 */
-	blr	x1
-        /*
-	 * Never gets here, unless suspend finisher fails.
-	 * Successful cpu_suspend should return from cpu_resume, returning
-	 * through this code path is considered an error
-	 * If the return value is set to 0 force x0 = -EOPNOTSUPP
-	 * to make sure a proper error condition is propagated
-	 */
-	cmp	x0, #0
-	mov	x3, #-EOPNOTSUPP
-	csel	x0, x3, x0, eq
-	add	sp, sp, #CPU_SUSPEND_SZ	// rewind stack pointer
-	ldp	x19, x20, [sp, #16]
-	ldp	x21, x22, [sp, #32]
-	ldp	x23, x24, [sp, #48]
-	ldp	x25, x26, [sp, #64]
-	ldp	x27, x28, [sp, #80]
-	ldp	x29, lr, [sp], #96
+	pop	x29, lr
+	mov	x0, #1
 	ret
 ENDPROC(__cpu_suspend_enter)
 	.ltorg
@@ -150,12 +120,6 @@ cpu_resume_after_mmu:
 	bl	kasan_unpoison_remaining_stack
 #endif
 	mov	x0, #0			// return zero on success
-	ldp	x19, x20, [sp, #16]
-	ldp	x21, x22, [sp, #32]
-	ldp	x23, x24, [sp, #48]
-	ldp	x25, x26, [sp, #64]
-	ldp	x27, x28, [sp, #80]
-	ldp	x29, lr, [sp], #96
 	ret
 ENDPROC(cpu_resume_after_mmu)
 
@@ -172,6 +136,8 @@ ENTRY(cpu_resume)
         /* x7 contains hash index, let's use it to grab context pointer */
 	ldr_l	x0, sleep_save_sp + SLEEP_SAVE_SP_PHYS
 	ldr	x0, [x0, x7, lsl #3]
+	add	x29, x0, #SLEEP_STACK_DATA_CALLEE_REGS
+	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
 	/* load sp from context */
 	ldr	x2, [x0, #CPU_CTX_SP]
 	/* load physical address of identity map page table in x1 */
@@ -185,5 +151,12 @@ ENTRY(cpu_resume)
 	 * pointer and x1 to contain physical address of 1:1 page tables
 	 */
 	bl	cpu_do_resume		// PC relative jump, MMU off
+	/* Can't access these by physical address once the MMU is on */
+	ldp	x19, x20, [x29, #16]
+	ldp	x21, x22, [x29, #32]
+	ldp	x23, x24, [x29, #48]
+	ldp	x25, x26, [x29, #64]
+	ldp	x27, x28, [x29, #80]
+	ldp	x29, lr, [x29]
 	b	cpu_resume_mmu		// Resume MMU, never returns
 ENDPROC(cpu_resume)
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 66055392f445..6fe46100685a 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -10,22 +10,22 @@
 #include <asm/suspend.h>
 #include <asm/tlbflush.h>
 
-extern int __cpu_suspend_enter(unsigned long arg, int (*fn)(unsigned long));
+
 /*
  * This is called by __cpu_suspend_enter() to save the state, and do whatever
  * flushing is required to ensure that when the CPU goes to sleep we have
  * the necessary data available when the caches are not searched.
  *
- * ptr: CPU context virtual address
+ * ptr: sleep_stack_data containing cpu state virtual address.
  * save_ptr: address of the location where the context physical address
  *           must be saved
  */
-void notrace __cpu_suspend_save(struct cpu_suspend_ctx *ptr,
+void notrace __cpu_suspend_save(struct sleep_stack_data *ptr,
 				phys_addr_t *save_ptr)
 {
 	*save_ptr = virt_to_phys(ptr);
 
-	cpu_do_suspend(ptr);
+	cpu_do_suspend(&ptr->system_regs);
 	/*
 	 * Only flush the context that must be retrieved with the MMU
 	 * off. VA primitives ensure the flush is applied to all
@@ -51,6 +51,30 @@ void __init cpu_suspend_set_dbg_restorer(void (*hw_bp_restore)(void *))
 	hw_breakpoint_restore = hw_bp_restore;
 }
 
+void notrace __cpu_suspend_exit(void)
+{
+	/*
+	 * We are resuming from reset with the idmap active in TTBR0_EL1.
+	 * We must uninstall the idmap and restore the expected MMU
+	 * state before we can possibly return to userspace.
+	 */
+	cpu_uninstall_idmap();
+
+	/*
+	 * Restore per-cpu offset before any kernel
+	 * subsystem relying on it has a chance to run.
+	 */
+	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
+
+	/*
+	 * Restore HW breakpoint registers to sane values
+	 * before debug exceptions are possibly reenabled
+	 * through local_dbg_restore.
+	 */
+	if (hw_breakpoint_restore)
+		hw_breakpoint_restore(NULL);
+}
+
 /*
  * cpu_suspend
  *
@@ -60,8 +84,9 @@ void __init cpu_suspend_set_dbg_restorer(void (*hw_bp_restore)(void *))
  */
 int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 {
-	int ret;
+	int ret = 0;
 	unsigned long flags;
+	struct sleep_stack_data state;
 
 	/*
 	 * From this point debug exceptions are disabled to prevent
@@ -83,28 +108,21 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 	 * page tables, so that the thread address space is properly
 	 * set-up on function return.
 	 */
-	ret = __cpu_suspend_enter(arg, fn);
-	if (ret == 0) {
-		/*
-		 * We are resuming from reset with the idmap active in TTBR0_EL1.
-		 * We must uninstall the idmap and restore the expected MMU
-		 * state before we can possibly return to userspace.
-		 */
-		cpu_uninstall_idmap();
-
-		/*
-		 * Restore per-cpu offset before any kernel
-		 * subsystem relying on it has a chance to run.
-		 */
-		set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
+	if (__cpu_suspend_enter(&state)) {
+		/* Call the suspend finisher */
+		ret = fn(arg);
 
 		/*
-		 * Restore HW breakpoint registers to sane values
-		 * before debug exceptions are possibly reenabled
-		 * through local_dbg_restore.
+		 * Never gets here, unless the suspend finisher fails.
+		 * Successful cpu_suspend() should return from cpu_resume(),
+		 * returning through this code path is considered an error
+		 * If the return value is set to 0 force ret = -EOPNOTSUPP
+		 * to make sure a proper error condition is propagated
 		 */
-		if (hw_breakpoint_restore)
-			hw_breakpoint_restore(NULL);
+		if (!ret)
+			ret = -EOPNOTSUPP;
+	} else {
+		__cpu_suspend_exit();
 	}
 
 	unpause_graph_tracing();
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 09/16] arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (7 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 08/16] arm64: kernel: Rework finisher callback out of __cpu_suspend_enter() James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-20 16:24   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 10/16] arm64: kernel: Include _AC definition in page.h James Morse
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

By enabling the MMU early in cpu_resume(), the sleep_save_sp and stack can
be accessed by VA, which avoids the need to convert-addresses and clean to
PoC on the suspend path.

MMU setup is shared with the boot path, meaning the swapper_pg_dir is
restored directly: ttbr1_el1 is no longer saved/restored.

struct sleep_save_sp is removed, replacing it with a single array of
pointers.

cpu_do_{suspend,resume} could be further reduced to not restore: cpacr_el1,
mdscr_el1, tcr_el1, vbar_el1 and sctlr_el1, all of which are set by
__cpu_setup(). However these values all contain res0 bits that may be used
to enable future features.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm64/include/asm/suspend.h |  9 ++----
 arch/arm64/kernel/asm-offsets.c  |  3 --
 arch/arm64/kernel/head.S         |  2 +-
 arch/arm64/kernel/setup.c        |  1 -
 arch/arm64/kernel/sleep.S        | 63 +++++++++++++++-------------------------
 arch/arm64/kernel/suspend.c      | 38 ++++--------------------
 arch/arm64/mm/proc.S             | 53 ++++++++++++++-------------------
 7 files changed, 55 insertions(+), 114 deletions(-)

diff --git a/arch/arm64/include/asm/suspend.h b/arch/arm64/include/asm/suspend.h
index 365d8cdd0073..29d3c71433e1 100644
--- a/arch/arm64/include/asm/suspend.h
+++ b/arch/arm64/include/asm/suspend.h
@@ -1,7 +1,7 @@
 #ifndef __ASM_SUSPEND_H
 #define __ASM_SUSPEND_H
 
-#define NR_CTX_REGS 11
+#define NR_CTX_REGS 10
 #define NR_CALLEE_SAVED_REGS 12
 
 /*
@@ -17,11 +17,6 @@ struct cpu_suspend_ctx {
 	u64 sp;
 } __aligned(16);
 
-struct sleep_save_sp {
-	phys_addr_t *save_ptr_stash;
-	phys_addr_t save_ptr_stash_phys;
-};
-
 /*
  * Memory to save the cpu state is allocated on the stack by
  * __cpu_suspend_enter()'s caller, and populated by __cpu_suspend_enter().
@@ -39,6 +34,8 @@ struct sleep_stack_data {
 	unsigned long		callee_saved_regs[NR_CALLEE_SAVED_REGS];
 };
 
+extern unsigned long *sleep_save_stash;
+
 extern int cpu_suspend(unsigned long arg, int (*fn)(unsigned long));
 extern void cpu_resume(void);
 int __cpu_suspend_enter(struct sleep_stack_data *state);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0902acb7afe5..ac742ef0fde0 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -119,9 +119,6 @@ int main(void)
   DEFINE(CPU_CTX_SP,		offsetof(struct cpu_suspend_ctx, sp));
   DEFINE(MPIDR_HASH_MASK,	offsetof(struct mpidr_hash, mask));
   DEFINE(MPIDR_HASH_SHIFTS,	offsetof(struct mpidr_hash, shift_aff));
-  DEFINE(SLEEP_SAVE_SP_SZ,	sizeof(struct sleep_save_sp));
-  DEFINE(SLEEP_SAVE_SP_PHYS,	offsetof(struct sleep_save_sp, save_ptr_stash_phys));
-  DEFINE(SLEEP_SAVE_SP_VIRT,	offsetof(struct sleep_save_sp, save_ptr_stash));
   DEFINE(SLEEP_STACK_DATA_SYSTEM_REGS,	offsetof(struct sleep_stack_data, system_regs));
   DEFINE(SLEEP_STACK_DATA_CALLEE_REGS,	offsetof(struct sleep_stack_data, callee_saved_regs));
 #endif
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4203d5f257bc..c024b5e7fc25 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -757,7 +757,7 @@ ENTRY(__early_cpu_boot_status)
  * If it isn't, park the CPU
  */
 	.section	".idmap.text", "ax"
-__enable_mmu:
+ENTRY(__enable_mmu)
 	mrs	x22, sctlr_el1			// preserve old SCTLR_EL1 value
 	mrs	x1, ID_AA64MMFR0_EL1
 	ubfx	x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 9dc67769b6a4..2325d489b8f0 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -175,7 +175,6 @@ static void __init smp_build_mpidr_hash(void)
 	 */
 	if (mpidr_hash_size() > 4 * num_possible_cpus())
 		pr_warn("Large number of MPIDR hash buckets detected\n");
-	__flush_dcache_area(&mpidr_hash, sizeof(struct mpidr_hash));
 }
 
 static void __init setup_machine_fdt(phys_addr_t dt_phys)
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 17d4d864c96e..7916facff5e7 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -73,8 +73,8 @@ ENTRY(__cpu_suspend_enter)
 	str	x2, [x0, #SLEEP_STACK_DATA_SYSTEM_REGS + CPU_CTX_SP]
 
 	/* find the mpidr_hash */
-	ldr	x1, =sleep_save_sp
-	ldr	x1, [x1, #SLEEP_SAVE_SP_VIRT]
+	ldr	x1, =sleep_save_stash
+	ldr	x1, [x1]
 	mrs	x7, mpidr_el1
 	ldr	x9, =mpidr_hash
 	ldr	x10, [x9, #MPIDR_HASH_MASK]
@@ -87,44 +87,26 @@ ENTRY(__cpu_suspend_enter)
 	compute_mpidr_hash x8, x3, x4, x5, x6, x7, x10
 	add	x1, x1, x8, lsl #3
 
+	str	x0, [x1]
+	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
 	push	x29, lr
-	bl	__cpu_suspend_save
+	bl	cpu_do_suspend
 	pop	x29, lr
 	mov	x0, #1
 	ret
 ENDPROC(__cpu_suspend_enter)
 	.ltorg
 
-/*
- * x0 must contain the sctlr value retrieved from restored context
- */
-	.pushsection	".idmap.text", "ax"
-ENTRY(cpu_resume_mmu)
-	ldr	x3, =cpu_resume_after_mmu
-	msr	sctlr_el1, x0		// restore sctlr_el1
-	isb
-	/*
-	 * Invalidate the local I-cache so that any instructions fetched
-	 * speculatively from the PoC are discarded, since they may have
-	 * been dynamically patched at the PoU.
-	 */
-	ic	iallu
-	dsb	nsh
-	isb
-	br	x3			// global jump to virtual address
-ENDPROC(cpu_resume_mmu)
-	.popsection
-cpu_resume_after_mmu:
-#ifdef CONFIG_KASAN
-	mov	x0, sp
-	bl	kasan_unpoison_remaining_stack
-#endif
-	mov	x0, #0			// return zero on success
-	ret
-ENDPROC(cpu_resume_after_mmu)
-
 ENTRY(cpu_resume)
 	bl	el2_setup		// if in EL2 drop to EL1 cleanly
+	/* enable the MMU early - so we can access sleep_save_stash by va */
+	adr_l	lr, __enable_mmu	/* __cpu_setup will return here */
+	ldr	x27, =_cpu_resume	/* __enable_mmu will branch here */
+	adrp	x25, idmap_pg_dir
+	adrp	x26, swapper_pg_dir
+	b	__cpu_setup
+
+ENTRY(_cpu_resume)
 	mrs	x1, mpidr_el1
 	adrp	x8, mpidr_hash
 	add x8, x8, #:lo12:mpidr_hash // x8 = struct mpidr_hash phys address
@@ -134,29 +116,32 @@ ENTRY(cpu_resume)
 	ldp	w5, w6, [x8, #(MPIDR_HASH_SHIFTS + 8)]
 	compute_mpidr_hash x7, x3, x4, x5, x6, x1, x2
         /* x7 contains hash index, let's use it to grab context pointer */
-	ldr_l	x0, sleep_save_sp + SLEEP_SAVE_SP_PHYS
+	ldr_l	x0, sleep_save_stash
 	ldr	x0, [x0, x7, lsl #3]
 	add	x29, x0, #SLEEP_STACK_DATA_CALLEE_REGS
 	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
 	/* load sp from context */
 	ldr	x2, [x0, #CPU_CTX_SP]
-	/* load physical address of identity map page table in x1 */
-	adrp	x1, idmap_pg_dir
 	mov	sp, x2
 	/* save thread_info */
 	and	x2, x2, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x2
 	/*
-	 * cpu_do_resume expects x0 to contain context physical address
-	 * pointer and x1 to contain physical address of 1:1 page tables
+	 * cpu_do_resume expects x0 to contain context address pointer
 	 */
-	bl	cpu_do_resume		// PC relative jump, MMU off
-	/* Can't access these by physical address once the MMU is on */
+	bl	cpu_do_resume
+
+#ifdef CONFIG_KASAN
+	mov	x0, sp
+	bl	kasan_unpoison_remaining_stack
+#endif
+
 	ldp	x19, x20, [x29, #16]
 	ldp	x21, x22, [x29, #32]
 	ldp	x23, x24, [x29, #48]
 	ldp	x25, x26, [x29, #64]
 	ldp	x27, x28, [x29, #80]
 	ldp	x29, lr, [x29]
-	b	cpu_resume_mmu		// Resume MMU, never returns
+	mov	x0, #0
+	ret
 ENDPROC(cpu_resume)
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 6fe46100685a..c6457cedd664 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -10,30 +10,11 @@
 #include <asm/suspend.h>
 #include <asm/tlbflush.h>
 
-
 /*
- * This is called by __cpu_suspend_enter() to save the state, and do whatever
- * flushing is required to ensure that when the CPU goes to sleep we have
- * the necessary data available when the caches are not searched.
- *
- * ptr: sleep_stack_data containing cpu state virtual address.
- * save_ptr: address of the location where the context physical address
- *           must be saved
+ * This is allocated by cpu_suspend_init(), and used to store a pointer to
+ * the 'struct sleep_stack_data' the contains a particular CPUs state.
  */
-void notrace __cpu_suspend_save(struct sleep_stack_data *ptr,
-				phys_addr_t *save_ptr)
-{
-	*save_ptr = virt_to_phys(ptr);
-
-	cpu_do_suspend(&ptr->system_regs);
-	/*
-	 * Only flush the context that must be retrieved with the MMU
-	 * off. VA primitives ensure the flush is applied to all
-	 * cache levels so context is pushed to DRAM.
-	 */
-	__flush_dcache_area(ptr, sizeof(*ptr));
-	__flush_dcache_area(save_ptr, sizeof(*save_ptr));
-}
+unsigned long *sleep_save_stash;
 
 /*
  * This hook is provided so that cpu_suspend code can restore HW
@@ -137,22 +118,15 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 	return ret;
 }
 
-struct sleep_save_sp sleep_save_sp;
-
 static int __init cpu_suspend_init(void)
 {
-	void *ctx_ptr;
-
 	/* ctx_ptr is an array of physical addresses */
-	ctx_ptr = kcalloc(mpidr_hash_size(), sizeof(phys_addr_t), GFP_KERNEL);
+	sleep_save_stash = kcalloc(mpidr_hash_size(), sizeof(*sleep_save_stash),
+				   GFP_KERNEL);
 
-	if (WARN_ON(!ctx_ptr))
+	if (WARN_ON(!sleep_save_stash))
 		return -ENOMEM;
 
-	sleep_save_sp.save_ptr_stash = ctx_ptr;
-	sleep_save_sp.save_ptr_stash_phys = virt_to_phys(ctx_ptr);
-	__flush_dcache_area(&sleep_save_sp, sizeof(struct sleep_save_sp));
-
 	return 0;
 }
 early_initcall(cpu_suspend_init);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 9f6deacf41d2..c4317879b938 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -24,6 +24,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/hwcap.h>
 #include <asm/pgtable.h>
+#include <asm/pgtable-hwdef.h>
 #include <asm/cpufeature.h>
 #include <asm/alternative.h>
 
@@ -63,62 +64,50 @@ ENTRY(cpu_do_suspend)
 	mrs	x2, tpidr_el0
 	mrs	x3, tpidrro_el0
 	mrs	x4, contextidr_el1
-	mrs	x5, mair_el1
-	mrs	x6, cpacr_el1
-	mrs	x7, ttbr1_el1
-	mrs	x8, tcr_el1
-	mrs	x9, vbar_el1
-	mrs	x10, mdscr_el1
-	mrs	x11, oslsr_el1
-	mrs	x12, sctlr_el1
+	mrs	x5, cpacr_el1
+	mrs	x6, tcr_el1
+	mrs	x7, vbar_el1
+	mrs	x8, mdscr_el1
+	mrs	x9, oslsr_el1
+	mrs	x10, sctlr_el1
 	stp	x2, x3, [x0]
-	stp	x4, x5, [x0, #16]
-	stp	x6, x7, [x0, #32]
-	stp	x8, x9, [x0, #48]
-	stp	x10, x11, [x0, #64]
-	str	x12, [x0, #80]
+	stp	x4, xzr, [x0, #16]
+	stp	x5, x6, [x0, #32]
+	stp	x7, x8, [x0, #48]
+	stp	x9, x10, [x0, #64]
 	ret
 ENDPROC(cpu_do_suspend)
 
 /**
  * cpu_do_resume - restore CPU register context
  *
- * x0: Physical address of context pointer
- * x1: ttbr0_el1 to be restored
- *
- * Returns:
- *	sctlr_el1 value in x0
+ * x0: Address of context pointer
  */
 ENTRY(cpu_do_resume)
-	/*
-	 * Invalidate local tlb entries before turning on MMU
-	 */
-	tlbi	vmalle1
 	ldp	x2, x3, [x0]
 	ldp	x4, x5, [x0, #16]
-	ldp	x6, x7, [x0, #32]
-	ldp	x8, x9, [x0, #48]
-	ldp	x10, x11, [x0, #64]
-	ldr	x12, [x0, #80]
+	ldp	x6, x8, [x0, #32]
+	ldp	x9, x10, [x0, #48]
+	ldp	x11, x12, [x0, #64]
 	msr	tpidr_el0, x2
 	msr	tpidrro_el0, x3
 	msr	contextidr_el1, x4
-	msr	mair_el1, x5
 	msr	cpacr_el1, x6
-	msr	ttbr0_el1, x1
-	msr	ttbr1_el1, x7
-	tcr_set_idmap_t0sz x8, x7
+
+	/* Don't change t0sz here, mask those bits when restoring */
+	mrs	x5, tcr_el1
+	bfi	x8, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
+
 	msr	tcr_el1, x8
 	msr	vbar_el1, x9
 	msr	mdscr_el1, x10
+	msr	sctlr_el1, x12
 	/*
 	 * Restore oslsr_el1 by writing oslar_el1
 	 */
 	ubfx	x11, x11, #1, #1
 	msr	oslar_el1, x11
 	reset_pmuserenr_el0 x0			// Disable PMU access from EL0
-	mov	x0, x12
-	dsb	nsh		// Make sure local tlb invalidation completed
 	isb
 	ret
 ENDPROC(cpu_do_resume)
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 10/16] arm64: kernel: Include _AC definition in page.h
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (8 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 09/16] arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-20 16:25   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 11/16] arm64: Promote KERNEL_START/KERNEL_END definitions to a header file James Morse
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

page.h uses '_AC' in the definition of PAGE_SIZE, but doesn't include
linux/const.h where this is defined. This produces build warnings when only
asm/page.h is included by asm code.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/include/asm/page.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index ae615b9d9a55..17b45f7d96d3 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -19,6 +19,8 @@
 #ifndef __ASM_PAGE_H
 #define __ASM_PAGE_H
 
+#include <linux/const.h>
+
 /* PAGE_SHIFT determines the page size */
 /* CONT_SHIFT determines the number of pages which can be tracked together  */
 #ifdef CONFIG_ARM64_64K_PAGES
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 11/16] arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (9 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 10/16] arm64: kernel: Include _AC definition in page.h James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-20 16:26   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 12/16] arm64: Add new asm macro copy_page James Morse
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

KERNEL_START and KERNEL_END are useful outside head.S, move them to a
header file.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/memory.h | 3 +++
 arch/arm64/kernel/head.S        | 3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 12f8a00fb3f1..d776037d199f 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -71,6 +71,9 @@
 
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 4))
 
+#define KERNEL_START      _text
+#define KERNEL_END        _end
+
 /*
  * The size of the KASAN shadow region. This should be 1/8th of the
  * size of the entire kernel virtual address space.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index c024b5e7fc25..1217262b5210 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -51,9 +51,6 @@
 #error TEXT_OFFSET must be less than 2MB
 #endif
 
-#define KERNEL_START	_text
-#define KERNEL_END	_end
-
 /*
  * Kernel startup entry point.
  * ---------------------------
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 12/16] arm64: Add new asm macro copy_page
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (10 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 11/16] arm64: Promote KERNEL_START/KERNEL_END definitions to a header file James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-20 16:38   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument James Morse
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: Geoff Levand <geoff@infradead.org>

Kexec and hibernate need to copy pages of memory, but may not have all
of the kernel mapped, and are unable to call copy_page().

Add a simplistic copy_page() macro, that can be inlined in these
situations. lib/copy_page.S provides a bigger better version, but
uses more registers.

Signed-off-by: Geoff Levand <geoff@infradead.org>
[Changed asm label to 9998, added commit message]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/assembler.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index c72474c2d4dc..a04fd7a9c102 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -24,6 +24,7 @@
 #define __ASM_ASSEMBLER_H
 
 #include <asm/asm-offsets.h>
+#include <asm/page.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/ptrace.h>
 #include <asm/thread_info.h>
@@ -292,6 +293,24 @@ lr	.req	x30		// link register
 	.endm
 
 /*
+ * copy_page - copy src to dest using temp registers t1-t8
+ */
+	.macro copy_page dest:req src:req t1:req t2:req t3:req t4:req t5:req t6:req t7:req t8:req
+9998:	ldp	\t1, \t2, [\src]
+	ldp	\t3, \t4, [\src, #16]
+	ldp	\t5, \t6, [\src, #32]
+	ldp	\t7, \t8, [\src, #48]
+	add	\src, \src, #64
+	stnp	\t1, \t2, [\dest]
+	stnp	\t3, \t4, [\dest, #16]
+	stnp	\t5, \t6, [\dest, #32]
+	stnp	\t7, \t8, [\dest, #48]
+	add	\dest, \dest, #64
+	tst	\src, #(PAGE_SIZE - 1)
+	b.ne	9998b
+	.endm
+
+/*
  * Annotate a function as position independent, i.e., safe to be called before
  * the kernel virtual mapping is activated.
  */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (11 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 12/16] arm64: Add new asm macro copy_page James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-20 17:12   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 14/16] PM / Hibernate: Call flush_icache_range() on pages restored in-place James Morse
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

el2_setup() doesn't just configure el2, it configures el1 too. This
means we can't use it to re-configure el2 after resume from hibernate,
as we will be returned to el1 with the MMU turned off.

Split the sctlr_el1 setting code up, so that el2_setup() accepts an
initial value as an argument. This value will be ignored if el2_setup()
is called at el1: the running value will be preserved with endian
correction.

Hibernate can now call el2_setup() to re-configure el2, passing the
current sctlr_el1 as an argument.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/assembler.h | 10 ++++++++++
 arch/arm64/kernel/head.S           | 19 ++++++++++++-------
 arch/arm64/kernel/sleep.S          |  1 +
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index a04fd7a9c102..627d66efec68 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -311,6 +311,16 @@ lr	.req	x30		// link register
 	.endm
 
 /*
+ * Generate the initial sctlr_el1 value for el2_setup to set if we boot at EL2.
+ */
+	.macro init_sctlr_el1 reg
+		mov	\reg, #0x0800		// Set/clear RES{1,0} bits
+CPU_BE(		movk	\reg, #0x33d0, lsl #16)	// Set EE and E0E on BE systems
+CPU_LE(		movk	\reg, #0x30d0, lsl #16)	// Clear EE and E0E on LE systems
+	.endm
+
+/*
+
  * Annotate a function as position independent, i.e., safe to be called before
  * the kernel virtual mapping is activated.
  */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 1217262b5210..097986152203 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -208,6 +208,7 @@ section_table:
 
 ENTRY(stext)
 	bl	preserve_boot_args
+	init_sctlr_el1	x0
 	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
 	mov	x23, xzr			// KASLR offset, defaults to 0
 	adrp	x24, __PHYS_OFFSET
@@ -514,8 +515,12 @@ ENTRY(kimage_vaddr)
  *
  * Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in x20 if
  * booted in EL1 or EL2 respectively.
+ *
+ * If booted in EL2, SCTLR_EL1 will be initialised with the value in x0
+ * (otherwise the existing value will be preserved, with endian correction).
  */
 ENTRY(el2_setup)
+	mov	x1, x0				// preserve passed-in sctlr_el1
 	mrs	x0, CurrentEL
 	cmp	x0, #CurrentEL_EL2
 	b.ne	1f
@@ -524,7 +529,7 @@ CPU_BE(	orr	x0, x0, #(1 << 25)	)	// Set the EE bit for EL2
 CPU_LE(	bic	x0, x0, #(1 << 25)	)	// Clear the EE bit for EL2
 	msr	sctlr_el2, x0
 	b	2f
-1:	mrs	x0, sctlr_el1
+1:	mrs	x0, sctlr_el1			// ignore passed-in sctlr_el1
 CPU_BE(	orr	x0, x0, #(3 << 24)	)	// Set the EE and E0E bits for EL1
 CPU_LE(	bic	x0, x0, #(3 << 24)	)	// Clear the EE and E0E bits for EL1
 	msr	sctlr_el1, x0
@@ -578,6 +583,10 @@ set_hcr:
 
 3:
 #endif
+	/* use sctlr_el1 value we were provided with */
+CPU_BE(	orr	x1, x1, #(3 << 24)	)	// Set the EE and E0E bits for EL1
+CPU_LE(	bic	x1, x1, #(3 << 24)	)	// Clear the EE and E0E bits for EL1
+	msr	sctlr_el1, x1
 
 	/* Populate ID registers. */
 	mrs	x0, midr_el1
@@ -585,12 +594,6 @@ set_hcr:
 	msr	vpidr_el2, x0
 	msr	vmpidr_el2, x1
 
-	/* sctlr_el1 */
-	mov	x0, #0x0800			// Set/clear RES{1,0} bits
-CPU_BE(	movk	x0, #0x33d0, lsl #16	)	// Set EE and E0E on BE systems
-CPU_LE(	movk	x0, #0x30d0, lsl #16	)	// Clear EE and E0E on LE systems
-	msr	sctlr_el1, x0
-
 	/* Coprocessor traps. */
 	mov	x0, #0x33ff
 	msr	cptr_el2, x0			// Disable copro. traps to EL2
@@ -667,6 +670,7 @@ ENTRY(__boot_cpu_mode)
 	 * cores are held until we're ready for them to initialise.
 	 */
 ENTRY(secondary_holding_pen)
+	init_sctlr_el1	x0
 	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
 	bl	set_cpu_boot_mode_flag
 	mrs	x0, mpidr_el1
@@ -685,6 +689,7 @@ ENDPROC(secondary_holding_pen)
 	 * be used where CPUs are brought online dynamically by the kernel.
 	 */
 ENTRY(secondary_entry)
+	init_sctlr_el1	x0
 	bl	el2_setup			// Drop to EL1
 	bl	set_cpu_boot_mode_flag
 	b	secondary_startup
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 7916facff5e7..386a270a9998 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -98,6 +98,7 @@ ENDPROC(__cpu_suspend_enter)
 	.ltorg
 
 ENTRY(cpu_resume)
+	init_sctlr_el1	x0
 	bl	el2_setup		// if in EL2 drop to EL1 cleanly
 	/* enable the MMU early - so we can access sleep_save_stash by va */
 	adr_l	lr, __enable_mmu	/* __cpu_setup will return here */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 14/16] PM / Hibernate: Call flush_icache_range() on pages restored in-place
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (12 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-20 17:16   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 15/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Some architectures require code written to memory as if it were data to be
'cleaned' from any data caches before the processor can fetch them as new
instructions.

During resume from hibernate, the snapshot code copies some pages directly,
meaning these architectures do not get a chance to perform their cache
maintenance. Modify the read and decompress code to call
flush_icache_range() on all pages that are restored, so that the restored
in-place pages are guaranteed to be executable on these architectures.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
---
 kernel/power/swap.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 12cd989dadf6..a30645d2e93f 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -37,6 +37,14 @@
 #define HIBERNATE_SIG	"S1SUSPEND"
 
 /*
+ * When reading an {un,}compressed image, we may restore pages in place,
+ * in which case some architectures need these pages cleaning before they
+ * can be executed. We don't know which pages these may be, so clean the lot.
+ */
+bool clean_pages_on_read = false;
+bool clean_pages_on_decompress = false;
+
+/*
  *	The swap map is a data structure used for keeping track of each page
  *	written to a swap partition.  It consists of many swap_map_page
  *	structures that contain each an array of MAP_PAGE_ENTRIES swap entries.
@@ -241,6 +249,9 @@ static void hib_end_io(struct bio *bio)
 
 	if (bio_data_dir(bio) == WRITE)
 		put_page(page);
+	else if (clean_pages_on_read)
+		flush_icache_range((unsigned long)page_address(page),
+				   (unsigned long)page_address(page) + PAGE_SIZE);
 
 	if (bio->bi_error && !hb->error)
 		hb->error = bio->bi_error;
@@ -1049,6 +1060,7 @@ static int load_image(struct swap_map_handle *handle,
 
 	hib_init_batch(&hb);
 
+	clean_pages_on_read = true;
 	printk(KERN_INFO "PM: Loading image data pages (%u pages)...\n",
 		nr_to_read);
 	m = nr_to_read / 10;
@@ -1124,6 +1136,10 @@ static int lzo_decompress_threadfn(void *data)
 		d->unc_len = LZO_UNC_SIZE;
 		d->ret = lzo1x_decompress_safe(d->cmp + LZO_HEADER, d->cmp_len,
 		                               d->unc, &d->unc_len);
+		if (clean_pages_on_decompress)
+			flush_icache_range((unsigned long)d->unc,
+					   (unsigned long)d->unc + d->unc_len);
+
 		atomic_set(&d->stop, 1);
 		wake_up(&d->done);
 	}
@@ -1189,6 +1205,8 @@ static int load_image_lzo(struct swap_map_handle *handle,
 	}
 	memset(crc, 0, offsetof(struct crc_data, go));
 
+	clean_pages_on_decompress = true;
+
 	/*
 	 * Start the decompression threads.
 	 */
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 15/16] arm64: kernel: Add support for hibernate/suspend-to-disk
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (13 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 14/16] PM / Hibernate: Call flush_icache_range() on pages restored in-place James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-22 10:29   ` Catalin Marinas
  2016-04-01 16:53 ` [PATCH v7 16/16] arm64: hibernate: Prevent resume from a different kernel version James Morse
  2016-04-13 16:31 ` [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline James Morse
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Add support for hibernate/suspend-to-disk.

Suspend borrows code from cpu_suspend() to write cpu state onto the stack,
before calling swsusp_save() to save the memory image.

Restore creates a set of temporary page tables, covering only the
linear map, copies the restore code to a 'safe' page, then uses the copy to
restore the memory image. The copied code executes in the lower half of the
address space, and once complete, restores the original kernel's page
tables. It then calls into cpu_resume(), and follows the normal
cpu_suspend() path back into the suspend code.

To restore a kernel using KASLR, the address of the page tables, and
cpu_resume() are stored in the hibernate arch-header and the el2
vectors are pivotted via the 'safe' page in low memory. This also permits
us to resume using a different version of the kernel to the version that
hibernated, but because the MMU isn't turned off during resume, the
MMU settings must be the same between both kernels. To ensure this, the
value of the translation control register (TCR_EL1) is also included in the
hibernate arch-header, this means your resume kernel must have the same
page size, and virtual address space size.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Kevin Hilman <khilman@baylibre.com> # Tested on Juno R2
---
 arch/arm64/Kconfig                |   7 +
 arch/arm64/include/asm/suspend.h  |   7 +
 arch/arm64/kernel/Makefile        |   1 +
 arch/arm64/kernel/asm-offsets.c   |   5 +
 arch/arm64/kernel/hibernate-asm.S | 166 +++++++++++++
 arch/arm64/kernel/hibernate.c     | 503 ++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/vmlinux.lds.S   |  15 ++
 7 files changed, 704 insertions(+)
 create mode 100644 arch/arm64/kernel/hibernate-asm.S
 create mode 100644 arch/arm64/kernel/hibernate.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4f436220384f..7f4ad0075b97 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -953,6 +953,13 @@ menu "Power management options"
 
 source "kernel/power/Kconfig"
 
+config ARCH_HIBERNATION_POSSIBLE
+	def_bool y
+
+config ARCH_HIBERNATION_HEADER
+	def_bool y
+	depends on HIBERNATION
+
 config ARCH_SUSPEND_POSSIBLE
 	def_bool y
 
diff --git a/arch/arm64/include/asm/suspend.h b/arch/arm64/include/asm/suspend.h
index 29d3c71433e1..024d623f662e 100644
--- a/arch/arm64/include/asm/suspend.h
+++ b/arch/arm64/include/asm/suspend.h
@@ -40,4 +40,11 @@ extern int cpu_suspend(unsigned long arg, int (*fn)(unsigned long));
 extern void cpu_resume(void);
 int __cpu_suspend_enter(struct sleep_stack_data *state);
 void __cpu_suspend_exit(void);
+void _cpu_resume(void);
+
+int swsusp_arch_suspend(void);
+int swsusp_arch_resume(void);
+int arch_hibernation_header_save(void *addr, unsigned int max_size);
+int arch_hibernation_header_restore(void *addr);
+
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 3793003e16a2..2173149d8954 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -45,6 +45,7 @@ arm64-obj-$(CONFIG_ACPI)		+= acpi.o
 arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
 arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
+arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 
 obj-y					+= $(arm64-obj-y) vdso/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index ac742ef0fde0..f8e5d47f0880 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -22,6 +22,7 @@
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
 #include <linux/kvm_host.h>
+#include <linux/suspend.h>
 #include <asm/thread_info.h>
 #include <asm/memory.h>
 #include <asm/smp_plat.h>
@@ -124,5 +125,9 @@ int main(void)
 #endif
   DEFINE(ARM_SMCCC_RES_X0_OFFS,	offsetof(struct arm_smccc_res, a0));
   DEFINE(ARM_SMCCC_RES_X2_OFFS,	offsetof(struct arm_smccc_res, a2));
+  BLANK();
+  DEFINE(HIBERN_PBE_ORIG,	offsetof(struct pbe, orig_address));
+  DEFINE(HIBERN_PBE_ADDR,	offsetof(struct pbe, address));
+  DEFINE(HIBERN_PBE_NEXT,	offsetof(struct pbe, next));
   return 0;
 }
diff --git a/arch/arm64/kernel/hibernate-asm.S b/arch/arm64/kernel/hibernate-asm.S
new file mode 100644
index 000000000000..28c814aee608
--- /dev/null
+++ b/arch/arm64/kernel/hibernate-asm.S
@@ -0,0 +1,166 @@
+#include <linux/linkage.h>
+#include <linux/errno.h>
+
+#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
+#include <asm/cputype.h>
+#include <asm/memory.h>
+#include <asm/page.h>
+#include <asm/virt.h>
+
+/*
+ * Corrupt memory.
+ *
+ * Loads temporary page tables then restores the memory image.
+ * Finally branches to cpu_resume() to restore the state saved by
+ * swsusp_arch_suspend().
+ *
+ * Because this code has to be copied to a safe_page, it can't call out to
+ * other functions by PC-relative address. Also remember that it may be
+ * mid-way through over-writing other functions. For this reason it contains
+ * code from flush_icache_range() and uses the copy_page() macro.
+ *
+ * All of memory gets written to, including code. We need to clean the kernel
+ * text to the Point of Coherence (PoC) before secondary cores can be booted.
+ * Because the kernel modules and executable pages mapped to user space are
+ * also written as data, we clean all pages we touch to the Point of
+ * Unification (PoU).
+ *
+ * We use el2_setup() to reconfigure el2. This code needs cleaning to PoC by VA,
+ * but called with its physical address, as we left el2 with the MMU turned off.
+ *
+ * x0: physical address of temporary page tables
+ * x1: physical address of swapper page tables
+ * x2: address of cpu_resume
+ * x3: linear map address of restore_pblist in the current kernel
+ * x4: virtual address of the page containing el2_setup, used to clean to PoC
+ * x5: physical address of el2_setup, used to execute at el2
+ */
+.pushsection    ".hibernate_exit.text", "ax"
+ENTRY(swsusp_arch_suspend_exit)
+	/* Temporary page tables are a copy, so no need for a trampoline here */
+	msr	ttbr1_el1, x0
+	isb
+	tlbi	vmalle1is	/* invalidate intermediate caching entries */
+	dsb	ish
+
+	mov	x21, x1
+	mov	x30, x2		/* el2_setup() will eret to the lr directly */
+	mov	x24, x4
+	mov	x25, x5
+
+	/* walk the restore_pblist and use copy_page() to over-write memory */
+	mov	x19, x3
+
+1:	ldr	x10, [x19, #HIBERN_PBE_ORIG]
+	mov	x0, x10
+	ldr	x1, [x19, #HIBERN_PBE_ADDR]
+
+	copy_page	x0, x1, x2, x3, x4, x5, x6, x7, x8, x9
+
+	add	x1, x10, #PAGE_SIZE
+	/* Clean the copied page to PoU - based on flush_icache_range() */
+	dcache_line_size x2, x3
+	sub	x3, x2, #1
+	bic	x4, x10, x3
+2:	dc	cvau, x4	/* clean D line / unified line */
+	add	x4, x4, x2
+	cmp	x4, x1
+	b.lo	2b
+
+	ldr	x19, [x19, #HIBERN_PBE_NEXT]
+	cbnz	x19, 1b
+
+
+	/* switch to the restored kernels page tables, to reconfigure el2 */
+	msr	ttbr1_el1, x21  /* physical address of swapper page tables */
+	isb
+	tlbi	vmalle1is	/* invalidate intermediate caching entries */
+	ic	ialluis
+	dsb	ish		/* also waits for PoU cleaning to finish */
+	isb
+
+
+	cbz	x24, 4f		/* Did we boot at el1? */
+	/* Clean el2_setup's page to PoC */
+	mov	x0, x24
+	/*
+	 * We don't know if el2_setup() overlaps a page boundary, clean two
+	 * pages, just in case.
+	 */
+	add	x1, x0, #2*PAGE_SIZE
+	dcache_line_size x2, x3
+	sub	x3, x2, #1
+	bic	x4, x0, x3
+3:	dc	cvac, x4
+	add	x4, x4, x2
+	cmp	x4, x1
+	b.lo	3b
+
+	/* reconfigure el2 */
+	mrs	x0, sctlr_el1
+	hvc	#0
+
+	/*
+	 * el2_setup() will eret to the location in x30, so we
+	 * only get here if we booted at el1.
+	 */
+
+4:	ret
+
+	.ltorg
+ENDPROC(swsusp_arch_suspend_exit)
+
+/*
+ * Restore the hyp stub. Once we know where in memory el2_setup is, we
+ * can use it to re-initialise el2. This must be done before the hibernate
+ * page is unmapped.
+ *
+ * x0: The current sctlr_el1 value, to be re-loaded
+ * x25: The physical address of el2_setup __pa(el2_setup)
+ * x30: Where el2_setup() should eret to
+ */
+el1_sync:
+	br	x25
+ENDPROC(el1_sync)
+
+.macro invalid_vector	label
+\label:
+	b \label
+ENDPROC(\label)
+.endm
+
+	invalid_vector	el2_sync_invalid
+	invalid_vector	el2_irq_invalid
+	invalid_vector	el2_fiq_invalid
+	invalid_vector	el2_error_invalid
+	invalid_vector	el1_sync_invalid
+	invalid_vector	el1_irq_invalid
+	invalid_vector	el1_fiq_invalid
+	invalid_vector	el1_error_invalid
+
+/* el2 vectors - switch el2 here while we restore the memory image. */
+	.align 11
+ENTRY(hibernate_el2_vectors)
+	ventry	el2_sync_invalid		// Synchronous EL2t
+	ventry	el2_irq_invalid			// IRQ EL2t
+	ventry	el2_fiq_invalid			// FIQ EL2t
+	ventry	el2_error_invalid		// Error EL2t
+
+	ventry	el2_sync_invalid		// Synchronous EL2h
+	ventry	el2_irq_invalid			// IRQ EL2h
+	ventry	el2_fiq_invalid			// FIQ EL2h
+	ventry	el2_error_invalid		// Error EL2h
+
+	ventry	el1_sync			// Synchronous 64-bit EL1
+	ventry	el1_irq_invalid			// IRQ 64-bit EL1
+	ventry	el1_fiq_invalid			// FIQ 64-bit EL1
+	ventry	el1_error_invalid		// Error 64-bit EL1
+
+	ventry	el1_sync_invalid		// Synchronous 32-bit EL1
+	ventry	el1_irq_invalid			// IRQ 32-bit EL1
+	ventry	el1_fiq_invalid			// FIQ 32-bit EL1
+	ventry	el1_error_invalid		// Error 32-bit EL1
+END(hibernate_el2_vectors)
+
+.popsection
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
new file mode 100644
index 000000000000..279c556ee24b
--- /dev/null
+++ b/arch/arm64/kernel/hibernate.c
@@ -0,0 +1,503 @@
+/*:
+ * Hibernate support specific for ARM64
+ *
+ * Derived from work on ARM hibernation support by:
+ *
+ * Ubuntu project, hibernation support for mach-dove
+ * Copyright (C) 2010 Nokia Corporation (Hiroshi Doyu)
+ * Copyright (C) 2010 Texas Instruments, Inc. (Teerth Reddy et al.)
+ *  https://lkml.org/lkml/2010/6/18/4
+ *  https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/027422.html
+ *  https://patchwork.kernel.org/patch/96442/
+ *
+ * Copyright (C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
+ *
+ * License terms: GNU General Public License (GPL) version 2
+ */
+#define pr_fmt(x) "hibernate: " x
+#include <linux/kvm_host.h>
+#include <linux/mm.h>
+#include <linux/pm.h>
+#include <linux/sched.h>
+#include <linux/suspend.h>
+#include <linux/version.h>
+
+#include <asm/barrier.h>
+#include <asm/cacheflush.h>
+#include <asm/irqflags.h>
+#include <asm/memory.h>
+#include <asm/mmu_context.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/pgtable-hwdef.h>
+#include <asm/sections.h>
+#include <asm/suspend.h>
+#include <asm/virt.h>
+
+/* These are necessary to build without ifdefery */
+#ifndef pmd_index
+#define pmd_index(x)	0
+#endif
+#ifndef pud_index
+#define pud_index(x)	0
+#endif
+
+#define TCR_IPS_BITS (0x7UL<<32)
+
+/*
+ * Hibernate core relies on this value being 0 on resume, and marks it
+ * __nosavedata assuming it will keep the resume kernel's '0' value. This
+ * doesn't happen with either KASLR or resuming with a different kernel.
+ *
+ * defined as "__visible int in_suspend __nosavedata" in
+ * kernel/power/hibernate.c
+ */
+extern int in_suspend;
+
+/*
+ * This value is written to the hibernate arch header, and prevents resuming
+ * from a hibernate image produced by an incompatible kernel. If you change
+ * a value that isn't saved/restored by hibernate, you should change this value.
+ *
+ * For example, if the mair_el1 values used by the kernel are changed, you
+ * should prevent resuming from a kernel with incompatible attributes, as these
+ * aren't saved/restored.
+ */
+#define HIBERNATE_VERSION	KERNEL_VERSION(4, 6, 0)
+
+/* Find a symbols alias in the linear map */
+#define LMADDR(x)	phys_to_virt(virt_to_phys(x))
+
+/*
+ * Start/end of the hibernate exit code, this must be copied to a 'safe'
+ * location in memory, and executed from there.
+ */
+extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
+
+/* temporary el2 vectors in the __hibernate_exit_text section. */
+extern char hibernate_el2_vectors[];
+
+/* el2_setup(), in head.S, use to re-configure el2 */
+extern char el2_setup[];
+
+struct arch_hibernate_hdr_invariants {
+	unsigned long	version;
+	unsigned long	tcr_el1;	/* page_size, va bit etc */
+};
+
+/* These values need to be know across a hibernate/restore. */
+static struct arch_hibernate_hdr {
+	struct arch_hibernate_hdr_invariants invariants;
+
+	/* These are needed to find the relocated kernel if built with kaslr */
+	phys_addr_t	ttbr1_el1;
+	void		(*reenter_kernel)(void);
+
+	/*
+	 * We need to know where el2_setup() is after restore to re-configure
+	 * el2. But first, we need to clean it to PoC by va, as we left el2 with
+	 * the MMU off. Both values will be 0 if we booted@el1.
+	 */
+	phys_addr_t	el2_setup_phys;
+	unsigned long	el2_setup_page;
+} resume_hdr;
+
+static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
+{
+	i->version = HIBERNATE_VERSION;
+	asm volatile("mrs	%0, tcr_el1" : "=r"(i->tcr_el1));
+
+	/* IPS bits vary on big/little systems, mask them out */
+	i->tcr_el1 &= ~TCR_IPS_BITS;
+}
+
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = virt_to_pfn(&__nosave_begin);
+	unsigned long nosave_end_pfn = virt_to_pfn(&__nosave_end - 1);
+
+	return (pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn);
+}
+
+void notrace save_processor_state(void)
+{
+	WARN_ON(num_online_cpus() != 1);
+}
+
+void notrace restore_processor_state(void)
+{
+}
+
+int arch_hibernation_header_save(void *addr, unsigned int max_size)
+{
+	struct arch_hibernate_hdr *hdr = addr;
+
+	if (max_size < sizeof(*hdr))
+		return -EOVERFLOW;
+
+	arch_hdr_invariants(&hdr->invariants);
+	hdr->ttbr1_el1		= virt_to_phys(swapper_pg_dir);
+	hdr->reenter_kernel	= _cpu_resume;
+	if (is_hyp_mode_available()) {
+		hdr->el2_setup_page	= (unsigned long)el2_setup & PAGE_MASK;
+		hdr->el2_setup_phys	= virt_to_phys(el2_setup);
+	} else {
+		hdr->el2_setup_page = 0;
+		hdr->el2_setup_phys = 0;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(arch_hibernation_header_save);
+
+int arch_hibernation_header_restore(void *addr)
+{
+	struct arch_hibernate_hdr_invariants invariants;
+	struct arch_hibernate_hdr *hdr = addr;
+
+	/*
+	 * If this header is ancient, it may be smaller than we expect.
+	 * Test the version first.
+	 */
+	if (hdr->invariants.version != HIBERNATE_VERSION) {
+		pr_crit("Hibernate image not compatible with this kernel version!\n");
+		return -EINVAL;
+	}
+
+	arch_hdr_invariants(&invariants);
+	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
+		pr_crit("Hibernate image not compatible with this kernel configuration!\n");
+		return -EINVAL;
+	}
+
+	if ((is_hyp_mode_available() && !hdr->el2_setup_page) ||
+	    (!is_hyp_mode_available() && hdr->el2_setup_page)) {
+		pr_crit("Hibernate/resume kernels booted with differing virtualisation support. (EL1/EL2) ");
+		return -EINVAL;
+	}
+
+	resume_hdr = *hdr;
+
+	return 0;
+}
+EXPORT_SYMBOL(arch_hibernation_header_restore);
+
+/*
+ * Copies length bytes, starting at src_start into an new page,
+ * perform cache maintentance, then map it (nearly) at the bottom of memory
+ * as executable.
+ *
+ * This is used by hibernate to copy the code it needs to execute when
+ * overwriting the kernel text. This function generates a new set of page
+ * tables, which it loads into ttbr0.
+ *
+ * Length is provided as we probably only want 4K of data, even on a 64K
+ * page system. We don't use the very bottom page, so that dereferencing
+ * NULL continues to have the expected behaviour.
+ */
+static int create_safe_exec_page(void *src_start, size_t length,
+				 void **dst_addr, phys_addr_t *phys_dst_addr,
+				 unsigned long (*allocator)(gfp_t mask),
+				 gfp_t mask)
+{
+	int rc = 0;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long dst = allocator(mask);
+
+	if (!dst) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	memcpy((void *)dst, src_start, length);
+	flush_icache_range(dst, dst + length);
+
+	pgd = (pgd_t *)allocator(mask) + pgd_index(PAGE_SIZE);
+	if (PTRS_PER_PGD > 1) {
+		pud = (pud_t *)allocator(mask);
+		if (!pud) {
+			rc = -ENOMEM;
+			goto out;
+		}
+		set_pgd(pgd, __pgd(virt_to_phys(pud) | PUD_TYPE_TABLE));
+	}
+
+	pud = pud_offset(pgd, PAGE_SIZE);
+	if (PTRS_PER_PUD > 1) {
+		pmd = (pmd_t *)allocator(mask);
+		if (!pmd) {
+			rc = -ENOMEM;
+			goto out;
+		}
+		set_pud(pud, __pud(virt_to_phys(pmd) | PUD_TYPE_TABLE));
+	}
+
+	pmd = pmd_offset(pud, PAGE_SIZE);
+	if (PTRS_PER_PMD > 1) {
+		pte = (pte_t *)allocator(mask);
+		if (!pte) {
+			rc = -ENOMEM;
+			goto out;
+		}
+		set_pmd(pmd, __pmd(virt_to_phys(pte) | PMD_TYPE_TABLE));
+	}
+
+	pte = pte_offset_kernel(pmd, PAGE_SIZE);
+	set_pte_at(&init_mm, dst, pte,
+		   __pte(virt_to_phys((void *)dst) |
+			 pgprot_val(PAGE_KERNEL_EXEC)));
+
+	/* Load our new page tables */
+	asm volatile("msr	ttbr0_el1, %0;"
+		     "isb;"
+		     "tlbi	vmalle1is;"
+		     "dsb	ish" : : "r"(virt_to_phys(pgd)));
+
+	*dst_addr = (void *)(PAGE_SIZE);
+	*phys_dst_addr = virt_to_phys((void *)dst);
+
+out:
+	return rc;
+}
+
+
+int swsusp_arch_suspend(void)
+{
+	int ret = 0;
+	unsigned long flags;
+	struct sleep_stack_data state;
+
+	local_dbg_save(flags);
+
+	if (__cpu_suspend_enter(&state)) {
+		ret = swsusp_save();
+	} else {
+		void *lm_kernel_start;
+
+		/* Clean kernel to PoC for secondary core startup */
+		lm_kernel_start = LMADDR(KERNEL_START);
+		__flush_dcache_area(lm_kernel_start, KERNEL_END - KERNEL_START);
+
+		/*
+		 * Tell the hibernation core that we've just restored
+		 * the memory
+		 */
+		in_suspend = 0;
+
+		__cpu_suspend_exit();
+	}
+
+	local_dbg_restore(flags);
+
+	return ret;
+}
+
+static int copy_pte(pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long start,
+		    unsigned long end)
+{
+	unsigned long next;
+	unsigned long addr = start;
+	pte_t *src_pte = pte_offset_kernel(src_pmd, start);
+	pte_t *dst_pte = pte_offset_kernel(dst_pmd, start);
+
+	do {
+		next = addr + PAGE_SIZE;
+		if (pte_val(*src_pte))
+			set_pte(dst_pte,
+				__pte(pte_val(*src_pte) & ~PTE_RDONLY));
+	} while (dst_pte++, src_pte++, addr = next, addr != end);
+
+	return 0;
+}
+
+static int copy_pmd(pud_t *dst_pud, pud_t *src_pud, unsigned long start,
+		    unsigned long end)
+{
+	int rc = 0;
+	pte_t *dst_pte;
+	unsigned long next;
+	unsigned long addr = start;
+	pmd_t *src_pmd = pmd_offset(src_pud, start);
+	pmd_t *dst_pmd = pmd_offset(dst_pud, start);
+
+	do {
+		next = pmd_addr_end(addr, end);
+		if (!pmd_val(*src_pmd))
+			continue;
+
+		if (pmd_table(*(src_pmd))) {
+			dst_pte = (pte_t *)get_safe_page(GFP_ATOMIC);
+			if (!dst_pte) {
+				rc = -ENOMEM;
+				break;
+			}
+
+			set_pmd(dst_pmd, __pmd(virt_to_phys(dst_pte)
+					       | PMD_TYPE_TABLE));
+
+			rc = copy_pte(dst_pmd, src_pmd, addr, next);
+			if (rc)
+				break;
+		} else {
+			set_pmd(dst_pmd,
+				__pmd(pmd_val(*src_pmd) & ~PMD_SECT_RDONLY));
+		}
+	} while (dst_pmd++, src_pmd++, addr = next, addr != end);
+
+	return rc;
+}
+
+static int copy_pud(pgd_t *dst_pgd, pgd_t *src_pgd, unsigned long start,
+		    unsigned long end)
+{
+	int rc = 0;
+	pmd_t *dst_pmd;
+	unsigned long next;
+	unsigned long addr = start;
+	pud_t *src_pud = pud_offset(src_pgd, start);
+	pud_t *dst_pud = pud_offset(dst_pgd, start);
+
+	do {
+		next = pud_addr_end(addr, end);
+		if (!pud_val(*src_pud))
+			continue;
+
+		if (pud_table(*(src_pud))) {
+			if (PTRS_PER_PMD != 1) {
+				dst_pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
+				if (!dst_pmd) {
+					rc = -ENOMEM;
+					break;
+				}
+
+				set_pud(dst_pud, __pud(virt_to_phys(dst_pmd)
+						       | PUD_TYPE_TABLE));
+			}
+
+			rc = copy_pmd(dst_pud, src_pud, addr, next);
+			if (rc)
+				break;
+		} else {
+			set_pud(dst_pud,
+				__pud(pud_val(*src_pud) & ~PMD_SECT_RDONLY));
+		}
+	} while (dst_pud++, src_pud++, addr = next, addr != end);
+
+	return rc;
+}
+
+static int copy_page_tables(pgd_t *dst_pgd, unsigned long start,
+			    unsigned long end)
+{
+	int rc = 0;
+	pud_t *dst_pud;
+	unsigned long next;
+	unsigned long addr = start;
+	pgd_t *src_pgd = pgd_offset_k(start);
+
+	dst_pgd += pgd_index(start);
+
+	do {
+		next = pgd_addr_end(addr, end);
+		if (!pgd_val(*src_pgd))
+			continue;
+
+		if (PTRS_PER_PUD != 1) {
+			dst_pud = (pud_t *)get_safe_page(GFP_ATOMIC);
+			if (!dst_pud) {
+				rc = -ENOMEM;
+				break;
+			}
+
+			set_pgd(dst_pgd, __pgd(virt_to_phys(dst_pud)
+					       | PUD_TYPE_TABLE));
+		}
+
+		rc = copy_pud(dst_pgd, src_pgd, addr, next);
+		if (rc)
+			break;
+	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
+
+	return rc;
+}
+
+/*
+ * Setup then Resume from the hibernate image using swsusp_arch_suspend_exit().
+ *
+ * Memory allocated by get_safe_page() will be dealt with by the hibernate code,
+ * we don't need to free it here.
+ */
+int swsusp_arch_resume(void)
+{
+	int rc = 0;
+	size_t exit_size;
+	pgd_t *tmp_pg_dir;
+	void *lm_restore_pblist;
+	phys_addr_t phys_hibernate_exit;
+	void __noreturn (*hibernate_exit)(phys_addr_t, phys_addr_t, void *,
+					  void *, unsigned long, phys_addr_t);
+
+	/*
+	 * Copy swsusp_arch_suspend_exit() to a safe page. This will generate
+	 * a new set of ttbr0 page tables and load them.
+	 */
+	exit_size = __hibernate_exit_text_end - __hibernate_exit_text_start;
+	rc = create_safe_exec_page(__hibernate_exit_text_start, exit_size,
+				   (void **)&hibernate_exit,
+				   &phys_hibernate_exit,
+				   get_safe_page, GFP_ATOMIC);
+	if (rc) {
+		pr_err("Failed to create safe executable page for hibernate_exit code.");
+		goto out;
+	}
+
+	/*
+	 * The hibernate exit text contains a set of el2 vectors, that will
+	 * be executed at el2 with the mmu off in order to reload hyp-stub.
+	 */
+	__flush_dcache_area(hibernate_exit, exit_size);
+
+	/*
+	 * Restoring the memory image will overwrite the ttbr1 page tables.
+	 * Create a second copy of just the linear map, and use this when
+	 * restoring.
+	 */
+	tmp_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
+	if (!tmp_pg_dir) {
+		pr_err("Failed to allocate memory for temporary page tables.");
+		rc = -ENOMEM;
+		goto out;
+	}
+	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0);
+	if (rc)
+		goto out;
+
+	/*
+	 * Since we only copied the linear map, we need to find restore_pblist's
+	 * linear map address.
+	 */
+	lm_restore_pblist = LMADDR(restore_pblist);
+
+	/*
+	 * Both KASLR and restoring with a different kernel version will cause
+	 * the el2 vectors to be in a different location in the resumed kernel.
+	 * Load hibernate's temporary copy into el2.
+	 */
+	if (is_hyp_mode_available()) {
+		phys_addr_t el2_vectors = phys_hibernate_exit;  /* base */
+		el2_vectors += hibernate_el2_vectors -
+			       __hibernate_exit_text_start;     /* offset */
+
+		__hyp_set_vectors(el2_vectors);
+	}
+
+	hibernate_exit(virt_to_phys(tmp_pg_dir), resume_hdr.ttbr1_el1,
+		       resume_hdr.reenter_kernel, lm_restore_pblist,
+		       resume_hdr.el2_setup_page, resume_hdr.el2_setup_phys);
+
+out:
+	return rc;
+}
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5a1939a74ff3..48fab0553872 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -46,6 +46,16 @@ jiffies = jiffies_64;
 	*(.idmap.text)					\
 	VMLINUX_SYMBOL(__idmap_text_end) = .;
 
+#ifdef CONFIG_HIBERNATION
+#define HIBERNATE_TEXT					\
+	. = ALIGN(SZ_4K);				\
+	VMLINUX_SYMBOL(__hibernate_exit_text_start) = .;\
+	*(.hibernate_exit.text)				\
+	VMLINUX_SYMBOL(__hibernate_exit_text_end) = .;
+#else
+#define HIBERNATE_TEXT
+#endif
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from stext to _edata, must be a round multiple of the PE/COFF
@@ -109,6 +119,7 @@ SECTIONS
 			LOCK_TEXT
 			HYPERVISOR_TEXT
 			IDMAP_TEXT
+			HIBERNATE_TEXT
 			*(.fixup)
 			*(.gnu.warning)
 		. = ALIGN(16);
@@ -201,6 +212,10 @@ ASSERT(__hyp_idmap_text_end - (__hyp_idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"HYP init code too big or misaligned")
 ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"ID map text too big or misaligned")
+#ifdef CONFIG_HIBERNATION
+ASSERT(__hibernate_exit_text_end - (__hibernate_exit_text_start & ~(SZ_4K - 1))
+	<= SZ_4K, "Hibernate exit text too big or misaligned")
+#endif
 
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 16/16] arm64: hibernate: Prevent resume from a different kernel version
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (14 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 15/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
@ 2016-04-01 16:53 ` James Morse
  2016-04-10 12:16   ` Ard Biesheuvel
  2016-04-13 16:31 ` [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline James Morse
  16 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-01 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Resuming using a different kernel version is fragile, while there are
sufficient details in the hibernate arch-header to perform the restore,
changes in the boot process can have a long-lasting impact on the system.
In particular, if the EFI stub causes more runtime services memory to be
allocated, the amount of memory left for linux is reduced. If we are
lucky, this will cause restore to fail with the message:
> PM: Image mismatch: memory size
If we are unlucky, the system will explode sometime later when an EFI
runtime services call is made.

Prevent resuming with a different kernel build, by storing UTS_VERSION
in the hibernate header. This also ensures the page size and va bits
configuration remain the same.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/kernel/hibernate.c | 35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 279c556ee24b..486315249f2a 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -20,6 +20,7 @@
 #include <linux/pm.h>
 #include <linux/sched.h>
 #include <linux/suspend.h>
+#include <linux/utsname.h>
 #include <linux/version.h>
 
 #include <asm/barrier.h>
@@ -42,8 +43,6 @@
 #define pud_index(x)	0
 #endif
 
-#define TCR_IPS_BITS (0x7UL<<32)
-
 /*
  * Hibernate core relies on this value being 0 on resume, and marks it
  * __nosavedata assuming it will keep the resume kernel's '0' value. This
@@ -54,17 +53,6 @@
  */
 extern int in_suspend;
 
-/*
- * This value is written to the hibernate arch header, and prevents resuming
- * from a hibernate image produced by an incompatible kernel. If you change
- * a value that isn't saved/restored by hibernate, you should change this value.
- *
- * For example, if the mair_el1 values used by the kernel are changed, you
- * should prevent resuming from a kernel with incompatible attributes, as these
- * aren't saved/restored.
- */
-#define HIBERNATE_VERSION	KERNEL_VERSION(4, 6, 0)
-
 /* Find a symbols alias in the linear map */
 #define LMADDR(x)	phys_to_virt(virt_to_phys(x))
 
@@ -81,8 +69,7 @@ extern char hibernate_el2_vectors[];
 extern char el2_setup[];
 
 struct arch_hibernate_hdr_invariants {
-	unsigned long	version;
-	unsigned long	tcr_el1;	/* page_size, va bit etc */
+	char		uts_version[__NEW_UTS_LEN + 1];
 };
 
 /* These values need to be know across a hibernate/restore. */
@@ -104,11 +91,8 @@ static struct arch_hibernate_hdr {
 
 static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
 {
-	i->version = HIBERNATE_VERSION;
-	asm volatile("mrs	%0, tcr_el1" : "=r"(i->tcr_el1));
-
-	/* IPS bits vary on big/little systems, mask them out */
-	i->tcr_el1 &= ~TCR_IPS_BITS;
+	memset(i, 0, sizeof(*i));
+	memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
 }
 
 int pfn_is_nosave(unsigned long pfn)
@@ -155,18 +139,9 @@ int arch_hibernation_header_restore(void *addr)
 	struct arch_hibernate_hdr_invariants invariants;
 	struct arch_hibernate_hdr *hdr = addr;
 
-	/*
-	 * If this header is ancient, it may be smaller than we expect.
-	 * Test the version first.
-	 */
-	if (hdr->invariants.version != HIBERNATE_VERSION) {
-		pr_crit("Hibernate image not compatible with this kernel version!\n");
-		return -EINVAL;
-	}
-
 	arch_hdr_invariants(&invariants);
 	if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
-		pr_crit("Hibernate image not compatible with this kernel configuration!\n");
+		pr_crit("Hibernate image not generated by this kernel!\n");
 		return -EINVAL;
 	}
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 16/16] arm64: hibernate: Prevent resume from a different kernel version
  2016-04-01 16:53 ` [PATCH v7 16/16] arm64: hibernate: Prevent resume from a different kernel version James Morse
@ 2016-04-10 12:16   ` Ard Biesheuvel
  2016-04-13 16:35     ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Ard Biesheuvel @ 2016-04-10 12:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 1 April 2016 at 18:53, James Morse <james.morse@arm.com> wrote:
> Resuming using a different kernel version is fragile, while there are
> sufficient details in the hibernate arch-header to perform the restore,
> changes in the boot process can have a long-lasting impact on the system.
> In particular, if the EFI stub causes more runtime services memory to be
> allocated,

Actually, the UEFI stub does not allocate any runtime service code or
data memory, that is all done by the firmware itself before even
invoking the kernel or the stub. The allocations performed by the stub
are for the command line, the UEFI memory map, the kernel image and
optionally an initrd, none if which have any significance to the
firmware itself, so allocating runtime services memory for them does
not make a lot of sense.

So I think that means that the UEFI memory map can simply be restored
from the hibernation image wherever it resided originally. The only
thing that does need to happen is validating that the memory map still
matches our expectations, by comparing the hibernated version to the
memory map supplied by the current boot. Note that this is completely
independent from firmware or kernel versions: anything that may
influence the behavior of the boot loader stage (such as having a USB
thumb drive plugged in) could result in a different memory map when
booting the same firmware version.





> the amount of memory left for linux is reduced. If we are
> lucky, this will cause restore to fail with the message:
>> PM: Image mismatch: memory size
> If we are unlucky, the system will explode sometime later when an EFI
> runtime services call is made.
>
> Prevent resuming with a different kernel build, by storing UTS_VERSION
> in the hibernate header. This also ensures the page size and va bits
> configuration remain the same.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/arm64/kernel/hibernate.c | 35 +++++------------------------------
>  1 file changed, 5 insertions(+), 30 deletions(-)
>
> diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
> index 279c556ee24b..486315249f2a 100644
> --- a/arch/arm64/kernel/hibernate.c
> +++ b/arch/arm64/kernel/hibernate.c
> @@ -20,6 +20,7 @@
>  #include <linux/pm.h>
>  #include <linux/sched.h>
>  #include <linux/suspend.h>
> +#include <linux/utsname.h>
>  #include <linux/version.h>
>
>  #include <asm/barrier.h>
> @@ -42,8 +43,6 @@
>  #define pud_index(x)   0
>  #endif
>
> -#define TCR_IPS_BITS (0x7UL<<32)
> -
>  /*
>   * Hibernate core relies on this value being 0 on resume, and marks it
>   * __nosavedata assuming it will keep the resume kernel's '0' value. This
> @@ -54,17 +53,6 @@
>   */
>  extern int in_suspend;
>
> -/*
> - * This value is written to the hibernate arch header, and prevents resuming
> - * from a hibernate image produced by an incompatible kernel. If you change
> - * a value that isn't saved/restored by hibernate, you should change this value.
> - *
> - * For example, if the mair_el1 values used by the kernel are changed, you
> - * should prevent resuming from a kernel with incompatible attributes, as these
> - * aren't saved/restored.
> - */
> -#define HIBERNATE_VERSION      KERNEL_VERSION(4, 6, 0)
> -
>  /* Find a symbols alias in the linear map */
>  #define LMADDR(x)      phys_to_virt(virt_to_phys(x))
>
> @@ -81,8 +69,7 @@ extern char hibernate_el2_vectors[];
>  extern char el2_setup[];
>
>  struct arch_hibernate_hdr_invariants {
> -       unsigned long   version;
> -       unsigned long   tcr_el1;        /* page_size, va bit etc */
> +       char            uts_version[__NEW_UTS_LEN + 1];
>  };
>
>  /* These values need to be know across a hibernate/restore. */
> @@ -104,11 +91,8 @@ static struct arch_hibernate_hdr {
>
>  static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>  {
> -       i->version = HIBERNATE_VERSION;
> -       asm volatile("mrs       %0, tcr_el1" : "=r"(i->tcr_el1));
> -
> -       /* IPS bits vary on big/little systems, mask them out */
> -       i->tcr_el1 &= ~TCR_IPS_BITS;
> +       memset(i, 0, sizeof(*i));
> +       memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>  }
>
>  int pfn_is_nosave(unsigned long pfn)
> @@ -155,18 +139,9 @@ int arch_hibernation_header_restore(void *addr)
>         struct arch_hibernate_hdr_invariants invariants;
>         struct arch_hibernate_hdr *hdr = addr;
>
> -       /*
> -        * If this header is ancient, it may be smaller than we expect.
> -        * Test the version first.
> -        */
> -       if (hdr->invariants.version != HIBERNATE_VERSION) {
> -               pr_crit("Hibernate image not compatible with this kernel version!\n");
> -               return -EINVAL;
> -       }
> -
>         arch_hdr_invariants(&invariants);
>         if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> -               pr_crit("Hibernate image not compatible with this kernel configuration!\n");
> +               pr_crit("Hibernate image not generated by this kernel!\n");
>                 return -EINVAL;
>         }
>
> --
> 2.8.0.rc3
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
                   ` (15 preceding siblings ...)
  2016-04-01 16:53 ` [PATCH v7 16/16] arm64: hibernate: Prevent resume from a different kernel version James Morse
@ 2016-04-13 16:31 ` James Morse
  2016-04-21 11:33   ` Lorenzo Pieralisi
                     ` (2 more replies)
  16 siblings, 3 replies; 60+ messages in thread
From: James Morse @ 2016-04-13 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

It is important to hibernate/resume on the same CPU, otherwise we may
change the cpu order or restore a big cpu's register state on a little
cpu.

We know cpu 0 is the cpu the firmware booted us on last time, refuse to
hibernate if it has been hotplugged out. Follow x86's example by registering
a pm notifier that is called before processes are frozen and devices are
stopped.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/kernel/hibernate.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 486315249f2a..1ef4bf2207a5 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -17,6 +17,7 @@
 #define pr_fmt(x) "hibernate: " x
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
+#include <linux/notifier.h>
 #include <linux/pm.h>
 #include <linux/sched.h>
 #include <linux/suspend.h>
@@ -476,3 +477,28 @@ int swsusp_arch_resume(void)
 out:
 	return rc;
 }
+
+static int check_boot_cpu_online_pm_callback(struct notifier_block *nb,
+					     unsigned long action, void *ptr)
+{
+	if (action == PM_HIBERNATION_PREPARE &&
+	     cpumask_first(cpu_online_mask) != 0) {
+		pr_warn("CPU0 is offline.\n");
+		return notifier_from_errno(-ENODEV);
+	}
+
+	return NOTIFY_OK;
+}
+
+static int __init check_boot_cpu_online_init(void)
+{
+	/*
+	 * Set this pm_notifier callback with a lower priority than
+	 * cpu_hotplug_pm_callback, so that cpu_hotplug_pm_callback will be
+	 * called earlier to disable cpu hotplug before the cpu online check.
+	 */
+	pm_notifier(check_boot_cpu_online_pm_callback, -INT_MAX);
+
+	return 0;
+}
+core_initcall(check_boot_cpu_online_init);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 16/16] arm64: hibernate: Prevent resume from a different kernel version
  2016-04-10 12:16   ` Ard Biesheuvel
@ 2016-04-13 16:35     ` James Morse
  0 siblings, 0 replies; 60+ messages in thread
From: James Morse @ 2016-04-13 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On 10/04/16 13:16, Ard Biesheuvel wrote:
> On 1 April 2016 at 18:53, James Morse <james.morse@arm.com> wrote:
>> Resuming using a different kernel version is fragile, while there are
>> sufficient details in the hibernate arch-header to perform the restore,
>> changes in the boot process can have a long-lasting impact on the system.
>> In particular, if the EFI stub causes more runtime services memory to be
>> allocated,
> 
> Actually, the UEFI stub does not allocate any runtime service code or
> data memory, that is all done by the firmware itself before even
> invoking the kernel or the stub. The allocations performed by the stub
> are for the command line, the UEFI memory map, the kernel image and
> optionally an initrd, none if which have any significance to the
> firmware itself, so allocating runtime services memory for them does
> not make a lot of sense.

Thanks for putting me straight on this. I got these weird failures when
hibernate/restoring over your kaslr series, I mistakenly believed the addition
of locate_protocol/get_random was causing some extra initialisation. In
hindsight it was probably the kernels new ability to use the memory below the
kernel.


Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  2016-04-01 16:53 ` [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP James Morse
@ 2016-04-18 16:10   ` Catalin Marinas
  2016-04-19  8:58     ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Catalin Marinas @ 2016-04-18 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:25PM +0100, James Morse wrote:
> When the kernel is running at EL2, it doesn't need init_hyp_mode() to
> configure page tables for HYP. This function also registers the CPU
> hotplug and lower power notifiers that cause HYP to be re-initialised
> after the CPU has been reset.
> 
> To avoid losing the register state that controls stage2 translation, move
> the registering of these notifiers into init_subsystems(), and add a
> is_kernel_in_hyp_mode() path to each callback.
> 
> Fixes: 1e947bad0b6 ("arm64: KVM: Skip HYP setup when already running in HYP")
> Signed-off-by: James Morse <james.morse@arm.com>

To the best of my knowledge of the hyp code:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Since it has a Fixes tag, does this need to be picked for 4.6 or a Cc:
stable if it goes in later?

(cc'ing Marc and Christoffer; patch below for reference)

> ---
>  arch/arm/kvm/arm.c | 52 +++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 6accd66d26f0..b5384311dec4 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -1061,15 +1061,27 @@ static void cpu_init_hyp_mode(void *dummy)
>  	kvm_arm_init_debug();
>  }
>  
> +static void cpu_hyp_reinit(void)
> +{
> +	if (is_kernel_in_hyp_mode()) {
> +		/*
> +		 * cpu_init_stage2() is safe to call even if the PM
> +		 * event was cancelled before the CPU was reset.
> +		 */
> +		cpu_init_stage2(NULL);
> +	} else {
> +		if (__hyp_get_vectors() == hyp_default_vectors)
> +			cpu_init_hyp_mode(NULL);
> +	}
> +}
> +
>  static int hyp_init_cpu_notify(struct notifier_block *self,
>  			       unsigned long action, void *cpu)
>  {
>  	switch (action) {
>  	case CPU_STARTING:
>  	case CPU_STARTING_FROZEN:
> -		if (__hyp_get_vectors() == hyp_default_vectors)
> -			cpu_init_hyp_mode(NULL);
> -		break;
> +		cpu_hyp_reinit();
>  	}
>  
>  	return NOTIFY_OK;
> @@ -1084,9 +1096,8 @@ static int hyp_init_cpu_pm_notifier(struct notifier_block *self,
>  				    unsigned long cmd,
>  				    void *v)
>  {
> -	if (cmd == CPU_PM_EXIT &&
> -	    __hyp_get_vectors() == hyp_default_vectors) {
> -		cpu_init_hyp_mode(NULL);
> +	if (cmd == CPU_PM_EXIT) {
> +		cpu_hyp_reinit();
>  		return NOTIFY_OK;
>  	}
>  
> @@ -1128,6 +1139,22 @@ static int init_subsystems(void)
>  	int err;
>  
>  	/*
> +	 * Register CPU Hotplug notifier
> +	 */
> +	cpu_notifier_register_begin();
> +	err = __register_cpu_notifier(&hyp_init_cpu_nb);
> +	cpu_notifier_register_done();
> +	if (err) {
> +		kvm_err("Cannot register KVM init CPU notifier (%d)\n", err);
> +		return err;
> +	}
> +
> +	/*
> +	 * Register CPU lower-power notifier
> +	 */
> +	hyp_cpu_pm_init();
> +
> +	/*
>  	 * Init HYP view of VGIC
>  	 */
>  	err = kvm_vgic_hyp_init();
> @@ -1270,19 +1297,6 @@ static int init_hyp_mode(void)
>  	free_boot_hyp_pgd();
>  #endif
>  
> -	cpu_notifier_register_begin();
> -
> -	err = __register_cpu_notifier(&hyp_init_cpu_nb);
> -
> -	cpu_notifier_register_done();
> -
> -	if (err) {
> -		kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
> -		goto out_err;
> -	}
> -
> -	hyp_cpu_pm_init();
> -
>  	/* set size of VMID supported by CPU */
>  	kvm_vmid_bits = kvm_get_vmid_bits();
>  	kvm_info("%d-bit VMID\n", kvm_vmid_bits);
> -- 
> 2.8.0.rc3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 02/16] arm64: Fold proc-macros.S into assembler.h
  2016-04-01 16:53 ` [PATCH v7 02/16] arm64: Fold proc-macros.S into assembler.h James Morse
@ 2016-04-18 16:11   ` Catalin Marinas
  0 siblings, 0 replies; 60+ messages in thread
From: Catalin Marinas @ 2016-04-18 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:26PM +0100, James Morse wrote:
> From: Geoff Levand <geoff@infradead.org>
> 
> To allow the assembler macros defined in arch/arm64/mm/proc-macros.S to
> be used outside the mm code move the contents of proc-macros.S to
> asm/assembler.h.  Also, delete proc-macros.S, and fix up all references
> to proc-macros.S.
> 
> Signed-off-by: Geoff Levand <geoff@infradead.org>
> Acked-by: Pavel Machek <pavel@ucw.cz>
> [rebased, included dcache_by_line_op]
> Signed-off-by: James Morse <james.morse@arm.com>

The patch makes sense:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 08/16] arm64: kernel: Rework finisher callback out of __cpu_suspend_enter().
  2016-04-01 16:53 ` [PATCH v7 08/16] arm64: kernel: Rework finisher callback out of __cpu_suspend_enter() James Morse
@ 2016-04-18 17:20   ` Catalin Marinas
  0 siblings, 0 replies; 60+ messages in thread
From: Catalin Marinas @ 2016-04-18 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:32PM +0100, James Morse wrote:
> +/*
> + * Memory to save the cpu state is allocated on the stack by
> + * __cpu_suspend_enter()'s caller, and populated by __cpu_suspend_enter().
> + * This data must survive until cpu_resume() is called.
> + *
> + * This struct desribes the size and the layout of the saved cpu state.

s/desribes/describes/

> @@ -93,34 +86,11 @@ ENTRY(__cpu_suspend_enter)
>  	ldp	w5, w6, [x9, #(MPIDR_HASH_SHIFTS + 8)]
>  	compute_mpidr_hash x8, x3, x4, x5, x6, x7, x10
>  	add	x1, x1, x8, lsl #3
> +
> +	push	x29, lr
>  	bl	__cpu_suspend_save
> -	/*
> -	 * Grab suspend finisher in x20 and its argument in x19
> -	 */
> -	mov	x0, x19
> -	mov	x1, x20
> -	/*
> -	 * We are ready for power down, fire off the suspend finisher
> -	 * in x1, with argument in x0
> -	 */
> -	blr	x1
> -        /*
> -	 * Never gets here, unless suspend finisher fails.
> -	 * Successful cpu_suspend should return from cpu_resume, returning
> -	 * through this code path is considered an error
> -	 * If the return value is set to 0 force x0 = -EOPNOTSUPP
> -	 * to make sure a proper error condition is propagated
> -	 */
> -	cmp	x0, #0
> -	mov	x3, #-EOPNOTSUPP
> -	csel	x0, x3, x0, eq
> -	add	sp, sp, #CPU_SUSPEND_SZ	// rewind stack pointer
> -	ldp	x19, x20, [sp, #16]
> -	ldp	x21, x22, [sp, #32]
> -	ldp	x23, x24, [sp, #48]
> -	ldp	x25, x26, [sp, #64]
> -	ldp	x27, x28, [sp, #80]
> -	ldp	x29, lr, [sp], #96
> +	pop	x29, lr
> +	mov	x0, #1

Please do not use the push/pop macros. I was planning to remove them but
there was still some KVM code using them. Now it seems that the code is
gone, so feel free to remove them from assembler.h (maybe as part of the
second patch moving proc-macros.S into assembler.h).

-- 
Catalin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  2016-04-18 16:10   ` Catalin Marinas
@ 2016-04-19  8:58     ` James Morse
  2016-04-19 14:39       ` Marc Zyngier
  0 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-19  8:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin,

On 18/04/16 17:10, Catalin Marinas wrote:
> On Fri, Apr 01, 2016 at 05:53:25PM +0100, James Morse wrote:
>> When the kernel is running at EL2, it doesn't need init_hyp_mode() to
>> configure page tables for HYP. This function also registers the CPU
>> hotplug and lower power notifiers that cause HYP to be re-initialised
>> after the CPU has been reset.
>>
>> To avoid losing the register state that controls stage2 translation, move
>> the registering of these notifiers into init_subsystems(), and add a
>> is_kernel_in_hyp_mode() path to each callback.
>>
>> Fixes: 1e947bad0b6 ("arm64: KVM: Skip HYP setup when already running in HYP")
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> To the best of my knowledge of the hyp code:
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks,

> Since it has a Fixes tag, does this need to be picked for 4.6 or a Cc:
> stable if it goes in later?

This patch was sent as a separate fix for rc1, its here too so that this series
can be applied directly from the list without conflicts after rc2.



James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  2016-04-19  8:58     ` James Morse
@ 2016-04-19 14:39       ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-04-19 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/04/16 09:58, James Morse wrote:
> Hi Catalin,
> 
> On 18/04/16 17:10, Catalin Marinas wrote:
>> On Fri, Apr 01, 2016 at 05:53:25PM +0100, James Morse wrote:
>>> When the kernel is running at EL2, it doesn't need init_hyp_mode() to
>>> configure page tables for HYP. This function also registers the CPU
>>> hotplug and lower power notifiers that cause HYP to be re-initialised
>>> after the CPU has been reset.
>>>
>>> To avoid losing the register state that controls stage2 translation, move
>>> the registering of these notifiers into init_subsystems(), and add a
>>> is_kernel_in_hyp_mode() path to each callback.
>>>
>>> Fixes: 1e947bad0b6 ("arm64: KVM: Skip HYP setup when already running in HYP")
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>
>> To the best of my knowledge of the hyp code:
>>
>> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> Thanks,
> 
>> Since it has a Fixes tag, does this need to be picked for 4.6 or a Cc:
>> stable if it goes in later?
> 
> This patch was sent as a separate fix for rc1, its here too so that this series
> can be applied directly from the list without conflicts after rc2.

For the record, this has been merged in 4.6-rc3.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 03/16] arm64: Cleanup SCTLR flags
  2016-04-01 16:53 ` [PATCH v7 03/16] arm64: Cleanup SCTLR flags James Morse
@ 2016-04-19 14:44   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-04-19 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/04/16 17:53, James Morse wrote:
> From: Geoff Levand <geoff@infradead.org>
> 
> We currently have macros defining flags for the arm64 sctlr registers in
> both kvm_arm.h and sysreg.h.  To clean things up and simplify move the
> definitions of the SCTLR_EL2 flags from kvm_arm.h to sysreg.h, rename any
> SCTLR_EL1 or SCTLR_EL2 flags that are common to both registers to be
> SCTLR_ELx, with 'x' indicating a common flag, and fixup all files to
> include the proper header or to use the new macro names.
> 
> Signed-off-by: Geoff Levand <geoff@infradead.org>
> [Restored pgtable-hwdef.h include]
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file
  2016-04-01 16:53 ` [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file James Morse
@ 2016-04-19 15:02   ` Marc Zyngier
  2016-04-19 15:05     ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Marc Zyngier @ 2016-04-19 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/04/16 17:53, James Morse wrote:
> The hyp-stub could make use of the do_el2_call(), move it to a header file.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/arm64/include/asm/virt.h  | 18 +++++++++++++++++-
>  arch/arm64/kvm/hyp/hyp-entry.S | 17 +----------------
>  2 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index 9f22dd607958..b8fdddeca71b 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -21,7 +21,23 @@
>  #define BOOT_CPU_MODE_EL1	(0xe11)
>  #define BOOT_CPU_MODE_EL2	(0xe12)
>  
> -#ifndef __ASSEMBLY__
> +#ifdef __ASSEMBLY__
> +.macro do_el2_call
> +	/*
> +	 * Shuffle the parameters before calling the function
> +	 * pointed to in x0. Assumes parameters in x[1,2,3].
> +	 */
> +	sub	sp, sp, #16
> +	str	lr, [sp]
> +	mov	lr, x0
> +	mov	x0, x1
> +	mov	x1, x2
> +	mov	x2, x3
> +	blr	lr
> +	ldr	lr, [sp]
> +	add	sp, sp, #16
> +.endm
> +#else

So while I'm not opposed to this macro being reused, the name is
slightly misleading out of the original context. This macro doesn't take
you to EL2, but implements a very peculiar calling convention that only
the HYP code is expecting.

Could we have a disclaimer saying something along the lines of "Don't
you dare using this macro!"?

Otherwise:

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file
  2016-04-19 15:02   ` Marc Zyngier
@ 2016-04-19 15:05     ` James Morse
  2016-04-19 15:10       ` Marc Zyngier
  0 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-19 15:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On 19/04/16 16:02, Marc Zyngier wrote:
> On 01/04/16 17:53, James Morse wrote:
>> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
>> index 9f22dd607958..b8fdddeca71b 100644
>> --- a/arch/arm64/include/asm/virt.h
>> +++ b/arch/arm64/include/asm/virt.h
>> @@ -21,7 +21,23 @@
>>  #define BOOT_CPU_MODE_EL1	(0xe11)
>>  #define BOOT_CPU_MODE_EL2	(0xe12)
>>  
>> -#ifndef __ASSEMBLY__
>> +#ifdef __ASSEMBLY__
>> +.macro do_el2_call
>> +	/*
>> +	 * Shuffle the parameters before calling the function
>> +	 * pointed to in x0. Assumes parameters in x[1,2,3].
>> +	 */
>> +	sub	sp, sp, #16
>> +	str	lr, [sp]
>> +	mov	lr, x0
>> +	mov	x0, x1
>> +	mov	x1, x2
>> +	mov	x2, x3
>> +	blr	lr
>> +	ldr	lr, [sp]
>> +	add	sp, sp, #16
>> +.endm
>> +#else
> 
> So while I'm not opposed to this macro being reused, the name is
> slightly misleading out of the original context. This macro doesn't take
> you to EL2, but implements a very peculiar calling convention that only
> the HYP code is expecting.

Ah, I hadn't thought of it like that!


> Could we have a disclaimer saying something along the lines of "Don't
> you dare using this macro!"?

Or rename it? Something like 'unpack_el2_call'.



Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file
  2016-04-19 15:05     ` James Morse
@ 2016-04-19 15:10       ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-04-19 15:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/04/16 16:05, James Morse wrote:
> Hi Marc,
> 
> On 19/04/16 16:02, Marc Zyngier wrote:
>> On 01/04/16 17:53, James Morse wrote:
>>> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
>>> index 9f22dd607958..b8fdddeca71b 100644
>>> --- a/arch/arm64/include/asm/virt.h
>>> +++ b/arch/arm64/include/asm/virt.h
>>> @@ -21,7 +21,23 @@
>>>  #define BOOT_CPU_MODE_EL1	(0xe11)
>>>  #define BOOT_CPU_MODE_EL2	(0xe12)
>>>  
>>> -#ifndef __ASSEMBLY__
>>> +#ifdef __ASSEMBLY__
>>> +.macro do_el2_call
>>> +	/*
>>> +	 * Shuffle the parameters before calling the function
>>> +	 * pointed to in x0. Assumes parameters in x[1,2,3].
>>> +	 */
>>> +	sub	sp, sp, #16
>>> +	str	lr, [sp]
>>> +	mov	lr, x0
>>> +	mov	x0, x1
>>> +	mov	x1, x2
>>> +	mov	x2, x3
>>> +	blr	lr
>>> +	ldr	lr, [sp]
>>> +	add	sp, sp, #16
>>> +.endm
>>> +#else
>>
>> So while I'm not opposed to this macro being reused, the name is
>> slightly misleading out of the original context. This macro doesn't take
>> you to EL2, but implements a very peculiar calling convention that only
>> the HYP code is expecting.
> 
> Ah, I hadn't thought of it like that!
> 
> 
>> Could we have a disclaimer saying something along the lines of "Don't
>> you dare using this macro!"?
> 
> Or rename it? Something like 'unpack_el2_call'.

Works for me!

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 05/16] arm64: kvm: Move lr save/restore from do_el2_call into EL1
  2016-04-01 16:53 ` [PATCH v7 05/16] arm64: kvm: Move lr save/restore from do_el2_call into EL1 James Morse
@ 2016-04-19 15:11   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-04-19 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/04/16 17:53, James Morse wrote:
> Today the 'hvc' calling kvm or the hyp-stub is expected to preserve all
> registers. Kvm saves/restores the registers it needs on the EL2 stack using
> do_el2_call. The hyp-stub has no stack, later patches need at least one
> register they can use.
> 
> Allow theses calls to clobber the link register, and add code to
> save/restore this register at the call sites.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 06/16] arm64: hyp/kvm: Extend hyp-stub API to allow function calls at EL2
  2016-04-01 16:53 ` [PATCH v7 06/16] arm64: hyp/kvm: Extend hyp-stub API to allow function calls at EL2 James Morse
@ 2016-04-19 15:22   ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-04-19 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/04/16 17:53, James Morse wrote:
> From: Geoff Levand <geoff@infradead.org>
> 
> The existing arm64 hcall implementations are limited in that they only
> allow for two distinct hcalls; with the x0 register either zero or not
> zero.  Also, the API of the hyp-stub exception vector routines and the
> KVM exception vector routines differ; hyp-stub uses a non-zero value in
> x0 to implement __hyp_set_vectors, whereas KVM uses it to implement
> kvm_call_hyp.
> 
> To allow for additional hcalls to be defined and to make the arm64 hcall
> API more consistent across exception vector routines, change the hcall
> implementations to reserve all x0 values below PAGE_SIZE for hcalls such
> as {s,g}et_vectors().

<pedantic>
Given that we already have 3 possible page sizes, defining an API in
terms of PAGE_SIZE is a bit risky. I don't anticipate someone coming up
with a 4097th host hypercall anytime soon, but just to be on the safe
size, let's cap it to 0xfff.
</pedantic>

> 
> Define two new preprocessor macros HVC_GET_VECTORS, and HVC_SET_VECTORS
> to be used as hcall type specifiers and convert the existing
> __hyp_get_vectors() and __hyp_set_vectors() routines to use these new
> macros when executing an HVC call.  Also, change the corresponding
> hyp-stub and KVM el1_sync exception vector routines to use these new
> macros.
> 
> Signed-off-by: Geoff Levand <geoff@infradead.org>
> [Merged two hcall patches, moved immediate value from esr to x0, use lr
>  as a scratch register]
> Signed-off-by: James Morse <james.morse@arm.com>

Otherwise:

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-01 16:53 ` [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug James Morse
@ 2016-04-19 16:03   ` Marc Zyngier
  2016-04-19 17:37     ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Marc Zyngier @ 2016-04-19 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/04/16 17:53, James Morse wrote:
> From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> 
> The current kvm implementation on arm64 does cpu-specific initialization
> at system boot, and has no way to gracefully shutdown a core in terms of
> kvm. This prevents kexec from rebooting the system at EL2.
> 
> This patch adds a cpu tear-down function and also puts an existing cpu-init
> code into a separate function, kvm_arch_hardware_disable() and
> kvm_arch_hardware_enable() respectively.
> We don't need the arm64 specific cpu hotplug hook any more.
> 
> Since this patch modifies common code between arm and arm64, one stub
> definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
> compilation errors.
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> [Rebase, added separate VHE init/exit path, changed resets use of
>  kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
>  added icache maintenance to __kvm_hyp_reset() and removed lr restore]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  10 ++-
>  arch/arm/include/asm/kvm_mmu.h    |   1 +
>  arch/arm/kvm/arm.c                | 128 +++++++++++++++++++++++---------------
>  arch/arm/kvm/mmu.c                |   5 ++
>  arch/arm64/include/asm/kvm_asm.h  |   1 +
>  arch/arm64/include/asm/kvm_host.h |  13 +++-
>  arch/arm64/include/asm/kvm_mmu.h  |   1 +
>  arch/arm64/kvm/hyp-init.S         |  38 +++++++++++
>  arch/arm64/kvm/reset.c            |  14 +++++
>  9 files changed, 158 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 385070180c25..738d5eee91de 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -265,6 +265,15 @@ static inline void __cpu_init_stage2(void)
>  	kvm_call_hyp(__init_stage2_translation);
>  }
>  
> +static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
> +					phys_addr_t phys_idmap_start)
> +{
> +	/*
> +	 * TODO
> +	 * kvm_call_reset(boot_pgd_ptr, phys_idmap_start);
> +	 */
> +}
> +
>  static inline int kvm_arch_dev_ioctl_check_extension(long ext)
>  {
>  	return 0;
> @@ -277,7 +286,6 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
> -static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index da44be9db4fa..f17a8d41822c 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -66,6 +66,7 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
>  phys_addr_t kvm_mmu_get_httbr(void);
>  phys_addr_t kvm_mmu_get_boot_httbr(void);
>  phys_addr_t kvm_get_idmap_vector(void);
> +phys_addr_t kvm_get_idmap_start(void);
>  int kvm_mmu_init(void);
>  void kvm_clear_hyp_idmap(void);
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index b5384311dec4..962904a443be 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -16,7 +16,6 @@
>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>   */
>  
> -#include <linux/cpu.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/errno.h>
>  #include <linux/err.h>
> @@ -66,6 +65,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);
>  
>  static bool vgic_present;
>  
> +static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
> +
>  static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	BUG_ON(preemptible());
> @@ -90,11 +91,6 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
>  	return &kvm_arm_running_vcpu;
>  }
>  
> -int kvm_arch_hardware_enable(void)
> -{
> -	return 0;
> -}
> -
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>  {
>  	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> @@ -591,7 +587,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		/*
>  		 * Re-check atomic conditions
>  		 */
> -		if (signal_pending(current)) {
> +		if (unlikely(!__this_cpu_read(kvm_arm_hardware_enabled))) {
> +			/* cpu has been torn down */
> +			ret = 0;
> +			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> +			run->fail_entry.hardware_entry_failure_reason
> +					= (u64)-ENOEXEC;

This hunk makes me feel a bit uneasy. Having to check something that
critical on the entry path is at least a bit weird. If we've reset EL2
already, it means that we must have forced an exit on the guest to do so.

So why do we hand the control back to KVM (or anything else) once we've
nuked a CPU? I'd expect it to be put on some back-burner, never to
return in this lifetime...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-19 16:03   ` Marc Zyngier
@ 2016-04-19 17:37     ` James Morse
  2016-04-20 10:29       ` AKASHI Takahiro
  2016-04-20 10:37       ` Marc Zyngier
  0 siblings, 2 replies; 60+ messages in thread
From: James Morse @ 2016-04-19 17:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc, Takahiro,

On 19/04/16 17:03, Marc Zyngier wrote:
> On 01/04/16 17:53, James Morse wrote:
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index b5384311dec4..962904a443be 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -591,7 +587,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  		/*
>>  		 * Re-check atomic conditions
>>  		 */
>> -		if (signal_pending(current)) {
>> +		if (unlikely(!__this_cpu_read(kvm_arm_hardware_enabled))) {
>> +			/* cpu has been torn down */
>> +			ret = 0;
>> +			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
>> +			run->fail_entry.hardware_entry_failure_reason
>> +					= (u64)-ENOEXEC;
> 
> This hunk makes me feel a bit uneasy. Having to check something that
> critical on the entry path is at least a bit weird. If we've reset EL2
> already, it means that we must have forced an exit on the guest to do so.

(To save anyone else digging: the story comes from v12 of the kexec copy of this
patch [0])


> So why do we hand the control back to KVM (or anything else) once we've
> nuked a CPU? I'd expect it to be put on some back-burner, never to
> return in this lifetime...

This looks like the normal reboot code being called in the middle of a running
system. Kexec calls kernel_restart_prepare(), which kicks each reboot notifier,
one of which is kvm_reboot(), which calls:
> on_each_cpu(hardware_disable_nolock, NULL, 1);

We have to give the CPU back as there may be other reboot notifiers, and kexec
hasn't yet shuffled onto the boot cpu.


How about moving this check into handle_exit()[1]?
Currently this sees ARM_EXCEPTION_IRQ, and tries to re-enter the guest, we can
test for kvm_rebooting, which is set by kvm's reboot notifier .... but this
would still break if another vcpu wakes from cond_resched() and sprints towards
__kvm_vcpu_run(). Can we move cond_resched() to immediately before handle_exit()?

I can't see a reason why this doesn't happen on the normal reboot path,
presumably we rely on user space to kill any running guests.



It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
> 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
> * Hardware virtualization extension instructions may fault if a
> * reboot turns off virtualization while processes are running.
> * Trap the fault and ignore the instruction if that happens.


Takahiro, any ideas/wisdom on this?


Thanks,

James

[0] http://lists.infradead.org/pipermail/kexec/2015-December/014953.html
[1] Untested(!) alternative.
====================================================
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0e63047a9530..dfa3cc42ec89 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -562,11 +562,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct k
vm_run *run)
        ret = 1;
        run->exit_reason = KVM_EXIT_UNKNOWN;
        while (ret > 0) {
-               /*
-                * Check conditions before entering the guest
-                */
-               cond_resched();
-
                update_vttbr(vcpu->kvm);

                if (vcpu->arch.power_off || vcpu->arch.pause)
@@ -662,6 +657,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kv
m_run *run)

                preempt_enable();

+               cond_resched();
+
                ret = handle_exit(vcpu, run, ret);
        }

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index eba89e42f0ed..cc562d9ff905 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -170,6 +170,12 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 {
        exit_handle_fn exit_handler;

+       if (kvm_rebooting) {
+               run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+               run->fail_entry.hardware_entry_failure_reason = (u64)-ENOEXEC;
+               return 0;
+       }
+
        switch (exception_index) {
        case ARM_EXCEPTION_IRQ:
                return 1;
====================================================

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-19 17:37     ` James Morse
@ 2016-04-20 10:29       ` AKASHI Takahiro
  2016-04-20 11:19         ` James Morse
  2016-04-20 10:37       ` Marc Zyngier
  1 sibling, 1 reply; 60+ messages in thread
From: AKASHI Takahiro @ 2016-04-20 10:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 19, 2016 at 06:37:13PM +0100, James Morse wrote:
> Hi Marc, Takahiro,
> 
> On 19/04/16 17:03, Marc Zyngier wrote:
> > On 01/04/16 17:53, James Morse wrote:
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index b5384311dec4..962904a443be 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -591,7 +587,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>  		/*
> >>  		 * Re-check atomic conditions
> >>  		 */
> >> -		if (signal_pending(current)) {
> >> +		if (unlikely(!__this_cpu_read(kvm_arm_hardware_enabled))) {
> >> +			/* cpu has been torn down */
> >> +			ret = 0;
> >> +			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> >> +			run->fail_entry.hardware_entry_failure_reason
> >> +					= (u64)-ENOEXEC;
> > 
> > This hunk makes me feel a bit uneasy. Having to check something that
> > critical on the entry path is at least a bit weird. If we've reset EL2
> > already, it means that we must have forced an exit on the guest to do so.
> 
> (To save anyone else digging: the story comes from v12 of the kexec copy of this
> patch [0])
> 
> 
> > So why do we hand the control back to KVM (or anything else) once we've
> > nuked a CPU? I'd expect it to be put on some back-burner, never to
> > return in this lifetime...
> 
> This looks like the normal reboot code being called in the middle of a running
> system. Kexec calls kernel_restart_prepare(), which kicks each reboot notifier,
> one of which is kvm_reboot(), which calls:
> > on_each_cpu(hardware_disable_nolock, NULL, 1);
> 
> We have to give the CPU back as there may be other reboot notifiers, and kexec
> hasn't yet shuffled onto the boot cpu.

Right, and this kvm reboot notifier can be executed via IPI at any time
while interrupted is enabled, and so the check must be done between
local_irq_disable() and local_irq_enable().

> How about moving this check into handle_exit()[1]?
> Currently this sees ARM_EXCEPTION_IRQ, and tries to re-enter the guest, we can
> test for kvm_rebooting, which is set by kvm's reboot notifier .... but this
> would still break if another vcpu wakes from cond_resched() and sprints towards
> __kvm_vcpu_run(). Can we move cond_resched() to immediately before handle_exit()?

I don't think that it would break as reboot process is *one-directional*,
and any cpu should be torn down sooner or later.

I'm not quite sure about Marc's point, but if he has concern on overhead
of checking per-cpu kvm_arm_hardware_enabled, we may, instead, check on
kvm_rebooting.
(And this check won't make sense for VHE-enabled platform.)

Thanks,
-Takahiro AKASHI

> I can't see a reason why this doesn't happen on the normal reboot path,
> presumably we rely on user space to kill any running guests.
> 
> 
> 
> It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
> > 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
> Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
> > * Hardware virtualization extension instructions may fault if a
> > * reboot turns off virtualization while processes are running.
> > * Trap the fault and ignore the instruction if that happens.
> 
> 
> Takahiro, any ideas/wisdom on this?
> 
> 
> Thanks,
> 
> James
> 
> [0] http://lists.infradead.org/pipermail/kexec/2015-December/014953.html
> [1] Untested(!) alternative.
> ====================================================
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 0e63047a9530..dfa3cc42ec89 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -562,11 +562,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct k
> vm_run *run)
>         ret = 1;
>         run->exit_reason = KVM_EXIT_UNKNOWN;
>         while (ret > 0) {
> -               /*
> -                * Check conditions before entering the guest
> -                */
> -               cond_resched();
> -
>                 update_vttbr(vcpu->kvm);
> 
>                 if (vcpu->arch.power_off || vcpu->arch.pause)
> @@ -662,6 +657,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kv
> m_run *run)
> 
>                 preempt_enable();
> 
> +               cond_resched();
> +
>                 ret = handle_exit(vcpu, run, ret);
>         }
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index eba89e42f0ed..cc562d9ff905 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -170,6 +170,12 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  {
>         exit_handle_fn exit_handler;
> 
> +       if (kvm_rebooting) {
> +               run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +               run->fail_entry.hardware_entry_failure_reason = (u64)-ENOEXEC;
> +               return 0;
> +       }
> +
>         switch (exception_index) {
>         case ARM_EXCEPTION_IRQ:
>                 return 1;
> ====================================================
> 
> 

-- 
Thanks,
-Takahiro AKASHI

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-19 17:37     ` James Morse
  2016-04-20 10:29       ` AKASHI Takahiro
@ 2016-04-20 10:37       ` Marc Zyngier
  2016-04-20 11:19         ` James Morse
  1 sibling, 1 reply; 60+ messages in thread
From: Marc Zyngier @ 2016-04-20 10:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/04/16 18:37, James Morse wrote:
> Hi Marc, Takahiro,
> 

[...]

> It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
>> 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
> Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
>> * Hardware virtualization extension instructions may fault if a
>> * reboot turns off virtualization while processes are running.
>> * Trap the fault and ignore the instruction if that happens.

I very much like that approach, to be honest. Tearing down a CPU is
something exceptional, so let's make it an actual exception.

It is now pretty easy to discriminate between KVM functions and stub
functions thanks to your earlier patch, so if we end up calling the
hyp-stub because we've torn down KVM's EL2, let's just return an
appropriate error code (ARM_EXCEPTION_HYP_GONE), and handle it at EL1.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-20 10:29       ` AKASHI Takahiro
@ 2016-04-20 11:19         ` James Morse
  0 siblings, 0 replies; 60+ messages in thread
From: James Morse @ 2016-04-20 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 20/04/16 11:29, AKASHI Takahiro wrote:
> On Tue, Apr 19, 2016 at 06:37:13PM +0100, James Morse wrote:
>> On 19/04/16 17:03, Marc Zyngier wrote:
>>> On 01/04/16 17:53, James Morse wrote:
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index b5384311dec4..962904a443be 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -591,7 +587,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>  		/*
>>>>  		 * Re-check atomic conditions
>>>>  		 */
>>>> -		if (signal_pending(current)) {
>>>> +		if (unlikely(!__this_cpu_read(kvm_arm_hardware_enabled))) {
>>>> +			/* cpu has been torn down */
>>>> +			ret = 0;
>>>> +			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
>>>> +			run->fail_entry.hardware_entry_failure_reason
>>>> +					= (u64)-ENOEXEC;
>>>
>>> This hunk makes me feel a bit uneasy. Having to check something that
>>> critical on the entry path is at least a bit weird. If we've reset EL2
>>> already, it means that we must have forced an exit on the guest to do so.
>>
>> (To save anyone else digging: the story comes from v12 of the kexec copy of this
>> patch [0])
>>
>>
>>> So why do we hand the control back to KVM (or anything else) once we've
>>> nuked a CPU? I'd expect it to be put on some back-burner, never to
>>> return in this lifetime...
>>
>> This looks like the normal reboot code being called in the middle of a running
>> system. Kexec calls kernel_restart_prepare(), which kicks each reboot notifier,
>> one of which is kvm_reboot(), which calls:
>>> on_each_cpu(hardware_disable_nolock, NULL, 1);
>>
>> We have to give the CPU back as there may be other reboot notifiers, and kexec
>> hasn't yet shuffled onto the boot cpu.
> 
> Right, and this kvm reboot notifier can be executed via IPI at any time
> while interrupted is enabled, and so the check must be done between
> local_irq_disable() and local_irq_enable().

Good point, this makes it really nasty!


>> How about moving this check into handle_exit()[1]?
>> Currently this sees ARM_EXCEPTION_IRQ, and tries to re-enter the guest, we can
>> test for kvm_rebooting, which is set by kvm's reboot notifier .... but this
>> would still break if another vcpu wakes from cond_resched() and sprints towards
>> __kvm_vcpu_run(). Can we move cond_resched() to immediately before handle_exit()?
> 
> I don't think that it would break as reboot process is *one-directional*,
> and any cpu should be torn down sooner or later.

I think we can schedule quite a few vcpus in the meantime though:
The CPU that called kvm_reboot() will wait until each cpu has returned from
hardware_disable_nolock(), eventually it will call disable_non_boot_cpus(), in
the meantime the scheduler is free to pick threads to run.


> I'm not quite sure about Marc's point, but if he has concern on overhead
> of checking per-cpu kvm_arm_hardware_enabled, we may, instead, check on
> kvm_rebooting.

> (And this check won't make sense for VHE-enabled platform.)

Gah, VHE. I think Marc's suggestion to make it an exception returned from the
hyp-stub is best. I need to switch over to kexec to test it though....


Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-20 10:37       ` Marc Zyngier
@ 2016-04-20 11:19         ` James Morse
  2016-04-20 11:46           ` Marc Zyngier
  2016-04-25  8:41           ` AKASHI Takahiro
  0 siblings, 2 replies; 60+ messages in thread
From: James Morse @ 2016-04-20 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On 20/04/16 11:37, Marc Zyngier wrote:
> On 19/04/16 18:37, James Morse wrote:
>> It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
>>> 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
>> Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
>>> * Hardware virtualization extension instructions may fault if a
>>> * reboot turns off virtualization while processes are running.
>>> * Trap the fault and ignore the instruction if that happens.
> 
> I very much like that approach, to be honest. Tearing down a CPU is
> something exceptional, so let's make it an actual exception.
>
> It is now pretty easy to discriminate between KVM functions and stub
> functions thanks to your earlier patch, so if we end up calling the
> hyp-stub because we've torn down KVM's EL2, let's just return an
> appropriate error code (ARM_EXCEPTION_HYP_GONE), and handle it at EL1.

Okay. kexec uses kvm_call_hyp() against the hyp-stub to do the kernel-copy and
hand over to purgatory, but we could change that to a new 'special' builtin
call, something like HVC_KEXEC_CALL_HYP. It never calls it with kvm loaded, so
there is no reason the calls have to be same.

Given hibernate doesn't hit this issue, I will drop this hunk from this version
of the patch, and repost hibernate incorporating the feedback so far. I will
provide a patch for kexec to do the above.



Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-20 11:19         ` James Morse
@ 2016-04-20 11:46           ` Marc Zyngier
  2016-04-25  8:41           ` AKASHI Takahiro
  1 sibling, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-04-20 11:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/04/16 12:19, James Morse wrote:
> Hi Marc,
> 
> On 20/04/16 11:37, Marc Zyngier wrote:
>> On 19/04/16 18:37, James Morse wrote:
>>> It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
>>>> 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
>>> Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
>>>> * Hardware virtualization extension instructions may fault if a
>>>> * reboot turns off virtualization while processes are running.
>>>> * Trap the fault and ignore the instruction if that happens.
>>
>> I very much like that approach, to be honest. Tearing down a CPU is
>> something exceptional, so let's make it an actual exception.
>>
>> It is now pretty easy to discriminate between KVM functions and stub
>> functions thanks to your earlier patch, so if we end up calling the
>> hyp-stub because we've torn down KVM's EL2, let's just return an
>> appropriate error code (ARM_EXCEPTION_HYP_GONE), and handle it at EL1.
> 
> Okay. kexec uses kvm_call_hyp() against the hyp-stub to do the kernel-copy and
> hand over to purgatory, but we could change that to a new 'special' builtin
> call, something like HVC_KEXEC_CALL_HYP. It never calls it with kvm loaded, so
> there is no reason the calls have to be same.
> 
> Given hibernate doesn't hit this issue, I will drop this hunk from this version
> of the patch, and repost hibernate incorporating the feedback so far. I will
> provide a patch for kexec to do the above.

Awesome. Thanks James.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 09/16] arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va
  2016-04-01 16:53 ` [PATCH v7 09/16] arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va James Morse
@ 2016-04-20 16:24   ` Catalin Marinas
  0 siblings, 0 replies; 60+ messages in thread
From: Catalin Marinas @ 2016-04-20 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:33PM +0100, James Morse wrote:
>  ENTRY(cpu_resume)
>  	bl	el2_setup		// if in EL2 drop to EL1 cleanly
> +	/* enable the MMU early - so we can access sleep_save_stash by va */
> +	adr_l	lr, __enable_mmu	/* __cpu_setup will return here */
> +	ldr	x27, =_cpu_resume	/* __enable_mmu will branch here */
> +	adrp	x25, idmap_pg_dir
> +	adrp	x26, swapper_pg_dir
> +	b	__cpu_setup

You need an ENDPROC(cpu_resume) here.

> +
> +ENTRY(_cpu_resume)
>  	mrs	x1, mpidr_el1
>  	adrp	x8, mpidr_hash
>  	add x8, x8, #:lo12:mpidr_hash // x8 = struct mpidr_hash phys address
> @@ -134,29 +116,32 @@ ENTRY(cpu_resume)
>  	ldp	w5, w6, [x8, #(MPIDR_HASH_SHIFTS + 8)]
>  	compute_mpidr_hash x7, x3, x4, x5, x6, x1, x2
>          /* x7 contains hash index, let's use it to grab context pointer */
> -	ldr_l	x0, sleep_save_sp + SLEEP_SAVE_SP_PHYS
> +	ldr_l	x0, sleep_save_stash
>  	ldr	x0, [x0, x7, lsl #3]
>  	add	x29, x0, #SLEEP_STACK_DATA_CALLEE_REGS
>  	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
>  	/* load sp from context */
>  	ldr	x2, [x0, #CPU_CTX_SP]
> -	/* load physical address of identity map page table in x1 */
> -	adrp	x1, idmap_pg_dir
>  	mov	sp, x2
>  	/* save thread_info */
>  	and	x2, x2, #~(THREAD_SIZE - 1)
>  	msr	sp_el0, x2
>  	/*
> -	 * cpu_do_resume expects x0 to contain context physical address
> -	 * pointer and x1 to contain physical address of 1:1 page tables
> +	 * cpu_do_resume expects x0 to contain context address pointer
>  	 */
> -	bl	cpu_do_resume		// PC relative jump, MMU off
> -	/* Can't access these by physical address once the MMU is on */
> +	bl	cpu_do_resume
> +
> +#ifdef CONFIG_KASAN
> +	mov	x0, sp
> +	bl	kasan_unpoison_remaining_stack
> +#endif
> +
>  	ldp	x19, x20, [x29, #16]
>  	ldp	x21, x22, [x29, #32]
>  	ldp	x23, x24, [x29, #48]
>  	ldp	x25, x26, [x29, #64]
>  	ldp	x27, x28, [x29, #80]
>  	ldp	x29, lr, [x29]
> -	b	cpu_resume_mmu		// Resume MMU, never returns
> +	mov	x0, #0
> +	ret
>  ENDPROC(cpu_resume)

and ENDPROC(_cpu_resume) here.

Otherwise it looks fine.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 10/16] arm64: kernel: Include _AC definition in page.h
  2016-04-01 16:53 ` [PATCH v7 10/16] arm64: kernel: Include _AC definition in page.h James Morse
@ 2016-04-20 16:25   ` Catalin Marinas
  0 siblings, 0 replies; 60+ messages in thread
From: Catalin Marinas @ 2016-04-20 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:34PM +0100, James Morse wrote:
> page.h uses '_AC' in the definition of PAGE_SIZE, but doesn't include
> linux/const.h where this is defined. This produces build warnings when only
> asm/page.h is included by asm code.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 11/16] arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
  2016-04-01 16:53 ` [PATCH v7 11/16] arm64: Promote KERNEL_START/KERNEL_END definitions to a header file James Morse
@ 2016-04-20 16:26   ` Catalin Marinas
  0 siblings, 0 replies; 60+ messages in thread
From: Catalin Marinas @ 2016-04-20 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:35PM +0100, James Morse wrote:
> KERNEL_START and KERNEL_END are useful outside head.S, move them to a
> header file.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 12/16] arm64: Add new asm macro copy_page
  2016-04-01 16:53 ` [PATCH v7 12/16] arm64: Add new asm macro copy_page James Morse
@ 2016-04-20 16:38   ` Catalin Marinas
  2016-04-20 16:56     ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Catalin Marinas @ 2016-04-20 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:36PM +0100, James Morse wrote:
> From: Geoff Levand <geoff@infradead.org>
> 
> Kexec and hibernate need to copy pages of memory, but may not have all
> of the kernel mapped, and are unable to call copy_page().
> 
> Add a simplistic copy_page() macro, that can be inlined in these
> situations. lib/copy_page.S provides a bigger better version, but
> uses more registers.
> 
> Signed-off-by: Geoff Levand <geoff@infradead.org>
> [Changed asm label to 9998, added commit message]
> Signed-off-by: James Morse <james.morse@arm.com>

Good enough for the brief usage:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Would there be another user of this macro in the future (kexec)?

-- 
Catalin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 12/16] arm64: Add new asm macro copy_page
  2016-04-20 16:38   ` Catalin Marinas
@ 2016-04-20 16:56     ` James Morse
  0 siblings, 0 replies; 60+ messages in thread
From: James Morse @ 2016-04-20 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/04/16 17:38, Catalin Marinas wrote:
> On Fri, Apr 01, 2016 at 05:53:36PM +0100, James Morse wrote:
>> From: Geoff Levand <geoff@infradead.org>
>>
>> Kexec and hibernate need to copy pages of memory, but may not have all
>> of the kernel mapped, and are unable to call copy_page().
>>
>> Add a simplistic copy_page() macro, that can be inlined in these
>> situations. lib/copy_page.S provides a bigger better version, but
>> uses more registers.
>>
>> Signed-off-by: Geoff Levand <geoff@infradead.org>
>> [Changed asm label to 9998, added commit message]
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Good enough for the brief usage:
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks,

> Would there be another user of this macro in the future (kexec)?

Yes, this is one of the patches common to the two. Adding this macro was some
review feedback from kexec v12 [0].


Thanks,
James

[0] http://lists.infradead.org/pipermail/kexec/2015-December/014959.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument
  2016-04-01 16:53 ` [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument James Morse
@ 2016-04-20 17:12   ` Catalin Marinas
  2016-04-20 17:35     ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Catalin Marinas @ 2016-04-20 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:37PM +0100, James Morse wrote:
> el2_setup() doesn't just configure el2, it configures el1 too. This
> means we can't use it to re-configure el2 after resume from hibernate,
> as we will be returned to el1 with the MMU turned off.
> 
> Split the sctlr_el1 setting code up, so that el2_setup() accepts an
> initial value as an argument. This value will be ignored if el2_setup()
> is called at el1: the running value will be preserved with endian
> correction.
> 
> Hibernate can now call el2_setup() to re-configure el2, passing the
> current sctlr_el1 as an argument.

I don't fully understand the description. The first paragraph says we
can't use el2_setup to re-configure el2 after resume from hibernate
while the last paragraph says that we can.

> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index a04fd7a9c102..627d66efec68 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -311,6 +311,16 @@ lr	.req	x30		// link register
>  	.endm
>  
>  /*
> + * Generate the initial sctlr_el1 value for el2_setup to set if we boot at EL2.
> + */
> +	.macro init_sctlr_el1 reg
> +		mov	\reg, #0x0800		// Set/clear RES{1,0} bits
> +CPU_BE(		movk	\reg, #0x33d0, lsl #16)	// Set EE and E0E on BE systems
> +CPU_LE(		movk	\reg, #0x30d0, lsl #16)	// Clear EE and E0E on LE systems
> +	.endm

I can see, at least in this patch, that all places where this macro is
invoked are immediately followed by a call to el2_setup. Can we not just
place this at the beginning of el2_setup?

> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 1217262b5210..097986152203 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -208,6 +208,7 @@ section_table:
>  
>  ENTRY(stext)
>  	bl	preserve_boot_args
> +	init_sctlr_el1	x0
>  	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
>  	mov	x23, xzr			// KASLR offset, defaults to 0
>  	adrp	x24, __PHYS_OFFSET
> @@ -514,8 +515,12 @@ ENTRY(kimage_vaddr)
>   *
>   * Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in x20 if
>   * booted in EL1 or EL2 respectively.
> + *
> + * If booted in EL2, SCTLR_EL1 will be initialised with the value in x0
> + * (otherwise the existing value will be preserved, with endian correction).
>   */
>  ENTRY(el2_setup)
> +	mov	x1, x0				// preserve passed-in sctlr_el1
>  	mrs	x0, CurrentEL
>  	cmp	x0, #CurrentEL_EL2
>  	b.ne	1f
> @@ -524,7 +529,7 @@ CPU_BE(	orr	x0, x0, #(1 << 25)	)	// Set the EE bit for EL2
>  CPU_LE(	bic	x0, x0, #(1 << 25)	)	// Clear the EE bit for EL2
>  	msr	sctlr_el2, x0
>  	b	2f
> -1:	mrs	x0, sctlr_el1
> +1:	mrs	x0, sctlr_el1			// ignore passed-in sctlr_el1
>  CPU_BE(	orr	x0, x0, #(3 << 24)	)	// Set the EE and E0E bits for EL1
>  CPU_LE(	bic	x0, x0, #(3 << 24)	)	// Clear the EE and E0E bits for EL1
>  	msr	sctlr_el1, x0

BTW, I wonder whether we need this read-modify-write sequence at all
rather than always setting a pre-set sane value (as per the
init_sctlr_el1 macro). This way we can always do this at the beginning
of el2_setup independent of whether we run in EL1 or EL2.

> @@ -578,6 +583,10 @@ set_hcr:
>  
>  3:
>  #endif
> +	/* use sctlr_el1 value we were provided with */
> +CPU_BE(	orr	x1, x1, #(3 << 24)	)	// Set the EE and E0E bits for EL1
> +CPU_LE(	bic	x1, x1, #(3 << 24)	)	// Clear the EE and E0E bits for EL1
> +	msr	sctlr_el1, x1

I don't think we need this here, the value passed already has the EE/E0E
bits set accordingly by the init_sctlr_el1 macro.

-- 
Catalin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 14/16] PM / Hibernate: Call flush_icache_range() on pages restored in-place
  2016-04-01 16:53 ` [PATCH v7 14/16] PM / Hibernate: Call flush_icache_range() on pages restored in-place James Morse
@ 2016-04-20 17:16   ` Catalin Marinas
  0 siblings, 0 replies; 60+ messages in thread
From: Catalin Marinas @ 2016-04-20 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 01, 2016 at 05:53:38PM +0100, James Morse wrote:
> Some architectures require code written to memory as if it were data to be
> 'cleaned' from any data caches before the processor can fetch them as new
> instructions.
> 
> During resume from hibernate, the snapshot code copies some pages directly,
> meaning these architectures do not get a chance to perform their cache
> maintenance. Modify the read and decompress code to call
> flush_icache_range() on all pages that are restored, so that the restored
> in-place pages are guaranteed to be executable on these architectures.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Acked-by: Pavel Machek <pavel@ucw.cz>
> Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument
  2016-04-20 17:12   ` Catalin Marinas
@ 2016-04-20 17:35     ` James Morse
  2016-04-22 10:36       ` Catalin Marinas
  0 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-20 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin,

On 20/04/16 18:12, Catalin Marinas wrote:
> On Fri, Apr 01, 2016 at 05:53:37PM +0100, James Morse wrote:
>> el2_setup() doesn't just configure el2, it configures el1 too. This
>> means we can't use it to re-configure el2 after resume from hibernate,
>> as we will be returned to el1 with the MMU turned off.
>>
>> Split the sctlr_el1 setting code up, so that el2_setup() accepts an
>> initial value as an argument. This value will be ignored if el2_setup()
>> is called at el1: the running value will be preserved with endian
>> correction.
>>
>> Hibernate can now call el2_setup() to re-configure el2, passing the
>> current sctlr_el1 as an argument.
> 
> I don't fully understand the description. The first paragraph says we
> can't use el2_setup to re-configure el2 after resume from hibernate
> while the last paragraph says that we can.

Then I need to rephrase the middle paragraph. Is this any clearer:
This happens because el2_setup() generates a sctlr_el1 value needed for early
boot, (mmu off etc). Change el2_setup() to accept this as an argument, so that
hibernate can pass the current sctlr_el1 (with the mmu on), to el2_setup(), and
have it re-configure el2 without changing el1.

The motivation for this patch was resuming with a different kernel version,
which may have left el2 in an incompatible state. Running the original
el2_setup() was the only guaranteed way to know everything was set correctly.
If we definitely don't want to support this, then this patch can go, and the
minimum el2-reset code can be put into hibernate-asm.S.


>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index a04fd7a9c102..627d66efec68 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -311,6 +311,16 @@ lr	.req	x30		// link register
>>  	.endm
>>  
>>  /*
>> + * Generate the initial sctlr_el1 value for el2_setup to set if we boot at EL2.
>> + */
>> +	.macro init_sctlr_el1 reg
>> +		mov	\reg, #0x0800		// Set/clear RES{1,0} bits
>> +CPU_BE(		movk	\reg, #0x33d0, lsl #16)	// Set EE and E0E on BE systems
>> +CPU_LE(		movk	\reg, #0x30d0, lsl #16)	// Clear EE and E0E on LE systems
>> +	.endm
> 
> I can see, at least in this patch, that all places where this macro is
> invoked are immediately followed by a call to el2_setup. Can we not just
> place this at the beginning of el2_setup?

Hibernate's resume path is added as a later caller, it doesn't expect to be
returned to with the MMU off.


>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 1217262b5210..097986152203 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -208,6 +208,7 @@ section_table:
>>  
>>  ENTRY(stext)
>>  	bl	preserve_boot_args
>> +	init_sctlr_el1	x0
>>  	bl	el2_setup			// Drop to EL1, w20=cpu_boot_mode
>>  	mov	x23, xzr			// KASLR offset, defaults to 0
>>  	adrp	x24, __PHYS_OFFSET
>> @@ -514,8 +515,12 @@ ENTRY(kimage_vaddr)
>>   *
>>   * Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in x20 if
>>   * booted in EL1 or EL2 respectively.
>> + *
>> + * If booted in EL2, SCTLR_EL1 will be initialised with the value in x0
>> + * (otherwise the existing value will be preserved, with endian correction).
>>   */
>>  ENTRY(el2_setup)
>> +	mov	x1, x0				// preserve passed-in sctlr_el1
>>  	mrs	x0, CurrentEL
>>  	cmp	x0, #CurrentEL_EL2
>>  	b.ne	1f
>> @@ -524,7 +529,7 @@ CPU_BE(	orr	x0, x0, #(1 << 25)	)	// Set the EE bit for EL2
>>  CPU_LE(	bic	x0, x0, #(1 << 25)	)	// Clear the EE bit for EL2
>>  	msr	sctlr_el2, x0
>>  	b	2f
>> -1:	mrs	x0, sctlr_el1
>> +1:	mrs	x0, sctlr_el1			// ignore passed-in sctlr_el1
>>  CPU_BE(	orr	x0, x0, #(3 << 24)	)	// Set the EE and E0E bits for EL1
>>  CPU_LE(	bic	x0, x0, #(3 << 24)	)	// Clear the EE and E0E bits for EL1
>>  	msr	sctlr_el1, x0
> 
> BTW, I wonder whether we need this read-modify-write sequence at all
> rather than always setting a pre-set sane value (as per the
> init_sctlr_el1 macro). This way we can always do this at the beginning
> of el2_setup independent of whether we run in EL1 or EL2.

In this case el2_setup is preserving the existing sctlr_el1 value because it was
run at el1, it just changes the EE/EOE bits. I guess it's being clever in not
overwriting it with the value it uses later (if run at el2), presumably the boot
loader can leave some bit set that we don't want to lose.


> 
>> @@ -578,6 +583,10 @@ set_hcr:
>>  
>>  3:
>>  #endif
>> +	/* use sctlr_el1 value we were provided with */
>> +CPU_BE(	orr	x1, x1, #(3 << 24)	)	// Set the EE and E0E bits for EL1
>> +CPU_LE(	bic	x1, x1, #(3 << 24)	)	// Clear the EE and E0E bits for EL1
>> +	msr	sctlr_el1, x1
> 
> I don't think we need this here, the value passed already has the EE/E0E
> bits set accordingly by the init_sctlr_el1 macro.

I've removed this one locally...


Thanks!

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-13 16:31 ` [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline James Morse
@ 2016-04-21 11:33   ` Lorenzo Pieralisi
  2016-04-21 11:44   ` Mark Rutland
  2016-04-22 10:41   ` Catalin Marinas
  2 siblings, 0 replies; 60+ messages in thread
From: Lorenzo Pieralisi @ 2016-04-21 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
> It is important to hibernate/resume on the same CPU, otherwise we may
> change the cpu order or restore a big cpu's register state on a little
> cpu.

I think the problem is that we would end up having no context to
resume to altogether given how cpu_suspend() is implemented (or
context that is simply stale).

> We know cpu 0 is the cpu the firmware booted us on last time, refuse to
> hibernate if it has been hotplugged out. Follow x86's example by registering
> a pm notifier that is called before processes are frozen and devices are
> stopped.

"Hibernation represents a system state save/restore through
a system reboot; this implies that the logical cpus carrying
out hibernation/thawing must be the same, so that the context
saved in the snapshot image on hibernation is consistent with
the state of the system on resume. If resume from hibernation
is driven through kernel command line parameter, the cpu responsible
for thawing the system will be whatever CPU firmware boots the system
on upon cold-boot (ie logical cpu 0); this means that in order to
keep system context consistent between the hibernate snapshot image
and system state on kernel resume from hibernate, logical cpu 0 must
be online on hibernation and must be the logical cpu that creates
the snapshot image.

This patch adds a PM notifier that enforces logical cpu 0 is online
when the hibernation is started (and prevents hibernation if it is
not), which is sufficient to guarantee it will be the one creating
the snapshot image therefore providing the resume cpu a consistent
snapshot of the system to resume to."

A tad verbose, feel free to modify it as you deem fit.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/arm64/kernel/hibernate.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
> index 486315249f2a..1ef4bf2207a5 100644
> --- a/arch/arm64/kernel/hibernate.c
> +++ b/arch/arm64/kernel/hibernate.c
> @@ -17,6 +17,7 @@
>  #define pr_fmt(x) "hibernate: " x
>  #include <linux/kvm_host.h>
>  #include <linux/mm.h>
> +#include <linux/notifier.h>
>  #include <linux/pm.h>
>  #include <linux/sched.h>
>  #include <linux/suspend.h>
> @@ -476,3 +477,28 @@ int swsusp_arch_resume(void)
>  out:
>  	return rc;
>  }
> +
> +static int check_boot_cpu_online_pm_callback(struct notifier_block *nb,
> +					     unsigned long action, void *ptr)
> +{
> +	if (action == PM_HIBERNATION_PREPARE &&
> +	     cpumask_first(cpu_online_mask) != 0) {
> +		pr_warn("CPU0 is offline.\n");
> +		return notifier_from_errno(-ENODEV);
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
> +static int __init check_boot_cpu_online_init(void)
> +{
> +	/*
> +	 * Set this pm_notifier callback with a lower priority than
> +	 * cpu_hotplug_pm_callback, so that cpu_hotplug_pm_callback will be
> +	 * called earlier to disable cpu hotplug before the cpu online check.
> +	 */
> +	pm_notifier(check_boot_cpu_online_pm_callback, -INT_MAX);
> +
> +	return 0;
> +}
> +core_initcall(check_boot_cpu_online_init);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-13 16:31 ` [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline James Morse
  2016-04-21 11:33   ` Lorenzo Pieralisi
@ 2016-04-21 11:44   ` Mark Rutland
  2016-04-21 12:33     ` Mark Rutland
  2016-04-22 10:41   ` Catalin Marinas
  2 siblings, 1 reply; 60+ messages in thread
From: Mark Rutland @ 2016-04-21 11:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
> It is important to hibernate/resume on the same CPU, otherwise we may
> change the cpu order or restore a big cpu's register state on a little
> cpu.
> 
> We know cpu 0 is the cpu the firmware booted us on last time, 

This assumes that we only kexec from CPU0 also, which we will have to
enforce. For example, disable_nonboot_cpus() does not enforce this if
CPU0 has been hotplugged out.

Otherwise, this kernel's CPU0 is not necessarily the CPU the FW booted
a kernel on.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-21 11:44   ` Mark Rutland
@ 2016-04-21 12:33     ` Mark Rutland
  2016-04-21 16:28       ` Lorenzo Pieralisi
  0 siblings, 1 reply; 60+ messages in thread
From: Mark Rutland @ 2016-04-21 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 21, 2016 at 12:44:16PM +0100, Mark Rutland wrote:
> On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
> > It is important to hibernate/resume on the same CPU, otherwise we may
> > change the cpu order or restore a big cpu's register state on a little
> > cpu.
> > 
> > We know cpu 0 is the cpu the firmware booted us on last time, 
> 
> This assumes that we only kexec from CPU0 also, which we will have to
> enforce. For example, disable_nonboot_cpus() does not enforce this if
> CPU0 has been hotplugged out.
> 
> Otherwise, this kernel's CPU0 is not necessarily the CPU the FW booted
> a kernel on.

A better approach might be:

* When going down for hibernate, store the physical CPU ID (e.g.
  MPIDR_EL1.Aff*) in the header for the hibernate image.

* When restoring a hibernate image, first switch over to the CPU
  described in the header (rather than assuming CPU0). 

I think this is a matter of adding a new disable_non_hibernate_cpus()
function (defaulting to disable_nonboot_cpus()), and overriding that in
the arch code. Then that can be called in resume_target_kernel.

Though there are likely caveats I've missed.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-21 12:33     ` Mark Rutland
@ 2016-04-21 16:28       ` Lorenzo Pieralisi
  2016-04-22 10:41         ` Mark Rutland
  0 siblings, 1 reply; 60+ messages in thread
From: Lorenzo Pieralisi @ 2016-04-21 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 21, 2016 at 01:33:35PM +0100, Mark Rutland wrote:
> On Thu, Apr 21, 2016 at 12:44:16PM +0100, Mark Rutland wrote:
> > On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
> > > It is important to hibernate/resume on the same CPU, otherwise we may
> > > change the cpu order or restore a big cpu's register state on a little
> > > cpu.
> > > 
> > > We know cpu 0 is the cpu the firmware booted us on last time, 
> > 
> > This assumes that we only kexec from CPU0 also, which we will have to
> > enforce. For example, disable_nonboot_cpus() does not enforce this if
> > CPU0 has been hotplugged out.
> > 
> > Otherwise, this kernel's CPU0 is not necessarily the CPU the FW booted
> > a kernel on.
> 
> A better approach might be:
> 
> * When going down for hibernate, store the physical CPU ID (e.g.
>   MPIDR_EL1.Aff*) in the header for the hibernate image.
> 
> * When restoring a hibernate image, first switch over to the CPU
>   described in the header (rather than assuming CPU0). 
> 
> I think this is a matter of adding a new disable_non_hibernate_cpus()
> function (defaulting to disable_nonboot_cpus()), and overriding that in
> the arch code. Then that can be called in resume_target_kernel.

Yes, it looks feasible at least by code inspection, given that it
requires changes in core code I wonder whether it is a restriction
that we can remove later or we want it in from the beginning (given
that the issue with kexec you mention above is not present in the
current kernel, hopefully not for long :)).

> Though there are likely caveats I've missed.

Same here, it *should* just be a matter of rescheduling on the
cpu whose MPIDR corresponds to the cpu that created the snapshot
image instead of the first cpu online, that can be done as you
said with a generalized disable_nonboot_cpus() that looks up
the MPIDR and get the corresponding cpu index, in the *current*
logical map that I expect to be identical to the one in the
hibernated kernel (actually I am not even sure that's a requirement).

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 15/16] arm64: kernel: Add support for hibernate/suspend-to-disk
  2016-04-01 16:53 ` [PATCH v7 15/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
@ 2016-04-22 10:29   ` Catalin Marinas
  2016-04-25  9:19     ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Catalin Marinas @ 2016-04-22 10:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On Fri, Apr 01, 2016 at 05:53:39PM +0100, James Morse wrote:
> --- /dev/null
> +++ b/arch/arm64/kernel/hibernate-asm.S

[...]

> +ENTRY(swsusp_arch_suspend_exit)
> +	/* Temporary page tables are a copy, so no need for a trampoline here */
> +	msr	ttbr1_el1, x0
> +	isb
> +	tlbi	vmalle1is	/* invalidate intermediate caching entries */
> +	dsb	ish
> +
> +	mov	x21, x1
> +	mov	x30, x2		/* el2_setup() will eret to the lr directly */
> +	mov	x24, x4
> +	mov	x25, x5
> +
> +	/* walk the restore_pblist and use copy_page() to over-write memory */
> +	mov	x19, x3
> +
> +1:	ldr	x10, [x19, #HIBERN_PBE_ORIG]
> +	mov	x0, x10
> +	ldr	x1, [x19, #HIBERN_PBE_ADDR]
> +
> +	copy_page	x0, x1, x2, x3, x4, x5, x6, x7, x8, x9
> +
> +	add	x1, x10, #PAGE_SIZE
> +	/* Clean the copied page to PoU - based on flush_icache_range() */
> +	dcache_line_size x2, x3
> +	sub	x3, x2, #1
> +	bic	x4, x10, x3
> +2:	dc	cvau, x4	/* clean D line / unified line */
> +	add	x4, x4, x2
> +	cmp	x4, x1
> +	b.lo	2b
> +
> +	ldr	x19, [x19, #HIBERN_PBE_NEXT]
> +	cbnz	x19, 1b
> +
> +
> +	/* switch to the restored kernels page tables, to reconfigure el2 */
> +	msr	ttbr1_el1, x21  /* physical address of swapper page tables */
> +	isb
> +	tlbi	vmalle1is	/* invalidate intermediate caching entries */
> +	ic	ialluis
> +	dsb	ish		/* also waits for PoU cleaning to finish */
> +	isb

The waiting for PoU cleaning needs to happen before the IC instruction.

> +
> +
> +	cbz	x24, 4f		/* Did we boot at el1? */
> +	/* Clean el2_setup's page to PoC */
> +	mov	x0, x24
> +	/*
> +	 * We don't know if el2_setup() overlaps a page boundary, clean two
> +	 * pages, just in case.
> +	 */
> +	add	x1, x0, #2*PAGE_SIZE
> +	dcache_line_size x2, x3
> +	sub	x3, x2, #1
> +	bic	x4, x0, x3
> +3:	dc	cvac, x4
> +	add	x4, x4, x2
> +	cmp	x4, x1
> +	b.lo	3b
> +
> +	/* reconfigure el2 */
> +	mrs	x0, sctlr_el1
> +	hvc	#0
> +
> +	/*
> +	 * el2_setup() will eret to the location in x30, so we
> +	 * only get here if we booted at el1.
> +	 */
> +
> +4:	ret
> +
> +	.ltorg
> +ENDPROC(swsusp_arch_suspend_exit)

[...]

> --- /dev/null
> +++ b/arch/arm64/kernel/hibernate.c
> @@ -0,0 +1,503 @@

[...]

> +/* Find a symbols alias in the linear map */
> +#define LMADDR(x)	phys_to_virt(virt_to_phys(x))

IIRC Ard was looking to add a specific macro in his subsequent KASLR
clean-up patches but I haven't checked the latest status.

[...]

> +/*
> + * Copies length bytes, starting at src_start into an new page,
> + * perform cache maintentance, then map it (nearly) at the bottom of memory
> + * as executable.
> + *
> + * This is used by hibernate to copy the code it needs to execute when
> + * overwriting the kernel text. This function generates a new set of page
> + * tables, which it loads into ttbr0.
> + *
> + * Length is provided as we probably only want 4K of data, even on a 64K
> + * page system. We don't use the very bottom page, so that dereferencing
> + * NULL continues to have the expected behaviour.
> + */
> +static int create_safe_exec_page(void *src_start, size_t length,
> +				 void **dst_addr, phys_addr_t *phys_dst_addr,
> +				 unsigned long (*allocator)(gfp_t mask),
> +				 gfp_t mask)
> +{
> +	int rc = 0;
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	unsigned long dst = allocator(mask);
> +
> +	if (!dst) {
> +		rc = -ENOMEM;
> +		goto out;
> +	}
> +
> +	memcpy((void *)dst, src_start, length);
> +	flush_icache_range(dst, dst + length);
> +
> +	pgd = (pgd_t *)allocator(mask) + pgd_index(PAGE_SIZE);

You could use pgd_offset_raw((pgd_t *)allocator(mask), PAGE_SIZE) (or
use two separate lines for allocation and offset in case of -ENOMEM).

BTW, can we have the allocator return type (void *) to avoid extra
casting?

> +	if (PTRS_PER_PGD > 1) {

I think you can use pgd_none(*pgd) here. In the nopud case, this would
be 0. I'm also not sure PTRS_PER_PGD is even correct here in the nopud
case where we don't need to allocate a pud but PTRS_PER_PGD is always
greater than 1.

> +		pud = (pud_t *)allocator(mask);
> +		if (!pud) {
> +			rc = -ENOMEM;
> +			goto out;
> +		}
> +		set_pgd(pgd, __pgd(virt_to_phys(pud) | PUD_TYPE_TABLE));
> +	}
> +
> +	pud = pud_offset(pgd, PAGE_SIZE);
> +	if (PTRS_PER_PUD > 1) {

pud_none()

> +		pmd = (pmd_t *)allocator(mask);
> +		if (!pmd) {
> +			rc = -ENOMEM;
> +			goto out;
> +		}
> +		set_pud(pud, __pud(virt_to_phys(pmd) | PUD_TYPE_TABLE));
> +	}
> +
> +	pmd = pmd_offset(pud, PAGE_SIZE);
> +	if (PTRS_PER_PMD > 1) {

pmd_none()

> +		pte = (pte_t *)allocator(mask);
> +		if (!pte) {
> +			rc = -ENOMEM;
> +			goto out;
> +		}
> +		set_pmd(pmd, __pmd(virt_to_phys(pte) | PMD_TYPE_TABLE));
> +	}
> +
> +	pte = pte_offset_kernel(pmd, PAGE_SIZE);
> +	set_pte_at(&init_mm, dst, pte,
> +		   __pte(virt_to_phys((void *)dst) |
> +			 pgprot_val(PAGE_KERNEL_EXEC)));

I would use set_pte() rather than the *_at variant here as that one has
some side-effects. The alloc_init_pte() and fixmap code only use
set_pte().

> +
> +	/* Load our new page tables */
> +	asm volatile("msr	ttbr0_el1, %0;"
> +		     "isb;"
> +		     "tlbi	vmalle1is;"
> +		     "dsb	ish" : : "r"(virt_to_phys(pgd)));

Do we expect anything to have used ttbr0_el1 at this point? I think the
TLB for the low addresses should have already been invalidated
immediately after boot and we wouldn't run any user space at this point.

> +
> +	*dst_addr = (void *)(PAGE_SIZE);
> +	*phys_dst_addr = virt_to_phys((void *)dst);

More of a nitpick but you could set *dst_addr or a local variable (e.g.
vaddr) to PAGE_SIZE at the beginning of this function to make it clear
that this is the VA we picked for. Seeing PAGE_SIZE in several places
makes me think why we pass a "size" argument to those pte/pmd/pud
macros.

Even better, can we not just set this address in the caller of this
function (swsusp_arch_resume)? I find this **dst_addr argument passing
unnecessary since everything is contained in this file.

> +
> +out:
> +	return rc;
> +}
> +
> +
> +int swsusp_arch_suspend(void)
> +{
> +	int ret = 0;
> +	unsigned long flags;
> +	struct sleep_stack_data state;
> +
> +	local_dbg_save(flags);
> +
> +	if (__cpu_suspend_enter(&state)) {
> +		ret = swsusp_save();
> +	} else {
> +		void *lm_kernel_start;
> +
> +		/* Clean kernel to PoC for secondary core startup */
> +		lm_kernel_start = LMADDR(KERNEL_START);
> +		__flush_dcache_area(lm_kernel_start, KERNEL_END - KERNEL_START);

We don't need to use LMADDR here. The KERNEL_START is already mapped at
the caches are PIPT (-like), so flushing any of the aliases would do.

But I'm not sure we even need to flush the whole kernel. The secondary
cores would only execute certain areas before they enable the MMU, at
which point they have visibility over the whole cache. Is this needed
for secondary core startup on resume from hibernate?

[...]

> +static int copy_pud(pgd_t *dst_pgd, pgd_t *src_pgd, unsigned long start,
> +		    unsigned long end)
> +{
> +	int rc = 0;
> +	pmd_t *dst_pmd;
> +	unsigned long next;
> +	unsigned long addr = start;
> +	pud_t *src_pud = pud_offset(src_pgd, start);
> +	pud_t *dst_pud = pud_offset(dst_pgd, start);
> +
> +	do {
> +		next = pud_addr_end(addr, end);
> +		if (!pud_val(*src_pud))
> +			continue;
> +
> +		if (pud_table(*(src_pud))) {
> +			if (PTRS_PER_PMD != 1) {
> +				dst_pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
> +				if (!dst_pmd) {
> +					rc = -ENOMEM;
> +					break;
> +				}
> +
> +				set_pud(dst_pud, __pud(virt_to_phys(dst_pmd)
> +						       | PUD_TYPE_TABLE));
> +			}
> +
> +			rc = copy_pmd(dst_pud, src_pud, addr, next);
> +			if (rc)
> +				break;
> +		} else {
> +			set_pud(dst_pud,
> +				__pud(pud_val(*src_pud) & ~PMD_SECT_RDONLY));
> +		}
> +	} while (dst_pud++, src_pud++, addr = next, addr != end);
> +
> +	return rc;
> +}
> +
> +static int copy_page_tables(pgd_t *dst_pgd, unsigned long start,
> +			    unsigned long end)
> +{
> +	int rc = 0;
> +	pud_t *dst_pud;
> +	unsigned long next;
> +	unsigned long addr = start;
> +	pgd_t *src_pgd = pgd_offset_k(start);
> +
> +	dst_pgd += pgd_index(start);
> +
> +	do {
> +		next = pgd_addr_end(addr, end);
> +		if (!pgd_val(*src_pgd))
> +			continue;
> +
> +		if (PTRS_PER_PUD != 1) {
> +			dst_pud = (pud_t *)get_safe_page(GFP_ATOMIC);
> +			if (!dst_pud) {
> +				rc = -ENOMEM;
> +				break;
> +			}
> +
> +			set_pgd(dst_pgd, __pgd(virt_to_phys(dst_pud)
> +					       | PUD_TYPE_TABLE));
> +		}
> +
> +		rc = copy_pud(dst_pgd, src_pgd, addr, next);
> +		if (rc)
> +			break;
> +	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
> +
> +	return rc;
> +}

We have a few similar page table walking routines in the kernel, though
none of them seems close enough to be easily reusable. Most of them
require a vm_area_struct and mm_struct. But we could use them as
inspiration and the closest to what we need is copy_page_range(). The
main differences from what you have:

- using p*d_offset() instead of p*d_index(). The former is already
  defined in the pgtable-nop*d.h etc. files
- the pud allocation happens in copy_pud rather than copy_page_tables
  (similarly for pmd)
- using p*d_none() instead of !p*d_val(). Again, the former is already
  defined in pgtable-nop*d.h and I don't think we'll need the
  PTRS_PER_P*D checks

> +
> +/*
> + * Setup then Resume from the hibernate image using swsusp_arch_suspend_exit().
> + *
> + * Memory allocated by get_safe_page() will be dealt with by the hibernate code,
> + * we don't need to free it here.
> + */
> +int swsusp_arch_resume(void)
> +{
> +	int rc = 0;
> +	size_t exit_size;
> +	pgd_t *tmp_pg_dir;
> +	void *lm_restore_pblist;
> +	phys_addr_t phys_hibernate_exit;
> +	void __noreturn (*hibernate_exit)(phys_addr_t, phys_addr_t, void *,
> +					  void *, unsigned long, phys_addr_t);
> +
> +	/*
> +	 * Copy swsusp_arch_suspend_exit() to a safe page. This will generate
> +	 * a new set of ttbr0 page tables and load them.
> +	 */
> +	exit_size = __hibernate_exit_text_end - __hibernate_exit_text_start;
> +	rc = create_safe_exec_page(__hibernate_exit_text_start, exit_size,
> +				   (void **)&hibernate_exit,
> +				   &phys_hibernate_exit,
> +				   get_safe_page, GFP_ATOMIC);

What I suggested above, just set hibernate_exit VA to PAGE_SIZE here and
pass it directly to create_safe_exec_page().

-- 
Catalin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument
  2016-04-20 17:35     ` James Morse
@ 2016-04-22 10:36       ` Catalin Marinas
  0 siblings, 0 replies; 60+ messages in thread
From: Catalin Marinas @ 2016-04-22 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 20, 2016 at 06:35:57PM +0100, James Morse wrote:
> Hi Catalin,
> 
> On 20/04/16 18:12, Catalin Marinas wrote:
> > On Fri, Apr 01, 2016 at 05:53:37PM +0100, James Morse wrote:
> >> el2_setup() doesn't just configure el2, it configures el1 too. This
> >> means we can't use it to re-configure el2 after resume from hibernate,
> >> as we will be returned to el1 with the MMU turned off.
> >>
> >> Split the sctlr_el1 setting code up, so that el2_setup() accepts an
> >> initial value as an argument. This value will be ignored if el2_setup()
> >> is called at el1: the running value will be preserved with endian
> >> correction.
> >>
> >> Hibernate can now call el2_setup() to re-configure el2, passing the
> >> current sctlr_el1 as an argument.
> > 
> > I don't fully understand the description. The first paragraph says we
> > can't use el2_setup to re-configure el2 after resume from hibernate
> > while the last paragraph says that we can.
> 
> Then I need to rephrase the middle paragraph. Is this any clearer:
> This happens because el2_setup() generates a sctlr_el1 value needed for early
> boot, (mmu off etc). Change el2_setup() to accept this as an argument, so that
> hibernate can pass the current sctlr_el1 (with the mmu on), to el2_setup(), and
> have it re-configure el2 without changing el1.

I think I understand it now, it makes sense.

> The motivation for this patch was resuming with a different kernel version,
> which may have left el2 in an incompatible state. Running the original
> el2_setup() was the only guaranteed way to know everything was set correctly.
> If we definitely don't want to support this, then this patch can go, and the
> minimum el2-reset code can be put into hibernate-asm.S.

I suggest we leave this patch as it is for now, I just didn't understand
why it was needed.

With the slightly updated log:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-21 16:28       ` Lorenzo Pieralisi
@ 2016-04-22 10:41         ` Mark Rutland
  2016-04-22 15:32           ` James Morse
  0 siblings, 1 reply; 60+ messages in thread
From: Mark Rutland @ 2016-04-22 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 21, 2016 at 05:28:52PM +0100, Lorenzo Pieralisi wrote:
> On Thu, Apr 21, 2016 at 01:33:35PM +0100, Mark Rutland wrote:
> > On Thu, Apr 21, 2016 at 12:44:16PM +0100, Mark Rutland wrote:
> > > On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
> > > > It is important to hibernate/resume on the same CPU, otherwise we may
> > > > change the cpu order or restore a big cpu's register state on a little
> > > > cpu.
> > > > 
> > > > We know cpu 0 is the cpu the firmware booted us on last time, 
> > > 
> > > This assumes that we only kexec from CPU0 also, which we will have to
> > > enforce. For example, disable_nonboot_cpus() does not enforce this if
> > > CPU0 has been hotplugged out.
> > > 
> > > Otherwise, this kernel's CPU0 is not necessarily the CPU the FW booted
> > > a kernel on.
> > 
> > A better approach might be:
> > 
> > * When going down for hibernate, store the physical CPU ID (e.g.
> >   MPIDR_EL1.Aff*) in the header for the hibernate image.
> > 
> > * When restoring a hibernate image, first switch over to the CPU
> >   described in the header (rather than assuming CPU0). 
> > 
> > I think this is a matter of adding a new disable_non_hibernate_cpus()
> > function (defaulting to disable_nonboot_cpus()), and overriding that in
> > the arch code. Then that can be called in resume_target_kernel.
> 
> Yes, it looks feasible at least by code inspection, given that it
> requires changes in core code I wonder whether it is a restriction
> that we can remove later or we want it in from the beginning (given
> that the issue with kexec you mention above is not present in the
> current kernel, hopefully not for long :)).

So long as we're not tied to a specific ABI for the hibernate image then
we should be fine to change that later; I agree that this can be a
subsequent improvement.

So long as we have a mechanism to detect that the hibernate image format
changed, we should be OK to add stuff to it.

> > Though there are likely caveats I've missed.
> 
> Same here, it *should* just be a matter of rescheduling on the
> cpu whose MPIDR corresponds to the cpu that created the snapshot
> image instead of the first cpu online, that can be done as you
> said with a generalized disable_nonboot_cpus() that looks up
> the MPIDR and get the corresponding cpu index, in the *current*
> logical map that I expect to be identical to the one in the
> hibernated kernel (actually I am not even sure that's a requirement).

That can't be a requirement. The logical ID may be arbitrarily
different.

Imagine a two CPU system (0x000 and 0x100, say):

* Kernel A boots (CPU0: 0x000, CPU1: 0x100);
* Kernel A hotplugs out CPU0 (0x000)
* Kernel A kexecs to kernel B on CPU1 (0x100)
* Kernel B boots (CPU0: 0x100, CPU1: 0x000)
* Kernel B hibernates on CPU0 (0x100)
* Kernel C boots (CPU0: 0x000, CPU1: 0x100)
* Kernel C switches to CPU1 (0x100)
* Kernel C resumes Kernel B, CPU1 becomes CPU0 (0x100)

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-13 16:31 ` [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline James Morse
  2016-04-21 11:33   ` Lorenzo Pieralisi
  2016-04-21 11:44   ` Mark Rutland
@ 2016-04-22 10:41   ` Catalin Marinas
  2016-04-22 15:32     ` James Morse
  2 siblings, 1 reply; 60+ messages in thread
From: Catalin Marinas @ 2016-04-22 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
> It is important to hibernate/resume on the same CPU, otherwise we may
> change the cpu order or restore a big cpu's register state on a little
> cpu.
> 
> We know cpu 0 is the cpu the firmware booted us on last time, refuse to
> hibernate if it has been hotplugged out. Follow x86's example by registering
> a pm notifier that is called before processes are frozen and devices are
> stopped.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

I think for the time being we should just go ahead with this patch (with
Lorenzo's suggestion for the commit log):

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Subsequently we could look at what Mark suggested on using MPIDR to
hibernate/resume on CPU other than 0.

-- 
Catalin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-22 10:41         ` Mark Rutland
@ 2016-04-22 15:32           ` James Morse
  0 siblings, 0 replies; 60+ messages in thread
From: James Morse @ 2016-04-22 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark, Lorenzo,

On 22/04/16 11:41, Mark Rutland wrote:
> On Thu, Apr 21, 2016 at 05:28:52PM +0100, Lorenzo Pieralisi wrote:
>> On Thu, Apr 21, 2016 at 01:33:35PM +0100, Mark Rutland wrote:
>>> On Thu, Apr 21, 2016 at 12:44:16PM +0100, Mark Rutland wrote:
>>>> On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
>>>>> It is important to hibernate/resume on the same CPU, otherwise we may
>>>>> change the cpu order or restore a big cpu's register state on a little
>>>>> cpu.
>>>>>
>>>>> We know cpu 0 is the cpu the firmware booted us on last time, 
>>>>
>>>> This assumes that we only kexec from CPU0 also, which we will have to
>>>> enforce. For example, disable_nonboot_cpus() does not enforce this if
>>>> CPU0 has been hotplugged out.
>>>>
>>>> Otherwise, this kernel's CPU0 is not necessarily the CPU the FW booted
>>>> a kernel on.
>>>
>>> A better approach might be:
>>>
>>> * When going down for hibernate, store the physical CPU ID (e.g.
>>>   MPIDR_EL1.Aff*) in the header for the hibernate image.
>>>
>>> * When restoring a hibernate image, first switch over to the CPU
>>>   described in the header (rather than assuming CPU0). 
>>>
>>> I think this is a matter of adding a new disable_non_hibernate_cpus()
>>> function (defaulting to disable_nonboot_cpus()), and overriding that in
>>> the arch code. Then that can be called in resume_target_kernel.
>>
>> Yes, it looks feasible at least by code inspection, given that it
>> requires changes in core code I wonder whether it is a restriction
>> that we can remove later or we want it in from the beginning (given
>> that the issue with kexec you mention above is not present in the
>> current kernel, hopefully not for long :)).

I think this will work, the arch-header read/write happens in the right order
for it to influence which cores get plugged out.

Curiously, migrate_to_reboot_cpu() would allow us to specify which cpu we want
to run on, but disable_nonboot_cpus() doesn't...


> So long as we're not tied to a specific ABI for the hibernate image then
> we should be fine to change that later; I agree that this can be a
> subsequent improvement.
> 
> So long as we have a mechanism to detect that the hibernate image format
> changed, we should be OK to add stuff to it.

We only need to worry about this if we support resuming with a different kernel
version. I don't think we should yet, so patch 16 puts some build specific data
in the header so that even a rebuild of the same kernel will refuse to
hibernate/restore. This is roughly what the core code does if the arch doesn't
need the header. With this we are free to extend the header as we see fit.

(If a distribution wants to support different-kernel-resume, its a one line
change, and an awful lot of testing!)

One thing that really bugs me about different-kernel-resume is that if the
combination you try is not supported, the first hint you get is a regular login
screen, when you were expecting your precious in-memory data.


Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline
  2016-04-22 10:41   ` Catalin Marinas
@ 2016-04-22 15:32     ` James Morse
  0 siblings, 0 replies; 60+ messages in thread
From: James Morse @ 2016-04-22 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/04/16 11:41, Catalin Marinas wrote:
> On Wed, Apr 13, 2016 at 05:31:50PM +0100, James Morse wrote:
>> It is important to hibernate/resume on the same CPU, otherwise we may
>> change the cpu order or restore a big cpu's register state on a little
>> cpu.
>>
>> We know cpu 0 is the cpu the firmware booted us on last time, refuse to
>> hibernate if it has been hotplugged out. Follow x86's example by registering
>> a pm notifier that is called before processes are frozen and devices are
>> stopped.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> I think for the time being we should just go ahead with this patch (with
> Lorenzo's suggestion for the commit log):
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks,


> Subsequently we could look at what Mark suggested on using MPIDR to
> hibernate/resume on CPU other than 0.

I was going to squash this into one of the previous patches, but as there is a
better-way(tm), I will leave it as it is so we can revert it later.


James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-20 11:19         ` James Morse
  2016-04-20 11:46           ` Marc Zyngier
@ 2016-04-25  8:41           ` AKASHI Takahiro
  2016-04-25  9:16             ` James Morse
  1 sibling, 1 reply; 60+ messages in thread
From: AKASHI Takahiro @ 2016-04-25  8:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 20, 2016 at 12:19:45PM +0100, James Morse wrote:
> Hi Marc,
> 
> On 20/04/16 11:37, Marc Zyngier wrote:
> > On 19/04/16 18:37, James Morse wrote:
> >> It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
> >>> 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
> >> Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
> >>> * Hardware virtualization extension instructions may fault if a
> >>> * reboot turns off virtualization while processes are running.
> >>> * Trap the fault and ignore the instruction if that happens.
> > 
> > I very much like that approach, to be honest. Tearing down a CPU is
> > something exceptional, so let's make it an actual exception.
> >
> > It is now pretty easy to discriminate between KVM functions and stub
> > functions thanks to your earlier patch, so if we end up calling the
> > hyp-stub because we've torn down KVM's EL2, let's just return an
> > appropriate error code (ARM_EXCEPTION_HYP_GONE), and handle it at EL1.
> 
> Okay. kexec uses kvm_call_hyp() against the hyp-stub to do the kernel-copy and
> hand over to purgatory, but we could change that to a new 'special' builtin
> call, something like HVC_KEXEC_CALL_HYP. It never calls it with kvm loaded, so
> there is no reason the calls have to be same.
> 
> Given hibernate doesn't hit this issue, I will drop this hunk from this version
> of the patch, and repost hibernate incorporating the feedback so far. I will
> provide a patch for kexec to do the above.

Thanks, but you don' have to.
If the fix below is acceptable, we will merge it to our next kexec/kdump
patch series.

-Takahiro AKASHI

> Thanks,
> 
> James


>From d70cf5d3202d819bd7f8c9072c61da783bf07b40 Mon Sep 17 00:00:00 2001
From: AKASHI Takahiro <takahiro.akashi@linaro.org>
Date: Mon, 25 Apr 2016 17:29:55 +0900
Subject: [PATCH] arm64: kvm: (experimental) fix kexec exception due to kexec
 reoot

---
 arch/arm/kvm/arm.c               |  8 +-------
 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/include/asm/virt.h    |  5 +++++
 arch/arm64/kernel/cpu-reset.S    |  5 +----
 arch/arm64/kernel/hyp-stub.S     | 14 ++++++++++++--
 arch/arm64/kvm/handle_exit.c     |  4 ++++
 6 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 962904a..0e92787 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -587,13 +587,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		/*
 		 * Re-check atomic conditions
 		 */
-		if (unlikely(!__this_cpu_read(kvm_arm_hardware_enabled))) {
-			/* cpu has been torn down */
-			ret = 0;
-			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
-			run->fail_entry.hardware_entry_failure_reason
-					= (u64)-ENOEXEC;
-		} else if (signal_pending(current)) {
+		if (signal_pending(current)) {
 			ret = -EINTR;
 			run->exit_reason = KVM_EXIT_INTR;
 		}
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ebc8d0e..7f653ad 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -22,6 +22,7 @@
 
 #define ARM_EXCEPTION_IRQ	  0
 #define ARM_EXCEPTION_TRAP	  1
+#define ARM_EXCEPTION_HYP_GONE	  2
 
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
 #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index d406d2e..9079661 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -34,6 +34,11 @@
  */
 #define HVC_SET_VECTORS 1
 
+/*
+ * HVC_KEXEC_RESTART
+ */
+#define HVC_KEXEC_RESTART 2
+
 #define BOOT_CPU_MODE_EL1	(0xe11)
 #define BOOT_CPU_MODE_EL2	(0xe12)
 
diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 11a5bc6..4d35331 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -41,10 +41,7 @@ ENTRY(__cpu_soft_restart)
 	isb
 
 	cbz	x0, 1f				// el2_switch?
-	mov	x0, x1				// entry
-	mov	x1, x2				// arg0
-	mov	x2, x3				// arg1
-	mov	x3, x4				// arg2
+	mov	x0, #HVC_KEXEC_RESTART
 	hvc	#0				// no return
 
 1:	mov	x18, x1				// entry
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 6bba25c..7fad129 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -22,7 +22,8 @@
 #include <linux/irqchip/arm-gic-v3.h>
 
 #include <asm/assembler.h>
-#include <asm/kvm_arm.h>
+#include <asm/esr.h>
+#include <asm/kvm_asm.h>
 #include <asm/ptrace.h>
 #include <asm/virt.h>
 
@@ -70,7 +71,16 @@ el1_sync:
 	msr	vbar_el2, x1
 	b	9f
 
-2:	do_el2_call
+2:	cmp	x0, #HVC_KEXEC_RESTART
+	b.eq	3f
+	mov	x0, #ARM_EXCEPTION_HYP_GONE
+	b	9f
+
+3:	mov	lr, x1
+	mov	x0, x2
+	mov	x1, x3
+	mov	x2, x4
+	blr	lr
 
 9:	eret
 ENDPROC(el1_sync)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index eba89e4..31b5224 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -186,6 +186,10 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		exit_handler = kvm_get_exit_handler(vcpu);
 
 		return exit_handler(vcpu, run);
+	case ARM_EXCEPTION_HYP_GONE:
+		/* due to kexec reboot */
+		run->exit_reason = KVM_EXIT_SHUTDOWN;
+		return 0;
 	default:
 		kvm_pr_unimpl("Unsupported exception type: %d",
 			      exception_index);
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-25  8:41           ` AKASHI Takahiro
@ 2016-04-25  9:16             ` James Morse
  2016-04-25  9:28               ` Marc Zyngier
  0 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2016-04-25  9:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 25/04/16 09:41, AKASHI Takahiro wrote:
> On Wed, Apr 20, 2016 at 12:19:45PM +0100, James Morse wrote:
>> On 20/04/16 11:37, Marc Zyngier wrote:
>>> On 19/04/16 18:37, James Morse wrote:
>>>> It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
>>>>> 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
>>>> Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
>>>>> * Hardware virtualization extension instructions may fault if a
>>>>> * reboot turns off virtualization while processes are running.
>>>>> * Trap the fault and ignore the instruction if that happens.
>>>
>>> I very much like that approach, to be honest. Tearing down a CPU is
>>> something exceptional, so let's make it an actual exception.
>>>
>>> It is now pretty easy to discriminate between KVM functions and stub
>>> functions thanks to your earlier patch, so if we end up calling the
>>> hyp-stub because we've torn down KVM's EL2, let's just return an
>>> appropriate error code (ARM_EXCEPTION_HYP_GONE), and handle it at EL1.
>>
>> Okay. kexec uses kvm_call_hyp() against the hyp-stub to do the kernel-copy and
>> hand over to purgatory, but we could change that to a new 'special' builtin
>> call, something like HVC_KEXEC_CALL_HYP. It never calls it with kvm loaded, so
>> there is no reason the calls have to be same.
>>
>> Given hibernate doesn't hit this issue, I will drop this hunk from this version
>> of the patch, and repost hibernate incorporating the feedback so far. I will
>> provide a patch for kexec to do the above.
> 
> Thanks, but you don' have to.

I was wrong with the 'hibernate doesn't hit this issue', with this patch we
re-install the hyp-stub during system reboot, and race with the scheduler.
('reboot -f' while running a guest).


> If the fix below is acceptable, we will merge it to our next kexec/kdump
> patch series.

I'm testing something that looks very similar at the moment.


> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index eba89e4..31b5224 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -186,6 +186,10 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  		exit_handler = kvm_get_exit_handler(vcpu);
>  
>  		return exit_handler(vcpu, run);
> +	case ARM_EXCEPTION_HYP_GONE:
> +		/* due to kexec reboot */
> +		run->exit_reason = KVM_EXIT_SHUTDOWN;
> +		return 0;

Is it fair to throw this back out to user space? While the hypervisor doesn't
have long to live, it may not be expecting this exit_reason. I couldn't see a
value for 'suprise cpu removal', and it looks like the x86 code causes the vcpu
to spin round enter guest...


Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 15/16] arm64: kernel: Add support for hibernate/suspend-to-disk
  2016-04-22 10:29   ` Catalin Marinas
@ 2016-04-25  9:19     ` James Morse
  0 siblings, 0 replies; 60+ messages in thread
From: James Morse @ 2016-04-25  9:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin,

Thanks for your comments,

On 22/04/16 11:29, Catalin Marinas wrote:
> On Fri, Apr 01, 2016 at 05:53:39PM +0100, James Morse wrote:
>> --- /dev/null
>> +++ b/arch/arm64/kernel/hibernate-asm.S

>> +	copy_page	x0, x1, x2, x3, x4, x5, x6, x7, x8, x9
>> +
>> +	add	x1, x10, #PAGE_SIZE
>> +	/* Clean the copied page to PoU - based on flush_icache_range() */
>> +	dcache_line_size x2, x3
>> +	sub	x3, x2, #1
>> +	bic	x4, x10, x3
>> +2:	dc	cvau, x4	/* clean D line / unified line */
>> +	add	x4, x4, x2
>> +	cmp	x4, x1
>> +	b.lo	2b
>> +
>> +	ldr	x19, [x19, #HIBERN_PBE_NEXT]
>> +	cbnz	x19, 1b
>> +
>> +
>> +	/* switch to the restored kernels page tables, to reconfigure el2 */
>> +	msr	ttbr1_el1, x21  /* physical address of swapper page tables */
>> +	isb
>> +	tlbi	vmalle1is	/* invalidate intermediate caching entries */
>> +	ic	ialluis
>> +	dsb	ish		/* also waits for PoU cleaning to finish */
>> +	isb
> 
> The waiting for PoU cleaning needs to happen before the IC instruction.

Done, to check I understand why:
The 'ic ialluis' may finish before the PoU cleaning, sharing a barrier means in
this case we may speculatively load stale values back into the icache while we
wait for the cleaning to finish.

[ ... ]

>> +
>> +	/* Load our new page tables */
>> +	asm volatile("msr	ttbr0_el1, %0;"
>> +		     "isb;"
>> +		     "tlbi	vmalle1is;"
>> +		     "dsb	ish" : : "r"(virt_to_phys(pgd)));
> 
> Do we expect anything to have used ttbr0_el1 at this point?

EFI for the virt_efi_get_time() call when we setup the rtc. There may also be
device drivers out there that try to load firmware before the
late_initcall_sync() call that triggers resume.

[ ... ]

>> +int swsusp_arch_suspend(void)
>> +{
>> +	int ret = 0;
>> +	unsigned long flags;
>> +	struct sleep_stack_data state;
>> +
>> +	local_dbg_save(flags);
>> +
>> +	if (__cpu_suspend_enter(&state)) {
>> +		ret = swsusp_save();
>> +	} else {
>> +		void *lm_kernel_start;
>> +
>> +		/* Clean kernel to PoC for secondary core startup */
>> +		lm_kernel_start = LMADDR(KERNEL_START);
>> +		__flush_dcache_area(lm_kernel_start, KERNEL_END - KERNEL_START);
> 
> We don't need to use LMADDR here. The KERNEL_START is already mapped at
> the caches are PIPT (-like), so flushing any of the aliases would do.

With kaslr the range KERNEL_START -> KERNEL_END has holes in it. I think this is
where the __init text or alternatives used to be. Cleaning the corresponding
range in the linear map avoids the fault...


> But I'm not sure we even need to flush the whole kernel. The secondary
> cores would only execute certain areas before they enable the MMU, at
> which point they have visibility over the whole cache. Is this needed
> for secondary core startup on resume from hibernate?

I haven't hit this as an issue, but I think its needed for any mmu-off code.
The list is:
*  secondary startup after resume
*  hyp-stub and kvm's el2-init code,
*  and cpu_resume() (if a core goes into idle soon after resume).

I agree cleaning the whole kernel is excessive. I guess the right thing to do is
to collect all these functions into a single section and clean that.

[ ... ]

Thanks for the detailed comments!


James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug
  2016-04-25  9:16             ` James Morse
@ 2016-04-25  9:28               ` Marc Zyngier
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Zyngier @ 2016-04-25  9:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 25/04/16 10:16, James Morse wrote:
> Hi,
> 
> On 25/04/16 09:41, AKASHI Takahiro wrote:
>> On Wed, Apr 20, 2016 at 12:19:45PM +0100, James Morse wrote:
>>> On 20/04/16 11:37, Marc Zyngier wrote:
>>>> On 19/04/16 18:37, James Morse wrote:
>>>>> It looks like x86 uses the extable to work around this, their vmx_vcpu_run() has:
>>>>>> 		__ex(ASM_VMX_VMLAUNCH) "\n\t"
>>>>> Where __ex ends up calling ____kvm_handle_fault_on_reboot(), with a nearby comment:
>>>>>> * Hardware virtualization extension instructions may fault if a
>>>>>> * reboot turns off virtualization while processes are running.
>>>>>> * Trap the fault and ignore the instruction if that happens.
>>>>
>>>> I very much like that approach, to be honest. Tearing down a CPU is
>>>> something exceptional, so let's make it an actual exception.
>>>>
>>>> It is now pretty easy to discriminate between KVM functions and stub
>>>> functions thanks to your earlier patch, so if we end up calling the
>>>> hyp-stub because we've torn down KVM's EL2, let's just return an
>>>> appropriate error code (ARM_EXCEPTION_HYP_GONE), and handle it at EL1.
>>>
>>> Okay. kexec uses kvm_call_hyp() against the hyp-stub to do the kernel-copy and
>>> hand over to purgatory, but we could change that to a new 'special' builtin
>>> call, something like HVC_KEXEC_CALL_HYP. It never calls it with kvm loaded, so
>>> there is no reason the calls have to be same.
>>>
>>> Given hibernate doesn't hit this issue, I will drop this hunk from this version
>>> of the patch, and repost hibernate incorporating the feedback so far. I will
>>> provide a patch for kexec to do the above.
>>
>> Thanks, but you don' have to.
> 
> I was wrong with the 'hibernate doesn't hit this issue', with this patch we
> re-install the hyp-stub during system reboot, and race with the scheduler.
> ('reboot -f' while running a guest).
> 
> 
>> If the fix below is acceptable, we will merge it to our next kexec/kdump
>> patch series.
> 
> I'm testing something that looks very similar at the moment.
> 
> 
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index eba89e4..31b5224 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -186,6 +186,10 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>  		exit_handler = kvm_get_exit_handler(vcpu);
>>  
>>  		return exit_handler(vcpu, run);
>> +	case ARM_EXCEPTION_HYP_GONE:
>> +		/* due to kexec reboot */
>> +		run->exit_reason = KVM_EXIT_SHUTDOWN;
>> +		return 0;
> 
> Is it fair to throw this back out to user space? While the hypervisor doesn't
> have long to live, it may not be expecting this exit_reason. I couldn't see a
> value for 'suprise cpu removal', and it looks like the x86 code causes the vcpu
> to spin round enter guest...

Yeah, KVM_EXIT_SHUTDOWN is a vcpu reset (which on x86 is caused by a
triple fault). KVM_EXIT_FAIL_ENTRY seems slightly better.

As for getting back to userspace, I don't see that as a problem (though
the documentation on that part of the API is... sparse).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-04-25  9:28 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 16:53 [PATCH v7 00/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
2016-04-01 16:53 ` [PATCH v7 01/16] arm64: KVM: Register CPU notifiers when the kernel runs at HYP James Morse
2016-04-18 16:10   ` Catalin Marinas
2016-04-19  8:58     ` James Morse
2016-04-19 14:39       ` Marc Zyngier
2016-04-01 16:53 ` [PATCH v7 02/16] arm64: Fold proc-macros.S into assembler.h James Morse
2016-04-18 16:11   ` Catalin Marinas
2016-04-01 16:53 ` [PATCH v7 03/16] arm64: Cleanup SCTLR flags James Morse
2016-04-19 14:44   ` Marc Zyngier
2016-04-01 16:53 ` [PATCH v7 04/16] arm64: kvm: Move the do_el2_call macro to a header file James Morse
2016-04-19 15:02   ` Marc Zyngier
2016-04-19 15:05     ` James Morse
2016-04-19 15:10       ` Marc Zyngier
2016-04-01 16:53 ` [PATCH v7 05/16] arm64: kvm: Move lr save/restore from do_el2_call into EL1 James Morse
2016-04-19 15:11   ` Marc Zyngier
2016-04-01 16:53 ` [PATCH v7 06/16] arm64: hyp/kvm: Extend hyp-stub API to allow function calls at EL2 James Morse
2016-04-19 15:22   ` Marc Zyngier
2016-04-01 16:53 ` [PATCH v7 07/16] arm64: kvm: allows kvm cpu hotplug James Morse
2016-04-19 16:03   ` Marc Zyngier
2016-04-19 17:37     ` James Morse
2016-04-20 10:29       ` AKASHI Takahiro
2016-04-20 11:19         ` James Morse
2016-04-20 10:37       ` Marc Zyngier
2016-04-20 11:19         ` James Morse
2016-04-20 11:46           ` Marc Zyngier
2016-04-25  8:41           ` AKASHI Takahiro
2016-04-25  9:16             ` James Morse
2016-04-25  9:28               ` Marc Zyngier
2016-04-01 16:53 ` [PATCH v7 08/16] arm64: kernel: Rework finisher callback out of __cpu_suspend_enter() James Morse
2016-04-18 17:20   ` Catalin Marinas
2016-04-01 16:53 ` [PATCH v7 09/16] arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va James Morse
2016-04-20 16:24   ` Catalin Marinas
2016-04-01 16:53 ` [PATCH v7 10/16] arm64: kernel: Include _AC definition in page.h James Morse
2016-04-20 16:25   ` Catalin Marinas
2016-04-01 16:53 ` [PATCH v7 11/16] arm64: Promote KERNEL_START/KERNEL_END definitions to a header file James Morse
2016-04-20 16:26   ` Catalin Marinas
2016-04-01 16:53 ` [PATCH v7 12/16] arm64: Add new asm macro copy_page James Morse
2016-04-20 16:38   ` Catalin Marinas
2016-04-20 16:56     ` James Morse
2016-04-01 16:53 ` [PATCH v7 13/16] arm64: head.S: el2_setup() to accept sctlr_el1 as an argument James Morse
2016-04-20 17:12   ` Catalin Marinas
2016-04-20 17:35     ` James Morse
2016-04-22 10:36       ` Catalin Marinas
2016-04-01 16:53 ` [PATCH v7 14/16] PM / Hibernate: Call flush_icache_range() on pages restored in-place James Morse
2016-04-20 17:16   ` Catalin Marinas
2016-04-01 16:53 ` [PATCH v7 15/16] arm64: kernel: Add support for hibernate/suspend-to-disk James Morse
2016-04-22 10:29   ` Catalin Marinas
2016-04-25  9:19     ` James Morse
2016-04-01 16:53 ` [PATCH v7 16/16] arm64: hibernate: Prevent resume from a different kernel version James Morse
2016-04-10 12:16   ` Ard Biesheuvel
2016-04-13 16:35     ` James Morse
2016-04-13 16:31 ` [PATCH v7 17/16] arm64: hibernate: Refuse to hibernate if the boot cpu is offline James Morse
2016-04-21 11:33   ` Lorenzo Pieralisi
2016-04-21 11:44   ` Mark Rutland
2016-04-21 12:33     ` Mark Rutland
2016-04-21 16:28       ` Lorenzo Pieralisi
2016-04-22 10:41         ` Mark Rutland
2016-04-22 15:32           ` James Morse
2016-04-22 10:41   ` Catalin Marinas
2016-04-22 15:32     ` James Morse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.