All of lore.kernel.org
 help / color / mirror / Atom feed
* Kexec and KVM not working gracefully together
@ 2015-01-27 15:07 Frediano Ziglio
  2015-01-27 15:25 ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: Frediano Ziglio @ 2015-01-27 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,
  I was trying to make kexec (software reset) work on an ARM platform
and I realized that the kernel launched with kexec cannot use KVM.
Looking for message I get from kernel and code the situation is this:
1- kernel start in HYP mode but then kvm code switch to SVC mode and
initialize HYP mode with its code;
2- kexec call reboot with LINUX_REBOOT_CMD_KEXEC
3- kernel call kernel_exec;
4- kernel_exec calls machine_kexec;
5- machine_kexec calls soft_restart passing physical entry point for
next in memory kernel;
6- soft_restart calls __soft_restart changing stack;
7- __soft_restart calls cpu_reset (which in my case is defined as cpu_v7_reset);
8- cpu_v7_reset just disable MMU (it's in an identity memory) and
calls next kernel entry point.

>From point 3 to 8 kernel is always in SVC mode so next kernel is
launched in SVC mode too but initial kernel was launched in HYP mode.

I used kernel 3.14 but looking at 3.19 rc code there is the same issue
(code didn't change).

Using hvc instruction you can execute arbitrary functions however
these function must be in a very restricted range as HYP code MMU has
very limited paged configured and cpu_v7_reset is not one of these
functions.

My idea to fix the issue is before calling cpu_reset call a new
kvm_exit or similar that turn into HYP mode with MMU set as SVC mode.

Is this a known issue? Should I try to fix the problem or somebody can
easily fix it?

Regards,
  Frediano

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-01-27 15:07 Kexec and KVM not working gracefully together Frediano Ziglio
@ 2015-01-27 15:25 ` Marc Zyngier
  2015-02-05  6:17   ` AKASHI Takahiro
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2015-01-27 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Frediano,

On 27/01/15 15:07, Frediano Ziglio wrote:
> Hi,
>   I was trying to make kexec (software reset) work on an ARM platform
> and I realized that the kernel launched with kexec cannot use KVM.
> Looking for message I get from kernel and code the situation is this:
> 1- kernel start in HYP mode but then kvm code switch to SVC mode and
> initialize HYP mode with its code;
> 2- kexec call reboot with LINUX_REBOOT_CMD_KEXEC
> 3- kernel call kernel_exec;
> 4- kernel_exec calls machine_kexec;
> 5- machine_kexec calls soft_restart passing physical entry point for
> next in memory kernel;
> 6- soft_restart calls __soft_restart changing stack;
> 7- __soft_restart calls cpu_reset (which in my case is defined as cpu_v7_reset);
> 8- cpu_v7_reset just disable MMU (it's in an identity memory) and
> calls next kernel entry point.
> 
> From point 3 to 8 kernel is always in SVC mode so next kernel is
> launched in SVC mode too but initial kernel was launched in HYP mode.
> 
> I used kernel 3.14 but looking at 3.19 rc code there is the same issue
> (code didn't change).
> 
> Using hvc instruction you can execute arbitrary functions however
> these function must be in a very restricted range as HYP code MMU has
> very limited paged configured and cpu_v7_reset is not one of these
> functions.
> 
> My idea to fix the issue is before calling cpu_reset call a new
> kvm_exit or similar that turn into HYP mode with MMU set as SVC mode.
> 
> Is this a known issue? Should I try to fix the problem or somebody can
> easily fix it?

This has been known for a while, and so far people dealing with Kexec
have preferred sidestepping the issue. This is moderately easy to fix if
you're happy dealing with page tables. What is missing is the code that
switches back to an idmap, restore the HYP stubs, and let Kexec install
its own stubs for jumping to the next kernel.

Most of the infrastructure is already there, it is "just" a matter of
getting it right.

If you feel like giving it a go, I suggest you have a look at how we
actually install KVM (the transitions from no MMU to idmap to trampoline
page to final layout is rather entertaining). Once you understand that,
it should be rather straightforward to perform this in the reverse
order, and we can assist you getting it right.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-01-27 15:25 ` Marc Zyngier
@ 2015-02-05  6:17   ` AKASHI Takahiro
  2015-02-05  9:43     ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: AKASHI Takahiro @ 2015-02-05  6:17 UTC (permalink / raw)
  To: linux-arm-kernel

Frediano
Cc: Marc

Are you going to fix this issue on arm?
As I'm now working on the same issue on arm64,
we should share the idea and some code.

Hopefully I will talk to Marc at Linaro Connect next week,
then submit my own patch (or merge it into Geoff's) soon.

Thanks,
-Takahiro AKASHI

On 01/28/2015 12:25 AM, Marc Zyngier wrote:
> Hi Frediano,
>
> On 27/01/15 15:07, Frediano Ziglio wrote:
>> Hi,
>>    I was trying to make kexec (software reset) work on an ARM platform
>> and I realized that the kernel launched with kexec cannot use KVM.
>> Looking for message I get from kernel and code the situation is this:
>> 1- kernel start in HYP mode but then kvm code switch to SVC mode and
>> initialize HYP mode with its code;
>> 2- kexec call reboot with LINUX_REBOOT_CMD_KEXEC
>> 3- kernel call kernel_exec;
>> 4- kernel_exec calls machine_kexec;
>> 5- machine_kexec calls soft_restart passing physical entry point for
>> next in memory kernel;
>> 6- soft_restart calls __soft_restart changing stack;
>> 7- __soft_restart calls cpu_reset (which in my case is defined as cpu_v7_reset);
>> 8- cpu_v7_reset just disable MMU (it's in an identity memory) and
>> calls next kernel entry point.
>>
>>  From point 3 to 8 kernel is always in SVC mode so next kernel is
>> launched in SVC mode too but initial kernel was launched in HYP mode.
>>
>> I used kernel 3.14 but looking at 3.19 rc code there is the same issue
>> (code didn't change).
>>
>> Using hvc instruction you can execute arbitrary functions however
>> these function must be in a very restricted range as HYP code MMU has
>> very limited paged configured and cpu_v7_reset is not one of these
>> functions.
>>
>> My idea to fix the issue is before calling cpu_reset call a new
>> kvm_exit or similar that turn into HYP mode with MMU set as SVC mode.
>>
>> Is this a known issue? Should I try to fix the problem or somebody can
>> easily fix it?
>
> This has been known for a while, and so far people dealing with Kexec
> have preferred sidestepping the issue. This is moderately easy to fix if
> you're happy dealing with page tables. What is missing is the code that
> switches back to an idmap, restore the HYP stubs, and let Kexec install
> its own stubs for jumping to the next kernel.
>
> Most of the infrastructure is already there, it is "just" a matter of
> getting it right.
>
> If you feel like giving it a go, I suggest you have a look at how we
> actually install KVM (the transitions from no MMU to idmap to trampoline
> page to final layout is rather entertaining). Once you understand that,
> it should be rather straightforward to perform this in the reverse
> order, and we can assist you getting it right.
>
> Thanks,
>
> 	M.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-05  6:17   ` AKASHI Takahiro
@ 2015-02-05  9:43     ` Marc Zyngier
  2015-02-05 10:27       ` AKASHI Takahiro
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2015-02-05  9:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 5 Feb 2015 06:17:04 +0000
AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:

> Frediano
> Cc: Marc
> 
> Are you going to fix this issue on arm?
> As I'm now working on the same issue on arm64,
> we should share the idea and some code.

Just to be perfectly clear: KVM support for Kexec on arm and arm64
should be exactly the same, with the exception of some (very
limited) assembly code for the MMU teardown.

And I'd like to see both architectures addressed at the same time,
without statements such as "we'll do that as a next step". As such, I'd
very much recommend that Frediano and you work together to address this
problem.

Thanks,

        M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-05  9:43     ` Marc Zyngier
@ 2015-02-05 10:27       ` AKASHI Takahiro
  2015-02-05 10:51         ` Frediano Ziglio
  0 siblings, 1 reply; 16+ messages in thread
From: AKASHI Takahiro @ 2015-02-05 10:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/05/2015 06:43 PM, Marc Zyngier wrote:
> On Thu, 5 Feb 2015 06:17:04 +0000
> AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
>
>> Frediano
>> Cc: Marc
>>
>> Are you going to fix this issue on arm?
>> As I'm now working on the same issue on arm64,
>> we should share the idea and some code.
>
> Just to be perfectly clear: KVM support for Kexec on arm and arm64
> should be exactly the same, with the exception of some (very
> limited) assembly code for the MMU teardown.

Yep, that's exactly what I meant.

> And I'd like to see both architectures addressed at the same time,
> without statements such as "we'll do that as a next step". As such, I'd
> very much recommend that Frediano and you work together to address this
> problem.

I hope so.

Thanks,
-Takahiro AKASHI

> Thanks,
>
>          M.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-05 10:27       ` AKASHI Takahiro
@ 2015-02-05 10:51         ` Frediano Ziglio
  2015-02-05 13:32           ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: Frediano Ziglio @ 2015-02-05 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

2015-02-05 10:27 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
> On 02/05/2015 06:43 PM, Marc Zyngier wrote:
>>
>> On Thu, 5 Feb 2015 06:17:04 +0000
>> AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
>>
>>> Frediano
>>> Cc: Marc
>>>
>>> Are you going to fix this issue on arm?
>>> As I'm now working on the same issue on arm64,
>>> we should share the idea and some code.
>>
>>
>> Just to be perfectly clear: KVM support for Kexec on arm and arm64
>> should be exactly the same, with the exception of some (very
>> limited) assembly code for the MMU teardown.
>
>
> Yep, that's exactly what I meant.
>
>> And I'd like to see both architectures addressed at the same time,
>> without statements such as "we'll do that as a next step". As such, I'd
>> very much recommend that Frediano and you work together to address this
>> problem.
>
>
> I hope so.
>
> Thanks,
> -Takahiro AKASHI
>
>> Thanks,
>>
>>          M.
>>



Well, let's code speak. I'm actually testing it with an ARM32 platform
with a single CPU (platform have 16 cores but I'm using a no SMP
kernel for my tests for different reason, I should be able to test SMP
in a couple of days).

The key point is the switch back code (arm.c, kvm_cpu_reset code). The idea:
- at boot time I save HSCTLR value
- restore boot page tables and trampoline before cpu reset
- call kvm_cpu_reset instead of cpu_reset (this will call cpu_reset if
kvm is disabled)
- set hvbar to trampoline (virtual memory)
- set hvbar to physical trampoline and boot pages (with trampoline
mapped at physical and virtual that is identity)
- flush memory while disabling caching on hypervisor mode (not doing
cause a lot of instability, now I rebooted more than 100 times in a
row)
- call hypervisor mode to do the reset cpu restoring the boot HSCTLR value


Hope gmail does not screw my patch...


diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 3a67bec..71380a1 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -97,7 +97,24 @@ extern char __kvm_hyp_code_end[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);

+extern void __kvm_set_vectors(unsigned long phys_vector_base);
+
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+#ifdef CONFIG_HAVE_KVM
+static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
+{
+    phys_reset(addr);
+}
+static inline int kvm_mmu_reset_prepare(void)
+{
+    return 0;
+}
+#else
+extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
+extern int kvm_mmu_reset_prepare(void);
+#endif
+
 #endif

 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 04b4ea0..c0b6c20 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -192,6 +192,8 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *);
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
         int exception_index);

+extern unsigned kvm_saved_sctlr;
+
 static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
                        phys_addr_t pgd_ptr,
                        unsigned long hyp_stack_ptr,
@@ -213,7 +215,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t
boot_pgd_ptr,
      * PCS!).
      */

-    kvm_call_hyp(NULL, 0, boot_pgd_ptr);
+    kvm_saved_sctlr = (unsigned) kvm_call_hyp(NULL, 0, boot_pgd_ptr);

     kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index fdfa3a7..026a1e4 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -41,6 +41,7 @@
 #include <asm/system_misc.h>
 #include <asm/mach/time.h>
 #include <asm/tls.h>
+#include <asm/kvm_asm.h>

 #ifdef CONFIG_CC_STACKPROTECTOR
 #include <linux/stackprotector.h>
@@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
 };

 extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
-typedef void (*phys_reset_t)(unsigned long);
+typedef void (*phys_reset_t)(void *);

 /*
  * A temporary stack to use for CPU reset. This is static so that we
@@ -89,7 +90,7 @@ static void __soft_restart(void *addr)

     /* Switch to the identity mapping. */
     phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
-    phys_reset((unsigned long)addr);
+    kvm_cpu_reset(phys_reset, addr);

     /* Should never get here. */
     BUG();
@@ -107,6 +108,8 @@ void soft_restart(unsigned long addr)
     if (num_online_cpus() == 1)
         outer_disable();

+    kvm_mmu_reset_prepare();
+
     /* Change to the new stack and continue with the reset. */
     call_with_stack(__soft_restart, (void *)addr, (void *)stack);

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0b0d58a..7727678 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -63,6 +63,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);

 static bool vgic_present;

+unsigned kvm_saved_sctlr;
+
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 {
     BUG_ON(preemptible());
@@ -1079,6 +1081,37 @@ out_err:
     return err;
 }

+void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
+{
+    unsigned long vector;
+
+    if (!is_hyp_mode_available()) {
+        phys_reset(addr);
+        return;
+    }
+
+    vector = (unsigned long) kvm_get_idmap_vector();
+
+    kvm_call_hyp(__kvm_flush_vm_context);
+
+    /* set vectors so we call initialization routines */
+    /* trampoline is still on memory, physical ram is not mapped */
+    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
(vector & (PAGE_SIZE-1))));
+
+    /* set HVBAR to physical, page table to identity to do the switch */
+    kvm_call_hyp((void*) 0x100, (unsigned long) vector,
kvm_mmu_get_boot_httbr());
+
+    flush_cache_all();
+
+    /* Turn off caching on Hypervisor mode */
+    kvm_call_hyp((void*) 1);
+
+    flush_cache_all();
+
+    /* restore execution */
+    kvm_call_hyp((void*) 3, kvm_saved_sctlr, addr);
+}
+
 /* NOP: Compiling as a module not supported */
 void kvm_arch_exit(void)
 {
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72..79f435a 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -68,12 +68,21 @@ __kvm_hyp_init:
     W(b)    .

 __do_hyp_init:
+    @ check for special values, odd numbers can't be a stack pointer
+    cmp    r0, #1
+    beq    cpu_proc_fin
+    cmp    r0, #3
+    beq    restore
+
     cmp    r0, #0            @ We have a SP?
     bne    phase2            @ Yes, second stage init

     @ Set the HTTBR to point to the hypervisor PGD pointer passed
     mcrr    p15, 4, rr_lo_hi(r2, r3), c2

+    @ Save HSCR to be able to return it, save and restore
+    mrc    p15, 4, r3, c1, c0, 0    @ HSCR
+
     @ Set the HTCR and VTCR to the same shareability and cacheability
     @ settings as the non-secure TTBCR and with T0SZ == 0.
     mrc    p15, 4, r0, c2, c0, 2    @ HTCR
@@ -125,6 +134,9 @@ __do_hyp_init:
     isb
     mcr    p15, 4, r0, c1, c0, 0    @ HSCR

+    @ Return initial HSCR register
+    mov    r0, r3
+
     @ End of init phase-1
     eret

@@ -151,6 +163,29 @@ target:    @ We're now in the trampoline code,
switch page tables

     eret

+cpu_proc_fin:
+    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
+    bic    r0, r0, #0x1000            @ ...i............
+    bic    r0, r0, #0x0006            @ .............ca.
+    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
+    eret
+
+restore:
+    isb
+    mcr    p15, 4, r1, c1, c0, 0    @ HSCR
+    isb
+    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
+    isb
+    dsb    ish
+
+    @ Disable MMU, caches and Thumb in SCTLR
+    mrc    p15, 0, r1, c1, c0, 0   @ SCTLR
+    ldr    r0, =0x40003807
+    bic    r1, r1, r0
+    mcr    p15, 0, r1, c1, c0, 0
+
+    bx    r2
+
     .ltorg

     .globl __kvm_hyp_init_end
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 01dcb0e..449e7dd 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)


 /********************************************************************
+ * Set HVBAR
+ *
+ * void __kvm_set_vectors(unsigned long phys_vector_base);
+ */
+ENTRY(__kvm_set_vectors)
+    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
+    dsb    ish
+    isb
+    bx    lr
+ENDPROC(__kvm_set_vectors)
+
+/********************************************************************
  *  Hypervisor world-switch code
  *
  *
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1366625..9149ae5 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -46,6 +46,8 @@ static phys_addr_t hyp_idmap_vector;

 #define kvm_pmd_huge(_x)    (pmd_huge(_x) || pmd_trans_huge(_x))

+static int kvm_mmu_boot_init(void);
+
 static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 {
     /*
@@ -369,6 +371,11 @@ void free_boot_hyp_pgd(void)
     free_page((unsigned long)init_bounce_page);
     init_bounce_page = NULL;

+    /* avoid to reuse possibly invalid values if bounce page is freed */
+    hyp_idmap_start = 0;
+    hyp_idmap_end = 0;
+    hyp_idmap_vector = 0;
+
     mutex_unlock(&kvm_hyp_pgd_mutex);
 }

@@ -1252,7 +1259,28 @@ phys_addr_t kvm_get_idmap_vector(void)
     return hyp_idmap_vector;
 }

-int kvm_mmu_init(void)
+/* assure we have MMU setup correctly */
+int kvm_mmu_reset_prepare(void)
+{
+    int err = 0;
+
+    if (boot_hyp_pgd)
+        goto out;
+
+    err = kvm_mmu_boot_init();
+    if (err)
+        goto out;
+
+    err =     __create_hyp_mappings(hyp_pgd,
+                      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
+                      __phys_to_pfn(hyp_idmap_start),
+                      PAGE_HYP);
+
+out:
+    return err;
+}
+
+static int kvm_mmu_boot_init(void)
 {
     int err;

@@ -1294,10 +1322,9 @@ int kvm_mmu_init(void)
              (unsigned long)phys_base);
     }

-    hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
hyp_pgd_order);
     boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
hyp_pgd_order);

-    if (!hyp_pgd || !boot_hyp_pgd) {
+    if (!boot_hyp_pgd) {
         kvm_err("Hyp mode PGD not allocated\n");
         err = -ENOMEM;
         goto out;
@@ -1326,6 +1353,27 @@ int kvm_mmu_init(void)
         goto out;
     }

+    return 0;
+out:
+    free_boot_hyp_pgd();
+    return err;
+}
+
+int kvm_mmu_init(void)
+{
+    int err;
+
+    err = kvm_mmu_boot_init();
+    if (err)
+        goto out0;
+
+    hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+    if (!hyp_pgd) {
+        kvm_err("Hyp mode PGD not allocated\n");
+        err = -ENOMEM;
+        goto out;
+    }
+
     /* Map the same page again into the runtime page tables */
     err =     __create_hyp_mappings(hyp_pgd,
                       TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
@@ -1340,6 +1388,7 @@ int kvm_mmu_init(void)
     return 0;
 out:
     free_hyp_pgds();
+out0:
     return err;
 }


Frediano

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-05 10:51         ` Frediano Ziglio
@ 2015-02-05 13:32           ` Marc Zyngier
  2015-02-05 14:27             ` Frediano Ziglio
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2015-02-05 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/02/15 10:51, Frediano Ziglio wrote:
> 2015-02-05 10:27 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
>> On 02/05/2015 06:43 PM, Marc Zyngier wrote:
>>>
>>> On Thu, 5 Feb 2015 06:17:04 +0000
>>> AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
>>>
>>>> Frediano
>>>> Cc: Marc
>>>>
>>>> Are you going to fix this issue on arm?
>>>> As I'm now working on the same issue on arm64,
>>>> we should share the idea and some code.
>>>
>>>
>>> Just to be perfectly clear: KVM support for Kexec on arm and arm64
>>> should be exactly the same, with the exception of some (very
>>> limited) assembly code for the MMU teardown.
>>
>>
>> Yep, that's exactly what I meant.
>>
>>> And I'd like to see both architectures addressed at the same time,
>>> without statements such as "we'll do that as a next step". As such, I'd
>>> very much recommend that Frediano and you work together to address this
>>> problem.
>>
>>
>> I hope so.
>>
>> Thanks,
>> -Takahiro AKASHI
>>
>>> Thanks,
>>>
>>>          M.
>>>
> 
> 
> 
> Well, let's code speak. I'm actually testing it with an ARM32 platform
> with a single CPU (platform have 16 cores but I'm using a no SMP
> kernel for my tests for different reason, I should be able to test SMP
> in a couple of days).
> 
> The key point is the switch back code (arm.c, kvm_cpu_reset code). The idea:
> - at boot time I save HSCTLR value

Why do you need this? The content of that register is known at any point
of the lifetime of the kernel, and so is its value at reboot time
(basically MMU and caches disabled, ARM mode).

This should be part of your MMU teardown process.

> - restore boot page tables and trampoline before cpu reset
> - call kvm_cpu_reset instead of cpu_reset (this will call cpu_reset if
> kvm is disabled)
> - set hvbar to trampoline (virtual memory)
> - set hvbar to physical trampoline and boot pages (with trampoline
> mapped at physical and virtual that is identity)
> - flush memory while disabling caching on hypervisor mode (not doing
> cause a lot of instability, now I rebooted more than 100 times in a
> row)
> - call hypervisor mode to do the reset cpu restoring the boot HSCTLR value
> 
> 
> Hope gmail does not screw my patch...
> 
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 3a67bec..71380a1 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -97,7 +97,24 @@ extern char __kvm_hyp_code_end[];
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> 
> +extern void __kvm_set_vectors(unsigned long phys_vector_base);
> +
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +
> +#ifdef CONFIG_HAVE_KVM

That doesn't seem right. Shouldn't that be #ifndef?

> +static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
> +{
> +    phys_reset(addr);
> +}
> +static inline int kvm_mmu_reset_prepare(void)
> +{
> +    return 0;
> +}
> +#else
> +extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
> +extern int kvm_mmu_reset_prepare(void);
> +#endif
> +
>  #endif
> 
>  #endif /* __ARM_KVM_ASM_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 04b4ea0..c0b6c20 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -192,6 +192,8 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu,
> const struct kvm_one_reg *);
>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>          int exception_index);
> 
> +extern unsigned kvm_saved_sctlr;
> +
>  static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>                         phys_addr_t pgd_ptr,
>                         unsigned long hyp_stack_ptr,
> @@ -213,7 +215,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t
> boot_pgd_ptr,
>       * PCS!).
>       */
> 
> -    kvm_call_hyp(NULL, 0, boot_pgd_ptr);
> +    kvm_saved_sctlr = (unsigned) kvm_call_hyp(NULL, 0, boot_pgd_ptr);
> 
>      kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>  }
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index fdfa3a7..026a1e4 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -41,6 +41,7 @@
>  #include <asm/system_misc.h>
>  #include <asm/mach/time.h>
>  #include <asm/tls.h>
> +#include <asm/kvm_asm.h>
> 
>  #ifdef CONFIG_CC_STACKPROTECTOR
>  #include <linux/stackprotector.h>
> @@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
>  };
> 
>  extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
> -typedef void (*phys_reset_t)(unsigned long);
> +typedef void (*phys_reset_t)(void *);
> 
>  /*
>   * A temporary stack to use for CPU reset. This is static so that we
> @@ -89,7 +90,7 @@ static void __soft_restart(void *addr)
> 
>      /* Switch to the identity mapping. */
>      phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
> -    phys_reset((unsigned long)addr);
> +    kvm_cpu_reset(phys_reset, addr);
> 
>      /* Should never get here. */
>      BUG();
> @@ -107,6 +108,8 @@ void soft_restart(unsigned long addr)
>      if (num_online_cpus() == 1)
>          outer_disable();
> 
> +    kvm_mmu_reset_prepare();
> +
>      /* Change to the new stack and continue with the reset. */
>      call_with_stack(__soft_restart, (void *)addr, (void *)stack);
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 0b0d58a..7727678 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -63,6 +63,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);
> 
>  static bool vgic_present;
> 
> +unsigned kvm_saved_sctlr;
> +
>  static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
>  {
>      BUG_ON(preemptible());
> @@ -1079,6 +1081,37 @@ out_err:
>      return err;
>  }
> 
> +void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
> +{
> +    unsigned long vector;
> +
> +    if (!is_hyp_mode_available()) {
> +        phys_reset(addr);
> +        return;
> +    }
> +
> +    vector = (unsigned long) kvm_get_idmap_vector();
> +
> +    kvm_call_hyp(__kvm_flush_vm_context);

What are you trying to achieve here? This only makes sense if you're
reassigning VMIDs. I would certainly hope that, by the time you're
tearing KVM down for reboot, no VM is up and running.

> +
> +    /* set vectors so we call initialization routines */
> +    /* trampoline is still on memory, physical ram is not mapped */

Comment is incredibly misleading. Physical memory *IS* mapped. What you
have is a kernel mapping of that page.

> +    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
> (vector & (PAGE_SIZE-1))));

That really deserves a comment as to *why* you're doing that offsetting.

> +
> +    /* set HVBAR to physical, page table to identity to do the switch */
> +    kvm_call_hyp((void*) 0x100, (unsigned long) vector,
> kvm_mmu_get_boot_httbr());

Where is that 0x100 (stack?) coming from?

> +    flush_cache_all();

Why do you need a flush_all? Why can't you do it by VA?
> +
> +    /* Turn off caching on Hypervisor mode */
> +    kvm_call_hyp((void*) 1);
> +
> +    flush_cache_all();

The fact that you have to do it twice is indicative of other problems.

> +    /* restore execution */
> +    kvm_call_hyp((void*) 3, kvm_saved_sctlr, addr);

That seems useless to me (as mentioned above). The whole sequence can
probably be reduced to a couple of HYP calls, and made much more
straightforward.

> +}
> +
>  /* NOP: Compiling as a module not supported */
>  void kvm_arch_exit(void)
>  {
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 3988e72..79f435a 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -68,12 +68,21 @@ __kvm_hyp_init:
>      W(b)    .
> 
>  __do_hyp_init:
> +    @ check for special values, odd numbers can't be a stack pointer
> +    cmp    r0, #1
> +    beq    cpu_proc_fin
> +    cmp    r0, #3
> +    beq    restore
> +
>      cmp    r0, #0            @ We have a SP?
>      bne    phase2            @ Yes, second stage init
> 
>      @ Set the HTTBR to point to the hypervisor PGD pointer passed
>      mcrr    p15, 4, rr_lo_hi(r2, r3), c2
> 
> +    @ Save HSCR to be able to return it, save and restore
> +    mrc    p15, 4, r3, c1, c0, 0    @ HSCR
> +
>      @ Set the HTCR and VTCR to the same shareability and cacheability
>      @ settings as the non-secure TTBCR and with T0SZ == 0.
>      mrc    p15, 4, r0, c2, c0, 2    @ HTCR
> @@ -125,6 +134,9 @@ __do_hyp_init:
>      isb
>      mcr    p15, 4, r0, c1, c0, 0    @ HSCR
> 
> +    @ Return initial HSCR register
> +    mov    r0, r3
> +
>      @ End of init phase-1
>      eret
> 
> @@ -151,6 +163,29 @@ target:    @ We're now in the trampoline code,
> switch page tables
> 
>      eret
> 
> +cpu_proc_fin:
> +    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
> +    bic    r0, r0, #0x1000            @ ...i............
> +    bic    r0, r0, #0x0006            @ .............ca.
> +    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
> +    eret
> +
> +restore:
> +    isb
> +    mcr    p15, 4, r1, c1, c0, 0    @ HSCR
> +    isb
> +    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
> +    isb
> +    dsb    ish
> +
> +    @ Disable MMU, caches and Thumb in SCTLR
> +    mrc    p15, 0, r1, c1, c0, 0   @ SCTLR
> +    ldr    r0, =0x40003807
> +    bic    r1, r1, r0
> +    mcr    p15, 0, r1, c1, c0, 0
> +
> +    bx    r2
> +
>      .ltorg
> 
>      .globl __kvm_hyp_init_end
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 01dcb0e..449e7dd 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)
> 
> 
>  /********************************************************************
> + * Set HVBAR
> + *
> + * void __kvm_set_vectors(unsigned long phys_vector_base);
> + */
> +ENTRY(__kvm_set_vectors)
> +    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
> +    dsb    ish
> +    isb
> +    bx    lr
> +ENDPROC(__kvm_set_vectors)
> +
> +/********************************************************************
>   *  Hypervisor world-switch code
>   *
>   *
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 1366625..9149ae5 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -46,6 +46,8 @@ static phys_addr_t hyp_idmap_vector;
> 
>  #define kvm_pmd_huge(_x)    (pmd_huge(_x) || pmd_trans_huge(_x))
> 
> +static int kvm_mmu_boot_init(void);
> +
>  static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  {
>      /*
> @@ -369,6 +371,11 @@ void free_boot_hyp_pgd(void)
>      free_page((unsigned long)init_bounce_page);
>      init_bounce_page = NULL;
> 
> +    /* avoid to reuse possibly invalid values if bounce page is freed */
> +    hyp_idmap_start = 0;
> +    hyp_idmap_end = 0;
> +    hyp_idmap_vector = 0;
> +

Well, in this case, why do we free the page? We'd better keep it around
if we have Kexec...

>      mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
> 
> @@ -1252,7 +1259,28 @@ phys_addr_t kvm_get_idmap_vector(void)
>      return hyp_idmap_vector;
>  }
> 
> -int kvm_mmu_init(void)
> +/* assure we have MMU setup correctly */
> +int kvm_mmu_reset_prepare(void)
> +{
> +    int err = 0;
> +
> +    if (boot_hyp_pgd)
> +        goto out;
> +
> +    err = kvm_mmu_boot_init();
> +    if (err)
> +        goto out;
> +
> +    err =     __create_hyp_mappings(hyp_pgd,
> +                      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> +                      __phys_to_pfn(hyp_idmap_start),
> +                      PAGE_HYP);
> +
> +out:
> +    return err;
> +}
> +

... and this code can then go away.

> +static int kvm_mmu_boot_init(void)
>  {
>      int err;
> 
> @@ -1294,10 +1322,9 @@ int kvm_mmu_init(void)
>               (unsigned long)phys_base);
>      }
> 
> -    hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> hyp_pgd_order);
>      boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> hyp_pgd_order);
> 
> -    if (!hyp_pgd || !boot_hyp_pgd) {
> +    if (!boot_hyp_pgd) {
>          kvm_err("Hyp mode PGD not allocated\n");
>          err = -ENOMEM;
>          goto out;
> @@ -1326,6 +1353,27 @@ int kvm_mmu_init(void)
>          goto out;
>      }
> 
> +    return 0;
> +out:
> +    free_boot_hyp_pgd();
> +    return err;
> +}
> +
> +int kvm_mmu_init(void)
> +{
> +    int err;
> +
> +    err = kvm_mmu_boot_init();
> +    if (err)
> +        goto out0;
> +
> +    hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
> +    if (!hyp_pgd) {
> +        kvm_err("Hyp mode PGD not allocated\n");
> +        err = -ENOMEM;
> +        goto out;
> +    }
> +
>      /* Map the same page again into the runtime page tables */
>      err =     __create_hyp_mappings(hyp_pgd,
>                        TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
> @@ -1340,6 +1388,7 @@ int kvm_mmu_init(void)
>      return 0;
>  out:
>      free_hyp_pgds();
> +out0:
>      return err;
>  }
> 
> 
> Frediano
> 

Thanks for having a look into this, as you're the first one to actually
do something. That still requires a lot of work, and some careful
thoughts as to how to do this on arm64.

Looking forward to your next patch,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-05 13:32           ` Marc Zyngier
@ 2015-02-05 14:27             ` Frediano Ziglio
  2015-02-05 16:56               ` Frediano Ziglio
  0 siblings, 1 reply; 16+ messages in thread
From: Frediano Ziglio @ 2015-02-05 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

2015-02-05 13:32 GMT+00:00 Marc Zyngier <marc.zyngier@arm.com>:
> On 05/02/15 10:51, Frediano Ziglio wrote:
>> 2015-02-05 10:27 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
>>> On 02/05/2015 06:43 PM, Marc Zyngier wrote:
>>>>
>>>> On Thu, 5 Feb 2015 06:17:04 +0000
>>>> AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
>>>>
>>>>> Frediano
>>>>> Cc: Marc
>>>>>
>>>>> Are you going to fix this issue on arm?
>>>>> As I'm now working on the same issue on arm64,
>>>>> we should share the idea and some code.
>>>>
>>>>
>>>> Just to be perfectly clear: KVM support for Kexec on arm and arm64
>>>> should be exactly the same, with the exception of some (very
>>>> limited) assembly code for the MMU teardown.
>>>
>>>
>>> Yep, that's exactly what I meant.
>>>
>>>> And I'd like to see both architectures addressed at the same time,
>>>> without statements such as "we'll do that as a next step". As such, I'd
>>>> very much recommend that Frediano and you work together to address this
>>>> problem.
>>>
>>>
>>> I hope so.
>>>
>>> Thanks,
>>> -Takahiro AKASHI
>>>
>>>> Thanks,
>>>>
>>>>          M.
>>>>
>>
>>
>>
>> Well, let's code speak. I'm actually testing it with an ARM32 platform
>> with a single CPU (platform have 16 cores but I'm using a no SMP
>> kernel for my tests for different reason, I should be able to test SMP
>> in a couple of days).
>>
>> The key point is the switch back code (arm.c, kvm_cpu_reset code). The idea:
>> - at boot time I save HSCTLR value
>
> Why do you need this? The content of that register is known at any point
> of the lifetime of the kernel, and so is its value at reboot time
> (basically MMU and caches disabled, ARM mode).
>
> This should be part of your MMU teardown process.
>

I should have put a RFC in the subject. Probably I can just adjust the
value without saving the boot one.

>> - restore boot page tables and trampoline before cpu reset
>> - call kvm_cpu_reset instead of cpu_reset (this will call cpu_reset if
>> kvm is disabled)
>> - set hvbar to trampoline (virtual memory)
>> - set hvbar to physical trampoline and boot pages (with trampoline
>> mapped at physical and virtual that is identity)
>> - flush memory while disabling caching on hypervisor mode (not doing
>> cause a lot of instability, now I rebooted more than 100 times in a
>> row)
>> - call hypervisor mode to do the reset cpu restoring the boot HSCTLR value
>>
>>
>> Hope gmail does not screw my patch...
>>
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 3a67bec..71380a1 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -97,7 +97,24 @@ extern char __kvm_hyp_code_end[];
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>
>> +extern void __kvm_set_vectors(unsigned long phys_vector_base);
>> +
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>> +
>> +#ifdef CONFIG_HAVE_KVM
>
> That doesn't seem right. Shouldn't that be #ifndef?
>

Well... I don't know where I got that define, the name is wrong too.
However I think that the kernel should switch to hypervisor mode
either if kvm is enabled or not. head.S test for CONFIG_ARM_VIRT_EXT
so probably I should have the same test here. I should also try to
compile with either KVM enabled or not just to check that kernel links
correctly.

>> +static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>> +{
>> +    phys_reset(addr);
>> +}
>> +static inline int kvm_mmu_reset_prepare(void)
>> +{
>> +    return 0;
>> +}
>> +#else
>> +extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
>> +extern int kvm_mmu_reset_prepare(void);
>> +#endif
>> +
>>  #endif
>>
>>  #endif /* __ARM_KVM_ASM_H__ */
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 04b4ea0..c0b6c20 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -192,6 +192,8 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu,
>> const struct kvm_one_reg *);
>>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>          int exception_index);
>>
>> +extern unsigned kvm_saved_sctlr;
>> +
>>  static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>>                         phys_addr_t pgd_ptr,
>>                         unsigned long hyp_stack_ptr,
>> @@ -213,7 +215,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t
>> boot_pgd_ptr,
>>       * PCS!).
>>       */
>>
>> -    kvm_call_hyp(NULL, 0, boot_pgd_ptr);
>> +    kvm_saved_sctlr = (unsigned) kvm_call_hyp(NULL, 0, boot_pgd_ptr);
>>
>>      kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>>  }
>> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
>> index fdfa3a7..026a1e4 100644
>> --- a/arch/arm/kernel/process.c
>> +++ b/arch/arm/kernel/process.c
>> @@ -41,6 +41,7 @@
>>  #include <asm/system_misc.h>
>>  #include <asm/mach/time.h>
>>  #include <asm/tls.h>
>> +#include <asm/kvm_asm.h>
>>
>>  #ifdef CONFIG_CC_STACKPROTECTOR
>>  #include <linux/stackprotector.h>
>> @@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
>>  };
>>
>>  extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
>> -typedef void (*phys_reset_t)(unsigned long);
>> +typedef void (*phys_reset_t)(void *);
>>
>>  /*
>>   * A temporary stack to use for CPU reset. This is static so that we
>> @@ -89,7 +90,7 @@ static void __soft_restart(void *addr)
>>
>>      /* Switch to the identity mapping. */
>>      phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
>> -    phys_reset((unsigned long)addr);
>> +    kvm_cpu_reset(phys_reset, addr);
>>
>>      /* Should never get here. */
>>      BUG();
>> @@ -107,6 +108,8 @@ void soft_restart(unsigned long addr)
>>      if (num_online_cpus() == 1)
>>          outer_disable();
>>
>> +    kvm_mmu_reset_prepare();
>> +
>>      /* Change to the new stack and continue with the reset. */
>>      call_with_stack(__soft_restart, (void *)addr, (void *)stack);
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 0b0d58a..7727678 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -63,6 +63,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);
>>
>>  static bool vgic_present;
>>
>> +unsigned kvm_saved_sctlr;
>> +
>>  static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
>>  {
>>      BUG_ON(preemptible());
>> @@ -1079,6 +1081,37 @@ out_err:
>>      return err;
>>  }
>>
>> +void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>> +{
>> +    unsigned long vector;
>> +
>> +    if (!is_hyp_mode_available()) {
>> +        phys_reset(addr);
>> +        return;
>> +    }
>> +
>> +    vector = (unsigned long) kvm_get_idmap_vector();
>> +
>> +    kvm_call_hyp(__kvm_flush_vm_context);
>
> What are you trying to achieve here? This only makes sense if you're
> reassigning VMIDs. I would certainly hope that, by the time you're
> tearing KVM down for reboot, no VM is up and running.
>

Too many try flushing the cache... I can remove this flush.

>> +
>> +    /* set vectors so we call initialization routines */
>> +    /* trampoline is still on memory, physical ram is not mapped */
>
> Comment is incredibly misleading. Physical memory *IS* mapped. What you
> have is a kernel mapping of that page.
>

Right, should be "trampoline is mapped at TRAMPOLINE_VA but not at its
physical address so we don't have an identity map to be able to
disable MMU".

>> +    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
>> (vector & (PAGE_SIZE-1))));
>
> That really deserves a comment as to *why* you're doing that offsetting.
>

I think "Set exception vectors at trampoline at TRAMPOLINE_VA address
which is mapped".

>> +
>> +    /* set HVBAR to physical, page table to identity to do the switch */
>> +    kvm_call_hyp((void*) 0x100, (unsigned long) vector,
>> kvm_mmu_get_boot_httbr());
>
> Where is that 0x100 (stack?) coming from?
>

Could be 0x4, it's a valid pointer stack, I don't use stack but I use
to reuse second phase in the identity map.

>> +    flush_cache_all();
>
> Why do you need a flush_all? Why can't you do it by VA?
>> +
>> +    /* Turn off caching on Hypervisor mode */
>> +    kvm_call_hyp((void*) 1);
>> +
>> +    flush_cache_all();
>
> The fact that you have to do it twice is indicative of other problems.
>

It's the same code in __soft_restart. The first flush is to make sure
that code after the cache are disabled see updated values, the second
one is to make sure that code don't see old cached values after cache
is enabled (this happen when second kernel enable it). Without these
flushes you got memory corruption. Yes, probably I can do by VA but we
are on a slow code path restarting the system with a single CPU, best
safe then sorry.

>> +    /* restore execution */
>> +    kvm_call_hyp((void*) 3, kvm_saved_sctlr, addr);
>
> That seems useless to me (as mentioned above). The whole sequence can
> probably be reduced to a couple of HYP calls, and made much more
> straightforward.
>

Yes, could be. However it's better to call flush functions from
supervisor mode as all kernel code is mainly though for that mode.
This call beside restoring the saver HSCTLR also jump to the code that
setup the new kernel so I would say it's quite useful.

>> +}
>> +
>>  /* NOP: Compiling as a module not supported */
>>  void kvm_arch_exit(void)
>>  {
>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>> index 3988e72..79f435a 100644
>> --- a/arch/arm/kvm/init.S
>> +++ b/arch/arm/kvm/init.S
>> @@ -68,12 +68,21 @@ __kvm_hyp_init:
>>      W(b)    .
>>
>>  __do_hyp_init:
>> +    @ check for special values, odd numbers can't be a stack pointer
>> +    cmp    r0, #1
>> +    beq    cpu_proc_fin
>> +    cmp    r0, #3
>> +    beq    restore
>> +
>>      cmp    r0, #0            @ We have a SP?
>>      bne    phase2            @ Yes, second stage init
>>
>>      @ Set the HTTBR to point to the hypervisor PGD pointer passed
>>      mcrr    p15, 4, rr_lo_hi(r2, r3), c2
>>
>> +    @ Save HSCR to be able to return it, save and restore
>> +    mrc    p15, 4, r3, c1, c0, 0    @ HSCR
>> +
>>      @ Set the HTCR and VTCR to the same shareability and cacheability
>>      @ settings as the non-secure TTBCR and with T0SZ == 0.
>>      mrc    p15, 4, r0, c2, c0, 2    @ HTCR
>> @@ -125,6 +134,9 @@ __do_hyp_init:
>>      isb
>>      mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>>
>> +    @ Return initial HSCR register
>> +    mov    r0, r3
>> +
>>      @ End of init phase-1
>>      eret
>>
>> @@ -151,6 +163,29 @@ target:    @ We're now in the trampoline code,
>> switch page tables
>>
>>      eret
>>
>> +cpu_proc_fin:
>> +    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
>> +    bic    r0, r0, #0x1000            @ ...i............
>> +    bic    r0, r0, #0x0006            @ .............ca.
>> +    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
>> +    eret
>> +
>> +restore:
>> +    isb
>> +    mcr    p15, 4, r1, c1, c0, 0    @ HSCR
>> +    isb
>> +    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
>> +    isb
>> +    dsb    ish
>> +
>> +    @ Disable MMU, caches and Thumb in SCTLR
>> +    mrc    p15, 0, r1, c1, c0, 0   @ SCTLR
>> +    ldr    r0, =0x40003807
>> +    bic    r1, r1, r0
>> +    mcr    p15, 0, r1, c1, c0, 0
>> +
>> +    bx    r2
>> +
>>      .ltorg
>>
>>      .globl __kvm_hyp_init_end
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 01dcb0e..449e7dd 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)
>>
>>
>>  /********************************************************************
>> + * Set HVBAR
>> + *
>> + * void __kvm_set_vectors(unsigned long phys_vector_base);
>> + */
>> +ENTRY(__kvm_set_vectors)
>> +    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
>> +    dsb    ish
>> +    isb
>> +    bx    lr
>> +ENDPROC(__kvm_set_vectors)
>> +
>> +/********************************************************************
>>   *  Hypervisor world-switch code
>>   *
>>   *
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 1366625..9149ae5 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -46,6 +46,8 @@ static phys_addr_t hyp_idmap_vector;
>>
>>  #define kvm_pmd_huge(_x)    (pmd_huge(_x) || pmd_trans_huge(_x))
>>
>> +static int kvm_mmu_boot_init(void);
>> +
>>  static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>  {
>>      /*
>> @@ -369,6 +371,11 @@ void free_boot_hyp_pgd(void)
>>      free_page((unsigned long)init_bounce_page);
>>      init_bounce_page = NULL;
>>
>> +    /* avoid to reuse possibly invalid values if bounce page is freed */
>> +    hyp_idmap_start = 0;
>> +    hyp_idmap_end = 0;
>> +    hyp_idmap_vector = 0;
>> +
>
> Well, in this case, why do we free the page? We'd better keep it around
> if we have Kexec...
>

Well.. yes, only I though that people that freed that 16kb (should be)
memory wants that 16kb memory :)
Keeping the trampoline and boot PGD available is much more easier indeed.

>>      mutex_unlock(&kvm_hyp_pgd_mutex);
>>  }
>>
>> @@ -1252,7 +1259,28 @@ phys_addr_t kvm_get_idmap_vector(void)
>>      return hyp_idmap_vector;
>>  }
>>
>> -int kvm_mmu_init(void)
>> +/* assure we have MMU setup correctly */
>> +int kvm_mmu_reset_prepare(void)
>> +{
>> +    int err = 0;
>> +
>> +    if (boot_hyp_pgd)
>> +        goto out;
>> +
>> +    err = kvm_mmu_boot_init();
>> +    if (err)
>> +        goto out;
>> +
>> +    err =     __create_hyp_mappings(hyp_pgd,
>> +                      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> +                      __phys_to_pfn(hyp_idmap_start),
>> +                      PAGE_HYP);
>> +
>> +out:
>> +    return err;
>> +}
>> +
>
> ... and this code can then go away.
>

Ok, I'll remove

>> +static int kvm_mmu_boot_init(void)
>>  {
>>      int err;
>>
>> @@ -1294,10 +1322,9 @@ int kvm_mmu_init(void)
>>               (unsigned long)phys_base);
>>      }
>>
>> -    hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
>> hyp_pgd_order);
>>      boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
>> hyp_pgd_order);
>>
>> -    if (!hyp_pgd || !boot_hyp_pgd) {
>> +    if (!boot_hyp_pgd) {
>>          kvm_err("Hyp mode PGD not allocated\n");
>>          err = -ENOMEM;
>>          goto out;
>> @@ -1326,6 +1353,27 @@ int kvm_mmu_init(void)
>>          goto out;
>>      }
>>
>> +    return 0;
>> +out:
>> +    free_boot_hyp_pgd();
>> +    return err;
>> +}
>> +
>> +int kvm_mmu_init(void)
>> +{
>> +    int err;
>> +
>> +    err = kvm_mmu_boot_init();
>> +    if (err)
>> +        goto out0;
>> +
>> +    hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>> +    if (!hyp_pgd) {
>> +        kvm_err("Hyp mode PGD not allocated\n");
>> +        err = -ENOMEM;
>> +        goto out;
>> +    }
>> +
>>      /* Map the same page again into the runtime page tables */
>>      err =     __create_hyp_mappings(hyp_pgd,
>>                        TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>> @@ -1340,6 +1388,7 @@ int kvm_mmu_init(void)
>>      return 0;
>>  out:
>>      free_hyp_pgds();
>> +out0:
>>      return err;
>>  }
>>
>>
>> Frediano
>>
>
> Thanks for having a look into this, as you're the first one to actually
> do something. That still requires a lot of work, and some careful
> thoughts as to how to do this on arm64.
>
> Looking forward to your next patch,
>
>         M.


So, my todo list:
- removed saved value and test again
- remove vm flush, useless
- correct compile check on header
- try KVM enabled and not, compile link and tests
- rewrite comments, add more where necessary
- remove code to free and allocate again boot PGD and related, keep in
memory if KEXEC is enabled
- try SMP
Beside last step I probably could do something before next week.

Frediano

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-05 14:27             ` Frediano Ziglio
@ 2015-02-05 16:56               ` Frediano Ziglio
  2015-02-06  4:13                 ` AKASHI Takahiro
  0 siblings, 1 reply; 16+ messages in thread
From: Frediano Ziglio @ 2015-02-05 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

2015-02-05 14:27 GMT+00:00 Frediano Ziglio <freddy77@gmail.com>:
> 2015-02-05 13:32 GMT+00:00 Marc Zyngier <marc.zyngier@arm.com>:
>> On 05/02/15 10:51, Frediano Ziglio wrote:
>>> 2015-02-05 10:27 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
>>>> On 02/05/2015 06:43 PM, Marc Zyngier wrote:
>>>>>
>>>>> On Thu, 5 Feb 2015 06:17:04 +0000
>>>>> AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
>>>>>
>>>>>> Frediano
>>>>>> Cc: Marc
>>>>>>
>>>>>> Are you going to fix this issue on arm?
>>>>>> As I'm now working on the same issue on arm64,
>>>>>> we should share the idea and some code.
>>>>>
>>>>>
>>>>> Just to be perfectly clear: KVM support for Kexec on arm and arm64
>>>>> should be exactly the same, with the exception of some (very
>>>>> limited) assembly code for the MMU teardown.
>>>>
>>>>
>>>> Yep, that's exactly what I meant.
>>>>
>>>>> And I'd like to see both architectures addressed at the same time,
>>>>> without statements such as "we'll do that as a next step". As such, I'd
>>>>> very much recommend that Frediano and you work together to address this
>>>>> problem.
>>>>
>>>>
>>>> I hope so.
>>>>
>>>> Thanks,
>>>> -Takahiro AKASHI
>>>>
>>>>> Thanks,
>>>>>
>>>>>          M.
>>>>>
>>>
>>>
>>>
>>> Well, let's code speak. I'm actually testing it with an ARM32 platform
>>> with a single CPU (platform have 16 cores but I'm using a no SMP
>>> kernel for my tests for different reason, I should be able to test SMP
>>> in a couple of days).
>>>
>>> The key point is the switch back code (arm.c, kvm_cpu_reset code). The idea:
>>> - at boot time I save HSCTLR value
>>
>> Why do you need this? The content of that register is known at any point
>> of the lifetime of the kernel, and so is its value at reboot time
>> (basically MMU and caches disabled, ARM mode).
>>
>> This should be part of your MMU teardown process.
>>
>
> I should have put a RFC in the subject. Probably I can just adjust the
> value without saving the boot one.
>
>>> - restore boot page tables and trampoline before cpu reset
>>> - call kvm_cpu_reset instead of cpu_reset (this will call cpu_reset if
>>> kvm is disabled)
>>> - set hvbar to trampoline (virtual memory)
>>> - set hvbar to physical trampoline and boot pages (with trampoline
>>> mapped at physical and virtual that is identity)
>>> - flush memory while disabling caching on hypervisor mode (not doing
>>> cause a lot of instability, now I rebooted more than 100 times in a
>>> row)
>>> - call hypervisor mode to do the reset cpu restoring the boot HSCTLR value
>>>
>>>
>>> Hope gmail does not screw my patch...
>>>
>>>
>>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>>> index 3a67bec..71380a1 100644
>>> --- a/arch/arm/include/asm/kvm_asm.h
>>> +++ b/arch/arm/include/asm/kvm_asm.h
>>> @@ -97,7 +97,24 @@ extern char __kvm_hyp_code_end[];
>>>  extern void __kvm_flush_vm_context(void);
>>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>>
>>> +extern void __kvm_set_vectors(unsigned long phys_vector_base);
>>> +
>>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>> +
>>> +#ifdef CONFIG_HAVE_KVM
>>
>> That doesn't seem right. Shouldn't that be #ifndef?
>>
>
> Well... I don't know where I got that define, the name is wrong too.
> However I think that the kernel should switch to hypervisor mode
> either if kvm is enabled or not. head.S test for CONFIG_ARM_VIRT_EXT
> so probably I should have the same test here. I should also try to
> compile with either KVM enabled or not just to check that kernel links
> correctly.
>
>>> +static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>>> +{
>>> +    phys_reset(addr);
>>> +}
>>> +static inline int kvm_mmu_reset_prepare(void)
>>> +{
>>> +    return 0;
>>> +}
>>> +#else
>>> +extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
>>> +extern int kvm_mmu_reset_prepare(void);
>>> +#endif
>>> +
>>>  #endif
>>>
>>>  #endif /* __ARM_KVM_ASM_H__ */
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 04b4ea0..c0b6c20 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -192,6 +192,8 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu,
>>> const struct kvm_one_reg *);
>>>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>>          int exception_index);
>>>
>>> +extern unsigned kvm_saved_sctlr;
>>> +
>>>  static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>>>                         phys_addr_t pgd_ptr,
>>>                         unsigned long hyp_stack_ptr,
>>> @@ -213,7 +215,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t
>>> boot_pgd_ptr,
>>>       * PCS!).
>>>       */
>>>
>>> -    kvm_call_hyp(NULL, 0, boot_pgd_ptr);
>>> +    kvm_saved_sctlr = (unsigned) kvm_call_hyp(NULL, 0, boot_pgd_ptr);
>>>
>>>      kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
>>>  }
>>> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
>>> index fdfa3a7..026a1e4 100644
>>> --- a/arch/arm/kernel/process.c
>>> +++ b/arch/arm/kernel/process.c
>>> @@ -41,6 +41,7 @@
>>>  #include <asm/system_misc.h>
>>>  #include <asm/mach/time.h>
>>>  #include <asm/tls.h>
>>> +#include <asm/kvm_asm.h>
>>>
>>>  #ifdef CONFIG_CC_STACKPROTECTOR
>>>  #include <linux/stackprotector.h>
>>> @@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
>>>  };
>>>
>>>  extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
>>> -typedef void (*phys_reset_t)(unsigned long);
>>> +typedef void (*phys_reset_t)(void *);
>>>
>>>  /*
>>>   * A temporary stack to use for CPU reset. This is static so that we
>>> @@ -89,7 +90,7 @@ static void __soft_restart(void *addr)
>>>
>>>      /* Switch to the identity mapping. */
>>>      phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
>>> -    phys_reset((unsigned long)addr);
>>> +    kvm_cpu_reset(phys_reset, addr);
>>>
>>>      /* Should never get here. */
>>>      BUG();
>>> @@ -107,6 +108,8 @@ void soft_restart(unsigned long addr)
>>>      if (num_online_cpus() == 1)
>>>          outer_disable();
>>>
>>> +    kvm_mmu_reset_prepare();
>>> +
>>>      /* Change to the new stack and continue with the reset. */
>>>      call_with_stack(__soft_restart, (void *)addr, (void *)stack);
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 0b0d58a..7727678 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -63,6 +63,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);
>>>
>>>  static bool vgic_present;
>>>
>>> +unsigned kvm_saved_sctlr;
>>> +
>>>  static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
>>>  {
>>>      BUG_ON(preemptible());
>>> @@ -1079,6 +1081,37 @@ out_err:
>>>      return err;
>>>  }
>>>
>>> +void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>>> +{
>>> +    unsigned long vector;
>>> +
>>> +    if (!is_hyp_mode_available()) {
>>> +        phys_reset(addr);
>>> +        return;
>>> +    }
>>> +
>>> +    vector = (unsigned long) kvm_get_idmap_vector();
>>> +
>>> +    kvm_call_hyp(__kvm_flush_vm_context);
>>
>> What are you trying to achieve here? This only makes sense if you're
>> reassigning VMIDs. I would certainly hope that, by the time you're
>> tearing KVM down for reboot, no VM is up and running.
>>
>
> Too many try flushing the cache... I can remove this flush.
>
>>> +
>>> +    /* set vectors so we call initialization routines */
>>> +    /* trampoline is still on memory, physical ram is not mapped */
>>
>> Comment is incredibly misleading. Physical memory *IS* mapped. What you
>> have is a kernel mapping of that page.
>>
>
> Right, should be "trampoline is mapped at TRAMPOLINE_VA but not at its
> physical address so we don't have an identity map to be able to
> disable MMU".
>
>>> +    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
>>> (vector & (PAGE_SIZE-1))));
>>
>> That really deserves a comment as to *why* you're doing that offsetting.
>>
>
> I think "Set exception vectors at trampoline at TRAMPOLINE_VA address
> which is mapped".
>
>>> +
>>> +    /* set HVBAR to physical, page table to identity to do the switch */
>>> +    kvm_call_hyp((void*) 0x100, (unsigned long) vector,
>>> kvm_mmu_get_boot_httbr());
>>
>> Where is that 0x100 (stack?) coming from?
>>
>
> Could be 0x4, it's a valid pointer stack, I don't use stack but I use
> to reuse second phase in the identity map.
>
>>> +    flush_cache_all();
>>
>> Why do you need a flush_all? Why can't you do it by VA?
>>> +
>>> +    /* Turn off caching on Hypervisor mode */
>>> +    kvm_call_hyp((void*) 1);
>>> +
>>> +    flush_cache_all();
>>
>> The fact that you have to do it twice is indicative of other problems.
>>
>
> It's the same code in __soft_restart. The first flush is to make sure
> that code after the cache are disabled see updated values, the second
> one is to make sure that code don't see old cached values after cache
> is enabled (this happen when second kernel enable it). Without these
> flushes you got memory corruption. Yes, probably I can do by VA but we
> are on a slow code path restarting the system with a single CPU, best
> safe then sorry.
>
>>> +    /* restore execution */
>>> +    kvm_call_hyp((void*) 3, kvm_saved_sctlr, addr);
>>
>> That seems useless to me (as mentioned above). The whole sequence can
>> probably be reduced to a couple of HYP calls, and made much more
>> straightforward.
>>
>
> Yes, could be. However it's better to call flush functions from
> supervisor mode as all kernel code is mainly though for that mode.
> This call beside restoring the saver HSCTLR also jump to the code that
> setup the new kernel so I would say it's quite useful.
>
>>> +}
>>> +
>>>  /* NOP: Compiling as a module not supported */
>>>  void kvm_arch_exit(void)
>>>  {
>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>> index 3988e72..79f435a 100644
>>> --- a/arch/arm/kvm/init.S
>>> +++ b/arch/arm/kvm/init.S
>>> @@ -68,12 +68,21 @@ __kvm_hyp_init:
>>>      W(b)    .
>>>
>>>  __do_hyp_init:
>>> +    @ check for special values, odd numbers can't be a stack pointer
>>> +    cmp    r0, #1
>>> +    beq    cpu_proc_fin
>>> +    cmp    r0, #3
>>> +    beq    restore
>>> +
>>>      cmp    r0, #0            @ We have a SP?
>>>      bne    phase2            @ Yes, second stage init
>>>
>>>      @ Set the HTTBR to point to the hypervisor PGD pointer passed
>>>      mcrr    p15, 4, rr_lo_hi(r2, r3), c2
>>>
>>> +    @ Save HSCR to be able to return it, save and restore
>>> +    mrc    p15, 4, r3, c1, c0, 0    @ HSCR
>>> +
>>>      @ Set the HTCR and VTCR to the same shareability and cacheability
>>>      @ settings as the non-secure TTBCR and with T0SZ == 0.
>>>      mrc    p15, 4, r0, c2, c0, 2    @ HTCR
>>> @@ -125,6 +134,9 @@ __do_hyp_init:
>>>      isb
>>>      mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>>>
>>> +    @ Return initial HSCR register
>>> +    mov    r0, r3
>>> +
>>>      @ End of init phase-1
>>>      eret
>>>
>>> @@ -151,6 +163,29 @@ target:    @ We're now in the trampoline code,
>>> switch page tables
>>>
>>>      eret
>>>
>>> +cpu_proc_fin:
>>> +    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
>>> +    bic    r0, r0, #0x1000            @ ...i............
>>> +    bic    r0, r0, #0x0006            @ .............ca.
>>> +    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
>>> +    eret
>>> +
>>> +restore:
>>> +    isb
>>> +    mcr    p15, 4, r1, c1, c0, 0    @ HSCR
>>> +    isb
>>> +    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
>>> +    isb
>>> +    dsb    ish
>>> +
>>> +    @ Disable MMU, caches and Thumb in SCTLR
>>> +    mrc    p15, 0, r1, c1, c0, 0   @ SCTLR
>>> +    ldr    r0, =0x40003807
>>> +    bic    r1, r1, r0
>>> +    mcr    p15, 0, r1, c1, c0, 0
>>> +
>>> +    bx    r2
>>> +
>>>      .ltorg
>>>
>>>      .globl __kvm_hyp_init_end
>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>> index 01dcb0e..449e7dd 100644
>>> --- a/arch/arm/kvm/interrupts.S
>>> +++ b/arch/arm/kvm/interrupts.S
>>> @@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)
>>>
>>>
>>>  /********************************************************************
>>> + * Set HVBAR
>>> + *
>>> + * void __kvm_set_vectors(unsigned long phys_vector_base);
>>> + */
>>> +ENTRY(__kvm_set_vectors)
>>> +    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
>>> +    dsb    ish
>>> +    isb
>>> +    bx    lr
>>> +ENDPROC(__kvm_set_vectors)
>>> +
>>> +/********************************************************************
>>>   *  Hypervisor world-switch code
>>>   *
>>>   *
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 1366625..9149ae5 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -46,6 +46,8 @@ static phys_addr_t hyp_idmap_vector;
>>>
>>>  #define kvm_pmd_huge(_x)    (pmd_huge(_x) || pmd_trans_huge(_x))
>>>
>>> +static int kvm_mmu_boot_init(void);
>>> +
>>>  static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>>  {
>>>      /*
>>> @@ -369,6 +371,11 @@ void free_boot_hyp_pgd(void)
>>>      free_page((unsigned long)init_bounce_page);
>>>      init_bounce_page = NULL;
>>>
>>> +    /* avoid to reuse possibly invalid values if bounce page is freed */
>>> +    hyp_idmap_start = 0;
>>> +    hyp_idmap_end = 0;
>>> +    hyp_idmap_vector = 0;
>>> +
>>
>> Well, in this case, why do we free the page? We'd better keep it around
>> if we have Kexec...
>>
>
> Well.. yes, only I though that people that freed that 16kb (should be)
> memory wants that 16kb memory :)
> Keeping the trampoline and boot PGD available is much more easier indeed.
>
>>>      mutex_unlock(&kvm_hyp_pgd_mutex);
>>>  }
>>>
>>> @@ -1252,7 +1259,28 @@ phys_addr_t kvm_get_idmap_vector(void)
>>>      return hyp_idmap_vector;
>>>  }
>>>
>>> -int kvm_mmu_init(void)
>>> +/* assure we have MMU setup correctly */
>>> +int kvm_mmu_reset_prepare(void)
>>> +{
>>> +    int err = 0;
>>> +
>>> +    if (boot_hyp_pgd)
>>> +        goto out;
>>> +
>>> +    err = kvm_mmu_boot_init();
>>> +    if (err)
>>> +        goto out;
>>> +
>>> +    err =     __create_hyp_mappings(hyp_pgd,
>>> +                      TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>>> +                      __phys_to_pfn(hyp_idmap_start),
>>> +                      PAGE_HYP);
>>> +
>>> +out:
>>> +    return err;
>>> +}
>>> +
>>
>> ... and this code can then go away.
>>
>
> Ok, I'll remove
>
>>> +static int kvm_mmu_boot_init(void)
>>>  {
>>>      int err;
>>>
>>> @@ -1294,10 +1322,9 @@ int kvm_mmu_init(void)
>>>               (unsigned long)phys_base);
>>>      }
>>>
>>> -    hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
>>> hyp_pgd_order);
>>>      boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
>>> hyp_pgd_order);
>>>
>>> -    if (!hyp_pgd || !boot_hyp_pgd) {
>>> +    if (!boot_hyp_pgd) {
>>>          kvm_err("Hyp mode PGD not allocated\n");
>>>          err = -ENOMEM;
>>>          goto out;
>>> @@ -1326,6 +1353,27 @@ int kvm_mmu_init(void)
>>>          goto out;
>>>      }
>>>
>>> +    return 0;
>>> +out:
>>> +    free_boot_hyp_pgd();
>>> +    return err;
>>> +}
>>> +
>>> +int kvm_mmu_init(void)
>>> +{
>>> +    int err;
>>> +
>>> +    err = kvm_mmu_boot_init();
>>> +    if (err)
>>> +        goto out0;
>>> +
>>> +    hyp_pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
>>> +    if (!hyp_pgd) {
>>> +        kvm_err("Hyp mode PGD not allocated\n");
>>> +        err = -ENOMEM;
>>> +        goto out;
>>> +    }
>>> +
>>>      /* Map the same page again into the runtime page tables */
>>>      err =     __create_hyp_mappings(hyp_pgd,
>>>                        TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
>>> @@ -1340,6 +1388,7 @@ int kvm_mmu_init(void)
>>>      return 0;
>>>  out:
>>>      free_hyp_pgds();
>>> +out0:
>>>      return err;
>>>  }
>>>
>>>
>>> Frediano
>>>
>>
>> Thanks for having a look into this, as you're the first one to actually
>> do something. That still requires a lot of work, and some careful
>> thoughts as to how to do this on arm64.
>>
>> Looking forward to your next patch,
>>
>>         M.
>
>
> So, my todo list:
> - removed saved value and test again
> - remove vm flush, useless
> - correct compile check on header
> - try KVM enabled and not, compile link and tests
> - rewrite comments, add more where necessary
> - remove code to free and allocate again boot PGD and related, keep in
> memory if KEXEC is enabled
> - try SMP
> Beside last step I probably could do something before next week.
>
> Frediano

New version. Changes:
- removed saved value and test again
- remove vm flush, useless
- correct compile check on header
- try KVM enabled and not, compile link and tests
- rewrite comments, add more where necessary
- remove code to free and allocate again boot PGD and related, keep in
memory if KEXEC is enabled

Still not trying SMP. Still to be considered RFC. I tried different
compile options (KEXEC and KVM turned on/off). I realized that I
should test if kernel is started with SVC mode code still works
correctly. ARM_VIRT_EXT is always turned on for V7 CPU.


diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 3a67bec..985f0a3 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -97,7 +97,19 @@ extern char __kvm_hyp_code_end[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);

+extern void __kvm_set_vectors(unsigned long phys_vector_base);
+
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+#ifndef CONFIG_ARM_VIRT_EXT
+static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
+{
+    phys_reset(addr);
+}
+#else
+extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
+#endif
+
 #endif

 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
index 2a55373..30339ad 100644
--- a/arch/arm/kernel/hyp-stub.S
+++ b/arch/arm/kernel/hyp-stub.S
@@ -164,13 +164,63 @@ ARM_BE8(orr    r7, r7, #(1 << 25))     @ HSCTLR.EE
     bx    lr            @ The boot CPU mode is left in r4.
 ENDPROC(__hyp_stub_install_secondary)

+#undef CPU_RESET
+#if defined(CONFIG_ARM_VIRT_EXT) && !defined(CONFIG_KVM) && !defined(ZIMAGE)
+#  define CPU_RESET 1
+#endif
+
 __hyp_stub_do_trap:
+#ifdef CPU_RESET
+    cmp    r0, #-2
+    bne    1f
+
+    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
+    ldr    r2, =0x40003807
+    bic    r0, r0, r2
+    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
+    isb
+
+    @ Disable MMU, caches and Thumb in SCTLR
+    mrc    p15, 0, r0, c1, c0, 0    @ SCTLR
+    bic    r0, r0, r2
+    mcr    p15, 0, r0, c1, c0, 0
+
+    bx    r1
+    .ltorg
+1:
+#endif
     cmp    r0, #-1
     mrceq    p15, 4, r0, c12, c0, 0    @ get HVBAR
     mcrne    p15, 4, r0, c12, c0, 0    @ set HVBAR
     __ERET
 ENDPROC(__hyp_stub_do_trap)

+#ifdef CPU_RESET
+/*
+ * r0 cpu_reset function
+ * r1 address to jump to
+ */
+ENTRY(kvm_cpu_reset)
+    mov    r2, r0
+
+    @ __boot_cpu_mode should be HYP_MODE
+    adr    r0, .L__boot_cpu_mode_offset
+    ldr    r3, [r0]
+    ldr    r0, [r0, r3]
+    cmp    r0, #HYP_MODE
+    beq    reset_to_hyp
+
+    @ Call SVC mode cpu_reset
+    mov    r0, r1
+THUMB(    orr    r2, r2, 1    )
+    bx    r2
+
+reset_to_hyp:
+    mov    r0, #-2
+    b    __hyp_set_vectors
+ENDPROC(kvm_cpu_reset)
+#endif
+
 /*
  * __hyp_set_vectors: Call this after boot to set the initial hypervisor
  * vectors as part of hypervisor installation.  On an SMP system, this should
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index fdfa3a7..c018719 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -41,6 +41,7 @@
 #include <asm/system_misc.h>
 #include <asm/mach/time.h>
 #include <asm/tls.h>
+#include <asm/kvm_asm.h>

 #ifdef CONFIG_CC_STACKPROTECTOR
 #include <linux/stackprotector.h>
@@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
 };

 extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
-typedef void (*phys_reset_t)(unsigned long);
+typedef void (*phys_reset_t)(void *);

 /*
  * A temporary stack to use for CPU reset. This is static so that we
@@ -89,7 +90,8 @@ static void __soft_restart(void *addr)

     /* Switch to the identity mapping. */
     phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
-    phys_reset((unsigned long)addr);
+
+    kvm_cpu_reset(phys_reset, addr);

     /* Should never get here. */
     BUG();
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0b0d58a..e4d4a2b 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1009,7 +1009,7 @@ static int init_hyp_mode(void)
     if (err)
         goto out_free_mappings;

-#ifndef CONFIG_HOTPLUG_CPU
+#if !defined(CONFIG_HOTPLUG_CPU) && !defined(CONFIG_KEXEC)
     free_boot_hyp_pgd();
 #endif

@@ -1079,6 +1079,53 @@ out_err:
     return err;
 }

+void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
+{
+    unsigned long vector;
+
+    if (!is_hyp_mode_available()) {
+        phys_reset(addr);
+        return;
+    }
+
+    vector = (unsigned long) kvm_get_idmap_vector();
+
+    /*
+     * Set vectors so we call initialization routines.
+     * trampoline is mapped at TRAMPOLINE_VA but not at its physical
+     * address so we don't have an identity map to be able to
+     * disable MMU. We need to set exception vector at trampoline
+     * at TRAMPOLINE_VA address which is mapped.
+     */
+    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
(vector & (PAGE_SIZE-1))));
+
+    /*
+     * Set HVBAR to physical address, page table to identity to do the switch
+     */
+    kvm_call_hyp((void*) 4, (unsigned long) vector, kvm_mmu_get_boot_httbr());
+
+    /*
+     * Flush to make sure code after the cache are disabled see updated values
+     */
+    flush_cache_all();
+
+    /*
+     * Turn off caching on Hypervisor mode
+     */
+    kvm_call_hyp((void*) 1);
+
+    /*
+     * Flush to make sude code don't see old cached values after cache is
+     * enabled
+     */
+    flush_cache_all();
+
+    /*
+     * Restore execution
+     */
+    kvm_call_hyp((void*) 3, addr);
+}
+
 /* NOP: Compiling as a module not supported */
 void kvm_arch_exit(void)
 {
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72..c2f1d4d 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -68,6 +68,12 @@ __kvm_hyp_init:
     W(b)    .

 __do_hyp_init:
+    @ check for special values, odd numbers can't be a stack pointer
+    cmp    r0, #1
+    beq    cpu_proc_fin
+    cmp    r0, #3
+    beq    restore
+
     cmp    r0, #0            @ We have a SP?
     bne    phase2            @ Yes, second stage init

@@ -151,6 +157,33 @@ target:    @ We're now in the trampoline code,
switch page tables

     eret

+cpu_proc_fin:
+    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
+    bic    r0, r0, #0x1000            @ ...i............
+    bic    r0, r0, #0x0006            @ .............ca.
+    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
+    eret
+
+restore:
+    @ Disable MMU, caches and Thumb in HSCTLR
+    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
+    ldr    r2, =0x40003807
+    bic    r0, r0, r2
+    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
+    isb
+
+    @ Invalidate old TLBs
+    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
+    isb
+    dsb    ish
+
+    @ Disable MMU, caches and Thumb in SCTLR
+    mrc    p15, 0, r0, c1, c0, 0   @ SCTLR
+    bic    r0, r0, r2
+    mcr    p15, 0, r0, c1, c0, 0
+
+    bx    r1
+
     .ltorg

     .globl __kvm_hyp_init_end
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 01dcb0e..449e7dd 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)


 /********************************************************************
+ * Set HVBAR
+ *
+ * void __kvm_set_vectors(unsigned long phys_vector_base);
+ */
+ENTRY(__kvm_set_vectors)
+    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
+    dsb    ish
+    isb
+    bx    lr
+ENDPROC(__kvm_set_vectors)
+
+/********************************************************************
  *  Hypervisor world-switch code
  *
  *
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1366625..f853858 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -369,6 +369,11 @@ void free_boot_hyp_pgd(void)
     free_page((unsigned long)init_bounce_page);
     init_bounce_page = NULL;

+    /* avoid to reuse possibly invalid values if bounce page is freed */
+    hyp_idmap_start = 0;
+    hyp_idmap_end = 0;
+    hyp_idmap_vector = 0;
+
     mutex_unlock(&kvm_hyp_pgd_mutex);
 }


Frediano

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-05 16:56               ` Frediano Ziglio
@ 2015-02-06  4:13                 ` AKASHI Takahiro
  2015-02-06  9:50                   ` Frediano Ziglio
  0 siblings, 1 reply; 16+ messages in thread
From: AKASHI Takahiro @ 2015-02-06  4:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Frediano
[add Geoff to cc]

I still have to compare my code with yours, but please note that
some files in arch/arm/kvm are shared with arm64, especially
arm.c, mmu.c and related headers.

So can you please
- submit your new patch without including old e-mail threads.
   (don't forget RFC or PATCH.)
- break it up into two pieces, one for common, the other for arm

and a few more comments below:

On 02/06/2015 01:56 AM, Frediano Ziglio wrote:

[snip]

> New version. Changes:
> - removed saved value and test again
> - remove vm flush, useless
> - correct compile check on header
> - try KVM enabled and not, compile link and tests
> - rewrite comments, add more where necessary
> - remove code to free and allocate again boot PGD and related, keep in
> memory if KEXEC is enabled
>
> Still not trying SMP. Still to be considered RFC. I tried different
> compile options (KEXEC and KVM turned on/off). I realized that I
> should test if kernel is started with SVC mode code still works
> correctly. ARM_VIRT_EXT is always turned on for V7 CPU.
>
>
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 3a67bec..985f0a3 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -97,7 +97,19 @@ extern char __kvm_hyp_code_end[];
>   extern void __kvm_flush_vm_context(void);
>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>
> +extern void __kvm_set_vectors(unsigned long phys_vector_base);
> +
>   extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +
> +#ifndef CONFIG_ARM_VIRT_EXT

Arm64 doesn't have CONFIG_ARM_VIRT_EXT. Why don't use CONFIG_KVM?
I'd rather put this conditional into the place where the function is
actually called for flexible implementation. (See below)

> +static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
> +{
> +    phys_reset(addr);
> +}
> +#else
> +extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
> +#endif
> +
>   #endif
>
>   #endif /* __ARM_KVM_ASM_H__ */
> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
> index 2a55373..30339ad 100644
> --- a/arch/arm/kernel/hyp-stub.S
> +++ b/arch/arm/kernel/hyp-stub.S
> @@ -164,13 +164,63 @@ ARM_BE8(orr    r7, r7, #(1 << 25))     @ HSCTLR.EE
>       bx    lr            @ The boot CPU mode is left in r4.
>   ENDPROC(__hyp_stub_install_secondary)
>
> +#undef CPU_RESET
> +#if defined(CONFIG_ARM_VIRT_EXT) && !defined(CONFIG_KVM) && !defined(ZIMAGE)
> +#  define CPU_RESET 1
> +#endif
> +
>   __hyp_stub_do_trap:
> +#ifdef CPU_RESET
> +    cmp    r0, #-2
> +    bne    1f
> +
> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
> +    ldr    r2, =0x40003807
> +    bic    r0, r0, r2
> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
> +    isb
> +
> +    @ Disable MMU, caches and Thumb in SCTLR
> +    mrc    p15, 0, r0, c1, c0, 0    @ SCTLR
> +    bic    r0, r0, r2
> +    mcr    p15, 0, r0, c1, c0, 0
> +
> +    bx    r1
> +    .ltorg
> +1:
> +#endif
>       cmp    r0, #-1
>       mrceq    p15, 4, r0, c12, c0, 0    @ get HVBAR
>       mcrne    p15, 4, r0, c12, c0, 0    @ set HVBAR
>       __ERET
>   ENDPROC(__hyp_stub_do_trap)
>
> +#ifdef CPU_RESET
> +/*
> + * r0 cpu_reset function
> + * r1 address to jump to
> + */
> +ENTRY(kvm_cpu_reset)
> +    mov    r2, r0
> +
> +    @ __boot_cpu_mode should be HYP_MODE
> +    adr    r0, .L__boot_cpu_mode_offset
> +    ldr    r3, [r0]
> +    ldr    r0, [r0, r3]
> +    cmp    r0, #HYP_MODE
> +    beq    reset_to_hyp
> +
> +    @ Call SVC mode cpu_reset
> +    mov    r0, r1
> +THUMB(    orr    r2, r2, 1    )
> +    bx    r2
> +
> +reset_to_hyp:
> +    mov    r0, #-2
> +    b    __hyp_set_vectors
> +ENDPROC(kvm_cpu_reset)
> +#endif
> +
>   /*
>    * __hyp_set_vectors: Call this after boot to set the initial hypervisor
>    * vectors as part of hypervisor installation.  On an SMP system, this should
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index fdfa3a7..c018719 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -41,6 +41,7 @@
>   #include <asm/system_misc.h>
>   #include <asm/mach/time.h>
>   #include <asm/tls.h>
> +#include <asm/kvm_asm.h>
>
>   #ifdef CONFIG_CC_STACKPROTECTOR
>   #include <linux/stackprotector.h>
> @@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
>   };
>
>   extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
> -typedef void (*phys_reset_t)(unsigned long);
> +typedef void (*phys_reset_t)(void *);
>
>   /*
>    * A temporary stack to use for CPU reset. This is static so that we
> @@ -89,7 +90,8 @@ static void __soft_restart(void *addr)
>
>       /* Switch to the identity mapping. */
>       phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
> -    phys_reset((unsigned long)addr);
> +
> +    kvm_cpu_reset(phys_reset, addr);

How about this?
#ifdef CONFIG_KVM
	kvm_cpu_reset(...);
#endif
	phys_reset(addr);

It will clearify the purpose of kvm_cpu_reset().
The name of kvm_cpu_reset() might better be cpu_*de*init_hyp_mode or similar
given that the function would be called in hyp_init_cpu_notify()
once kvm supports cpu hotplug in the future.

>       /* Should never get here. */
>       BUG();
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 0b0d58a..e4d4a2b 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -1009,7 +1009,7 @@ static int init_hyp_mode(void)
>       if (err)
>           goto out_free_mappings;
>
> -#ifndef CONFIG_HOTPLUG_CPU
> +#if !defined(CONFIG_HOTPLUG_CPU) && !defined(CONFIG_KEXEC)
>       free_boot_hyp_pgd();
>   #endif
>
> @@ -1079,6 +1079,53 @@ out_err:
>       return err;
>   }
>
> +void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)

I'm not sure, but it might be better to put this code into arm/kernel/init.S
as it is a counterpart of _do_hyp_init from cpu_init_hyp_mode().

> +{
> +    unsigned long vector;
> +
> +    if (!is_hyp_mode_available()) {
> +        phys_reset(addr);
> +        return;
> +    }
> +
> +    vector = (unsigned long) kvm_get_idmap_vector();
> +
> +    /*
> +     * Set vectors so we call initialization routines.
> +     * trampoline is mapped at TRAMPOLINE_VA but not at its physical
> +     * address so we don't have an identity map to be able to
> +     * disable MMU. We need to set exception vector at trampoline
> +     * at TRAMPOLINE_VA address which is mapped.
> +     */
> +    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
> (vector & (PAGE_SIZE-1))));
> +
> +    /*
> +     * Set HVBAR to physical address, page table to identity to do the switch
> +     */
> +    kvm_call_hyp((void*) 4, (unsigned long) vector, kvm_mmu_get_boot_httbr());
> +
> +    /*
> +     * Flush to make sure code after the cache are disabled see updated values
> +     */
> +    flush_cache_all();
> +
> +    /*
> +     * Turn off caching on Hypervisor mode
> +     */
> +    kvm_call_hyp((void*) 1);
> +
> +    /*
> +     * Flush to make sude code don't see old cached values after cache is
> +     * enabled
> +     */
> +    flush_cache_all();
> +
> +    /*
> +     * Restore execution
> +     */
> +    kvm_call_hyp((void*) 3, addr);
> +}
> +

If you like kvm_call_hyp-like sequences, please specify what each kvm_call_hyp() should do.
(we can do all in one kvm_call_hyp() though.)

Thanks,
-Takahiro AKASHI

>   /* NOP: Compiling as a module not supported */
>   void kvm_arch_exit(void)
>   {
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index 3988e72..c2f1d4d 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -68,6 +68,12 @@ __kvm_hyp_init:
>       W(b)    .
>
>   __do_hyp_init:
> +    @ check for special values, odd numbers can't be a stack pointer
> +    cmp    r0, #1
> +    beq    cpu_proc_fin
> +    cmp    r0, #3
> +    beq    restore
> +
>       cmp    r0, #0            @ We have a SP?
>       bne    phase2            @ Yes, second stage init
>
> @@ -151,6 +157,33 @@ target:    @ We're now in the trampoline code,
> switch page tables
>
>       eret
>
> +cpu_proc_fin:
> +    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
> +    bic    r0, r0, #0x1000            @ ...i............
> +    bic    r0, r0, #0x0006            @ .............ca.
> +    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
> +    eret
> +
> +restore:
> +    @ Disable MMU, caches and Thumb in HSCTLR
> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
> +    ldr    r2, =0x40003807
> +    bic    r0, r0, r2
> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
> +    isb
> +
> +    @ Invalidate old TLBs
> +    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
> +    isb
> +    dsb    ish
> +
> +    @ Disable MMU, caches and Thumb in SCTLR
> +    mrc    p15, 0, r0, c1, c0, 0   @ SCTLR
> +    bic    r0, r0, r2
> +    mcr    p15, 0, r0, c1, c0, 0
> +
> +    bx    r1
> +
>       .ltorg
>
>       .globl __kvm_hyp_init_end
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 01dcb0e..449e7dd 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)
>
>
>   /********************************************************************
> + * Set HVBAR
> + *
> + * void __kvm_set_vectors(unsigned long phys_vector_base);
> + */
> +ENTRY(__kvm_set_vectors)
> +    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
> +    dsb    ish
> +    isb
> +    bx    lr
> +ENDPROC(__kvm_set_vectors)
> +
> +/********************************************************************
>    *  Hypervisor world-switch code
>    *
>    *
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 1366625..f853858 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -369,6 +369,11 @@ void free_boot_hyp_pgd(void)
>       free_page((unsigned long)init_bounce_page);
>       init_bounce_page = NULL;
>
> +    /* avoid to reuse possibly invalid values if bounce page is freed */
> +    hyp_idmap_start = 0;
> +    hyp_idmap_end = 0;
> +    hyp_idmap_vector = 0;
> +
>       mutex_unlock(&kvm_hyp_pgd_mutex);
>   }
>
>
> Frediano
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-06  4:13                 ` AKASHI Takahiro
@ 2015-02-06  9:50                   ` Frediano Ziglio
  2015-02-06 10:56                     ` AKASHI Takahiro
  0 siblings, 1 reply; 16+ messages in thread
From: Frediano Ziglio @ 2015-02-06  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

2015-02-06 4:13 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
> Hi Frediano
> [add Geoff to cc]
>
> I still have to compare my code with yours, but please note that
> some files in arch/arm/kvm are shared with arm64, especially
> arm.c, mmu.c and related headers.
>

Didn't know.

> So can you please
> - submit your new patch without including old e-mail threads.
>   (don't forget RFC or PATCH.)
> - break it up into two pieces, one for common, the other for arm
>

I'll do for next.

> and a few more comments below:
>
> On 02/06/2015 01:56 AM, Frediano Ziglio wrote:
>
> [snip]
>
>> New version. Changes:
>> - removed saved value and test again
>> - remove vm flush, useless
>> - correct compile check on header
>> - try KVM enabled and not, compile link and tests
>> - rewrite comments, add more where necessary
>> - remove code to free and allocate again boot PGD and related, keep in
>> memory if KEXEC is enabled
>>
>> Still not trying SMP. Still to be considered RFC. I tried different
>> compile options (KEXEC and KVM turned on/off). I realized that I
>> should test if kernel is started with SVC mode code still works
>> correctly. ARM_VIRT_EXT is always turned on for V7 CPU.
>>
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h
>> b/arch/arm/include/asm/kvm_asm.h
>> index 3a67bec..985f0a3 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -97,7 +97,19 @@ extern char __kvm_hyp_code_end[];
>>   extern void __kvm_flush_vm_context(void);
>>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>
>> +extern void __kvm_set_vectors(unsigned long phys_vector_base);
>> +
>>   extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>> +
>> +#ifndef CONFIG_ARM_VIRT_EXT
>
>
> Arm64 doesn't have CONFIG_ARM_VIRT_EXT. Why don't use CONFIG_KVM?
> I'd rather put this conditional into the place where the function is
> actually called for flexible implementation. (See below)
>

In ARMv7 you can have ARM_VIRT_EXT but not KVM. ARM_VIRT_EXT means the
processor can support virtualization while KVM means you want to
compile kvm in the kernel. So code should be defined if either are
defined.

>
>> +static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>> +{
>> +    phys_reset(addr);
>> +}
>> +#else
>> +extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
>> +#endif
>> +
>>   #endif
>>
>>   #endif /* __ARM_KVM_ASM_H__ */
>> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
>> index 2a55373..30339ad 100644
>> --- a/arch/arm/kernel/hyp-stub.S
>> +++ b/arch/arm/kernel/hyp-stub.S
>> @@ -164,13 +164,63 @@ ARM_BE8(orr    r7, r7, #(1 << 25))     @ HSCTLR.EE
>>       bx    lr            @ The boot CPU mode is left in r4.
>>   ENDPROC(__hyp_stub_install_secondary)
>>
>> +#undef CPU_RESET
>> +#if defined(CONFIG_ARM_VIRT_EXT) && !defined(CONFIG_KVM) &&
>> !defined(ZIMAGE)
>> +#  define CPU_RESET 1
>> +#endif
>> +
>>   __hyp_stub_do_trap:
>> +#ifdef CPU_RESET
>> +    cmp    r0, #-2
>> +    bne    1f
>> +
>> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
>> +    ldr    r2, =0x40003807
>> +    bic    r0, r0, r2
>> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>> +    isb
>> +
>> +    @ Disable MMU, caches and Thumb in SCTLR
>> +    mrc    p15, 0, r0, c1, c0, 0    @ SCTLR
>> +    bic    r0, r0, r2
>> +    mcr    p15, 0, r0, c1, c0, 0
>> +
>> +    bx    r1
>> +    .ltorg
>> +1:
>> +#endif
>>       cmp    r0, #-1
>>       mrceq    p15, 4, r0, c12, c0, 0    @ get HVBAR
>>       mcrne    p15, 4, r0, c12, c0, 0    @ set HVBAR
>>       __ERET
>>   ENDPROC(__hyp_stub_do_trap)
>>
>> +#ifdef CPU_RESET
>> +/*
>> + * r0 cpu_reset function
>> + * r1 address to jump to
>> + */
>> +ENTRY(kvm_cpu_reset)
>> +    mov    r2, r0
>> +
>> +    @ __boot_cpu_mode should be HYP_MODE
>> +    adr    r0, .L__boot_cpu_mode_offset
>> +    ldr    r3, [r0]
>> +    ldr    r0, [r0, r3]
>> +    cmp    r0, #HYP_MODE
>> +    beq    reset_to_hyp
>> +
>> +    @ Call SVC mode cpu_reset
>> +    mov    r0, r1
>> +THUMB(    orr    r2, r2, 1    )
>> +    bx    r2
>> +
>> +reset_to_hyp:
>> +    mov    r0, #-2
>> +    b    __hyp_set_vectors
>> +ENDPROC(kvm_cpu_reset)
>> +#endif
>> +
>>   /*
>>    * __hyp_set_vectors: Call this after boot to set the initial hypervisor
>>    * vectors as part of hypervisor installation.  On an SMP system, this
>> should
>> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
>> index fdfa3a7..c018719 100644
>> --- a/arch/arm/kernel/process.c
>> +++ b/arch/arm/kernel/process.c
>> @@ -41,6 +41,7 @@
>>   #include <asm/system_misc.h>
>>   #include <asm/mach/time.h>
>>   #include <asm/tls.h>
>> +#include <asm/kvm_asm.h>
>>
>>   #ifdef CONFIG_CC_STACKPROTECTOR
>>   #include <linux/stackprotector.h>
>> @@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
>>   };
>>
>>   extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
>> -typedef void (*phys_reset_t)(unsigned long);
>> +typedef void (*phys_reset_t)(void *);
>>
>>   /*
>>    * A temporary stack to use for CPU reset. This is static so that we
>> @@ -89,7 +90,8 @@ static void __soft_restart(void *addr)
>>
>>       /* Switch to the identity mapping. */
>>       phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
>> -    phys_reset((unsigned long)addr);
>> +
>> +    kvm_cpu_reset(phys_reset, addr);
>
>
> How about this?
> #ifdef CONFIG_KVM
>         kvm_cpu_reset(...);
> #endif
>         phys_reset(addr);
>
> It will clearify the purpose of kvm_cpu_reset().
> The name of kvm_cpu_reset() might better be cpu_*de*init_hyp_mode or similar
> given that the function would be called in hyp_init_cpu_notify()
> once kvm supports cpu hotplug in the future.
>

the purpose of kvm_cpu_reset is to reset state and jump to newer
kernel initialization code. Is a no return function like cpu_reset. So
perhaps code can be

#if defined(CONFIG_KVM) || defined(CONFIG_ARM_VIRT_EXT)
        kvm_cpu_reset(...);
#else
        phys_reset(addr);
#endif

>>       /* Should never get here. */
>>       BUG();
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 0b0d58a..e4d4a2b 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -1009,7 +1009,7 @@ static int init_hyp_mode(void)
>>       if (err)
>>           goto out_free_mappings;
>>
>> -#ifndef CONFIG_HOTPLUG_CPU
>> +#if !defined(CONFIG_HOTPLUG_CPU) && !defined(CONFIG_KEXEC)
>>       free_boot_hyp_pgd();
>>   #endif
>>
>> @@ -1079,6 +1079,53 @@ out_err:
>>       return err;
>>   }
>>
>> +void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>
>
> I'm not sure, but it might be better to put this code into arm/kernel/init.S
> as it is a counterpart of _do_hyp_init from cpu_init_hyp_mode().
>

init.S is written in assembly, kvm_cpu_reset for kvm is written in C.
There is another version if kvm is disabled.

The kernel code is supposed (at least in ARMv7) to run in SVC mode,
not on HYP mode. One reason is that code that change SCTLR executed at
HYP instead of SVC would not change current mode system control
register. The function does not revert to HYP for this reason.

>
>> +{
>> +    unsigned long vector;
>> +
>> +    if (!is_hyp_mode_available()) {
>> +        phys_reset(addr);
>> +        return;
>> +    }
>> +
>> +    vector = (unsigned long) kvm_get_idmap_vector();
>> +
>> +    /*
>> +     * Set vectors so we call initialization routines.
>> +     * trampoline is mapped at TRAMPOLINE_VA but not at its physical
>> +     * address so we don't have an identity map to be able to
>> +     * disable MMU. We need to set exception vector at trampoline
>> +     * at TRAMPOLINE_VA address which is mapped.
>> +     */
>> +    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
>> (vector & (PAGE_SIZE-1))));
>> +
>> +    /*
>> +     * Set HVBAR to physical address, page table to identity to do the
>> switch
>> +     */
>> +    kvm_call_hyp((void*) 4, (unsigned long) vector,
>> kvm_mmu_get_boot_httbr());
>> +
>> +    /*
>> +     * Flush to make sure code after the cache are disabled see updated
>> values
>> +     */
>> +    flush_cache_all();
>> +
>> +    /*
>> +     * Turn off caching on Hypervisor mode
>> +     */
>> +    kvm_call_hyp((void*) 1);
>> +
>> +    /*
>> +     * Flush to make sude code don't see old cached values after cache is
>> +     * enabled
>> +     */
>> +    flush_cache_all();
>> +
>> +    /*
>> +     * Restore execution
>> +     */
>> +    kvm_call_hyp((void*) 3, addr);
>> +}
>> +
>
>
> If you like kvm_call_hyp-like sequences, please specify what each
> kvm_call_hyp() should do.
> (we can do all in one kvm_call_hyp() though.)
>

kvm_call_hyp support a restricted number of arguments as they must be
on registers and I actually need more registers. I found quite
confusing that every vector table and HVC call have it's interface.
One for stub (-1 get HVBAR otherwise set), one for kvm (-1 get HVBAR
otherwise function pointer) and another for kvm initialization (0
first initialization, otherwise stack pointer, HVBAR and HTTBR).
Wouldn't be much nicer to pass a index for a function, something like
(in C pseudocode)

typedef unsigned (*hvc_func_t)(unsigned a1, unsigned a2, unsigned a3);

hvc_func_t hvc_funcs[MAX_HVC_FUNC];

#define HVC_FUNC_GET_HVBAR 0
#define HVC_FUNC_SET_HVBAR 1
#define HVC_FUNC_FLUSH_xxx 2
...

and in assembly we could just use these indexes ?? Of course some HVC
implementation should return not implemented.

I could disable cache on first call (the one I set just HVBAR) and
doing the flush than call second one with:
- vector in physical address (to jump to trampoline before disabling
MMU) with bit 0 set to distinguish from stack/0;
- address of newer kernel
- boot HTTBR (it takes 2 register as it's 64 bit)

I'll have a look at arm64 code.

Thank you,
   Frediano

> Thanks,
> -Takahiro AKASHI
>
>
>>   /* NOP: Compiling as a module not supported */
>>   void kvm_arch_exit(void)
>>   {
>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>> index 3988e72..c2f1d4d 100644
>> --- a/arch/arm/kvm/init.S
>> +++ b/arch/arm/kvm/init.S
>> @@ -68,6 +68,12 @@ __kvm_hyp_init:
>>       W(b)    .
>>
>>   __do_hyp_init:
>> +    @ check for special values, odd numbers can't be a stack pointer
>> +    cmp    r0, #1
>> +    beq    cpu_proc_fin
>> +    cmp    r0, #3
>> +    beq    restore
>> +
>>       cmp    r0, #0            @ We have a SP?
>>       bne    phase2            @ Yes, second stage init
>>
>> @@ -151,6 +157,33 @@ target:    @ We're now in the trampoline code,
>> switch page tables
>>
>>       eret
>>
>> +cpu_proc_fin:
>> +    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
>> +    bic    r0, r0, #0x1000            @ ...i............
>> +    bic    r0, r0, #0x0006            @ .............ca.
>> +    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
>> +    eret
>> +
>> +restore:
>> +    @ Disable MMU, caches and Thumb in HSCTLR
>> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
>> +    ldr    r2, =0x40003807
>> +    bic    r0, r0, r2
>> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>> +    isb
>> +
>> +    @ Invalidate old TLBs
>> +    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
>> +    isb
>> +    dsb    ish
>> +
>> +    @ Disable MMU, caches and Thumb in SCTLR
>> +    mrc    p15, 0, r0, c1, c0, 0   @ SCTLR
>> +    bic    r0, r0, r2
>> +    mcr    p15, 0, r0, c1, c0, 0
>> +
>> +    bx    r1
>> +
>>       .ltorg
>>
>>       .globl __kvm_hyp_init_end
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 01dcb0e..449e7dd 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)
>>
>>
>>   /********************************************************************
>> + * Set HVBAR
>> + *
>> + * void __kvm_set_vectors(unsigned long phys_vector_base);
>> + */
>> +ENTRY(__kvm_set_vectors)
>> +    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
>> +    dsb    ish
>> +    isb
>> +    bx    lr
>> +ENDPROC(__kvm_set_vectors)
>> +
>> +/********************************************************************
>>    *  Hypervisor world-switch code
>>    *
>>    *
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 1366625..f853858 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -369,6 +369,11 @@ void free_boot_hyp_pgd(void)
>>       free_page((unsigned long)init_bounce_page);
>>       init_bounce_page = NULL;
>>
>> +    /* avoid to reuse possibly invalid values if bounce page is freed */
>> +    hyp_idmap_start = 0;
>> +    hyp_idmap_end = 0;
>> +    hyp_idmap_vector = 0;
>> +
>>       mutex_unlock(&kvm_hyp_pgd_mutex);
>>   }
>>
>>
>> Frediano
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-06  9:50                   ` Frediano Ziglio
@ 2015-02-06 10:56                     ` AKASHI Takahiro
  2015-02-06 12:14                       ` Frediano Ziglio
  0 siblings, 1 reply; 16+ messages in thread
From: AKASHI Takahiro @ 2015-02-06 10:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/06/2015 06:50 PM, Frediano Ziglio wrote:
> 2015-02-06 4:13 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
>> Hi Frediano
>> [add Geoff to cc]
>>
>> I still have to compare my code with yours, but please note that
>> some files in arch/arm/kvm are shared with arm64, especially
>> arm.c, mmu.c and related headers.
>>
>
> Didn't know.
>
>> So can you please
>> - submit your new patch without including old e-mail threads.
>>    (don't forget RFC or PATCH.)
>> - break it up into two pieces, one for common, the other for arm
>>
>
> I'll do for next.

Thanks

>> and a few more comments below:
>>
>> On 02/06/2015 01:56 AM, Frediano Ziglio wrote:
>>
>> [snip]
>>
>>> New version. Changes:
>>> - removed saved value and test again
>>> - remove vm flush, useless
>>> - correct compile check on header
>>> - try KVM enabled and not, compile link and tests
>>> - rewrite comments, add more where necessary
>>> - remove code to free and allocate again boot PGD and related, keep in
>>> memory if KEXEC is enabled
>>>
>>> Still not trying SMP. Still to be considered RFC. I tried different
>>> compile options (KEXEC and KVM turned on/off). I realized that I
>>> should test if kernel is started with SVC mode code still works
>>> correctly. ARM_VIRT_EXT is always turned on for V7 CPU.
>>>
>>>
>>> diff --git a/arch/arm/include/asm/kvm_asm.h
>>> b/arch/arm/include/asm/kvm_asm.h
>>> index 3a67bec..985f0a3 100644
>>> --- a/arch/arm/include/asm/kvm_asm.h
>>> +++ b/arch/arm/include/asm/kvm_asm.h
>>> @@ -97,7 +97,19 @@ extern char __kvm_hyp_code_end[];
>>>    extern void __kvm_flush_vm_context(void);
>>>    extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>>
>>> +extern void __kvm_set_vectors(unsigned long phys_vector_base);
>>> +
>>>    extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>> +
>>> +#ifndef CONFIG_ARM_VIRT_EXT
>>
>>
>> Arm64 doesn't have CONFIG_ARM_VIRT_EXT. Why don't use CONFIG_KVM?
>> I'd rather put this conditional into the place where the function is
>> actually called for flexible implementation. (See below)
>>
>
> In ARMv7 you can have ARM_VIRT_EXT but not KVM. ARM_VIRT_EXT means the
> processor can support virtualization while KVM means you want to
> compile kvm in the kernel. So code should be defined if either are
> defined.

CONFIG_KVM exists in arch/arm/kvm/Kconfig.

>>
>>> +static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>>> +{
>>> +    phys_reset(addr);
>>> +}
>>> +#else
>>> +extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
>>> +#endif
>>> +
>>>    #endif
>>>
>>>    #endif /* __ARM_KVM_ASM_H__ */
>>> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
>>> index 2a55373..30339ad 100644
>>> --- a/arch/arm/kernel/hyp-stub.S
>>> +++ b/arch/arm/kernel/hyp-stub.S
>>> @@ -164,13 +164,63 @@ ARM_BE8(orr    r7, r7, #(1 << 25))     @ HSCTLR.EE
>>>        bx    lr            @ The boot CPU mode is left in r4.
>>>    ENDPROC(__hyp_stub_install_secondary)
>>>
>>> +#undef CPU_RESET
>>> +#if defined(CONFIG_ARM_VIRT_EXT) && !defined(CONFIG_KVM) &&
>>> !defined(ZIMAGE)
>>> +#  define CPU_RESET 1
>>> +#endif
>>> +
>>>    __hyp_stub_do_trap:
>>> +#ifdef CPU_RESET
>>> +    cmp    r0, #-2
>>> +    bne    1f
>>> +
>>> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
>>> +    ldr    r2, =0x40003807
>>> +    bic    r0, r0, r2
>>> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>>> +    isb
>>> +
>>> +    @ Disable MMU, caches and Thumb in SCTLR
>>> +    mrc    p15, 0, r0, c1, c0, 0    @ SCTLR
>>> +    bic    r0, r0, r2
>>> +    mcr    p15, 0, r0, c1, c0, 0
>>> +
>>> +    bx    r1
>>> +    .ltorg
>>> +1:
>>> +#endif
>>>        cmp    r0, #-1
>>>        mrceq    p15, 4, r0, c12, c0, 0    @ get HVBAR
>>>        mcrne    p15, 4, r0, c12, c0, 0    @ set HVBAR
>>>        __ERET
>>>    ENDPROC(__hyp_stub_do_trap)
>>>
>>> +#ifdef CPU_RESET
>>> +/*
>>> + * r0 cpu_reset function
>>> + * r1 address to jump to
>>> + */
>>> +ENTRY(kvm_cpu_reset)
>>> +    mov    r2, r0
>>> +
>>> +    @ __boot_cpu_mode should be HYP_MODE
>>> +    adr    r0, .L__boot_cpu_mode_offset
>>> +    ldr    r3, [r0]
>>> +    ldr    r0, [r0, r3]
>>> +    cmp    r0, #HYP_MODE
>>> +    beq    reset_to_hyp
>>> +
>>> +    @ Call SVC mode cpu_reset
>>> +    mov    r0, r1
>>> +THUMB(    orr    r2, r2, 1    )
>>> +    bx    r2
>>> +
>>> +reset_to_hyp:
>>> +    mov    r0, #-2
>>> +    b    __hyp_set_vectors
>>> +ENDPROC(kvm_cpu_reset)
>>> +#endif
>>> +
>>>    /*
>>>     * __hyp_set_vectors: Call this after boot to set the initial hypervisor
>>>     * vectors as part of hypervisor installation.  On an SMP system, this
>>> should
>>> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
>>> index fdfa3a7..c018719 100644
>>> --- a/arch/arm/kernel/process.c
>>> +++ b/arch/arm/kernel/process.c
>>> @@ -41,6 +41,7 @@
>>>    #include <asm/system_misc.h>
>>>    #include <asm/mach/time.h>
>>>    #include <asm/tls.h>
>>> +#include <asm/kvm_asm.h>
>>>
>>>    #ifdef CONFIG_CC_STACKPROTECTOR
>>>    #include <linux/stackprotector.h>
>>> @@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
>>>    };
>>>
>>>    extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
>>> -typedef void (*phys_reset_t)(unsigned long);
>>> +typedef void (*phys_reset_t)(void *);
>>>
>>>    /*
>>>     * A temporary stack to use for CPU reset. This is static so that we
>>> @@ -89,7 +90,8 @@ static void __soft_restart(void *addr)
>>>
>>>        /* Switch to the identity mapping. */
>>>        phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
>>> -    phys_reset((unsigned long)addr);
>>> +
>>> +    kvm_cpu_reset(phys_reset, addr);
>>
>>
>> How about this?
>> #ifdef CONFIG_KVM
>>          kvm_cpu_reset(...);
>> #endif
>>          phys_reset(addr);
>>
>> It will clearify the purpose of kvm_cpu_reset().
>> The name of kvm_cpu_reset() might better be cpu_*de*init_hyp_mode or similar
>> given that the function would be called in hyp_init_cpu_notify()
>> once kvm supports cpu hotplug in the future.
>>
>
> the purpose of kvm_cpu_reset is to reset state and jump to newer
> kernel initialization code. Is a no return function like cpu_reset. So
> perhaps code can be
>
> #if defined(CONFIG_KVM) || defined(CONFIG_ARM_VIRT_EXT)
>          kvm_cpu_reset(...);
> #else
>          phys_reset(addr);
> #endif

I meant that kvm_cpu_reset() should be responsible for restoring only kvm-related status.

>>>        /* Should never get here. */
>>>        BUG();
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 0b0d58a..e4d4a2b 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -1009,7 +1009,7 @@ static int init_hyp_mode(void)
>>>        if (err)
>>>            goto out_free_mappings;
>>>
>>> -#ifndef CONFIG_HOTPLUG_CPU
>>> +#if !defined(CONFIG_HOTPLUG_CPU) && !defined(CONFIG_KEXEC)
>>>        free_boot_hyp_pgd();
>>>    #endif
>>>
>>> @@ -1079,6 +1079,53 @@ out_err:
>>>        return err;
>>>    }
>>>
>>> +void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>>
>>
>> I'm not sure, but it might be better to put this code into arm/kernel/init.S
>> as it is a counterpart of _do_hyp_init from cpu_init_hyp_mode().
>>
>
> init.S is written in assembly, kvm_cpu_reset for kvm is written in C.
> There is another version if kvm is disabled.

For arm64, it seems easy to implement such function in asm.

> The kernel code is supposed (at least in ARMv7) to run in SVC mode,
> not on HYP mode. One reason is that code that change SCTLR executed at
> HYP instead of SVC would not change current mode system control
> register. The function does not revert to HYP for this reason.

I need understand what you mentioned here, but for arm64, Geoff's kexec lets
the boot cpu in hyp mode at soft_restart().

-Takahiro AKASHI

>>
>>> +{
>>> +    unsigned long vector;
>>> +
>>> +    if (!is_hyp_mode_available()) {
>>> +        phys_reset(addr);
>>> +        return;
>>> +    }
>>> +
>>> +    vector = (unsigned long) kvm_get_idmap_vector();
>>> +
>>> +    /*
>>> +     * Set vectors so we call initialization routines.
>>> +     * trampoline is mapped at TRAMPOLINE_VA but not at its physical
>>> +     * address so we don't have an identity map to be able to
>>> +     * disable MMU. We need to set exception vector at trampoline
>>> +     * at TRAMPOLINE_VA address which is mapped.
>>> +     */
>>> +    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
>>> (vector & (PAGE_SIZE-1))));
>>> +
>>> +    /*
>>> +     * Set HVBAR to physical address, page table to identity to do the
>>> switch
>>> +     */
>>> +    kvm_call_hyp((void*) 4, (unsigned long) vector,
>>> kvm_mmu_get_boot_httbr());
>>> +
>>> +    /*
>>> +     * Flush to make sure code after the cache are disabled see updated
>>> values
>>> +     */
>>> +    flush_cache_all();
>>> +
>>> +    /*
>>> +     * Turn off caching on Hypervisor mode
>>> +     */
>>> +    kvm_call_hyp((void*) 1);
>>> +
>>> +    /*
>>> +     * Flush to make sude code don't see old cached values after cache is
>>> +     * enabled
>>> +     */
>>> +    flush_cache_all();
>>> +
>>> +    /*
>>> +     * Restore execution
>>> +     */
>>> +    kvm_call_hyp((void*) 3, addr);
>>> +}
>>> +
>>
>>
>> If you like kvm_call_hyp-like sequences, please specify what each
>> kvm_call_hyp() should do.
>> (we can do all in one kvm_call_hyp() though.)
>>
>
> kvm_call_hyp support a restricted number of arguments as they must be
> on registers and I actually need more registers. I found quite
> confusing that every vector table and HVC call have it's interface.
> One for stub (-1 get HVBAR otherwise set), one for kvm (-1 get HVBAR
> otherwise function pointer) and another for kvm initialization (0
> first initialization, otherwise stack pointer, HVBAR and HTTBR).
> Wouldn't be much nicer to pass a index for a function, something like
> (in C pseudocode)
>
> typedef unsigned (*hvc_func_t)(unsigned a1, unsigned a2, unsigned a3);
>
> hvc_func_t hvc_funcs[MAX_HVC_FUNC];
>
> #define HVC_FUNC_GET_HVBAR 0
> #define HVC_FUNC_SET_HVBAR 1
> #define HVC_FUNC_FLUSH_xxx 2
> ...
>
> and in assembly we could just use these indexes ?? Of course some HVC
> implementation should return not implemented.
>
> I could disable cache on first call (the one I set just HVBAR) and
> doing the flush than call second one with:
> - vector in physical address (to jump to trampoline before disabling
> MMU) with bit 0 set to distinguish from stack/0;
> - address of newer kernel
> - boot HTTBR (it takes 2 register as it's 64 bit)
>
> I'll have a look at arm64 code.
>
> Thank you,
>     Frediano
>
>> Thanks,
>> -Takahiro AKASHI
>>
>>
>>>    /* NOP: Compiling as a module not supported */
>>>    void kvm_arch_exit(void)
>>>    {
>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>> index 3988e72..c2f1d4d 100644
>>> --- a/arch/arm/kvm/init.S
>>> +++ b/arch/arm/kvm/init.S
>>> @@ -68,6 +68,12 @@ __kvm_hyp_init:
>>>        W(b)    .
>>>
>>>    __do_hyp_init:
>>> +    @ check for special values, odd numbers can't be a stack pointer
>>> +    cmp    r0, #1
>>> +    beq    cpu_proc_fin
>>> +    cmp    r0, #3
>>> +    beq    restore
>>> +
>>>        cmp    r0, #0            @ We have a SP?
>>>        bne    phase2            @ Yes, second stage init
>>>
>>> @@ -151,6 +157,33 @@ target:    @ We're now in the trampoline code,
>>> switch page tables
>>>
>>>        eret
>>>
>>> +cpu_proc_fin:
>>> +    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
>>> +    bic    r0, r0, #0x1000            @ ...i............
>>> +    bic    r0, r0, #0x0006            @ .............ca.
>>> +    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
>>> +    eret
>>> +
>>> +restore:
>>> +    @ Disable MMU, caches and Thumb in HSCTLR
>>> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
>>> +    ldr    r2, =0x40003807
>>> +    bic    r0, r0, r2
>>> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>>> +    isb
>>> +
>>> +    @ Invalidate old TLBs
>>> +    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
>>> +    isb
>>> +    dsb    ish
>>> +
>>> +    @ Disable MMU, caches and Thumb in SCTLR
>>> +    mrc    p15, 0, r0, c1, c0, 0   @ SCTLR
>>> +    bic    r0, r0, r2
>>> +    mcr    p15, 0, r0, c1, c0, 0
>>> +
>>> +    bx    r1
>>> +
>>>        .ltorg
>>>
>>>        .globl __kvm_hyp_init_end
>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>> index 01dcb0e..449e7dd 100644
>>> --- a/arch/arm/kvm/interrupts.S
>>> +++ b/arch/arm/kvm/interrupts.S
>>> @@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)
>>>
>>>
>>>    /********************************************************************
>>> + * Set HVBAR
>>> + *
>>> + * void __kvm_set_vectors(unsigned long phys_vector_base);
>>> + */
>>> +ENTRY(__kvm_set_vectors)
>>> +    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
>>> +    dsb    ish
>>> +    isb
>>> +    bx    lr
>>> +ENDPROC(__kvm_set_vectors)
>>> +
>>> +/********************************************************************
>>>     *  Hypervisor world-switch code
>>>     *
>>>     *
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 1366625..f853858 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -369,6 +369,11 @@ void free_boot_hyp_pgd(void)
>>>        free_page((unsigned long)init_bounce_page);
>>>        init_bounce_page = NULL;
>>>
>>> +    /* avoid to reuse possibly invalid values if bounce page is freed */
>>> +    hyp_idmap_start = 0;
>>> +    hyp_idmap_end = 0;
>>> +    hyp_idmap_vector = 0;
>>> +
>>>        mutex_unlock(&kvm_hyp_pgd_mutex);
>>>    }
>>>
>>>
>>> Frediano
>>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-06 10:56                     ` AKASHI Takahiro
@ 2015-02-06 12:14                       ` Frediano Ziglio
  2015-02-06 14:09                         ` Marc Zyngier
  2015-02-09  3:54                         ` Geoff Levand
  0 siblings, 2 replies; 16+ messages in thread
From: Frediano Ziglio @ 2015-02-06 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

2015-02-06 10:56 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
> On 02/06/2015 06:50 PM, Frediano Ziglio wrote:
>>
>> 2015-02-06 4:13 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
>>>
>>> Hi Frediano
>>> [add Geoff to cc]
>>>
>>> I still have to compare my code with yours, but please note that
>>> some files in arch/arm/kvm are shared with arm64, especially
>>> arm.c, mmu.c and related headers.
>>>
>>
>> Didn't know.
>>
>>> So can you please
>>> - submit your new patch without including old e-mail threads.
>>>    (don't forget RFC or PATCH.)
>>> - break it up into two pieces, one for common, the other for arm
>>>
>>
>> I'll do for next.
>
>
> Thanks
>
>
>>> and a few more comments below:
>>>
>>> On 02/06/2015 01:56 AM, Frediano Ziglio wrote:
>>>
>>> [snip]
>>>
>>>> New version. Changes:
>>>> - removed saved value and test again
>>>> - remove vm flush, useless
>>>> - correct compile check on header
>>>> - try KVM enabled and not, compile link and tests
>>>> - rewrite comments, add more where necessary
>>>> - remove code to free and allocate again boot PGD and related, keep in
>>>> memory if KEXEC is enabled
>>>>
>>>> Still not trying SMP. Still to be considered RFC. I tried different
>>>> compile options (KEXEC and KVM turned on/off). I realized that I
>>>> should test if kernel is started with SVC mode code still works
>>>> correctly. ARM_VIRT_EXT is always turned on for V7 CPU.
>>>>
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_asm.h
>>>> b/arch/arm/include/asm/kvm_asm.h
>>>> index 3a67bec..985f0a3 100644
>>>> --- a/arch/arm/include/asm/kvm_asm.h
>>>> +++ b/arch/arm/include/asm/kvm_asm.h
>>>> @@ -97,7 +97,19 @@ extern char __kvm_hyp_code_end[];
>>>>    extern void __kvm_flush_vm_context(void);
>>>>    extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t
>>>> ipa);
>>>>
>>>> +extern void __kvm_set_vectors(unsigned long phys_vector_base);
>>>> +
>>>>    extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>>> +
>>>> +#ifndef CONFIG_ARM_VIRT_EXT
>>>
>>>
>>>
>>> Arm64 doesn't have CONFIG_ARM_VIRT_EXT. Why don't use CONFIG_KVM?
>>> I'd rather put this conditional into the place where the function is
>>> actually called for flexible implementation. (See below)
>>>
>>
>> In ARMv7 you can have ARM_VIRT_EXT but not KVM. ARM_VIRT_EXT means the
>> processor can support virtualization while KVM means you want to
>> compile kvm in the kernel. So code should be defined if either are
>> defined.
>
>
> CONFIG_KVM exists in arch/arm/kvm/Kconfig.
>

I know. On ARMv7 CONFIG_ARM_VIRT_EXT can be defined while CONFIG_KVM
is not defined. In this case firmware can launch kernel in HYP mode.
Kernel will switch in SVC mode ASAP and work entirely in SVC mode just
ignoring the HYP mode (it install the stub HYP code). Probably on
ARMv8 virtualization is always supported so the CONFIG_ARM_VIRT_EXT is
not defined. However in head.S (arm64) mode is switched ASAP to SVC
too.

>
>>>
>>>> +static void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>>>> +{
>>>> +    phys_reset(addr);
>>>> +}
>>>> +#else
>>>> +extern void kvm_cpu_reset(void (*phys_reset)(void *), void *addr);
>>>> +#endif
>>>> +
>>>>    #endif
>>>>
>>>>    #endif /* __ARM_KVM_ASM_H__ */
>>>> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
>>>> index 2a55373..30339ad 100644
>>>> --- a/arch/arm/kernel/hyp-stub.S
>>>> +++ b/arch/arm/kernel/hyp-stub.S
>>>> @@ -164,13 +164,63 @@ ARM_BE8(orr    r7, r7, #(1 << 25))     @ HSCTLR.EE
>>>>        bx    lr            @ The boot CPU mode is left in r4.
>>>>    ENDPROC(__hyp_stub_install_secondary)
>>>>
>>>> +#undef CPU_RESET
>>>> +#if defined(CONFIG_ARM_VIRT_EXT) && !defined(CONFIG_KVM) &&
>>>> !defined(ZIMAGE)
>>>> +#  define CPU_RESET 1
>>>> +#endif
>>>> +
>>>>    __hyp_stub_do_trap:
>>>> +#ifdef CPU_RESET
>>>> +    cmp    r0, #-2
>>>> +    bne    1f
>>>> +
>>>> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
>>>> +    ldr    r2, =0x40003807
>>>> +    bic    r0, r0, r2
>>>> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>>>> +    isb
>>>> +
>>>> +    @ Disable MMU, caches and Thumb in SCTLR
>>>> +    mrc    p15, 0, r0, c1, c0, 0    @ SCTLR
>>>> +    bic    r0, r0, r2
>>>> +    mcr    p15, 0, r0, c1, c0, 0
>>>> +
>>>> +    bx    r1
>>>> +    .ltorg
>>>> +1:
>>>> +#endif
>>>>        cmp    r0, #-1
>>>>        mrceq    p15, 4, r0, c12, c0, 0    @ get HVBAR
>>>>        mcrne    p15, 4, r0, c12, c0, 0    @ set HVBAR
>>>>        __ERET
>>>>    ENDPROC(__hyp_stub_do_trap)
>>>>
>>>> +#ifdef CPU_RESET
>>>> +/*
>>>> + * r0 cpu_reset function
>>>> + * r1 address to jump to
>>>> + */
>>>> +ENTRY(kvm_cpu_reset)
>>>> +    mov    r2, r0
>>>> +
>>>> +    @ __boot_cpu_mode should be HYP_MODE
>>>> +    adr    r0, .L__boot_cpu_mode_offset
>>>> +    ldr    r3, [r0]
>>>> +    ldr    r0, [r0, r3]
>>>> +    cmp    r0, #HYP_MODE
>>>> +    beq    reset_to_hyp
>>>> +
>>>> +    @ Call SVC mode cpu_reset
>>>> +    mov    r0, r1
>>>> +THUMB(    orr    r2, r2, 1    )
>>>> +    bx    r2
>>>> +
>>>> +reset_to_hyp:
>>>> +    mov    r0, #-2
>>>> +    b    __hyp_set_vectors
>>>> +ENDPROC(kvm_cpu_reset)
>>>> +#endif
>>>> +
>>>>    /*
>>>>     * __hyp_set_vectors: Call this after boot to set the initial
>>>> hypervisor
>>>>     * vectors as part of hypervisor installation.  On an SMP system,
>>>> this
>>>> should
>>>> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
>>>> index fdfa3a7..c018719 100644
>>>> --- a/arch/arm/kernel/process.c
>>>> +++ b/arch/arm/kernel/process.c
>>>> @@ -41,6 +41,7 @@
>>>>    #include <asm/system_misc.h>
>>>>    #include <asm/mach/time.h>
>>>>    #include <asm/tls.h>
>>>> +#include <asm/kvm_asm.h>
>>>>
>>>>    #ifdef CONFIG_CC_STACKPROTECTOR
>>>>    #include <linux/stackprotector.h>
>>>> @@ -60,7 +61,7 @@ static const char *isa_modes[] __maybe_unused = {
>>>>    };
>>>>
>>>>    extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
>>>> -typedef void (*phys_reset_t)(unsigned long);
>>>> +typedef void (*phys_reset_t)(void *);
>>>>
>>>>    /*
>>>>     * A temporary stack to use for CPU reset. This is static so that we
>>>> @@ -89,7 +90,8 @@ static void __soft_restart(void *addr)
>>>>
>>>>        /* Switch to the identity mapping. */
>>>>        phys_reset = (phys_reset_t)(unsigned
>>>> long)virt_to_phys(cpu_reset);
>>>> -    phys_reset((unsigned long)addr);
>>>> +
>>>> +    kvm_cpu_reset(phys_reset, addr);
>>>
>>>
>>>
>>> How about this?
>>> #ifdef CONFIG_KVM
>>>          kvm_cpu_reset(...);
>>> #endif
>>>          phys_reset(addr);
>>>
>>> It will clearify the purpose of kvm_cpu_reset().
>>> The name of kvm_cpu_reset() might better be cpu_*de*init_hyp_mode or
>>> similar
>>> given that the function would be called in hyp_init_cpu_notify()
>>> once kvm supports cpu hotplug in the future.
>>>
>>
>> the purpose of kvm_cpu_reset is to reset state and jump to newer
>> kernel initialization code. Is a no return function like cpu_reset. So
>> perhaps code can be
>>
>> #if defined(CONFIG_KVM) || defined(CONFIG_ARM_VIRT_EXT)
>>          kvm_cpu_reset(...);
>> #else
>>          phys_reset(addr);
>> #endif
>
>
> I meant that kvm_cpu_reset() should be responsible for restoring only
> kvm-related status.
>

No, kernel code is meant to run in SVC mode, that function can't
safely return. For instance in ARMv7 even setting HTTBR to TTBR0 and
returning back would execute cpu_v7_reset which try to change its
SCTLR register (but it cannot as it has to change HSCTLR now).

The code that switch from HYP to SVC is meant to be run without MMU.
Would work if possibly a cpu_deinit_hyp_mode would just:
- disable cache
- disable mmu
- restore HYP stub code
- allow to do a sort of cpu_reset, but in HYP mode instead of SVC mode.

In this case this cpu_deinit_hyp_mode could be called freely from
sort_restart (not __soft_restart) and cpu unplugging. In this case a
kvm_cpu_reset could be implemented like:

void kvm_cpu_reset(void (*cpu_reset)(void*,void*), void* addr)
{
    if (we_have_hyp_mode())
       cpu_reset(hyp_physical_reset, addr);
    cpu_reset(addr, NULL);
}

yes, cpu_reset has an added parameter to be passed to
hyp_physical_reset. hyp_physical_reset would be a identify function
which just a HVC to do a reset from HYP.

Does it make sense?

>>>>        /* Should never get here. */
>>>>        BUG();
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 0b0d58a..e4d4a2b 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -1009,7 +1009,7 @@ static int init_hyp_mode(void)
>>>>        if (err)
>>>>            goto out_free_mappings;
>>>>
>>>> -#ifndef CONFIG_HOTPLUG_CPU
>>>> +#if !defined(CONFIG_HOTPLUG_CPU) && !defined(CONFIG_KEXEC)
>>>>        free_boot_hyp_pgd();
>>>>    #endif
>>>>
>>>> @@ -1079,6 +1079,53 @@ out_err:
>>>>        return err;
>>>>    }
>>>>
>>>> +void kvm_cpu_reset(void (*phys_reset)(void *), void *addr)
>>>
>>>
>>>
>>> I'm not sure, but it might be better to put this code into
>>> arm/kernel/init.S
>>> as it is a counterpart of _do_hyp_init from cpu_init_hyp_mode().
>>>
>>
>> init.S is written in assembly, kvm_cpu_reset for kvm is written in C.
>> There is another version if kvm is disabled.
>
>
> For arm64, it seems easy to implement such function in asm.
>
>> The kernel code is supposed (at least in ARMv7) to run in SVC mode,
>> not on HYP mode. One reason is that code that change SCTLR executed at
>> HYP instead of SVC would not change current mode system control
>> register. The function does not revert to HYP for this reason.
>
>
> I need understand what you mentioned here, but for arm64, Geoff's kexec lets
> the boot cpu in hyp mode at soft_restart().
>
> -Takahiro AKASHI
>

Where's Geoff code? Note however that is not maintainable in this way
as people have to consider that future code could execute in either
HYP or SVC mode. Did you test that all code called from soft_restart
could do it?

It's similar as the kernel returning to user space with a jump instead
of a proper system return. User space code will execute in kernel
mode, now imagine if this code do another syscall. You can't be 100%
sure that current and future code works correctly or at least you
should check and state that these function have these restrictions.

I was going to add an kvm_call_hyp_ext function which in either arm32
or arm64 could pass up to 8 register parameters. Does it make sense?

Frediano

>
>>>
>>>> +{
>>>> +    unsigned long vector;
>>>> +
>>>> +    if (!is_hyp_mode_available()) {
>>>> +        phys_reset(addr);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    vector = (unsigned long) kvm_get_idmap_vector();
>>>> +
>>>> +    /*
>>>> +     * Set vectors so we call initialization routines.
>>>> +     * trampoline is mapped at TRAMPOLINE_VA but not at its physical
>>>> +     * address so we don't have an identity map to be able to
>>>> +     * disable MMU. We need to set exception vector at trampoline
>>>> +     * at TRAMPOLINE_VA address which is mapped.
>>>> +     */
>>>> +    kvm_call_hyp(__kvm_set_vectors,(unsigned long) (TRAMPOLINE_VA +
>>>> (vector & (PAGE_SIZE-1))));
>>>> +
>>>> +    /*
>>>> +     * Set HVBAR to physical address, page table to identity to do the
>>>> switch
>>>> +     */
>>>> +    kvm_call_hyp((void*) 4, (unsigned long) vector,
>>>> kvm_mmu_get_boot_httbr());
>>>> +
>>>> +    /*
>>>> +     * Flush to make sure code after the cache are disabled see updated
>>>> values
>>>> +     */
>>>> +    flush_cache_all();
>>>> +
>>>> +    /*
>>>> +     * Turn off caching on Hypervisor mode
>>>> +     */
>>>> +    kvm_call_hyp((void*) 1);
>>>> +
>>>> +    /*
>>>> +     * Flush to make sude code don't see old cached values after cache
>>>> is
>>>> +     * enabled
>>>> +     */
>>>> +    flush_cache_all();
>>>> +
>>>> +    /*
>>>> +     * Restore execution
>>>> +     */
>>>> +    kvm_call_hyp((void*) 3, addr);
>>>> +}
>>>> +
>>>
>>>
>>>
>>> If you like kvm_call_hyp-like sequences, please specify what each
>>> kvm_call_hyp() should do.
>>> (we can do all in one kvm_call_hyp() though.)
>>>
>>
>> kvm_call_hyp support a restricted number of arguments as they must be
>> on registers and I actually need more registers. I found quite
>> confusing that every vector table and HVC call have it's interface.
>> One for stub (-1 get HVBAR otherwise set), one for kvm (-1 get HVBAR
>> otherwise function pointer) and another for kvm initialization (0
>> first initialization, otherwise stack pointer, HVBAR and HTTBR).
>> Wouldn't be much nicer to pass a index for a function, something like
>> (in C pseudocode)
>>
>> typedef unsigned (*hvc_func_t)(unsigned a1, unsigned a2, unsigned a3);
>>
>> hvc_func_t hvc_funcs[MAX_HVC_FUNC];
>>
>> #define HVC_FUNC_GET_HVBAR 0
>> #define HVC_FUNC_SET_HVBAR 1
>> #define HVC_FUNC_FLUSH_xxx 2
>> ...
>>
>> and in assembly we could just use these indexes ?? Of course some HVC
>> implementation should return not implemented.
>>
>> I could disable cache on first call (the one I set just HVBAR) and
>> doing the flush than call second one with:
>> - vector in physical address (to jump to trampoline before disabling
>> MMU) with bit 0 set to distinguish from stack/0;
>> - address of newer kernel
>> - boot HTTBR (it takes 2 register as it's 64 bit)
>>
>> I'll have a look at arm64 code.
>>
>> Thank you,
>>     Frediano
>>
>>> Thanks,
>>> -Takahiro AKASHI
>>>
>>>
>>>>    /* NOP: Compiling as a module not supported */
>>>>    void kvm_arch_exit(void)
>>>>    {
>>>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
>>>> index 3988e72..c2f1d4d 100644
>>>> --- a/arch/arm/kvm/init.S
>>>> +++ b/arch/arm/kvm/init.S
>>>> @@ -68,6 +68,12 @@ __kvm_hyp_init:
>>>>        W(b)    .
>>>>
>>>>    __do_hyp_init:
>>>> +    @ check for special values, odd numbers can't be a stack pointer
>>>> +    cmp    r0, #1
>>>> +    beq    cpu_proc_fin
>>>> +    cmp    r0, #3
>>>> +    beq    restore
>>>> +
>>>>        cmp    r0, #0            @ We have a SP?
>>>>        bne    phase2            @ Yes, second stage init
>>>>
>>>> @@ -151,6 +157,33 @@ target:    @ We're now in the trampoline code,
>>>> switch page tables
>>>>
>>>>        eret
>>>>
>>>> +cpu_proc_fin:
>>>> +    mrc    p15, 4, r0, c1, c0, 0        @ ctrl register
>>>> +    bic    r0, r0, #0x1000            @ ...i............
>>>> +    bic    r0, r0, #0x0006            @ .............ca.
>>>> +    mcr    p15, 4, r0, c1, c0, 0        @ disable caches
>>>> +    eret
>>>> +
>>>> +restore:
>>>> +    @ Disable MMU, caches and Thumb in HSCTLR
>>>> +    mrc    p15, 4, r0, c1, c0, 0    @ HSCR
>>>> +    ldr    r2, =0x40003807
>>>> +    bic    r0, r0, r2
>>>> +    mcr    p15, 4, r0, c1, c0, 0    @ HSCR
>>>> +    isb
>>>> +
>>>> +    @ Invalidate old TLBs
>>>> +    mcr    p15, 4, r0, c8, c7, 0    @ TLBIALLH
>>>> +    isb
>>>> +    dsb    ish
>>>> +
>>>> +    @ Disable MMU, caches and Thumb in SCTLR
>>>> +    mrc    p15, 0, r0, c1, c0, 0   @ SCTLR
>>>> +    bic    r0, r0, r2
>>>> +    mcr    p15, 0, r0, c1, c0, 0
>>>> +
>>>> +    bx    r1
>>>> +
>>>>        .ltorg
>>>>
>>>>        .globl __kvm_hyp_init_end
>>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>>> index 01dcb0e..449e7dd 100644
>>>> --- a/arch/arm/kvm/interrupts.S
>>>> +++ b/arch/arm/kvm/interrupts.S
>>>> @@ -87,6 +87,18 @@ ENDPROC(__kvm_flush_vm_context)
>>>>
>>>>
>>>>    /********************************************************************
>>>> + * Set HVBAR
>>>> + *
>>>> + * void __kvm_set_vectors(unsigned long phys_vector_base);
>>>> + */
>>>> +ENTRY(__kvm_set_vectors)
>>>> +    mcr    p15, 4, r0, c12, c0, 0    @ set HVBAR
>>>> +    dsb    ish
>>>> +    isb
>>>> +    bx    lr
>>>> +ENDPROC(__kvm_set_vectors)
>>>> +
>>>> +/********************************************************************
>>>>     *  Hypervisor world-switch code
>>>>     *
>>>>     *
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 1366625..f853858 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -369,6 +369,11 @@ void free_boot_hyp_pgd(void)
>>>>        free_page((unsigned long)init_bounce_page);
>>>>        init_bounce_page = NULL;
>>>>
>>>> +    /* avoid to reuse possibly invalid values if bounce page is freed
>>>> */
>>>> +    hyp_idmap_start = 0;
>>>> +    hyp_idmap_end = 0;
>>>> +    hyp_idmap_vector = 0;
>>>> +
>>>>        mutex_unlock(&kvm_hyp_pgd_mutex);
>>>>    }
>>>>
>>>>
>>>> Frediano
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-06 12:14                       ` Frediano Ziglio
@ 2015-02-06 14:09                         ` Marc Zyngier
  2015-02-09  3:54                         ` Geoff Levand
  1 sibling, 0 replies; 16+ messages in thread
From: Marc Zyngier @ 2015-02-06 14:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/02/15 12:14, Frediano Ziglio wrote:

> I was going to add an kvm_call_hyp_ext function which in either arm32
> or arm64 could pass up to 8 register parameters. Does it make sense?

Not unless you can prove that you don't have any other way of doing what
you want to achieve.

We perfectly deal with having only 4 registers on the init sequence, and
I doubt that you cannot do it the same way, possibly using a
per-architecture abstraction (see __cpu_init_hyp_mode).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-06 12:14                       ` Frediano Ziglio
  2015-02-06 14:09                         ` Marc Zyngier
@ 2015-02-09  3:54                         ` Geoff Levand
  2015-02-09 15:51                           ` Frediano Ziglio
  1 sibling, 1 reply; 16+ messages in thread
From: Geoff Levand @ 2015-02-09  3:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Frediano,

On Fri, 2015-02-06 at 12:14 +0000, Frediano Ziglio wrote:
> 2015-02-06 10:56 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
> >> Frediano Ziglio wrote:
> >> The kernel code is supposed (at least in ARMv7) to run in SVC mode,
> >> not on HYP mode. One reason is that code that change SCTLR executed at
> >> HYP instead of SVC would not change current mode system control
> >> register. The function does not revert to HYP for this reason.
> >
> > I need understand what you mentioned here, but for arm64, Geoff's kexec lets
> > the boot cpu in hyp mode at soft_restart().
> 
> Where's Geoff code? Note however that is not maintainable in this way
> as people have to consider that future code could execute in either
> HYP or SVC mode. Did you test that all code called from soft_restart
> could do it?

Basically, when a CPU is reset it needs to be put into the exception
level it had when it first entered the kernel.

I have my arm64 kernel patches here:

  https://git.kernel.org/cgit/linux/kernel/git/geoff/linux-kexec.git

The patches related to CPU reset are:

  arm64: Convert hcalls to use ISS field
  arm64: Add new hcall HVC_CALL_FUNC
  arm64: Add EL2 switch to soft_restart

-Geoff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Kexec and KVM not working gracefully together
  2015-02-09  3:54                         ` Geoff Levand
@ 2015-02-09 15:51                           ` Frediano Ziglio
  0 siblings, 0 replies; 16+ messages in thread
From: Frediano Ziglio @ 2015-02-09 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

2015-02-09 3:54 GMT+00:00 Geoff Levand <geoff@infradead.org>:
> Hi Frediano,
>
> On Fri, 2015-02-06 at 12:14 +0000, Frediano Ziglio wrote:
>> 2015-02-06 10:56 GMT+00:00 AKASHI Takahiro <takahiro.akashi@linaro.org>:
>> >> Frediano Ziglio wrote:
>> >> The kernel code is supposed (at least in ARMv7) to run in SVC mode,
>> >> not on HYP mode. One reason is that code that change SCTLR executed at
>> >> HYP instead of SVC would not change current mode system control
>> >> register. The function does not revert to HYP for this reason.
>> >
>> > I need understand what you mentioned here, but for arm64, Geoff's kexec lets
>> > the boot cpu in hyp mode at soft_restart().
>>
>> Where's Geoff code? Note however that is not maintainable in this way
>> as people have to consider that future code could execute in either
>> HYP or SVC mode. Did you test that all code called from soft_restart
>> could do it?
>
> Basically, when a CPU is reset it needs to be put into the exception
> level it had when it first entered the kernel.
>
> I have my arm64 kernel patches here:
>
>   https://git.kernel.org/cgit/linux/kernel/git/geoff/linux-kexec.git
>
> The patches related to CPU reset are:
>
>   arm64: Convert hcalls to use ISS field
>   arm64: Add new hcall HVC_CALL_FUNC
>   arm64: Add EL2 switch to soft_restart
>
> -Geoff
>
>

Thank you! This helps a lot.

There are some differences in our patches:
- your patches works only with kvm disabled (from
https://git.kernel.org/cgit/linux/kernel/git/geoff/linux-kexec.git/commit/?id=cdd634d6018d00cefc595f02c4341d3f7c1f0c47
you avoid EL2 switch if KVM is enabled);
- you started using immediate from ERS
(https://git.kernel.org/cgit/linux/kernel/git/geoff/linux-kexec.git/commit/?id=637faeaf6301208315c62be7a13713d50aa3a294).
I proposed (without knowing your patches) a similar solution.
Personally I think that function pointers can be removed and you could
use some indexing. The function you can call are very limited and
using indexes would be possibly in all vectors cases (stub,
initialization, kvm);
- you changed cpu_reset to accept an additional parameter and check if
calling the EL2 switch code while I added a kvm_cpu_reset which does
the switch or call the old cpu_reset. Well... here I think it's just a
question of style. I like both. In arm32 there is no cpu_soft_restart
so our code looks quite different;
- missing the kvm part you don't take into consideration flushing the
cache from EL2 mode. My tests confirm that not flushing cache cause
memory corruptions. My solution is to have an HYP call that turn off
caching and then one that jump to physical code after switching off
mmu. Perhaps a bit convoluted but works.

For paranoia check for functions (HVC_CALL_HYP,
https://git.kernel.org/cgit/linux/kernel/git/geoff/linux-kexec.git/commit/?id=a0a397cf6736b1ccfed7c8e9735a64dac1fb6304)
should be the first as in the fast path is the most likely.

I think HVC_CALL_FUNC can be misleading. At the end it will call a
function from EL2 but also with cache and MMU disabled.

Frediano

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-02-09 15:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-27 15:07 Kexec and KVM not working gracefully together Frediano Ziglio
2015-01-27 15:25 ` Marc Zyngier
2015-02-05  6:17   ` AKASHI Takahiro
2015-02-05  9:43     ` Marc Zyngier
2015-02-05 10:27       ` AKASHI Takahiro
2015-02-05 10:51         ` Frediano Ziglio
2015-02-05 13:32           ` Marc Zyngier
2015-02-05 14:27             ` Frediano Ziglio
2015-02-05 16:56               ` Frediano Ziglio
2015-02-06  4:13                 ` AKASHI Takahiro
2015-02-06  9:50                   ` Frediano Ziglio
2015-02-06 10:56                     ` AKASHI Takahiro
2015-02-06 12:14                       ` Frediano Ziglio
2015-02-06 14:09                         ` Marc Zyngier
2015-02-09  3:54                         ` Geoff Levand
2015-02-09 15:51                           ` Frediano Ziglio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.