All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] arm: dirty page logging support for ARMv7
@ 2014-06-19  1:31 ` Mario Smarduch
  0 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall
  Cc: steve.capper, kvm, linux-arm-kernel, gavin.guo, peter.maydell,
	jays.lee, Mario Smarduch

This patch adds support for dirty page logging so far tested only on ARMv7,
and verified to compile on ARMv8. With dirty page logging, GICv2 vGIC and arch 
timer save/restore support, live migration is supported.

Dirty page logging support -
- initially write protects VM RAM memory regions - 2nd stage page tables
- add support to read dirty page log and again write protect the dirty pages 
  - second stage page table for next pass.
- second stage huge page are dissolved into page tables to keep track of
  dirty pages at page granularity. Tracking at huge page granularity limits
  migration to an almost idle system.
- In the event migration is canceled, normal behavior is resumed huge pages
  are rebuilt over time.
- At this time reverse mappings are not used to for write protecting of 2nd 
  stage tables.

- Future work
  - Enable diry memory logging to work on ARMv8 FastModels.

Test Environment:
---------------------------------------------------------------------------
NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, infact
      initially light loads were succeeding without dirty page logging support.
---------------------------------------------------------------------------
- Will put all components on github, including test setup diagram
- In short summary
  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
    storage, 1GBs Ethernet, with swap enabled
  o NFS Server runing Ubuntu 13.04
    - both ARM boards mount shared file system
    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
      file systems.
  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
    - Destination command syntax: can change smp to 4, machine model outdated,
      but has been tested on virt by others (need to upgrade)

        /mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
        /mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
        -M vexpress-a15 -cpu cortex-a15 -nographic \
        -append "root=/dev/vda rw console=ttyAMA0 rootwait" \
        -drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
        -device virtio-blk-device,drive=vm1 \
        -netdev type=tap,id=net0,ifname=tap0 \
        -device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
        -incoming tcp:0:4321

    - Source command syntax same except '-incoming'

  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
    has been tested as well.
  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
    pages periodically.
    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
    Example:
    ./dirtyram.arm 102580 812 30
    - dirty 102580 pages
    - 812 pages every 30ms with an incrementing counter
    - run anywhere from one to as many copies as VM resources can support. If
      the dirty rate is too high migration will run indefintely
    - run date output loop, check date is picked up smoothly
    - place guest/host into page reclaim/swap mode - by whatever means in this
      case run multiple copies of 'dirtyram.ram' on host
    - issue migrate command(s) on source
    - Top result is 409600, 8192, 5
  o QEMU is instrumented to save RAM memory regions on source and destination
    after memory is migrated, but before guest started. Later files are
    checksummed on both ends for correctness, given VMs are small this works.
  o Guest kernel is instrumented to capture current cycle counter - last cycle
    and compare to qemu down time to test arch timer accuracy.
  o Network failover is at L3 due to interface limitations, ping continues
    working transparently
  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
    level instrumentation code).

- Basic Network Test - Assuming one ethernet interface available

Source host IP 192.168.10.101/24, VM tap0 192.168.2.1/24 and
VM eth0 192.168.2.100/24 with default route 192.168.2.1

Destination host IP 192.168.10.100/24, VM same settings as above.
Both VMs have identical MAC addresses.

Initially NFS server route to 192.168.2.100 is via 192.168.10.101

- ssh 192.168.2.100
- start migration from source to destination
- after migration ends
- on NFS server switch routes.
   route add -host 192.168.2.100 gw 192.168.10.100

ssh should resume after route switch. ping as well should work
seamlessly.

Changes since v7:
- Reworked write protection of dirty page mask
- Moved generic code back to architecture layer, keep it there for time being,
  until a KVM framework for architecture functions to override genric 
  ones is defined.
- Fixed conditon bug for marking pages dirty
  

Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs without address param
  dirty page logging inital mem region write protect (w/no huge PUD
    support)
  dirty log write protect management support
  dirty page logging 2nd stage page fault handling support

 arch/arm/include/asm/kvm_asm.h        |    1 +
 arch/arm/include/asm/kvm_host.h       |    6 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |   92 +++++++++++++++
 arch/arm/kvm/interrupts.S             |   11 ++
 arch/arm/kvm/mmu.c                    |  197 ++++++++++++++++++++++++++++++++-
 7 files changed, 322 insertions(+), 6 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v8 0/4] arm: dirty page logging support for ARMv7
@ 2014-06-19  1:31 ` Mario Smarduch
  0 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for dirty page logging so far tested only on ARMv7,
and verified to compile on ARMv8. With dirty page logging, GICv2 vGIC and arch 
timer save/restore support, live migration is supported.

Dirty page logging support -
- initially write protects VM RAM memory regions - 2nd stage page tables
- add support to read dirty page log and again write protect the dirty pages 
  - second stage page table for next pass.
- second stage huge page are dissolved into page tables to keep track of
  dirty pages at page granularity. Tracking at huge page granularity limits
  migration to an almost idle system.
- In the event migration is canceled, normal behavior is resumed huge pages
  are rebuilt over time.
- At this time reverse mappings are not used to for write protecting of 2nd 
  stage tables.

- Future work
  - Enable diry memory logging to work on ARMv8 FastModels.

Test Environment:
---------------------------------------------------------------------------
NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, infact
      initially light loads were succeeding without dirty page logging support.
---------------------------------------------------------------------------
- Will put all components on github, including test setup diagram
- In short summary
  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
    storage, 1GBs Ethernet, with swap enabled
  o NFS Server runing Ubuntu 13.04
    - both ARM boards mount shared file system
    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
      file systems.
  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
    - Destination command syntax: can change smp to 4, machine model outdated,
      but has been tested on virt by others (need to upgrade)

        /mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
        /mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
        -M vexpress-a15 -cpu cortex-a15 -nographic \
        -append "root=/dev/vda rw console=ttyAMA0 rootwait" \
        -drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
        -device virtio-blk-device,drive=vm1 \
        -netdev type=tap,id=net0,ifname=tap0 \
        -device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
        -incoming tcp:0:4321

    - Source command syntax same except '-incoming'

  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
    has been tested as well.
  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
    pages periodically.
    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
    Example:
    ./dirtyram.arm 102580 812 30
    - dirty 102580 pages
    - 812 pages every 30ms with an incrementing counter
    - run anywhere from one to as many copies as VM resources can support. If
      the dirty rate is too high migration will run indefintely
    - run date output loop, check date is picked up smoothly
    - place guest/host into page reclaim/swap mode - by whatever means in this
      case run multiple copies of 'dirtyram.ram' on host
    - issue migrate command(s) on source
    - Top result is 409600, 8192, 5
  o QEMU is instrumented to save RAM memory regions on source and destination
    after memory is migrated, but before guest started. Later files are
    checksummed on both ends for correctness, given VMs are small this works.
  o Guest kernel is instrumented to capture current cycle counter - last cycle
    and compare to qemu down time to test arch timer accuracy.
  o Network failover is at L3 due to interface limitations, ping continues
    working transparently
  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
    level instrumentation code).

- Basic Network Test - Assuming one ethernet interface available

Source host IP 192.168.10.101/24, VM tap0 192.168.2.1/24 and
VM eth0 192.168.2.100/24 with default route 192.168.2.1

Destination host IP 192.168.10.100/24, VM same settings as above.
Both VMs have identical MAC addresses.

Initially NFS server route to 192.168.2.100 is via 192.168.10.101

- ssh 192.168.2.100
- start migration from source to destination
- after migration ends
- on NFS server switch routes.
   route add -host 192.168.2.100 gw 192.168.10.100

ssh should resume after route switch. ping as well should work
seamlessly.

Changes since v7:
- Reworked write protection of dirty page mask
- Moved generic code back to architecture layer, keep it there for time being,
  until a KVM framework for architecture functions to override genric 
  ones is defined.
- Fixed conditon bug for marking pages dirty
  

Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs without address param
  dirty page logging inital mem region write protect (w/no huge PUD
    support)
  dirty log write protect management support
  dirty page logging 2nd stage page fault handling support

 arch/arm/include/asm/kvm_asm.h        |    1 +
 arch/arm/include/asm/kvm_host.h       |    6 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |   92 +++++++++++++++
 arch/arm/kvm/interrupts.S             |   11 ++
 arch/arm/kvm/mmu.c                    |  197 ++++++++++++++++++++++++++++++++-
 7 files changed, 322 insertions(+), 6 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v8 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param
  2014-06-19  1:31 ` Mario Smarduch
@ 2014-06-19  1:31   ` Mario Smarduch
  -1 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall
  Cc: steve.capper, kvm, linux-arm-kernel, gavin.guo, peter.maydell,
	jays.lee, Mario Smarduch

Patch adds HYP interface for global VM TLB invalidation without address
parameter. Moved VM TLB flushing back to architecture layer.
This patch depends on the unmap_range() patch, it needs to be applied first.
No changes to ARMv8.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_asm.h  |    1 +
 arch/arm/include/asm/kvm_host.h |    2 ++
 arch/arm/kvm/interrupts.S       |   11 +++++++++++
 arch/arm/kvm/mmu.c              |   16 ++++++++++++++++
 4 files changed, 30 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 53b3c4a..21bc519 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..ac3bb65 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -231,4 +231,6 @@ int kvm_perf_teardown(void);
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
+void kvm_tlb_flush_vmid(struct kvm *kvm);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 0d68d40..a3717b7 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid_ipa)
 
+/**
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
+ * @kvm:	pointer to kvm structure
+ *
+ * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
+ * parameter
+ */
+ENTRY(__kvm_tlb_flush_vmid)
+	b	__kvm_tlb_flush_vmid_ipa
+ENDPROC(__kvm_tlb_flush_vmid)
+
 /********************************************************************
  * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
  * domain, for all VMIDs
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2ac9588..e90b9e4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -56,6 +56,22 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+#ifdef CONFIG_ARM
+/**
+ * kvm_tlb_flush_vmid() - flush all VM TLB entries
+ * @kvm:	pointer to kvm structure.
+ *
+ * Interface to HYP function to flush all VM TLB entries without address
+ * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
+ * kvm_tlb_flush_vmid_ipa().
+ */
+void kvm_tlb_flush_vmid(struct kvm *kvm)
+{
+	if (kvm)
+		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+#endif
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
 {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param
@ 2014-06-19  1:31   ` Mario Smarduch
  0 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: linux-arm-kernel

Patch adds HYP interface for global VM TLB invalidation without address
parameter. Moved VM TLB flushing back to architecture layer.
This patch depends on the unmap_range() patch, it needs to be applied first.
No changes to ARMv8.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_asm.h  |    1 +
 arch/arm/include/asm/kvm_host.h |    2 ++
 arch/arm/kvm/interrupts.S       |   11 +++++++++++
 arch/arm/kvm/mmu.c              |   16 ++++++++++++++++
 4 files changed, 30 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 53b3c4a..21bc519 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..ac3bb65 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -231,4 +231,6 @@ int kvm_perf_teardown(void);
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
+void kvm_tlb_flush_vmid(struct kvm *kvm);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 0d68d40..a3717b7 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid_ipa)
 
+/**
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
+ * @kvm:	pointer to kvm structure
+ *
+ * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
+ * parameter
+ */
+ENTRY(__kvm_tlb_flush_vmid)
+	b	__kvm_tlb_flush_vmid_ipa
+ENDPROC(__kvm_tlb_flush_vmid)
+
 /********************************************************************
  * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
  * domain, for all VMIDs
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2ac9588..e90b9e4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -56,6 +56,22 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+#ifdef CONFIG_ARM
+/**
+ * kvm_tlb_flush_vmid() - flush all VM TLB entries
+ * @kvm:	pointer to kvm structure.
+ *
+ * Interface to HYP function to flush all VM TLB entries without address
+ * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
+ * kvm_tlb_flush_vmid_ipa().
+ */
+void kvm_tlb_flush_vmid(struct kvm *kvm)
+{
+	if (kvm)
+		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+#endif
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-06-19  1:31 ` Mario Smarduch
@ 2014-06-19  1:31   ` Mario Smarduch
  -1 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall
  Cc: steve.capper, kvm, linux-arm-kernel, gavin.guo, peter.maydell,
	jays.lee, Mario Smarduch

Patch adds  support for initial write protection VM memlsot. This patch series 
assumes that huge PUDs will not be used in 2nd stage tables. For ARMv8 nothing
happens here.


Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h       |    1 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |    9 +++
 arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ac3bb65..586c467 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -232,5 +232,6 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 void kvm_tlb_flush_vmid(struct kvm *kvm);
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f..08ab5e8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
 	pmd_val(*pmd) |= L_PMD_S2_RDWR;
 }
 
+static inline void kvm_set_s2pte_readonly(pte_t *pte)
+{
+	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
+}
+
+static inline bool kvm_s2pte_readonly(pte_t *pte)
+{
+	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
+}
+
+static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
+{
+	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
+}
+
+static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
+{
+	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
+}
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 #define kvm_pgd_addr_end(addr, end)					\
 ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..d8bb40b 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -129,6 +129,7 @@
 #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
 
+#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 
 /*
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..e11c2dd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_memory_slot *old,
 				   enum kvm_mr_change change)
 {
+#ifdef CONFIG_ARM
+	/*
+	 * At this point memslot has been committed and there is an
+	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
+	 * memory slot is write protected.
+	 */
+	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
+		kvm_mmu_wp_memory_region(kvm, mem->slot);
+#endif
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e90b9e4..37edcbe 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -762,6 +762,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
 	return false;
 }
 
+#ifdef CONFIG_ARM
+/**
+ * stage2_wp_pte_range - write protect PTE range
+ * @pmd:	pointer to pmd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		if (!pte_none(*pte)) {
+			if (!kvm_s2pte_readonly(pte))
+				kvm_set_s2pte_readonly(pte);
+		}
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_wp_pmd_range - write protect PMD range
+ * @pud:	pointer to pud entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+{
+	pmd_t *pmd;
+	phys_addr_t next;
+
+	pmd = pmd_offset(pud, addr);
+
+	do {
+		next = kvm_pmd_addr_end(addr, end);
+		if (!pmd_none(*pmd)) {
+			if (kvm_pmd_huge(*pmd)) {
+				if (!kvm_s2pmd_readonly(pmd))
+					kvm_set_s2pmd_readonly(pmd);
+			} else
+				stage2_wp_pte_range(pmd, addr, next);
+
+		}
+	} while (pmd++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_pud_range - write protect PUD range
+ * @kvm:	pointer to kvm structure
+ * @pud:	pointer to pgd entry
+ * @addr:	range start address
+ * @end:	range end address
+ *
+ * While walking the PUD range huge PUD pages are ignored, in the future this
+ * may need to be revisited. Determine how to handle huge PUDs when logging
+ * of dirty pages is enabled.
+ */
+static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
+				phys_addr_t addr, phys_addr_t end)
+{
+	pud_t *pud;
+	phys_addr_t next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+
+		next = kvm_pud_addr_end(addr, end);
+		/* TODO: huge PUD not supported, revisit later */
+		BUG_ON(pud_huge(*pud));
+		if (!pud_none(*pud))
+			stage2_wp_pmd_range(pud, addr, next);
+	} while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_range() - write protect stage2 memory region range
+ * @kvm:	The KVM pointer
+ * @start:	Start address of range
+ * &end:	End address of range
+ */
+static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+{
+	pgd_t *pgd;
+	phys_addr_t next;
+
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	do {
+		/*
+		 * Release kvm_mmu_lock periodically if the memory region is
+		 * large features like detect hung task, lock detector or lock
+		 * dep  may panic. In addition holding the lock this long will
+		 * also starve other vCPUs. Applies to huge VM memory regions.
+		 */
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		next = kvm_pgd_addr_end(addr, end);
+		if (pgd_present(*pgd))
+			stage2_wp_pud_range(kvm, pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
+
+}
+/**
+ * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot to write protect
+ *
+ * Called to start logging dirty pages after memory region
+ * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
+ * all present PMD and PTEs are write protected in the memory region.
+ * Afterwards read of dirty page log can be called.
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+	spin_lock(&kvm->mmu_lock);
+	stage2_wp_range(kvm, start, end);
+	kvm_tlb_flush_vmid(kvm);
+	spin_unlock(&kvm->mmu_lock);
+}
+#endif
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-06-19  1:31   ` Mario Smarduch
  0 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: linux-arm-kernel

Patch adds  support for initial write protection VM memlsot. This patch series 
assumes that huge PUDs will not be used in 2nd stage tables. For ARMv8 nothing
happens here.


Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h       |    1 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |    9 +++
 arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ac3bb65..586c467 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -232,5 +232,6 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 void kvm_tlb_flush_vmid(struct kvm *kvm);
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f..08ab5e8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
 	pmd_val(*pmd) |= L_PMD_S2_RDWR;
 }
 
+static inline void kvm_set_s2pte_readonly(pte_t *pte)
+{
+	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
+}
+
+static inline bool kvm_s2pte_readonly(pte_t *pte)
+{
+	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
+}
+
+static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
+{
+	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
+}
+
+static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
+{
+	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
+}
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 #define kvm_pgd_addr_end(addr, end)					\
 ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..d8bb40b 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -129,6 +129,7 @@
 #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
 
+#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 
 /*
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..e11c2dd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_memory_slot *old,
 				   enum kvm_mr_change change)
 {
+#ifdef CONFIG_ARM
+	/*
+	 * At this point memslot has been committed and there is an
+	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
+	 * memory slot is write protected.
+	 */
+	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
+		kvm_mmu_wp_memory_region(kvm, mem->slot);
+#endif
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e90b9e4..37edcbe 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -762,6 +762,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
 	return false;
 }
 
+#ifdef CONFIG_ARM
+/**
+ * stage2_wp_pte_range - write protect PTE range
+ * @pmd:	pointer to pmd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		if (!pte_none(*pte)) {
+			if (!kvm_s2pte_readonly(pte))
+				kvm_set_s2pte_readonly(pte);
+		}
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_wp_pmd_range - write protect PMD range
+ * @pud:	pointer to pud entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+{
+	pmd_t *pmd;
+	phys_addr_t next;
+
+	pmd = pmd_offset(pud, addr);
+
+	do {
+		next = kvm_pmd_addr_end(addr, end);
+		if (!pmd_none(*pmd)) {
+			if (kvm_pmd_huge(*pmd)) {
+				if (!kvm_s2pmd_readonly(pmd))
+					kvm_set_s2pmd_readonly(pmd);
+			} else
+				stage2_wp_pte_range(pmd, addr, next);
+
+		}
+	} while (pmd++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_pud_range - write protect PUD range
+ * @kvm:	pointer to kvm structure
+ * @pud:	pointer to pgd entry
+ * @addr:	range start address
+ * @end:	range end address
+ *
+ * While walking the PUD range huge PUD pages are ignored, in the future this
+ * may need to be revisited. Determine how to handle huge PUDs when logging
+ * of dirty pages is enabled.
+ */
+static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
+				phys_addr_t addr, phys_addr_t end)
+{
+	pud_t *pud;
+	phys_addr_t next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+
+		next = kvm_pud_addr_end(addr, end);
+		/* TODO: huge PUD not supported, revisit later */
+		BUG_ON(pud_huge(*pud));
+		if (!pud_none(*pud))
+			stage2_wp_pmd_range(pud, addr, next);
+	} while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_range() - write protect stage2 memory region range
+ * @kvm:	The KVM pointer
+ * @start:	Start address of range
+ * &end:	End address of range
+ */
+static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+{
+	pgd_t *pgd;
+	phys_addr_t next;
+
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	do {
+		/*
+		 * Release kvm_mmu_lock periodically if the memory region is
+		 * large features like detect hung task, lock detector or lock
+		 * dep  may panic. In addition holding the lock this long will
+		 * also starve other vCPUs. Applies to huge VM memory regions.
+		 */
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		next = kvm_pgd_addr_end(addr, end);
+		if (pgd_present(*pgd))
+			stage2_wp_pud_range(kvm, pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
+
+}
+/**
+ * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot to write protect
+ *
+ * Called to start logging dirty pages after memory region
+ * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
+ * all present PMD and PTEs are write protected in the memory region.
+ * Afterwards read of dirty page log can be called.
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+	spin_lock(&kvm->mmu_lock);
+	stage2_wp_range(kvm, start, end);
+	kvm_tlb_flush_vmid(kvm);
+	spin_unlock(&kvm->mmu_lock);
+}
+#endif
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 3/4] arm: dirty log write protect management support
  2014-06-19  1:31 ` Mario Smarduch
@ 2014-06-19  1:31   ` Mario Smarduch
  -1 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall
  Cc: steve.capper, kvm, linux-arm-kernel, gavin.guo, peter.maydell,
	jays.lee, Mario Smarduch

This patch adds support for keeping track of VM dirty pages. As dirty page log
is retrieved, the pages that have been written are write protected again for
next write and log read. For ARMv8 read of dirty log returns invalid operation.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h |    3 ++
 arch/arm/kvm/arm.c              |   83 +++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/mmu.c              |   22 +++++++++++
 3 files changed, 108 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 586c467..dbf3d45 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -233,5 +233,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 void kvm_tlb_flush_vmid(struct kvm *kvm);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e11c2dd..cb3c090 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -783,10 +783,93 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 }
 
+#ifdef CONFIG_ARM
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+						struct kvm_dirty_log *log)
+{
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+
+	lockdep_assert_held(&kvm->slots_lock);
+	if (is_dirty)
+		kvm_tlb_flush_vmid(kvm);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+#else
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return -EINVAL;
 }
+#endif
 
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 37edcbe..1caf511 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -888,6 +888,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	kvm_tlb_flush_vmid(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
+
+/**
+ * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot associated with mask
+ * @gfn_offset:	The gfn offset in memory slot
+ * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
+ *		slot to be write protected
+ *
+ * Walks bits set in mask write protects the associated pte's. Caller must
+ * acquire kvm_mmu_lock.
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask)
+{
+	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
+	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
+	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+
+	stage2_wp_range(kvm, start, end);
+}
 #endif
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 3/4] arm: dirty log write protect management support
@ 2014-06-19  1:31   ` Mario Smarduch
  0 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for keeping track of VM dirty pages. As dirty page log
is retrieved, the pages that have been written are write protected again for
next write and log read. For ARMv8 read of dirty log returns invalid operation.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h |    3 ++
 arch/arm/kvm/arm.c              |   83 +++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/mmu.c              |   22 +++++++++++
 3 files changed, 108 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 586c467..dbf3d45 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -233,5 +233,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 void kvm_tlb_flush_vmid(struct kvm *kvm);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e11c2dd..cb3c090 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -783,10 +783,93 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 }
 
+#ifdef CONFIG_ARM
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+						struct kvm_dirty_log *log)
+{
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+
+	lockdep_assert_held(&kvm->slots_lock);
+	if (is_dirty)
+		kvm_tlb_flush_vmid(kvm);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+#else
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return -EINVAL;
 }
+#endif
 
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 37edcbe..1caf511 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -888,6 +888,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	kvm_tlb_flush_vmid(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
+
+/**
+ * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot associated with mask
+ * @gfn_offset:	The gfn offset in memory slot
+ * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
+ *		slot to be write protected
+ *
+ * Walks bits set in mask write protects the associated pte's. Caller must
+ * acquire kvm_mmu_lock.
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask)
+{
+	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
+	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
+	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+
+	stage2_wp_range(kvm, start, end);
+}
 #endif
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 4/4] arm: dirty page logging 2nd stage page fault handling support
  2014-06-19  1:31 ` Mario Smarduch
@ 2014-06-19  1:31   ` Mario Smarduch
  -1 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall
  Cc: steve.capper, kvm, linux-arm-kernel, gavin.guo, peter.maydell,
	jays.lee, Mario Smarduch

This patch adds support for handling 2nd stage page faults during migration,
it disables faulting in huge pages, and dissolves huge pages to page tables.
In case migration is canceled huge pages will be used again. For ARMv8
logging is hardcoded to false.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1caf511..d49df28 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -641,7 +641,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 }
 
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
-			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
+			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
+			  bool logging_active)
 {
 	pmd_t *pmd;
 	pte_t *pte, old_pte;
@@ -656,6 +657,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 		return 0;
 	}
 
+	/*
+	 * While dirty memory logging, clear PMD entry for huge page and split
+	 * into smaller pages, to track dirty memory at page granularity.
+	 */
+	if (logging_active && kvm_pmd_huge(*pmd)) {
+		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
+		clear_pmd_entry(kvm, pmd, ipa);
+	}
+
 	/* Create stage-2 page mappings - Level 2 */
 	if (pmd_none(*pmd)) {
 		if (!cache)
@@ -708,7 +718,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
+		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
 			goto out;
@@ -925,6 +935,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
 	pfn_t pfn;
+	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
+#ifdef CONFIG_ARM
+	bool logging_active = !!memslot->dirty_bitmap;
+#else
+	bool logging_active = false;
+#endif
 
 	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
 	if (fault_status == FSC_PERM && !write_fault) {
@@ -935,7 +951,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
-	if (is_vm_hugetlb_page(vma)) {
+	if (is_vm_hugetlb_page(vma) && !logging_active) {
 		hugetlb = true;
 		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
 	} else {
@@ -978,7 +994,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	spin_lock(&kvm->mmu_lock);
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
-	if (!hugetlb && !force_pte)
+	if (!hugetlb && !force_pte && !logging_active)
 		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
 	if (hugetlb) {
@@ -997,9 +1013,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
-		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
+		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
+					logging_active);
 	}
 
+	if (write_fault)
+		mark_page_dirty(kvm, gfn);
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
@@ -1150,7 +1169,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
 	pte_t *pte = (pte_t *)data;
 
-	stage2_set_pte(kvm, NULL, gpa, pte, false);
+	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
 }
 
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 4/4] arm: dirty page logging 2nd stage page fault handling support
@ 2014-06-19  1:31   ` Mario Smarduch
  0 siblings, 0 replies; 10+ messages in thread
From: Mario Smarduch @ 2014-06-19  1:31 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for handling 2nd stage page faults during migration,
it disables faulting in huge pages, and dissolves huge pages to page tables.
In case migration is canceled huge pages will be used again. For ARMv8
logging is hardcoded to false.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1caf511..d49df28 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -641,7 +641,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 }
 
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
-			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
+			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
+			  bool logging_active)
 {
 	pmd_t *pmd;
 	pte_t *pte, old_pte;
@@ -656,6 +657,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 		return 0;
 	}
 
+	/*
+	 * While dirty memory logging, clear PMD entry for huge page and split
+	 * into smaller pages, to track dirty memory at page granularity.
+	 */
+	if (logging_active && kvm_pmd_huge(*pmd)) {
+		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
+		clear_pmd_entry(kvm, pmd, ipa);
+	}
+
 	/* Create stage-2 page mappings - Level 2 */
 	if (pmd_none(*pmd)) {
 		if (!cache)
@@ -708,7 +718,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
+		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
 			goto out;
@@ -925,6 +935,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
 	pfn_t pfn;
+	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
+#ifdef CONFIG_ARM
+	bool logging_active = !!memslot->dirty_bitmap;
+#else
+	bool logging_active = false;
+#endif
 
 	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
 	if (fault_status == FSC_PERM && !write_fault) {
@@ -935,7 +951,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
-	if (is_vm_hugetlb_page(vma)) {
+	if (is_vm_hugetlb_page(vma) && !logging_active) {
 		hugetlb = true;
 		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
 	} else {
@@ -978,7 +994,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	spin_lock(&kvm->mmu_lock);
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
-	if (!hugetlb && !force_pte)
+	if (!hugetlb && !force_pte && !logging_active)
 		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
 	if (hugetlb) {
@@ -997,9 +1013,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
-		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
+		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
+					logging_active);
 	}
 
+	if (write_fault)
+		mark_page_dirty(kvm, gfn);
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
@@ -1150,7 +1169,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
 	pte_t *pte = (pte_t *)data;
 
-	stage2_set_pte(kvm, NULL, gpa, pte, false);
+	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
 }
 
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-06-19  1:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-19  1:31 [PATCH v8 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
2014-06-19  1:31 ` Mario Smarduch
2014-06-19  1:31 ` [PATCH v8 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
2014-06-19  1:31   ` Mario Smarduch
2014-06-19  1:31 ` [PATCH v8 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
2014-06-19  1:31   ` Mario Smarduch
2014-06-19  1:31 ` [PATCH v8 3/4] arm: dirty log write protect management support Mario Smarduch
2014-06-19  1:31   ` Mario Smarduch
2014-06-19  1:31 ` [PATCH v8 4/4] arm: dirty page logging 2nd stage page fault handling support Mario Smarduch
2014-06-19  1:31   ` Mario Smarduch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.