linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/4] arm: dirty page logging support for ARMv7
@ 2014-06-03 23:19 Mario Smarduch
  2014-06-03 23:19 ` [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-06-03 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for dirty page logging so far tested only on ARMv7.
With dirty page logging, GICv2 vGIC and arch timer save/restore support, live 
migration is supported. 

Dirty page logging support -
- initially write protects VM RAM memory regions - 2nd stage page tables
- add support to read dirty page log and again write protect the dirty pages 
  - second stage page table for next pass.
- second stage huge page are disolved into page tables to keep track of
  dirty pages at page granularity. Tracking at huge page granularity limits 
  migration to an almost idle system. There are couple approaches to handling
  huge pages:
  1 - break up huge page into page table and write protect all pte's
  2 - clear the PMD entry, create a page table install the faulted page entry
      and write protect it.

  This patch implements #2, in the future #1 may be implemented depending on
  more bench mark results.

  Option 1: may over commit and do unnecessary work, but on heavy loads appears
            to converge faster during live migration
  Option 2: Only write protects pages that are accessed, migration
	    varies, takes longer then Option 1 but eventually catches up.

- In the event migration is canceled, normal behavior is resumed huge pages
  are rebuilt over time.
- Another alternative is use of reverse mappings where for each level 2nd
  stage tables (PTE, PMD, PUD) pointers to spte's are maintained (x86 impl.).
  Primary reverse mapping benefits are for mmu notifiers for large memory range
  invalidations. Reverse mappings also improve dirty page logging, instead of
  walking page tables, spete pointers are accessed directly via reverse map
  array.
- Reverse mappings will be considered for future support once the current
  implementation is hardened.
  o validate current dirty page logging support
  o VMID TLB Flushing, migrating multiple guests
  o GIC/arch-timer migration
  o migration under various loads, primarily page reclaim and validate current
    mmu-notifiers
  o Run benchmarks (lmbench for now) and test impact on performance, and
    optimize
  o Test virtio - since it writes into guest memory. Wait until pci is supported
    on ARM.
  o Currently on ARM, KVM doesn't appear to write into Guest address space,
    need to mark those pages dirty too (???).
- Move onto ARMv8 since 2nd stage mmu is shared between both architectures. 
  But in addition to dirty page log additional support for GIC, arch timers, 
  and emulated devices is required. Also working on emulated platform masks
  a lot of potential bugs, but does help to get majority of code working.

Test Environment:
---------------------------------------------------------------------------
NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, infact 
      initially light loads were succeeding without dirty page logging support.
---------------------------------------------------------------------------
- Will put all components on github, including test setup diagram
- In short summary
  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
    storage, 1GBs Ethernet, with swap enabled
  o NFS Server runing Ubuntu 13.04 
    - both ARM boards mount shared file system 
    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
      file systems.
  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
    - Destination command syntax: can change smp to 4, machine model outdated,
      but has been tested on virt by others (need to upgrade)
	
	/mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
	/mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
	-M vexpress-a15 -cpu cortex-a15 -nographic \
	-append "root=/dev/vda rw console=ttyAMA0 rootwait" \
	-drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
	-device virtio-blk-device,drive=vm1 \
	-netdev type=tap,id=net0,ifname=tap0 \
	-device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
	-incoming tcp:0:4321

    - Source command syntax same except '-incoming'

  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
    has been tested as well.
  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
    pages periodically.
    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
    Example:
    ./dirtyram.arm 102580 812 30
    - dirty 102580 pages
    - 812 pages every 30ms with an incrementing counter 
    - run anywhere from one to as many copies as VM resources can support. If 
      the dirty rate is too high migration will run indefintely
    - run date output loop, check date is picked up smoothly
    - place guest/host into page reclaim/swap mode - by whatever means in this
      case run multiple copies of 'dirtyram.ram' on host
    - issue migrate command(s) on source
    - Top result is 409600, 8192, 5
  o QEMU is instrumented to save RAM memory regions on source and destination
    after memory is migrated, but before guest started. Later files are 
    checksummed on both ends for correctness, given VMs are small this works. 
  o Guest kernel is instrumented to capture current cycle counter - last cycle
    and compare to qemu down time to test arch timer accuracy. 
  o Network failover is at L3 due to interface limitations, ping continues
    working transparently
  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
    level instrumentation code).

Changes since v6:
- primarily reworked initial write protect, and write protect of dirty pages on
  logging request
- Only code logic change, disolve huge pages to page tables in page fault 
  handler
- Made many many changes based on Christoffers comments.

Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs without address param
  dirty page logging inital mem region write protect (w/no huge PUD
    support)
  dirty log write protect management sppport
  dirt page logging 2nd stage page fault handling support

 arch/arm/include/asm/kvm_asm.h        |    1 +
 arch/arm/include/asm/kvm_host.h       |    5 +
 arch/arm/include/asm/kvm_mmu.h        |   20 +++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |   11 +-
 arch/arm/kvm/interrupts.S             |   11 ++
 arch/arm/kvm/mmu.c                    |  243 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                    |   86 ------------
 virt/kvm/kvm_main.c                   |   83 ++++++++++-
 9 files changed, 367 insertions(+), 94 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param
  2014-06-03 23:19 [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
@ 2014-06-03 23:19 ` Mario Smarduch
  2014-06-08 12:05   ` Christoffer Dall
  2014-06-03 23:19 ` [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-03 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

Patch adds HYP interface for global VM TLB invalidation without address
parameter. Added ARM version of kvm_flush_remote_tlbs(), made the generic
implementation a weak symbol.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_asm.h |    1 +
 arch/arm/kvm/interrupts.S      |   11 +++++++++++
 arch/arm/kvm/mmu.c             |   14 ++++++++++++++
 virt/kvm/kvm_main.c            |    2 +-
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 53b3c4a..21bc519 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 0d68d40..bddc66b 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid_ipa)
 
+/**
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
+ *
+ * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
+ * parameter
+ */
+
+ENTRY(__kvm_tlb_flush_vmid)
+	b	__kvm_tlb_flush_vmid_ipa
+ENDPROC(__kvm_tlb_flush_vmid)
+
 /********************************************************************
  * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
  * domain, for all VMIDs
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2ac9588..ef29540 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -56,6 +56,20 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+/**
+ * kvm_flush_remote_tlbs() - flush all VM TLB entries
+ * @kvm:       pointer to kvm structure.
+ *
+ * Interface to HYP function to flush all VM TLB entries without address
+ * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
+ * kvm_tlb_flush_vmid_ipa().
+ */
+void kvm_flush_remote_tlbs(struct kvm *kvm)
+{
+	if (kvm)
+		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fa70c6e..ba25765 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -184,7 +184,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 	return called;
 }
 
-void kvm_flush_remote_tlbs(struct kvm *kvm)
+void __weak kvm_flush_remote_tlbs(struct kvm *kvm)
 {
 	long dirty_count = kvm->tlbs_dirty;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-06-03 23:19 [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
  2014-06-03 23:19 ` [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
@ 2014-06-03 23:19 ` Mario Smarduch
  2014-06-08 12:05   ` Christoffer Dall
  2014-06-03 23:19 ` [PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-03 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

Patch adds memslot support for initial write protection and split up of huge
pages. This patch series assumes that huge PUDs will not be used to map VM
memory. This patch depends on the unmap_range() patch, it needs to be applied
first.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h       |    2 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |    6 ++
 arch/arm/kvm/mmu.c                    |  114 +++++++++++++++++++++++++++++++++
 5 files changed, 143 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..59565f5 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -231,4 +231,6 @@ int kvm_perf_teardown(void);
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f..08ab5e8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
 	pmd_val(*pmd) |= L_PMD_S2_RDWR;
 }
 
+static inline void kvm_set_s2pte_readonly(pte_t *pte)
+{
+	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
+}
+
+static inline bool kvm_s2pte_readonly(pte_t *pte)
+{
+	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
+}
+
+static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
+{
+	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
+}
+
+static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
+{
+	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
+}
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 #define kvm_pgd_addr_end(addr, end)					\
 ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..d8bb40b 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -129,6 +129,7 @@
 #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
 
+#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 
 /*
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..dfd63ac 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -242,6 +242,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_memory_slot *old,
 				   enum kvm_mr_change change)
 {
+	/*
+	 * At this point memslot has been committed and the there is an
+	 * allocated dirty_bitmap[] so marking of diryt pages works now on.
+	 */
+	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
+		kvm_mmu_wp_memory_region(kvm, mem->slot);
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index ef29540..e5dff85 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -760,6 +760,120 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
 	return false;
 }
 
+
+/**
+ * stage2_wp_pte_range - write protect PTE range
+ * @pmd:	pointer to pmd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		if (!pte_none(*pte)) {
+			if (!kvm_s2pte_readonly(pte))
+				kvm_set_s2pte_readonly(pte);
+		}
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_wp_pmd_range - write protect PMD range
+ * @pud:	pointer to pud entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+{
+	pmd_t *pmd;
+	phys_addr_t next;
+
+	pmd = pmd_offset(pud, addr);
+
+	do {
+		next = kvm_pmd_addr_end(addr, end);
+		if (!pmd_none(*pmd)) {
+			if (kvm_pmd_huge(*pmd)) {
+				/*
+				 * Write Protect the PMD, give user_mem_abort()
+				 * a choice to clear and fault on demand or
+				 * break up the huge page.
+				 */
+				if (!kvm_s2pmd_readonly(pmd))
+					kvm_set_s2pmd_readonly(pmd);
+			} else
+				stage2_wp_pte_range(pmd, addr, next);
+
+		}
+	} while (pmd++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_pud_range - write protect PUD range
+ * @kvm:	pointer to kvm structure
+ * @pud:	pointer to pgd entry
+ * @addr:	range start address
+ * @end:	range end address
+ *
+ * While walking the PUD range huge PUD pages are ignored, in the future this
+ * may need to be revisited. Determine how to handle huge PUDs when logging
+ * of dirty pages is enabled.
+ */
+static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
+				phys_addr_t addr, phys_addr_t end)
+{
+	pud_t *pud;
+	phys_addr_t next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		/* Check for contention every PUD range and release CPU */
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		next = kvm_pud_addr_end(addr, end);
+		/* TODO: huge PUD not supported, revisit later */
+		if (!pud_none(*pud))
+			stage2_wp_pmd_range(pud, addr, next);
+	} while (pud++, addr = next, addr != end);
+}
+
+/**
+ * kvm_mmu_wp_memory_region() - initial write protected of memory region slot
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot to write protect
+ *
+ * Called to start logging dirty pages after memory region
+ * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
+ * all present PMD and PTEs are write protected in the memory region.
+ * Afterwards read of dirty page log can be called. Pages not present are
+ * write protected on future access in user_mem_abort().
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
+{
+	pgd_t *pgd;
+	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
+	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	phys_addr_t next;
+
+	spin_lock(&kvm->mmu_lock);
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	do {
+		next = kvm_pgd_addr_end(addr, end);
+		if (pgd_present(*pgd))
+			stage2_wp_pud_range(kvm, pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
+	kvm_flush_remote_tlbs(kvm);
+	spin_unlock(&kvm->mmu_lock);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-03 23:19 [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
  2014-06-03 23:19 ` [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
  2014-06-03 23:19 ` [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
@ 2014-06-03 23:19 ` Mario Smarduch
  2014-06-03 23:19 ` [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support Mario Smarduch
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-06-03 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for keeping track of VM dirty pages. As dirty page log
is retrieved, the pages that have been written are write protected again for
next write and log read.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h |    3 ++
 arch/arm/kvm/arm.c              |    5 ---
 arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              |   86 ---------------------------------------
 virt/kvm/kvm_main.c             |   81 ++++++++++++++++++++++++++++++++++++
 5 files changed, 163 insertions(+), 91 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 59565f5..b760f9c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+	struct kvm_memory_slot *slot,
+	gfn_t gfn_offset, unsigned long mask);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dfd63ac..f06fb21 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -EINVAL;
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e5dff85..1c546c9 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -874,6 +874,85 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	spin_unlock(&kvm->mmu_lock);
 }
 
+/**
+ * stage2_wp_mask_range() - write protect memslot pages set in mask
+ * @pmd - pointer to page table
+ * @start_ipa - the start range of mask
+ * @addr - start_ipa or start range of adjusted mask if crossing PMD range
+ * @mask - mask of dirty pages
+ *
+ * Walk mask and write protect the associated dirty pages in the memory region.
+ * If mask crosses a PMD range adjust it to next page table and return.
+ */
+static void stage2_wp_mask_range(pmd_t *pmd, phys_addr_t start_ipa,
+		phys_addr_t *addr, unsigned long *mask)
+{
+	pte_t *pte;
+	bool crosses_pmd;
+	int i;
+
+	for (i = __ffs(*mask), *addr = start_ipa + i * PAGE_SIZE;
+		*mask;
+		i = __ffs(*mask), *addr = start_ipa + i * PAGE_SIZE) {
+		crosses_pmd = !!((start_ipa & PMD_MASK) ^ (*addr & PMD_MASK));
+		if (unlikely(crosses_pmd)) {
+			/* Adjust mask dirty bits relative to next page table */
+			*mask >>= (PTRS_PER_PTE - pte_index(start_ipa));
+			return;
+		}
+
+		pte = pte_offset_kernel(pmd, *addr);
+		if (!pte_none(*pte))
+			kvm_set_s2pte_readonly(pte);
+		*mask &= ~(1 << i);
+	}
+}
+
+/**
+ * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
+ * @kvm:        The KVM pointer
+ * @slot:       The memory slot associated with mask
+ * @gfn_offset: The gfn offset in memory slot
+ * @mask:       The mask of dirty pages@offset 'gnf_offset' in this memory
+ *              slot to be write protected
+ *
+ * Called from dirty page logging read function to write protect bits set in
+ * mask to record future writes to these pages in dirty page log. This function
+ * uses simplified page table walk knowing that mask spawns range of two PMDs.
+ *
+ * 'kvm->mmu_lock' must be  held to protect against concurrent modification
+ * of page tables (2nd stage fault, mmu modifiers, ...)
+ *
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+	phys_addr_t start_ipa = (slot->base_gfn + gfn_offset) << PAGE_SHIFT;
+	phys_addr_t end_ipa = start_ipa + BITS_PER_LONG * PAGE_SIZE;
+	phys_addr_t addr = start_ipa;
+	pgd_t *pgdp = kvm->arch.pgd, *pgd;
+
+	do {
+		pgd = pgdp + pgd_index(addr);
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, addr);
+			if (!pud_none(*pud) && !pud_huge(*pud)) {
+				pmd = pmd_offset(pud, addr);
+				if (!pmd_none(*pmd) && !kvm_pmd_huge(*pmd))
+					stage2_wp_mask_range(pmd, start_ipa,
+								&addr, &mask);
+				else
+					addr += PMD_SIZE;
+			} else
+				addr += PUD_SIZE;
+		} else
+			addr += PGDIR_SIZE;
+	} while (mask && addr < end_ipa);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5582c3..a603ca3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * We need to keep it in mind that VCPU threads can write to the bitmap
- * concurrently.  So, to avoid losing data, we keep the following order for
- * each bit:
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Flush TLB's if needed.
- *   4. Copy the snapshot to the userspace.
- *
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry.  This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	int r;
-	struct kvm_memory_slot *memslot;
-	unsigned long n, i;
-	unsigned long *dirty_bitmap;
-	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
-		goto out;
-
-	memslot = id_to_memslot(kvm->memslots, log->slot);
-
-	dirty_bitmap = memslot->dirty_bitmap;
-	r = -ENOENT;
-	if (!dirty_bitmap)
-		goto out;
-
-	n = kvm_dirty_bitmap_bytes(memslot);
-
-	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
-	memset(dirty_bitmap_buffer, 0, n);
-
-	spin_lock(&kvm->mmu_lock);
-
-	for (i = 0; i < n / sizeof(long); i++) {
-		unsigned long mask;
-		gfn_t offset;
-
-		if (!dirty_bitmap[i])
-			continue;
-
-		is_dirty = true;
-
-		mask = xchg(&dirty_bitmap[i], 0);
-		dirty_bitmap_buffer[i] = mask;
-
-		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
-	}
-
-	spin_unlock(&kvm->mmu_lock);
-
-	/* See the comments in kvm_mmu_slot_remove_write_access(). */
-	lockdep_assert_held(&kvm->slots_lock);
-
-	/*
-	 * All the TLBs can be flushed out of mmu lock, see the comments in
-	 * kvm_mmu_slot_remove_write_access().
-	 */
-	if (is_dirty)
-		kvm_flush_remote_tlbs(kvm);
-
-	r = -EFAULT;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
-		goto out;
-
-	r = 0;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
 			bool line_status)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba25765..d8d5091 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -429,6 +429,87 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 	return mmu_notifier_register(&kvm->mmu_notifier, current->mm);
 }
 
+
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+
+int __weak kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+						struct kvm_dirty_log *log)
+{
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+	if (is_dirty)
+		kvm_flush_remote_tlbs(kvm);
+
+	spin_unlock(&kvm->mmu_lock);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+
 #else  /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */
 
 static int kvm_init_mmu_notifier(struct kvm *kvm)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support
  2014-06-03 23:19 [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
                   ` (2 preceding siblings ...)
  2014-06-03 23:19 ` [PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
@ 2014-06-03 23:19 ` Mario Smarduch
  2014-06-08 12:05   ` Christoffer Dall
  2014-06-06 17:33 ` [RESEND PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
  2014-06-08 10:45 ` [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Christoffer Dall
  5 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-03 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for handling 2nd stage page faults during migration,
it disables faulting in huge pages, and disolves huge pages to page tables.
In case migration is canceled huge pages will be used again.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/kvm/mmu.c |   36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1c546c9..aca4fbf 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -966,6 +966,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
 	pfn_t pfn;
+	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
+	bool logging_active = !!memslot->dirty_bitmap;
 
 	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
 	if (fault_status == FSC_PERM && !write_fault) {
@@ -1019,10 +1021,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	spin_lock(&kvm->mmu_lock);
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
-	if (!hugetlb && !force_pte)
+
+	/* When logging don't spend cycles to check for huge pages */
+	if (!hugetlb && !force_pte && !logging_active)
 		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
-	if (hugetlb) {
+	/*
+	 * Force all not present/perm faults to PTE handling, address both
+	 * PMD and PTE faults
+	 */
+	if (hugetlb && !logging_active) {
 		pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
 		new_pmd = pmd_mkhuge(new_pmd);
 		if (writable) {
@@ -1034,6 +1042,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	} else {
 		pte_t new_pte = pfn_pte(pfn, PAGE_S2);
 		if (writable) {
+			/*
+			 * If pmd is  mapping a huge page then clear it and let
+			 * stage2_set_pte() create a pte table. At the sametime
+			 * you write protect the pte (PAGE_S2 pgprot_t).
+			 */
+			if (logging_active) {
+				pmd_t *pmd;
+				if (hugetlb) {
+					pfn += pte_index(fault_ipa);
+					gfn = fault_ipa >> PAGE_SHIFT;
+					new_pte = pfn_pte(pfn, PAGE_S2);
+				}
+				pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
+				if (pmd && kvm_pmd_huge(*pmd))
+					clear_pmd_entry(kvm, pmd, fault_ipa);
+			}
 			kvm_set_s2pte_writable(&new_pte);
 			kvm_set_pfn_dirty(pfn);
 		}
@@ -1041,6 +1065,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
 	}
 
+	/*
+	 * Log the dirty page in dirty_bitmap[], call regardless if logging is
+	 * disabled or enabled both cases handled safely.
+	 * TODO: for larger page size mark mulitple dirty page bits for each
+	 *       4k page.
+	 */
+	if (writable)
+		mark_page_dirty(kvm, gfn);
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
@ 2014-06-06 17:33 ` Mario Smarduch
  2014-06-08 12:05   ` Christoffer Dall
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-06 17:33 UTC (permalink / raw)
  To: linux-arm-kernel

kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
changed this function, this patch picks up those changes, re-tested everything
works. Applies cleanly with other patches.

This patch adds support for keeping track of VM dirty pages. As dirty page log
is retrieved, the pages that have been written are write protected again for
next write and log read.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h |    3 ++
 arch/arm/kvm/arm.c              |    5 ---
 arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              |   86 ---------------------------------------
 virt/kvm/kvm_main.c             |   86 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 168 insertions(+), 91 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 59565f5..b760f9c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+	struct kvm_memory_slot *slot,
+	gfn_t gfn_offset, unsigned long mask);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dfd63ac..f06fb21 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -EINVAL;
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e5dff85..907344c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -874,6 +874,85 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	spin_unlock(&kvm->mmu_lock);
 }
 
+/**
+ * stage2_wp_mask_range() - write protect memslot pages set in mask
+ * @pmd - pointer to page table
+ * @start_ipa - the start range of mask
+ * @addr - start_ipa or start range of adjusted mask if crossing PMD range
+ * @mask - mask of dirty pages
+ *
+ * Walk mask and write protect the associated dirty pages in the memory region.
+ * If mask crosses a PMD range adjust it to next page table and return.
+ */
+static void stage2_wp_mask_range(pmd_t *pmd, phys_addr_t start_ipa,
+		phys_addr_t *addr, unsigned long *mask)
+{
+	pte_t *pte;
+	bool crosses_pmd;
+	int i = __ffs(*mask);
+
+	if (unlikely(*addr > start_ipa))
+		start_ipa = *addr - i * PAGE_SIZE;
+	pte = pte_offset_kernel(pmd, start_ipa);
+	for (*addr = start_ipa + i * PAGE_SIZE; *mask;
+		i = __ffs(*mask), *addr = start_ipa + i * PAGE_SIZE) {
+		crosses_pmd = !!((start_ipa & PMD_MASK) ^ (*addr & PMD_MASK));
+		if (unlikely(crosses_pmd)) {
+			/* Adjust mask dirty bits relative to next page table */
+			*mask >>= (PTRS_PER_PTE - pte_index(start_ipa));
+			return;
+		}
+		if (!pte_none(pte[i]))
+			kvm_set_s2pte_readonly(&pte[i]);
+		*mask &= ~(1 << i);
+	}
+}
+
+/**
+ * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
+ * @kvm:        The KVM pointer
+ * @slot:       The memory slot associated with mask
+ * @gfn_offset: The gfn offset in memory slot
+ * @mask:       The mask of dirty pages@offset 'gnf_offset' in this memory
+ *              slot to be write protected
+ *
+ * Called from dirty page logging read function to write protect bits set in
+ * mask to record future writes to these pages in dirty page log. This function
+ * uses simplified page table walk given  mask can spawn no more then 2 PMD
+ * table range.
+ * 'kvm->mmu_lock' must be  held to protect against concurrent modification
+ * of page tables (2nd stage fault, mmu modifiers, ...)
+ *
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+	phys_addr_t start_ipa = (slot->base_gfn + gfn_offset) << PAGE_SHIFT;
+	phys_addr_t end_ipa = start_ipa + BITS_PER_LONG * PAGE_SIZE;
+	phys_addr_t addr = start_ipa;
+	pgd_t *pgdp = kvm->arch.pgd, *pgd;
+
+	do {
+		pgd = pgdp + pgd_index(addr);
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, addr);
+			if (!pud_none(*pud) && !pud_huge(*pud)) {
+				pmd = pmd_offset(pud, addr);
+				if (!pmd_none(*pmd) && !kvm_pmd_huge(*pmd))
+					stage2_wp_mask_range(pmd, start_ipa,
+								&addr, &mask);
+				else
+					addr += PMD_SIZE;
+			} else
+				addr += PUD_SIZE;
+		} else
+			addr += PGDIR_SIZE;
+	} while (mask && addr < end_ipa);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5582c3..a603ca3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * We need to keep it in mind that VCPU threads can write to the bitmap
- * concurrently.  So, to avoid losing data, we keep the following order for
- * each bit:
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Flush TLB's if needed.
- *   4. Copy the snapshot to the userspace.
- *
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry.  This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	int r;
-	struct kvm_memory_slot *memslot;
-	unsigned long n, i;
-	unsigned long *dirty_bitmap;
-	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
-		goto out;
-
-	memslot = id_to_memslot(kvm->memslots, log->slot);
-
-	dirty_bitmap = memslot->dirty_bitmap;
-	r = -ENOENT;
-	if (!dirty_bitmap)
-		goto out;
-
-	n = kvm_dirty_bitmap_bytes(memslot);
-
-	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
-	memset(dirty_bitmap_buffer, 0, n);
-
-	spin_lock(&kvm->mmu_lock);
-
-	for (i = 0; i < n / sizeof(long); i++) {
-		unsigned long mask;
-		gfn_t offset;
-
-		if (!dirty_bitmap[i])
-			continue;
-
-		is_dirty = true;
-
-		mask = xchg(&dirty_bitmap[i], 0);
-		dirty_bitmap_buffer[i] = mask;
-
-		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
-	}
-
-	spin_unlock(&kvm->mmu_lock);
-
-	/* See the comments in kvm_mmu_slot_remove_write_access(). */
-	lockdep_assert_held(&kvm->slots_lock);
-
-	/*
-	 * All the TLBs can be flushed out of mmu lock, see the comments in
-	 * kvm_mmu_slot_remove_write_access().
-	 */
-	if (is_dirty)
-		kvm_flush_remote_tlbs(kvm);
-
-	r = -EFAULT;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
-		goto out;
-
-	r = 0;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
 			bool line_status)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba25765..c87c612 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -429,6 +429,92 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 	return mmu_notifier_register(&kvm->mmu_notifier, current->mm);
 }
 
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+
+	/* See the comments in kvm_mmu_slot_remove_write_access(). */
+	lockdep_assert_held(&kvm->slots_lock);
+
+	/*
+	 * All the TLBs can be flushed out of mmu lock, see the comments in
+	 * kvm_mmu_slot_remove_write_access().
+	 */
+	if (is_dirty)
+		kvm_flush_remote_tlbs(kvm);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+
 #else  /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */
 
 static int kvm_init_mmu_notifier(struct kvm *kvm)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v7 0/4] arm: dirty page logging support for ARMv7
  2014-06-03 23:19 [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
                   ` (4 preceding siblings ...)
  2014-06-06 17:33 ` [RESEND PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
@ 2014-06-08 10:45 ` Christoffer Dall
  2014-06-09 17:02   ` Mario Smarduch
  5 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-08 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 03, 2014 at 04:19:23PM -0700, Mario Smarduch wrote:
> This patch adds support for dirty page logging so far tested only on ARMv7.
> With dirty page logging, GICv2 vGIC and arch timer save/restore support, live 
> migration is supported. 
> 
> Dirty page logging support -
> - initially write protects VM RAM memory regions - 2nd stage page tables
> - add support to read dirty page log and again write protect the dirty pages 
>   - second stage page table for next pass.
> - second stage huge page are disolved into page tables to keep track of
>   dirty pages at page granularity. Tracking at huge page granularity limits 
>   migration to an almost idle system. There are couple approaches to handling
>   huge pages:
>   1 - break up huge page into page table and write protect all pte's
>   2 - clear the PMD entry, create a page table install the faulted page entry
>       and write protect it.

not sure I fully understand.  Is option 2 simply write-protecting all
PMDs and splitting it at fault time?

> 
>   This patch implements #2, in the future #1 may be implemented depending on
>   more bench mark results.
> 
>   Option 1: may over commit and do unnecessary work, but on heavy loads appears
>             to converge faster during live migration
>   Option 2: Only write protects pages that are accessed, migration
> 	    varies, takes longer then Option 1 but eventually catches up.
> 
> - In the event migration is canceled, normal behavior is resumed huge pages
>   are rebuilt over time.
> - Another alternative is use of reverse mappings where for each level 2nd
>   stage tables (PTE, PMD, PUD) pointers to spte's are maintained (x86 impl.).
>   Primary reverse mapping benefits are for mmu notifiers for large memory range
>   invalidations. Reverse mappings also improve dirty page logging, instead of
>   walking page tables, spete pointers are accessed directly via reverse map
>   array.
> - Reverse mappings will be considered for future support once the current
>   implementation is hardened.

Is the following a list of your future work?

>   o validate current dirty page logging support
>   o VMID TLB Flushing, migrating multiple guests
>   o GIC/arch-timer migration
>   o migration under various loads, primarily page reclaim and validate current
>     mmu-notifiers
>   o Run benchmarks (lmbench for now) and test impact on performance, and
>     optimize
>   o Test virtio - since it writes into guest memory. Wait until pci is supported
>     on ARM.

So you're not testing with virtio now?  Your command line below seems to
suggest that in fact you are.  /me confused.

>   o Currently on ARM, KVM doesn't appear to write into Guest address space,
>     need to mark those pages dirty too (???).

not sure what you mean here, can you expand?

> - Move onto ARMv8 since 2nd stage mmu is shared between both architectures. 
>   But in addition to dirty page log additional support for GIC, arch timers, 
>   and emulated devices is required. Also working on emulated platform masks
>   a lot of potential bugs, but does help to get majority of code working.
> 
> Test Environment:
> ---------------------------------------------------------------------------
> NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, infact 
>       initially light loads were succeeding without dirty page logging support.
> ---------------------------------------------------------------------------
> - Will put all components on github, including test setup diagram
> - In short summary
>   o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
>     storage, 1GBs Ethernet, with swap enabled
>   o NFS Server runing Ubuntu 13.04 
>     - both ARM boards mount shared file system 
>     - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
>       file systems.
>   o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
>   o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
>     - Destination command syntax: can change smp to 4, machine model outdated,
>       but has been tested on virt by others (need to upgrade)
> 	
> 	/mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
> 	/mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
> 	-M vexpress-a15 -cpu cortex-a15 -nographic \
> 	-append "root=/dev/vda rw console=ttyAMA0 rootwait" \
> 	-drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
> 	-device virtio-blk-device,drive=vm1 \
> 	-netdev type=tap,id=net0,ifname=tap0 \
> 	-device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
> 	-incoming tcp:0:4321
> 
>     - Source command syntax same except '-incoming'
> 
>   o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
>     has been tested as well.
>   o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
>     pages periodically.
>     ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
>     Example:
>     ./dirtyram.arm 102580 812 30
>     - dirty 102580 pages
>     - 812 pages every 30ms with an incrementing counter 
>     - run anywhere from one to as many copies as VM resources can support. If 
>       the dirty rate is too high migration will run indefintely
>     - run date output loop, check date is picked up smoothly
>     - place guest/host into page reclaim/swap mode - by whatever means in this
>       case run multiple copies of 'dirtyram.ram' on host
>     - issue migrate command(s) on source
>     - Top result is 409600, 8192, 5
>   o QEMU is instrumented to save RAM memory regions on source and destination
>     after memory is migrated, but before guest started. Later files are 
>     checksummed on both ends for correctness, given VMs are small this works. 
>   o Guest kernel is instrumented to capture current cycle counter - last cycle
>     and compare to qemu down time to test arch timer accuracy. 
>   o Network failover is at L3 due to interface limitations, ping continues
>     working transparently
>   o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
>     level instrumentation code).
> 

Thanks for the info, this makes it much clearer to me how you're testing
this and I will try to reprocuce.

-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param
  2014-06-03 23:19 ` [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
@ 2014-06-08 12:05   ` Christoffer Dall
  2014-06-09 17:06     ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-08 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 03, 2014 at 04:19:24PM -0700, Mario Smarduch wrote:
> Patch adds HYP interface for global VM TLB invalidation without address
> parameter. Added ARM version of kvm_flush_remote_tlbs(), made the generic
> implementation a weak symbol.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_asm.h |    1 +
>  arch/arm/kvm/interrupts.S      |   11 +++++++++++
>  arch/arm/kvm/mmu.c             |   14 ++++++++++++++
>  virt/kvm/kvm_main.c            |    2 +-
>  4 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 53b3c4a..21bc519 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>  
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  #endif
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 0d68d40..bddc66b 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>  	bx	lr
>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>  
> +/**
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
> + *
> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
> + * parameter
> + */
> +
> +ENTRY(__kvm_tlb_flush_vmid)
> +	b	__kvm_tlb_flush_vmid_ipa
> +ENDPROC(__kvm_tlb_flush_vmid)
> +
>  /********************************************************************
>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>   * domain, for all VMIDs
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2ac9588..ef29540 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -56,6 +56,20 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>  }
>  
> +/**
> + * kvm_flush_remote_tlbs() - flush all VM TLB entries
> + * @kvm:       pointer to kvm structure.
> + *
> + * Interface to HYP function to flush all VM TLB entries without address
> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
> + * kvm_tlb_flush_vmid_ipa().
> + */
> +void kvm_flush_remote_tlbs(struct kvm *kvm)
> +{
> +	if (kvm)
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> +}
> +
>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>  				  int min, int max)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fa70c6e..ba25765 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -184,7 +184,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
>  	return called;
>  }
>  
> -void kvm_flush_remote_tlbs(struct kvm *kvm)
> +void __weak kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
>  	long dirty_count = kvm->tlbs_dirty;
>  
> -- 
> 1.7.9.5
> 

This doesn't build or link on aarch64 :(

-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-06-03 23:19 ` [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
@ 2014-06-08 12:05   ` Christoffer Dall
  2014-06-09 17:58     ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-08 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 03, 2014 at 04:19:25PM -0700, Mario Smarduch wrote:
> Patch adds memslot support for initial write protection and split up of huge
> pages. This patch series assumes that huge PUDs will not be used to map VM
> memory. This patch depends on the unmap_range() patch, it needs to be applied
> first.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h       |    2 +
>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>  arch/arm/include/asm/pgtable-3level.h |    1 +
>  arch/arm/kvm/arm.c                    |    6 ++
>  arch/arm/kvm/mmu.c                    |  114 +++++++++++++++++++++++++++++++++
>  5 files changed, 143 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 193ceaf..59565f5 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -231,4 +231,6 @@ int kvm_perf_teardown(void);
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5cc0b0f..08ab5e8 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>  }
>  
> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> +{
> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> +{
> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> +}
> +
> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> +{
> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> +{
> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> +}
> +

not crazy about the names, how about kvm_set_s2_pte_readonly etc.?

the fact that these don't exist for arm64 makes me think it may break
the build for arm64 as well...

>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>  #define kvm_pgd_addr_end(addr, end)					\
>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..d8bb40b 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -129,6 +129,7 @@
>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>  
> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>  
>  /*
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3c82b37..dfd63ac 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -242,6 +242,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  				   const struct kvm_memory_slot *old,
>  				   enum kvm_mr_change change)
>  {
> +	/*
> +	 * At this point memslot has been committed and the there is an
> +	 * allocated dirty_bitmap[] so marking of diryt pages works now on.

s/diryt/dirty/

"works now on" ?

> +	 */
> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>  }
>  
>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index ef29540..e5dff85 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -760,6 +760,120 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>  	return false;
>  }
>  
> +
> +/**
> + * stage2_wp_pte_range - write protect PTE range
> + * @pmd:	pointer to pmd entry
> + * @addr:	range start address
> + * @end:	range end address
> + */
> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> +{
> +	pte_t *pte;
> +
> +	pte = pte_offset_kernel(pmd, addr);
> +	do {
> +		if (!pte_none(*pte)) {
> +			if (!kvm_s2pte_readonly(pte))
> +				kvm_set_s2pte_readonly(pte);

do you need the test before setting readonly?

> +		}
> +	} while (pte++, addr += PAGE_SIZE, addr != end);
> +}
> +
> +/**
> + * stage2_wp_pmd_range - write protect PMD range
> + * @pud:	pointer to pud entry
> + * @addr:	range start address
> + * @end:	range end address
> + */
> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
> +{
> +	pmd_t *pmd;
> +	phys_addr_t next;
> +
> +	pmd = pmd_offset(pud, addr);
> +
> +	do {
> +		next = kvm_pmd_addr_end(addr, end);
> +		if (!pmd_none(*pmd)) {
> +			if (kvm_pmd_huge(*pmd)) {
> +				/*
> +				 * Write Protect the PMD, give user_mem_abort()
> +				 * a choice to clear and fault on demand or
> +				 * break up the huge page.
> +				 */

I think this comment here is unnecessary.  If this function is used for
other purposes, it will be misleading.

> +				if (!kvm_s2pmd_readonly(pmd))
> +					kvm_set_s2pmd_readonly(pmd);

same as above

> +			} else
> +				stage2_wp_pte_range(pmd, addr, next);
> +
> +		}
> +	} while (pmd++, addr = next, addr != end);
> +}
> +
> +/**
> + * stage2_wp_pud_range - write protect PUD range
> + * @kvm:	pointer to kvm structure
> + * @pud:	pointer to pgd entry
> + * @addr:	range start address
> + * @end:	range end address
> + *
> + * While walking the PUD range huge PUD pages are ignored, in the future this
> + * may need to be revisited. Determine how to handle huge PUDs when logging
> + * of dirty pages is enabled.
> + */
> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
> +				phys_addr_t addr, phys_addr_t end)
> +{
> +	pud_t *pud;
> +	phys_addr_t next;
> +
> +	pud = pud_offset(pgd, addr);
> +	do {
> +		/* Check for contention every PUD range and release CPU */
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))

Why do we need this here?

> +			cond_resched_lock(&kvm->mmu_lock);

is this really safe?  This will unlock the mmu_lock and all sorts of
stuff could happen, in between.  For example, couldn't the compiler have
cached the pud value here?  It feels extremely dicy.


> +
> +		next = kvm_pud_addr_end(addr, end);
> +		/* TODO: huge PUD not supported, revisit later */

BUG_ON(kvm_pud_huge(*pud))  ?

> +		if (!pud_none(*pud))
> +			stage2_wp_pmd_range(pud, addr, next);
> +	} while (pud++, addr = next, addr != end);
> +}
> +
> +/**
> + * kvm_mmu_wp_memory_region() - initial write protected of memory region slot

I think this should be:

... - write protect stage 2 entries for memory slot

> + * @kvm:	The KVM pointer
> + * @slot:	The memory slot to write protect
> + *
> + * Called to start logging dirty pages after memory region
> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
> + * all present PMD and PTEs are write protected in the memory region.
> + * Afterwards read of dirty page log can be called. Pages not present are
> + * write protected on future access in user_mem_abort().
> + *
> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
> + * serializing operations for VM memory regions.
> + */
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
> +{
> +	pgd_t *pgd;
> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
> +	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
> +	phys_addr_t next;
> +
> +	spin_lock(&kvm->mmu_lock);
> +	pgd = kvm->arch.pgd + pgd_index(addr);
> +	do {
> +		next = kvm_pgd_addr_end(addr, end);
> +		if (pgd_present(*pgd))
> +			stage2_wp_pud_range(kvm, pgd, addr, next);
> +	} while (pgd++, addr = next, addr != end);

you probably want to factor out everything beginnign with pgd = (and the
variable declarations) into stage2_wp_range(kvm, start, end).

> +	kvm_flush_remote_tlbs(kvm);
> +	spin_unlock(&kvm->mmu_lock);
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot,
>  			  unsigned long fault_status)
> -- 
> 1.7.9.5
> 

This is moving in the right direction.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-06 17:33 ` [RESEND PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
@ 2014-06-08 12:05   ` Christoffer Dall
  2014-06-10  1:47     ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-08 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 06, 2014 at 10:33:41AM -0700, Mario Smarduch wrote:
> kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
> changed this function, this patch picks up those changes, re-tested everything
> works. Applies cleanly with other patches.
> 
> This patch adds support for keeping track of VM dirty pages. As dirty page log
> is retrieved, the pages that have been written are write protected again for
> next write and log read.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h |    3 ++
>  arch/arm/kvm/arm.c              |    5 ---
>  arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c              |   86 ---------------------------------------
>  virt/kvm/kvm_main.c             |   86 +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 168 insertions(+), 91 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 59565f5..b760f9c 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +	struct kvm_memory_slot *slot,
> +	gfn_t gfn_offset, unsigned long mask);

Do all other architectures implement this function?  arm64?

>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dfd63ac..f06fb21 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	}
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> -{
> -	return -EINVAL;
> -}
> -

What about the other architectures implementing this function?

>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>  					struct kvm_arm_device_addr *dev_addr)
>  {
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index e5dff85..907344c 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -874,6 +874,85 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> +/**
> + * stage2_wp_mask_range() - write protect memslot pages set in mask
> + * @pmd - pointer to page table
> + * @start_ipa - the start range of mask
> + * @addr - start_ipa or start range of adjusted mask if crossing PMD range
> + * @mask - mask of dirty pages
> + *
> + * Walk mask and write protect the associated dirty pages in the memory region.
> + * If mask crosses a PMD range adjust it to next page table and return.
> + */
> +static void stage2_wp_mask_range(pmd_t *pmd, phys_addr_t start_ipa,
> +		phys_addr_t *addr, unsigned long *mask)
> +{
> +	pte_t *pte;
> +	bool crosses_pmd;
> +	int i = __ffs(*mask);
> +
> +	if (unlikely(*addr > start_ipa))
> +		start_ipa = *addr - i * PAGE_SIZE;

huh?

> +	pte = pte_offset_kernel(pmd, start_ipa);
> +	for (*addr = start_ipa + i * PAGE_SIZE; *mask;
> +		i = __ffs(*mask), *addr = start_ipa + i * PAGE_SIZE) {
> +		crosses_pmd = !!((start_ipa & PMD_MASK) ^ (*addr & PMD_MASK));
> +		if (unlikely(crosses_pmd)) {
> +			/* Adjust mask dirty bits relative to next page table */
> +			*mask >>= (PTRS_PER_PTE - pte_index(start_ipa));
> +			return;
> +		}
> +		if (!pte_none(pte[i]))
> +			kvm_set_s2pte_readonly(&pte[i]);
> +		*mask &= ~(1 << i);

This is *really* complicated, and *really* unintuitive and *really* hard
to read!

I feel this may very likely break, and is optimizing prematurely for
some very special case.  Can't you follow the usual scheme of traversing
the levels one-by-one and just calculate the 'end' address based on the
number of bits in your long, and just adjust the mask in the calling
function each time you are about to call a lower-level function?

In fact, I think this could be trivially implemented as an extension to
your existing wp_range functions.  On ARM you are mostly going to
consider 32 pages, on arm64 you are mostly going to consider 64 pages,
just calculate that range in terms of IPAs and set that as the limit for
calling stage2_wp_pgd_range (which should be factor'ed out into its
function and called from kvm_mmu_wp_memory_region).



> +	}
> +}
> +
> +/**
> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
> + * @kvm:        The KVM pointer
> + * @slot:       The memory slot associated with mask
> + * @gfn_offset: The gfn offset in memory slot
> + * @mask:       The mask of dirty pages at offset 'gnf_offset' in this memory

s/gnf_offset/gfn_offset/

> + *              slot to be write protected
> + *
> + * Called from dirty page logging read function to write protect bits set in
> + * mask to record future writes to these pages in dirty page log. This function
> + * uses simplified page table walk given  mask can spawn no more then 2 PMD

random double white space before mask

> + * table range.
> + * 'kvm->mmu_lock' must be  held to protect against concurrent modification
> + * of page tables (2nd stage fault, mmu modifiers, ...)
> + *
> + */
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +		struct kvm_memory_slot *slot,
> +		gfn_t gfn_offset, unsigned long mask)
> +{
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	phys_addr_t start_ipa = (slot->base_gfn + gfn_offset) << PAGE_SHIFT;
> +	phys_addr_t end_ipa = start_ipa + BITS_PER_LONG * PAGE_SIZE;
> +	phys_addr_t addr = start_ipa;
> +	pgd_t *pgdp = kvm->arch.pgd, *pgd;
> +
> +	do {
> +		pgd = pgdp + pgd_index(addr);
> +		if (pgd_present(*pgd)) {
> +			pud = pud_offset(pgd, addr);
> +			if (!pud_none(*pud) && !pud_huge(*pud)) {
> +				pmd = pmd_offset(pud, addr);
> +				if (!pmd_none(*pmd) && !kvm_pmd_huge(*pmd))
> +					stage2_wp_mask_range(pmd, start_ipa,
> +								&addr, &mask);
> +				else
> +					addr += PMD_SIZE;
> +			} else
> +				addr += PUD_SIZE;

is this correct? what if your gfn_offset puts you at the last page of a
PUD, don't you need pud_addr_end() instead?

> +		} else
> +			addr += PGDIR_SIZE;

please use braces for both of the single-line else-clauses above when
the main if-clause is multi-line (see Documentation/CodingStyle chapter
3, just before 3.1).

> +	} while (mask && addr < end_ipa);

this seems like a complicated loop condition.  It seems to me that
either you clear all the bits in the mask you want to check or you check
for an address range, why is there a need for both?

> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot,
>  			  unsigned long fault_status)

[...]

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support
  2014-06-03 23:19 ` [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support Mario Smarduch
@ 2014-06-08 12:05   ` Christoffer Dall
  2014-06-10 18:23     ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-08 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 03, 2014 at 04:19:27PM -0700, Mario Smarduch wrote:
> This patch adds support for handling 2nd stage page faults during migration,
> it disables faulting in huge pages, and disolves huge pages to page tables.

s/disolves/dissolves/g

> In case migration is canceled huge pages will be used again.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/kvm/mmu.c |   36 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 1c546c9..aca4fbf 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -966,6 +966,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
>  	pfn_t pfn;
> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
> +	bool logging_active = !!memslot->dirty_bitmap;

>  
>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>  	if (fault_status == FSC_PERM && !write_fault) {
> @@ -1019,10 +1021,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	spin_lock(&kvm->mmu_lock);
>  	if (mmu_notifier_retry(kvm, mmu_seq))
>  		goto out_unlock;
> -	if (!hugetlb && !force_pte)
> +
> +	/* When logging don't spend cycles to check for huge pages */

drop the comment: either explain the entire clause (which would be too
long) or don't explain anything.

> +	if (!hugetlb && !force_pte && !logging_active)

instead of having all this, can't you just change 

if (is_vm_hugetlb_page(vma)) to
if (is_vm_hugetlb_page(vma) && !logging_active)

then you're also not mucking around with the gfn etc.

>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>  
> -	if (hugetlb) {
> +	/*
> +	 * Force all not present/perm faults to PTE handling, address both
> +	 * PMD and PTE faults
> +	 */

I don't understand this comment?  In which case does this apply?

> +	if (hugetlb && !logging_active) {
>  		pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
>  		new_pmd = pmd_mkhuge(new_pmd);
>  		if (writable) {
> @@ -1034,6 +1042,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	} else {
>  		pte_t new_pte = pfn_pte(pfn, PAGE_S2);
>  		if (writable) {
> +			/*
> +			 * If pmd is  mapping a huge page then clear it and let
> +			 * stage2_set_pte() create a pte table. At the sametime
> +			 * you write protect the pte (PAGE_S2 pgprot_t).
> +			 */
> +			if (logging_active) {
> +				pmd_t *pmd;
> +				if (hugetlb) {
> +					pfn += pte_index(fault_ipa);
> +					gfn = fault_ipa >> PAGE_SHIFT;
> +					new_pte = pfn_pte(pfn, PAGE_S2);
> +				}
> +				pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
> +				if (pmd && kvm_pmd_huge(*pmd))
> +					clear_pmd_entry(kvm, pmd, fault_ipa);
> +			}

now instead of all this, you just need to check for kvm_pmd_huge() in
stage2_set_pte() and if that's true, you clear it, and then then install
your new pte.

>  			kvm_set_s2pte_writable(&new_pte);
>  			kvm_set_pfn_dirty(pfn);
>  		}
> @@ -1041,6 +1065,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>  	}
>  
> +	/*
> +	 * Log the dirty page in dirty_bitmap[], call regardless if logging is
> +	 * disabled or enabled both cases handled safely.
> +	 * TODO: for larger page size mark mulitple dirty page bits for each
> +	 *       4k page.
> +	 */
> +	if (writable)
> +		mark_page_dirty(kvm, gfn);

what if you just faulted in a page on a read which wasn't present
before but it happens to belong to a writeable memslot, is that page
then dirty? hmmm.


>  
>  out_unlock:
>  	spin_unlock(&kvm->mmu_lock);
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 0/4] arm: dirty page logging support for ARMv7
  2014-06-08 10:45 ` [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Christoffer Dall
@ 2014-06-09 17:02   ` Mario Smarduch
  0 siblings, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-06-09 17:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/2014 03:45 AM, Christoffer Dall wrote:
> On Tue, Jun 03, 2014 at 04:19:23PM -0700, Mario Smarduch wrote:
>> This patch adds support for dirty page logging so far tested only on ARMv7.
>> With dirty page logging, GICv2 vGIC and arch timer save/restore support, live 
>> migration is supported. 
>>
>> Dirty page logging support -
>> - initially write protects VM RAM memory regions - 2nd stage page tables
>> - add support to read dirty page log and again write protect the dirty pages 
>>   - second stage page table for next pass.
>> - second stage huge page are disolved into page tables to keep track of
>>   dirty pages at page granularity. Tracking at huge page granularity limits 
>>   migration to an almost idle system. There are couple approaches to handling
>>   huge pages:
>>   1 - break up huge page into page table and write protect all pte's
>>   2 - clear the PMD entry, create a page table install the faulted page entry
>>       and write protect it.
> 
> not sure I fully understand.  Is option 2 simply write-protecting all
> PMDs and splitting it at fault time?

No that's 1 above. Option 2 is the optimized solution you describe in patch 4 
review - clear the PMD and let stage2_set_pte allocate a page table and install 
the pte, then it's demand faulting on future access to that PMD range.
> 
>>
>>   This patch implements #2, in the future #1 may be implemented depending on
>>   more bench mark results.
>>
>>   Option 1: may over commit and do unnecessary work, but on heavy loads appears
>>             to converge faster during live migration
>>   Option 2: Only write protects pages that are accessed, migration
>> 	    varies, takes longer then Option 1 but eventually catches up.
>>
>> - In the event migration is canceled, normal behavior is resumed huge pages
>>   are rebuilt over time.
>> - Another alternative is use of reverse mappings where for each level 2nd
>>   stage tables (PTE, PMD, PUD) pointers to spte's are maintained (x86 impl.).
>>   Primary reverse mapping benefits are for mmu notifiers for large memory range
>>   invalidations. Reverse mappings also improve dirty page logging, instead of
>>   walking page tables, spete pointers are accessed directly via reverse map
>>   array.
>> - Reverse mappings will be considered for future support once the current
>>   implementation is hardened.
> 
> Is the following a list of your future work?

I guess yes and no, with exception of lmbench I've ran these tests also
couple other folks have tested with prior revisions. I'll run
more (overnight, burn in tests) adding lmbench, but I'm hoping 
others will run tests to give this more run time, different loads 
and so on.
> 
>>   o validate current dirty page logging support
>>   o VMID TLB Flushing, migrating multiple guests
>>   o GIC/arch-timer migration
>>   o migration under various loads, primarily page reclaim and validate current
>>     mmu-notifiers
>>   o Run benchmarks (lmbench for now) and test impact on performance, and
>>     optimize
>>   o Test virtio - since it writes into guest memory. Wait until pci is supported
>>     on ARM.
> 
> So you're not testing with virtio now?  Your command line below seems to
> suggest that in fact you are.  /me confused.

Yes so I've see no errors with virtio-mmio transport and virto-net-device,
blk-device backends under moderate loads. But virtio inbound is purely user space 
in this case QEMU  so I can't say with certainty that virtio is 100%. 
Sometime back I found problems with virtio-mmio when transport and backend 
are not fused together none of the performance options (UFO, TSO, Partial 
Checksum...) got applied, like they did for virti-net-pci. So to summarize 
I need to see how virtio tracks dirty pages for virtio-mmio, 
and  virtio-pci in QEMU. I have fair idea where to look but have not
done so yet.


> 
>>   o Currently on ARM, KVM doesn't appear to write into Guest address space,
>>     need to mark those pages dirty too (???).
> 
> not sure what you mean here, can you expand?

For few architectures KVM writes into guest memory, one example is PV-EOI,
will write into guest memory to disable/enable PV-EOI while injecting an
interrupt - based one number of in flight interrupts. There is other code
that does it too, but I'm not familiar with all the use cases. So if we do
that on ARM the page(s) must marked dirty.

> 
>> - Move onto ARMv8 since 2nd stage mmu is shared between both architectures. 
>>   But in addition to dirty page log additional support for GIC, arch timers, 
>>   and emulated devices is required. Also working on emulated platform masks
>>   a lot of potential bugs, but does help to get majority of code working.
>>
>> Test Environment:
>> ---------------------------------------------------------------------------
>> NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, infact 
>>       initially light loads were succeeding without dirty page logging support.
>> ---------------------------------------------------------------------------
>> - Will put all components on github, including test setup diagram
>> - In short summary
>>   o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
>>     storage, 1GBs Ethernet, with swap enabled
>>   o NFS Server runing Ubuntu 13.04 
>>     - both ARM boards mount shared file system 
>>     - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
>>       file systems.
>>   o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
>>   o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
>>     - Destination command syntax: can change smp to 4, machine model outdated,
>>       but has been tested on virt by others (need to upgrade)
>> 	
>> 	/mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
>> 	/mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
>> 	-M vexpress-a15 -cpu cortex-a15 -nographic \
>> 	-append "root=/dev/vda rw console=ttyAMA0 rootwait" \
>> 	-drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
>> 	-device virtio-blk-device,drive=vm1 \
>> 	-netdev type=tap,id=net0,ifname=tap0 \
>> 	-device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
>> 	-incoming tcp:0:4321
>>
>>     - Source command syntax same except '-incoming'
>>
>>   o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
>>     has been tested as well.
>>   o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
>>     pages periodically.
>>     ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
>>     Example:
>>     ./dirtyram.arm 102580 812 30
>>     - dirty 102580 pages
>>     - 812 pages every 30ms with an incrementing counter 
>>     - run anywhere from one to as many copies as VM resources can support. If 
>>       the dirty rate is too high migration will run indefintely
>>     - run date output loop, check date is picked up smoothly
>>     - place guest/host into page reclaim/swap mode - by whatever means in this
>>       case run multiple copies of 'dirtyram.ram' on host
>>     - issue migrate command(s) on source
>>     - Top result is 409600, 8192, 5
>>   o QEMU is instrumented to save RAM memory regions on source and destination
>>     after memory is migrated, but before guest started. Later files are 
>>     checksummed on both ends for correctness, given VMs are small this works. 
>>   o Guest kernel is instrumented to capture current cycle counter - last cycle
>>     and compare to qemu down time to test arch timer accuracy. 
>>   o Network failover is at L3 due to interface limitations, ping continues
>>     working transparently
>>   o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
>>     level instrumentation code).
>>
> 
> Thanks for the info, this makes it much clearer to me how you're testing
> this and I will try to reprocuce.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param
  2014-06-08 12:05   ` Christoffer Dall
@ 2014-06-09 17:06     ` Mario Smarduch
  2014-06-09 17:49       ` Christoffer Dall
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-09 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> On Tue, Jun 03, 2014 at 04:19:24PM -0700, Mario Smarduch wrote:
>> Patch adds HYP interface for global VM TLB invalidation without address
>> parameter. Added ARM version of kvm_flush_remote_tlbs(), made the generic
>> implementation a weak symbol.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_asm.h |    1 +
>>  arch/arm/kvm/interrupts.S      |   11 +++++++++++
>>  arch/arm/kvm/mmu.c             |   14 ++++++++++++++
>>  virt/kvm/kvm_main.c            |    2 +-
>>  4 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 53b3c4a..21bc519 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>>  
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  #endif
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 0d68d40..bddc66b 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>  	bx	lr
>>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>  
>> +/**
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
>> + *
>> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
>> + * parameter
>> + */
>> +
>> +ENTRY(__kvm_tlb_flush_vmid)
>> +	b	__kvm_tlb_flush_vmid_ipa
>> +ENDPROC(__kvm_tlb_flush_vmid)
>> +
>>  /********************************************************************
>>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>>   * domain, for all VMIDs
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 2ac9588..ef29540 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -56,6 +56,20 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>  }
>>  
>> +/**
>> + * kvm_flush_remote_tlbs() - flush all VM TLB entries
>> + * @kvm:       pointer to kvm structure.
>> + *
>> + * Interface to HYP function to flush all VM TLB entries without address
>> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
>> + * kvm_tlb_flush_vmid_ipa().
>> + */
>> +void kvm_flush_remote_tlbs(struct kvm *kvm)
>> +{
>> +	if (kvm)
>> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>> +}
>> +
>>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>>  				  int min, int max)
>>  {
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index fa70c6e..ba25765 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -184,7 +184,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
>>  	return called;
>>  }
>>  
>> -void kvm_flush_remote_tlbs(struct kvm *kvm)
>> +void __weak kvm_flush_remote_tlbs(struct kvm *kvm)
>>  {
>>  	long dirty_count = kvm->tlbs_dirty;
>>  
>> -- 
>> 1.7.9.5
>>
> 
> This doesn't build or link on aarch64 :(
> 
> -Christoffer
> 

I'll recompile and retest the dirty page logging portion on ARMv8 and
resolve these issues, early next week.

In the meantime if it's ok with you, I'' move forward with the
rest of the patches on ARMv7 to get through critical issues.

Would that work?

- Mario

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param
  2014-06-09 17:06     ` Mario Smarduch
@ 2014-06-09 17:49       ` Christoffer Dall
  2014-06-09 18:36         ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-09 17:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 09, 2014 at 10:06:29AM -0700, Mario Smarduch wrote:
> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> > On Tue, Jun 03, 2014 at 04:19:24PM -0700, Mario Smarduch wrote:
> >> Patch adds HYP interface for global VM TLB invalidation without address
> >> parameter. Added ARM version of kvm_flush_remote_tlbs(), made the generic
> >> implementation a weak symbol.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/include/asm/kvm_asm.h |    1 +
> >>  arch/arm/kvm/interrupts.S      |   11 +++++++++++
> >>  arch/arm/kvm/mmu.c             |   14 ++++++++++++++
> >>  virt/kvm/kvm_main.c            |    2 +-
> >>  4 files changed, 27 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> >> index 53b3c4a..21bc519 100644
> >> --- a/arch/arm/include/asm/kvm_asm.h
> >> +++ b/arch/arm/include/asm/kvm_asm.h
> >> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
> >>  
> >>  extern void __kvm_flush_vm_context(void);
> >>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >>  
> >>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >>  #endif
> >> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> >> index 0d68d40..bddc66b 100644
> >> --- a/arch/arm/kvm/interrupts.S
> >> +++ b/arch/arm/kvm/interrupts.S
> >> @@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
> >>  	bx	lr
> >>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
> >>  
> >> +/**
> >> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
> >> + *
> >> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
> >> + * parameter
> >> + */
> >> +
> >> +ENTRY(__kvm_tlb_flush_vmid)
> >> +	b	__kvm_tlb_flush_vmid_ipa
> >> +ENDPROC(__kvm_tlb_flush_vmid)
> >> +
> >>  /********************************************************************
> >>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
> >>   * domain, for all VMIDs
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 2ac9588..ef29540 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -56,6 +56,20 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> >>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
> >>  }
> >>  
> >> +/**
> >> + * kvm_flush_remote_tlbs() - flush all VM TLB entries
> >> + * @kvm:       pointer to kvm structure.
> >> + *
> >> + * Interface to HYP function to flush all VM TLB entries without address
> >> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
> >> + * kvm_tlb_flush_vmid_ipa().
> >> + */
> >> +void kvm_flush_remote_tlbs(struct kvm *kvm)
> >> +{
> >> +	if (kvm)
> >> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> >> +}
> >> +
> >>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
> >>  				  int min, int max)
> >>  {
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index fa70c6e..ba25765 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -184,7 +184,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
> >>  	return called;
> >>  }
> >>  
> >> -void kvm_flush_remote_tlbs(struct kvm *kvm)
> >> +void __weak kvm_flush_remote_tlbs(struct kvm *kvm)
> >>  {
> >>  	long dirty_count = kvm->tlbs_dirty;
> >>  
> >> -- 
> >> 1.7.9.5
> >>
> > 
> > This doesn't build or link on aarch64 :(
> > 
> > -Christoffer
> > 
> 
> I'll recompile and retest the dirty page logging portion on ARMv8 and
> resolve these issues, early next week.
> 
> In the meantime if it's ok with you, I'' move forward with the
> rest of the patches on ARMv7 to get through critical issues.
> 
> Would that work?

Your patches need to at least compile with other architectures, I
suggest fixing that up front.  You also eventually need to somehow test
(or ask maintainers to test) a branch with your patches on the
architectures that your code messes with.  This later point can be
addressed once we're close to a consensus on ARM.

-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-06-08 12:05   ` Christoffer Dall
@ 2014-06-09 17:58     ` Mario Smarduch
  2014-06-09 18:09       ` Christoffer Dall
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-09 17:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> On Tue, Jun 03, 2014 at 04:19:25PM -0700, Mario Smarduch wrote:
>> Patch adds memslot support for initial write protection and split up of huge
>> pages. This patch series assumes that huge PUDs will not be used to map VM
>> memory. This patch depends on the unmap_range() patch, it needs to be applied
>> first.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h       |    2 +
>>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>  arch/arm/include/asm/pgtable-3level.h |    1 +
>>  arch/arm/kvm/arm.c                    |    6 ++
>>  arch/arm/kvm/mmu.c                    |  114 +++++++++++++++++++++++++++++++++
>>  5 files changed, 143 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 193ceaf..59565f5 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,4 +231,6 @@ int kvm_perf_teardown(void);
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>> +
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>  }
>>  
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
> 
> not crazy about the names, how about kvm_set_s2_pte_readonly etc.?
> 
So kvm_set_s2pte_writable(pte_t *pte) was there already just following
that convention.

> the fact that these don't exist for arm64 makes me think it may break
> the build for arm64 as well...

Yes will address it.
> 
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  #define kvm_pgd_addr_end(addr, end)					\
>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>  
>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>  
>>  /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..dfd63ac 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>  				   const struct kvm_memory_slot *old,
>>  				   enum kvm_mr_change change)
>>  {
>> +	/*
>> +	 * At this point memslot has been committed and the there is an
>> +	 * allocated dirty_bitmap[] so marking of diryt pages works now on.
> 
> s/diryt/dirty/
> 
> "works now on" ?
Ok
> 
>> +	 */
>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index ef29540..e5dff85 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -760,6 +760,120 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>>  	return false;
>>  }
>>  
>> +
>> +/**
>> + * stage2_wp_pte_range - write protect PTE range
>> + * @pmd:	pointer to pmd entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pte_t *pte;
>> +
>> +	pte = pte_offset_kernel(pmd, addr);
>> +	do {
>> +		if (!pte_none(*pte)) {
>> +			if (!kvm_s2pte_readonly(pte))
>> +				kvm_set_s2pte_readonly(pte);
> 
> do you need the test before setting readonly?
Probably not.

Some memory regions have hardly any pages present and sometimes
not dirty. Was thinking of couple enhancements not to flush if
there are no dirty pages or few dirty pages then just flush by IPA.
But currently not doing anything with this info, leave it for
future.

> 
>> +		}
>> +	} while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pmd_range - write protect PMD range
>> + * @pud:	pointer to pud entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pmd_t *pmd;
>> +	phys_addr_t next;
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +
>> +	do {
>> +		next = kvm_pmd_addr_end(addr, end);
>> +		if (!pmd_none(*pmd)) {
>> +			if (kvm_pmd_huge(*pmd)) {
>> +				/*
>> +				 * Write Protect the PMD, give user_mem_abort()
>> +				 * a choice to clear and fault on demand or
>> +				 * break up the huge page.
>> +				 */
> 
> I think this comment here is unnecessary.  If this function is used for
> other purposes, it will be misleading.

Ok.
> 
>> +				if (!kvm_s2pmd_readonly(pmd))
>> +					kvm_set_s2pmd_readonly(pmd);
> 
> same as above
> 
>> +			} else
>> +				stage2_wp_pte_range(pmd, addr, next);
>> +
>> +		}
>> +	} while (pmd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pud_range - write protect PUD range
>> + * @kvm:	pointer to kvm structure
>> + * @pud:	pointer to pgd entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + *
>> + * While walking the PUD range huge PUD pages are ignored, in the future this
>> + * may need to be revisited. Determine how to handle huge PUDs when logging
>> + * of dirty pages is enabled.
>> + */
>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>> +				phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pud_t *pud;
>> +	phys_addr_t next;
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	do {
>> +		/* Check for contention every PUD range and release CPU */
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock)
> 
> Why do we need this here?
> 
>> +			cond_resched_lock(&kvm->mmu_lock);
> 
> is this really safe?  This will unlock the mmu_lock and all sorts of
> stuff could happen, in between.  For example, couldn't the compiler have
> cached the pud value here?  It feels extremely dicy.

During testing either DETECT_HUNG_TASK, LOCK_DETECTOR, LOCK_DEP I don't
recall paniced the system with lockup detected I think the
thread was running longer then 5s this was for a 2GB memory region. Back
then I was splitting pages in the initial write protect. In addition
you also starve the other vCPUs. But if you have huge VM with a huge
memory region it can cause problems.

But code does have a bug it relies on a stale value of the pud entry.
need to move up where PGDs are walked and recheck the value
of the pgd after you emerge cond_resched_lock(), will reassess.

> 
> 
>> +
>> +		next = kvm_pud_addr_end(addr, end);
>> +		/* TODO: huge PUD not supported, revisit later */
> 
> BUG_ON(kvm_pud_huge(*pud))  ?
> 
>> +		if (!pud_none(*pud))
>> +			stage2_wp_pmd_range(pud, addr, next);
>> +	} while (pud++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * kvm_mmu_wp_memory_region() - initial write protected of memory region slot
> 
> I think this should be:
> 
> ... - write protect stage 2 entries for memory slot

sure.
> 
>> + * @kvm:	The KVM pointer
>> + * @slot:	The memory slot to write protect
>> + *
>> + * Called to start logging dirty pages after memory region
>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>> + * all present PMD and PTEs are write protected in the memory region.
>> + * Afterwards read of dirty page log can be called. Pages not present are
>> + * write protected on future access in user_mem_abort().
>> + *
>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>> + * serializing operations for VM memory regions.
>> + */
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>> +{
>> +	pgd_t *pgd;
>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>> +	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>> +	phys_addr_t next;
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>> +	do {
>> +		next = kvm_pgd_addr_end(addr, end);
>> +		if (pgd_present(*pgd))
>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>> +	} while (pgd++, addr = next, addr != end);
> 
> you probably want to factor out everything beginnign with pgd = (and the
> variable declarations) into stage2_wp_range(kvm, start, end).

So I understand define another function stage2_wp_range()?
If that's it that's fine.
> 
>> +	kvm_flush_remote_tlbs(kvm);
>> +	spin_unlock(&kvm->mmu_lock);
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot,
>>  			  unsigned long fault_status)
>> -- 
>> 1.7.9.5
>>
> 
> This is moving in the right direction.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-06-09 17:58     ` Mario Smarduch
@ 2014-06-09 18:09       ` Christoffer Dall
  2014-06-09 18:33         ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-09 18:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 09, 2014 at 10:58:18AM -0700, Mario Smarduch wrote:
> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> > On Tue, Jun 03, 2014 at 04:19:25PM -0700, Mario Smarduch wrote:
> >> Patch adds memslot support for initial write protection and split up of huge
> >> pages. This patch series assumes that huge PUDs will not be used to map VM
> >> memory. This patch depends on the unmap_range() patch, it needs to be applied
> >> first.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/include/asm/kvm_host.h       |    2 +
> >>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
> >>  arch/arm/include/asm/pgtable-3level.h |    1 +
> >>  arch/arm/kvm/arm.c                    |    6 ++
> >>  arch/arm/kvm/mmu.c                    |  114 +++++++++++++++++++++++++++++++++
> >>  5 files changed, 143 insertions(+)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 193ceaf..59565f5 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -231,4 +231,6 @@ int kvm_perf_teardown(void);
> >>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >>  
> >> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >> +
> >>  #endif /* __ARM_KVM_HOST_H__ */
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index 5cc0b0f..08ab5e8 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
> >>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
> >>  }
> >>  
> >> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> >> +{
> >> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> >> +}
> >> +
> >> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> >> +{
> >> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> >> +}
> >> +
> >> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> >> +{
> >> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> >> +}
> >> +
> >> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> >> +{
> >> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> >> +}
> >> +
> > 
> > not crazy about the names, how about kvm_set_s2_pte_readonly etc.?
> > 
> So kvm_set_s2pte_writable(pte_t *pte) was there already just following
> that convention.
> 

ah, ok, no problem then.

> > the fact that these don't exist for arm64 makes me think it may break
> > the build for arm64 as well...
> 
> Yes will address it.
> > 
> >>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
> >>  #define kvm_pgd_addr_end(addr, end)					\
> >>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
> >> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> >> index 85c60ad..d8bb40b 100644
> >> --- a/arch/arm/include/asm/pgtable-3level.h
> >> +++ b/arch/arm/include/asm/pgtable-3level.h
> >> @@ -129,6 +129,7 @@
> >>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
> >>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
> >>  
> >> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
> >>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
> >>  
> >>  /*
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index 3c82b37..dfd63ac 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -242,6 +242,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> >>  				   const struct kvm_memory_slot *old,
> >>  				   enum kvm_mr_change change)
> >>  {
> >> +	/*
> >> +	 * At this point memslot has been committed and the there is an
> >> +	 * allocated dirty_bitmap[] so marking of diryt pages works now on.
> > 
> > s/diryt/dirty/
> > 
> > "works now on" ?
> Ok

I don't understand what "works now on" means, so you need to clarify.

> > 
> >> +	 */
> >> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
> >>  }
> >>  
> >>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index ef29540..e5dff85 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -760,6 +760,120 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
> >>  	return false;
> >>  }
> >>  
> >> +
> >> +/**
> >> + * stage2_wp_pte_range - write protect PTE range
> >> + * @pmd:	pointer to pmd entry
> >> + * @addr:	range start address
> >> + * @end:	range end address
> >> + */
> >> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> >> +{
> >> +	pte_t *pte;
> >> +
> >> +	pte = pte_offset_kernel(pmd, addr);
> >> +	do {
> >> +		if (!pte_none(*pte)) {
> >> +			if (!kvm_s2pte_readonly(pte))
> >> +				kvm_set_s2pte_readonly(pte);
> > 
> > do you need the test before setting readonly?
> Probably not.
> 
> Some memory regions have hardly any pages present and sometimes
> not dirty. Was thinking of couple enhancements not to flush if
> there are no dirty pages or few dirty pages then just flush by IPA.
> But currently not doing anything with this info, leave it for
> future.
> 

hmmh, yeah, maybe it's better to keep it the way you have it now for
cache purposes, not sure.

> > 
> >> +		}
> >> +	} while (pte++, addr += PAGE_SIZE, addr != end);
> >> +}
> >> +
> >> +/**
> >> + * stage2_wp_pmd_range - write protect PMD range
> >> + * @pud:	pointer to pud entry
> >> + * @addr:	range start address
> >> + * @end:	range end address
> >> + */
> >> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
> >> +{
> >> +	pmd_t *pmd;
> >> +	phys_addr_t next;
> >> +
> >> +	pmd = pmd_offset(pud, addr);
> >> +
> >> +	do {
> >> +		next = kvm_pmd_addr_end(addr, end);
> >> +		if (!pmd_none(*pmd)) {
> >> +			if (kvm_pmd_huge(*pmd)) {
> >> +				/*
> >> +				 * Write Protect the PMD, give user_mem_abort()
> >> +				 * a choice to clear and fault on demand or
> >> +				 * break up the huge page.
> >> +				 */
> > 
> > I think this comment here is unnecessary.  If this function is used for
> > other purposes, it will be misleading.
> 
> Ok.
> > 
> >> +				if (!kvm_s2pmd_readonly(pmd))
> >> +					kvm_set_s2pmd_readonly(pmd);
> > 
> > same as above
> > 
> >> +			} else
> >> +				stage2_wp_pte_range(pmd, addr, next);
> >> +
> >> +		}
> >> +	} while (pmd++, addr = next, addr != end);
> >> +}
> >> +
> >> +/**
> >> + * stage2_wp_pud_range - write protect PUD range
> >> + * @kvm:	pointer to kvm structure
> >> + * @pud:	pointer to pgd entry
> >> + * @addr:	range start address
> >> + * @end:	range end address
> >> + *
> >> + * While walking the PUD range huge PUD pages are ignored, in the future this
> >> + * may need to be revisited. Determine how to handle huge PUDs when logging
> >> + * of dirty pages is enabled.
> >> + */
> >> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
> >> +				phys_addr_t addr, phys_addr_t end)
> >> +{
> >> +	pud_t *pud;
> >> +	phys_addr_t next;
> >> +
> >> +	pud = pud_offset(pgd, addr);
> >> +	do {
> >> +		/* Check for contention every PUD range and release CPU */
> >> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock)
> > 
> > Why do we need this here?
> > 
> >> +			cond_resched_lock(&kvm->mmu_lock);
> > 
> > is this really safe?  This will unlock the mmu_lock and all sorts of
> > stuff could happen, in between.  For example, couldn't the compiler have
> > cached the pud value here?  It feels extremely dicy.
> 
> During testing either DETECT_HUNG_TASK, LOCK_DETECTOR, LOCK_DEP I don't
> recall paniced the system with lockup detected I think the
> thread was running longer then 5s this was for a 2GB memory region. Back
> then I was splitting pages in the initial write protect. In addition
> you also starve the other vCPUs. But if you have huge VM with a huge
> memory region it can cause problems.

hmm, ok, fair enough.

> 
> But code does have a bug it relies on a stale value of the pud entry.
> need to move up where PGDs are walked and recheck the value
> of the pgd after you emerge cond_resched_lock(), will reassess.
> 

yeah, you need to check it after you've come back and taken the lock.
With 4-level page tables doing this at the pud level will probably
break, I think.

> > 
> > 
> >> +
> >> +		next = kvm_pud_addr_end(addr, end);
> >> +		/* TODO: huge PUD not supported, revisit later */
> > 
> > BUG_ON(kvm_pud_huge(*pud))  ?
> > 
> >> +		if (!pud_none(*pud))
> >> +			stage2_wp_pmd_range(pud, addr, next);
> >> +	} while (pud++, addr = next, addr != end);
> >> +}
> >> +
> >> +/**
> >> + * kvm_mmu_wp_memory_region() - initial write protected of memory region slot
> > 
> > I think this should be:
> > 
> > ... - write protect stage 2 entries for memory slot
> 
> sure.
> > 
> >> + * @kvm:	The KVM pointer
> >> + * @slot:	The memory slot to write protect
> >> + *
> >> + * Called to start logging dirty pages after memory region
> >> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
> >> + * all present PMD and PTEs are write protected in the memory region.
> >> + * Afterwards read of dirty page log can be called. Pages not present are
> >> + * write protected on future access in user_mem_abort().
> >> + *
> >> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
> >> + * serializing operations for VM memory regions.
> >> + */
> >> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
> >> +{
> >> +	pgd_t *pgd;
> >> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
> >> +	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
> >> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
> >> +	phys_addr_t next;
> >> +
> >> +	spin_lock(&kvm->mmu_lock);
> >> +	pgd = kvm->arch.pgd + pgd_index(addr);
> >> +	do {
> >> +		next = kvm_pgd_addr_end(addr, end);
> >> +		if (pgd_present(*pgd))
> >> +			stage2_wp_pud_range(kvm, pgd, addr, next);
> >> +	} while (pgd++, addr = next, addr != end);
> > 
> > you probably want to factor out everything beginnign with pgd = (and the
> > variable declarations) into stage2_wp_range(kvm, start, end).
> 
> So I understand define another function stage2_wp_range()?
> If that's it that's fine.

yes, take the spinlock etc. in this function, but when it comes to
actually walking the tables etc. do this in a separate function - this
can then be reused for other purposes.

> > 
> >> +	kvm_flush_remote_tlbs(kvm);
> >> +	spin_unlock(&kvm->mmu_lock);
> >> +}
> >> +
> >>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			  struct kvm_memory_slot *memslot,
> >>  			  unsigned long fault_status)
> >> -- 
> >> 1.7.9.5
> >>
> > 
> > This is moving in the right direction.
> > 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-06-09 18:09       ` Christoffer Dall
@ 2014-06-09 18:33         ` Mario Smarduch
  0 siblings, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-06-09 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/09/2014 11:09 AM, Christoffer Dall wrote:
> On Mon, Jun 09, 2014 at 10:58:18AM -0700, Mario Smarduch wrote:
>> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
>>> On Tue, Jun 03, 2014 at 04:19:25PM -0700, Mario Smarduch wrote:
>>>> Patch adds memslot support for initial write protection and split up of huge
>>>> pages. This patch series assumes that huge PUDs will not be used to map VM
>>>> memory. This patch depends on the unmap_range() patch, it needs to be applied
>>>> first.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_host.h       |    2 +
>>>>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>>>  arch/arm/include/asm/pgtable-3level.h |    1 +
>>>>  arch/arm/kvm/arm.c                    |    6 ++
>>>>  arch/arm/kvm/mmu.c                    |  114 +++++++++++++++++++++++++++++++++
>>>>  5 files changed, 143 insertions(+)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>> index 193ceaf..59565f5 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -231,4 +231,6 @@ int kvm_perf_teardown(void);
>>>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>>>  
>>>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>>> +
>>>>  #endif /* __ARM_KVM_HOST_H__ */
>>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>>>> index 5cc0b0f..08ab5e8 100644
>>>> --- a/arch/arm/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm/include/asm/kvm_mmu.h
>>>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>>>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>>>  }
>>>>  
>>>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>>>> +{
>>>> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>>>> +}
>>>> +
>>>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>>>> +{
>>>> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>>>> +}
>>>> +
>>>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>>> +{
>>>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>>>> +}
>>>> +
>>>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>>> +{
>>>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>>> +}
>>>> +
>>>
>>> not crazy about the names, how about kvm_set_s2_pte_readonly etc.?
>>>
>> So kvm_set_s2pte_writable(pte_t *pte) was there already just following
>> that convention.
>>
> 
> ah, ok, no problem then.
> 
>>> the fact that these don't exist for arm64 makes me think it may break
>>> the build for arm64 as well...
>>
>> Yes will address it.
>>>
>>>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>>>  #define kvm_pgd_addr_end(addr, end)					\
>>>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>>>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>>>> index 85c60ad..d8bb40b 100644
>>>> --- a/arch/arm/include/asm/pgtable-3level.h
>>>> +++ b/arch/arm/include/asm/pgtable-3level.h
>>>> @@ -129,6 +129,7 @@
>>>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>>>  
>>>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>>>  
>>>>  /*
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 3c82b37..dfd63ac 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -242,6 +242,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>>>  				   const struct kvm_memory_slot *old,
>>>>  				   enum kvm_mr_change change)
>>>>  {
>>>> +	/*
>>>> +	 * At this point memslot has been committed and the there is an
>>>> +	 * allocated dirty_bitmap[] so marking of diryt pages works now on.
>>>
>>> s/diryt/dirty/
>>>
>>> "works now on" ?
>> Ok
Sorry I thought it was comment. This function is called after
the memslots have been committed so we know dirty bit map
has been allocated and marking the dirty bitmap will work as the pages
are being write protected and we're getting faults.
> 
> I don't understand what "works now on" means, so you need to clarify.
> 
>>>
>>>> +	 */
>>>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>>>>  }
>>>>  
>>>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index ef29540..e5dff85 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -760,6 +760,120 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>>>>  	return false;
>>>>  }
>>>>  
>>>> +
>>>> +/**
>>>> + * stage2_wp_pte_range - write protect PTE range
>>>> + * @pmd:	pointer to pmd entry
>>>> + * @addr:	range start address
>>>> + * @end:	range end address
>>>> + */
>>>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>>>> +{
>>>> +	pte_t *pte;
>>>> +
>>>> +	pte = pte_offset_kernel(pmd, addr);
>>>> +	do {
>>>> +		if (!pte_none(*pte)) {
>>>> +			if (!kvm_s2pte_readonly(pte))
>>>> +				kvm_set_s2pte_readonly(pte);
>>>
>>> do you need the test before setting readonly?
>> Probably not.
>>
>> Some memory regions have hardly any pages present and sometimes
>> not dirty. Was thinking of couple enhancements not to flush if
>> there are no dirty pages or few dirty pages then just flush by IPA.
>> But currently not doing anything with this info, leave it for
>> future.
>>
> 
> hmmh, yeah, maybe it's better to keep it the way you have it now for
> cache purposes, not sure.
> 
>>>
>>>> +		}
>>>> +	} while (pte++, addr += PAGE_SIZE, addr != end);
>>>> +}
>>>> +
>>>> +/**
>>>> + * stage2_wp_pmd_range - write protect PMD range
>>>> + * @pud:	pointer to pud entry
>>>> + * @addr:	range start address
>>>> + * @end:	range end address
>>>> + */
>>>> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
>>>> +{
>>>> +	pmd_t *pmd;
>>>> +	phys_addr_t next;
>>>> +
>>>> +	pmd = pmd_offset(pud, addr);
>>>> +
>>>> +	do {
>>>> +		next = kvm_pmd_addr_end(addr, end);
>>>> +		if (!pmd_none(*pmd)) {
>>>> +			if (kvm_pmd_huge(*pmd)) {
>>>> +				/*
>>>> +				 * Write Protect the PMD, give user_mem_abort()
>>>> +				 * a choice to clear and fault on demand or
>>>> +				 * break up the huge page.
>>>> +				 */
>>>
>>> I think this comment here is unnecessary.  If this function is used for
>>> other purposes, it will be misleading.
>>
>> Ok.
>>>
>>>> +				if (!kvm_s2pmd_readonly(pmd))
>>>> +					kvm_set_s2pmd_readonly(pmd);
>>>
>>> same as above
>>>
>>>> +			} else
>>>> +				stage2_wp_pte_range(pmd, addr, next);
>>>> +
>>>> +		}
>>>> +	} while (pmd++, addr = next, addr != end);
>>>> +}
>>>> +
>>>> +/**
>>>> + * stage2_wp_pud_range - write protect PUD range
>>>> + * @kvm:	pointer to kvm structure
>>>> + * @pud:	pointer to pgd entry
>>>> + * @addr:	range start address
>>>> + * @end:	range end address
>>>> + *
>>>> + * While walking the PUD range huge PUD pages are ignored, in the future this
>>>> + * may need to be revisited. Determine how to handle huge PUDs when logging
>>>> + * of dirty pages is enabled.
>>>> + */
>>>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>>>> +				phys_addr_t addr, phys_addr_t end)
>>>> +{
>>>> +	pud_t *pud;
>>>> +	phys_addr_t next;
>>>> +
>>>> +	pud = pud_offset(pgd, addr);
>>>> +	do {
>>>> +		/* Check for contention every PUD range and release CPU */
>>>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock)
>>>
>>> Why do we need this here?
>>>
>>>> +			cond_resched_lock(&kvm->mmu_lock);
>>>
>>> is this really safe?  This will unlock the mmu_lock and all sorts of
>>> stuff could happen, in between.  For example, couldn't the compiler have
>>> cached the pud value here?  It feels extremely dicy.
>>
>> During testing either DETECT_HUNG_TASK, LOCK_DETECTOR, LOCK_DEP I don't
>> recall paniced the system with lockup detected I think the
>> thread was running longer then 5s this was for a 2GB memory region. Back
>> then I was splitting pages in the initial write protect. In addition
>> you also starve the other vCPUs. But if you have huge VM with a huge
>> memory region it can cause problems.
> 
> hmm, ok, fair enough.

Now that I think of it unamp_range() may run into this, but where it's
called from I'm not sure how you can release the lock.

> 
>>
>> But code does have a bug it relies on a stale value of the pud entry.
>> need to move up where PGDs are walked and recheck the value
>> of the pgd after you emerge cond_resched_lock(), will reassess.
>>
> 
> yeah, you need to check it after you've come back and taken the lock.
> With 4-level page tables doing this at the pud level will probably
> break, I think.
> 
>>>
>>>
>>>> +
>>>> +		next = kvm_pud_addr_end(addr, end);
>>>> +		/* TODO: huge PUD not supported, revisit later */
>>>
>>> BUG_ON(kvm_pud_huge(*pud))  ?
>>>
>>>> +		if (!pud_none(*pud))
>>>> +			stage2_wp_pmd_range(pud, addr, next);
>>>> +	} while (pud++, addr = next, addr != end);
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_mmu_wp_memory_region() - initial write protected of memory region slot
>>>
>>> I think this should be:
>>>
>>> ... - write protect stage 2 entries for memory slot
>>
>> sure.
>>>
>>>> + * @kvm:	The KVM pointer
>>>> + * @slot:	The memory slot to write protect
>>>> + *
>>>> + * Called to start logging dirty pages after memory region
>>>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>>>> + * all present PMD and PTEs are write protected in the memory region.
>>>> + * Afterwards read of dirty page log can be called. Pages not present are
>>>> + * write protected on future access in user_mem_abort().
>>>> + *
>>>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>>>> + * serializing operations for VM memory regions.
>>>> + */
>>>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>>>> +{
>>>> +	pgd_t *pgd;
>>>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>>>> +	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>>>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>>>> +	phys_addr_t next;
>>>> +
>>>> +	spin_lock(&kvm->mmu_lock);
>>>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>>>> +	do {
>>>> +		next = kvm_pgd_addr_end(addr, end);
>>>> +		if (pgd_present(*pgd))
>>>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>>>> +	} while (pgd++, addr = next, addr != end);
>>>
>>> you probably want to factor out everything beginnign with pgd = (and the
>>> variable declarations) into stage2_wp_range(kvm, start, end).
>>
>> So I understand define another function stage2_wp_range()?
>> If that's it that's fine.
> 
> yes, take the spinlock etc. in this function, but when it comes to
> actually walking the tables etc. do this in a separate function - this
> can then be reused for other purposes.

Ok got it.

I'll comeback on patch 3/4 later. Patch 4 comments simplify the code
nicely and the marking is a bug need to add fault condition as well.

Thanks,
  Mario
> 
>>>
>>>> +	kvm_flush_remote_tlbs(kvm);
>>>> +	spin_unlock(&kvm->mmu_lock);
>>>> +}
>>>> +
>>>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			  struct kvm_memory_slot *memslot,
>>>>  			  unsigned long fault_status)
>>>> -- 
>>>> 1.7.9.5
>>>>
>>>
>>> This is moving in the right direction.
>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param
  2014-06-09 17:49       ` Christoffer Dall
@ 2014-06-09 18:36         ` Mario Smarduch
  0 siblings, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-06-09 18:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/09/2014 10:49 AM, Christoffer Dall wrote:
> On Mon, Jun 09, 2014 at 10:06:29AM -0700, Mario Smarduch wrote:
>> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
>>> On Tue, Jun 03, 2014 at 04:19:24PM -0700, Mario Smarduch wrote:
>>>> Patch adds HYP interface for global VM TLB invalidation without address
>>>> parameter. Added ARM version of kvm_flush_remote_tlbs(), made the generic
>>>> implementation a weak symbol.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_asm.h |    1 +
>>>>  arch/arm/kvm/interrupts.S      |   11 +++++++++++
>>>>  arch/arm/kvm/mmu.c             |   14 ++++++++++++++
>>>>  virt/kvm/kvm_main.c            |    2 +-
>>>>  4 files changed, 27 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>>>> index 53b3c4a..21bc519 100644
>>>> --- a/arch/arm/include/asm/kvm_asm.h
>>>> +++ b/arch/arm/include/asm/kvm_asm.h
>>>> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>>>>  
>>>>  extern void __kvm_flush_vm_context(void);
>>>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>>>  
>>>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>>>  #endif
>>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>>> index 0d68d40..bddc66b 100644
>>>> --- a/arch/arm/kvm/interrupts.S
>>>> +++ b/arch/arm/kvm/interrupts.S
>>>> @@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>>>  	bx	lr
>>>>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>>>  
>>>> +/**
>>>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
>>>> + *
>>>> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
>>>> + * parameter
>>>> + */
>>>> +
>>>> +ENTRY(__kvm_tlb_flush_vmid)
>>>> +	b	__kvm_tlb_flush_vmid_ipa
>>>> +ENDPROC(__kvm_tlb_flush_vmid)
>>>> +
>>>>  /********************************************************************
>>>>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>>>>   * domain, for all VMIDs
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 2ac9588..ef29540 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -56,6 +56,20 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>>>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>>>  }
>>>>  
>>>> +/**
>>>> + * kvm_flush_remote_tlbs() - flush all VM TLB entries
>>>> + * @kvm:       pointer to kvm structure.
>>>> + *
>>>> + * Interface to HYP function to flush all VM TLB entries without address
>>>> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
>>>> + * kvm_tlb_flush_vmid_ipa().
>>>> + */
>>>> +void kvm_flush_remote_tlbs(struct kvm *kvm)
>>>> +{
>>>> +	if (kvm)
>>>> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>>>> +}
>>>> +
>>>>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>>>>  				  int min, int max)
>>>>  {
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index fa70c6e..ba25765 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -184,7 +184,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
>>>>  	return called;
>>>>  }
>>>>  
>>>> -void kvm_flush_remote_tlbs(struct kvm *kvm)
>>>> +void __weak kvm_flush_remote_tlbs(struct kvm *kvm)
>>>>  {
>>>>  	long dirty_count = kvm->tlbs_dirty;
>>>>  
>>>> -- 
>>>> 1.7.9.5
>>>>
>>>
>>> This doesn't build or link on aarch64 :(
>>>
>>> -Christoffer
>>>
>>
>> I'll recompile and retest the dirty page logging portion on ARMv8 and
>> resolve these issues, early next week.
>>
>> In the meantime if it's ok with you, I'' move forward with the
>> rest of the patches on ARMv7 to get through critical issues.
>>
>> Would that work?
> 
> Your patches need to at least compile with other architectures, I
> suggest fixing that up front.  You also eventually need to somehow test
> (or ask maintainers to test) a branch with your patches on the
> architectures that your code messes with.  This later point can be
> addressed once we're close to a consensus on ARM.

Ok got it. So far I just checked to make sure x86 compiles since
one function is generic and declared __weak, both x86 and ARM
share it now (Xiaos comment).
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-08 12:05   ` Christoffer Dall
@ 2014-06-10  1:47     ` Mario Smarduch
  2014-06-10  9:22       ` Christoffer Dall
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-10  1:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> On Fri, Jun 06, 2014 at 10:33:41AM -0700, Mario Smarduch wrote:
>> kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
>> changed this function, this patch picks up those changes, re-tested everything
>> works. Applies cleanly with other patches.
>>
>> This patch adds support for keeping track of VM dirty pages. As dirty page log
>> is retrieved, the pages that have been written are write protected again for
>> next write and log read.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h |    3 ++
>>  arch/arm/kvm/arm.c              |    5 ---
>>  arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
>>  arch/x86/kvm/x86.c              |   86 ---------------------------------------
>>  virt/kvm/kvm_main.c             |   86 +++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 168 insertions(+), 91 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 59565f5..b760f9c 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +	struct kvm_memory_slot *slot,
>> +	gfn_t gfn_offset, unsigned long mask);
> 
> Do all other architectures implement this function?  arm64?

Besides arm, x86 but the function is not generic.
> 
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dfd63ac..f06fb21 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>  	}
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> -{
>> -	return -EINVAL;
>> -}
>> -
> 
> What about the other architectures implementing this function?

Six architectures define this function. With this patch this
function is generic in kvm_main.c used by x86.
> 
>>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>>  					struct kvm_arm_device_addr *dev_addr)
>>  {
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index e5dff85..907344c 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -874,6 +874,85 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>>  	spin_unlock(&kvm->mmu_lock);
>>  }
>>  
>> +/**
>> + * stage2_wp_mask_range() - write protect memslot pages set in mask
>> + * @pmd - pointer to page table
>> + * @start_ipa - the start range of mask
>> + * @addr - start_ipa or start range of adjusted mask if crossing PMD range
>> + * @mask - mask of dirty pages
>> + *
>> + * Walk mask and write protect the associated dirty pages in the memory region.
>> + * If mask crosses a PMD range adjust it to next page table and return.
>> + */
>> +static void stage2_wp_mask_range(pmd_t *pmd, phys_addr_t start_ipa,
>> +		phys_addr_t *addr, unsigned long *mask)
>> +{
>> +	pte_t *pte;
>> +	bool crosses_pmd;
>> +	int i = __ffs(*mask);
>> +
>> +	if (unlikely(*addr > start_ipa))
>> +		start_ipa = *addr - i * PAGE_SIZE;
> 
> huh?
> 
>> +	pte = pte_offset_kernel(pmd, start_ipa);
>> +	for (*addr = start_ipa + i * PAGE_SIZE; *mask;
>> +		i = __ffs(*mask), *addr = start_ipa + i * PAGE_SIZE) {
>> +		crosses_pmd = !!((start_ipa & PMD_MASK) ^ (*addr & PMD_MASK));
>> +		if (unlikely(crosses_pmd)) {
>> +			/* Adjust mask dirty bits relative to next page table */
>> +			*mask >>= (PTRS_PER_PTE - pte_index(start_ipa));
>> +			return;
>> +		}
>> +		if (!pte_none(pte[i]))
>> +			kvm_set_s2pte_readonly(&pte[i]);
>> +		*mask &= ~(1 << i);
> 
> This is *really* complicated, and *really* unintuitive and *really* hard
> to read!
> 
> I feel this may very likely break, and is optimizing prematurely for
> some very special case.  Can't you follow the usual scheme of traversing
> the levels one-by-one and just calculate the 'end' address based on the
> number of bits in your long, and just adjust the mask in the calling
> function each time you are about to call a lower-level function?

Agreed I'll extend wp_range functions, it probably makes no sense
to be optimizing at this phase.

> 
> In fact, I think this could be trivially implemented as an extension to
> your existing wp_range functions.  On ARM you are mostly going to
> consider 32 pages, on arm64 you are mostly going to consider 64 pages,
> just calculate that range in terms of IPAs and set that as the limit for
> calling stage2_wp_pgd_range (which should be factor'ed out into its
> function and called from kvm_mmu_wp_memory_region).
> 
> 
> 
>> +	}
>> +}
>> +
>> +/**
>> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
>> + * @kvm:        The KVM pointer
>> + * @slot:       The memory slot associated with mask
>> + * @gfn_offset: The gfn offset in memory slot
>> + * @mask:       The mask of dirty pages at offset 'gnf_offset' in this memory
> 
> s/gnf_offset/gfn_offset/
ok
> 
>> + *              slot to be write protected
>> + *
>> + * Called from dirty page logging read function to write protect bits set in
>> + * mask to record future writes to these pages in dirty page log. This function
>> + * uses simplified page table walk given  mask can spawn no more then 2 PMD
> 
> random double white space before mask
ok
> 
>> + * table range.
>> + * 'kvm->mmu_lock' must be  held to protect against concurrent modification
>> + * of page tables (2nd stage fault, mmu modifiers, ...)
>> + *
>> + */
>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +		struct kvm_memory_slot *slot,
>> +		gfn_t gfn_offset, unsigned long mask)
>> +{
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	phys_addr_t start_ipa = (slot->base_gfn + gfn_offset) << PAGE_SHIFT;
>> +	phys_addr_t end_ipa = start_ipa + BITS_PER_LONG * PAGE_SIZE;
>> +	phys_addr_t addr = start_ipa;
>> +	pgd_t *pgdp = kvm->arch.pgd, *pgd;
>> +
>> +	do {
>> +		pgd = pgdp + pgd_index(addr);
>> +		if (pgd_present(*pgd)) {
>> +			pud = pud_offset(pgd, addr);
>> +			if (!pud_none(*pud) && !pud_huge(*pud)) {
>> +				pmd = pmd_offset(pud, addr);
>> +				if (!pmd_none(*pmd) && !kvm_pmd_huge(*pmd))
>> +					stage2_wp_mask_range(pmd, start_ipa,
>> +								&addr, &mask);
>> +				else
>> +					addr += PMD_SIZE;
>> +			} else
>> +				addr += PUD_SIZE;
> 
> is this correct? what if your gfn_offset puts you at the last page of a
> PUD, don't you need pud_addr_end() instead?
> 
>> +		} else
>> +			addr += PGDIR_SIZE;
> 
> please use braces for both of the single-line else-clauses above when
> the main if-clause is multi-line (see Documentation/CodingStyle chapter
> 3, just before 3.1).
> 
>> +	} while (mask && addr < end_ipa);
> 
> this seems like a complicated loop condition.  It seems to me that
> either you clear all the bits in the mask you want to check or you check
> for an address range, why is there a need for both?
> 
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot,
>>  			  unsigned long fault_status)
> 
> [...]
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-10  1:47     ` Mario Smarduch
@ 2014-06-10  9:22       ` Christoffer Dall
  2014-06-10 18:08         ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-10  9:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 09, 2014 at 06:47:12PM -0700, Mario Smarduch wrote:
> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> > On Fri, Jun 06, 2014 at 10:33:41AM -0700, Mario Smarduch wrote:
> >> kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
> >> changed this function, this patch picks up those changes, re-tested everything
> >> works. Applies cleanly with other patches.
> >>
> >> This patch adds support for keeping track of VM dirty pages. As dirty page log
> >> is retrieved, the pages that have been written are write protected again for
> >> next write and log read.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/include/asm/kvm_host.h |    3 ++
> >>  arch/arm/kvm/arm.c              |    5 ---
> >>  arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
> >>  arch/x86/kvm/x86.c              |   86 ---------------------------------------
> >>  virt/kvm/kvm_main.c             |   86 +++++++++++++++++++++++++++++++++++++++
> >>  5 files changed, 168 insertions(+), 91 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 59565f5..b760f9c 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >>  
> >>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> >> +	struct kvm_memory_slot *slot,
> >> +	gfn_t gfn_offset, unsigned long mask);
> > 
> > Do all other architectures implement this function?  arm64?
> 
> Besides arm, x86 but the function is not generic.
> > 

you're now calling this from generic code, so all architecture must
implement it, and the prototype should proably be in
include/linux/kvm_host.h, not in the arch-specific headers.

> >>  
> >>  #endif /* __ARM_KVM_HOST_H__ */
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index dfd63ac..f06fb21 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> >>  	}
> >>  }
> >>  
> >> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> >> -{
> >> -	return -EINVAL;
> >> -}
> >> -
> > 
> > What about the other architectures implementing this function?
> 
> Six architectures define this function. With this patch this
> function is generic in kvm_main.c used by x86.

But you're not defining it as a weak symbol (and I don't suspect that
you should unless other archs do this in a *very* different way), so you
need to either remove it from the other archs, make it a weak symbol (I
hope this is not the case) or do something else.

-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-10  9:22       ` Christoffer Dall
@ 2014-06-10 18:08         ` Mario Smarduch
  2014-06-11  7:03           ` Christoffer Dall
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-10 18:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/10/2014 02:22 AM, Christoffer Dall wrote:
> On Mon, Jun 09, 2014 at 06:47:12PM -0700, Mario Smarduch wrote:
>> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
>>> On Fri, Jun 06, 2014 at 10:33:41AM -0700, Mario Smarduch wrote:
>>>> kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
>>>> changed this function, this patch picks up those changes, re-tested everything
>>>> works. Applies cleanly with other patches.
>>>>
>>>> This patch adds support for keeping track of VM dirty pages. As dirty page log
>>>> is retrieved, the pages that have been written are write protected again for
>>>> next write and log read.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_host.h |    3 ++
>>>>  arch/arm/kvm/arm.c              |    5 ---
>>>>  arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
>>>>  arch/x86/kvm/x86.c              |   86 ---------------------------------------
>>>>  virt/kvm/kvm_main.c             |   86 +++++++++++++++++++++++++++++++++++++++
>>>>  5 files changed, 168 insertions(+), 91 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>> index 59565f5..b760f9c 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>>>  
>>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>>>> +	struct kvm_memory_slot *slot,
>>>> +	gfn_t gfn_offset, unsigned long mask);
>>>
>>> Do all other architectures implement this function?  arm64?
>>
>> Besides arm, x86 but the function is not generic.
>>>
> 
> you're now calling this from generic code, so all architecture must
> implement it, and the prototype should proably be in
> include/linux/kvm_host.h, not in the arch-specific headers.
Ah ok.
> 
>>>>  
>>>>  #endif /* __ARM_KVM_HOST_H__ */
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index dfd63ac..f06fb21 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>>>  	}
>>>>  }
>>>>  
>>>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>>> -{
>>>> -	return -EINVAL;
>>>> -}
>>>> -
>>>
>>> What about the other architectures implementing this function?
>>
>> Six architectures define this function. With this patch this
>> function is generic in kvm_main.c used by x86.
> 
> But you're not defining it as a weak symbol (and I don't suspect that
> you should unless other archs do this in a *very* different way), so you
> need to either remove it from the other archs, make it a weak symbol (I
> hope this is not the case) or do something else.
Mistake on my part I just cut and paste Xiaos x86's recent upstream patch and 
didn't add weak definition.

I looked at IA64, MIPS (two of them ), S390 somewhat similar but quite 
different implementations. They use a sync version, where the dirty bitmaps 
are maintained at arch level and then copied to memslot->dirty_bitmap. There 
is only commonality between x86 and ARM right now, x86 uses
memslot->dirty_bitmap directly.

Maybe this function should go back to architecture layer, it's
unlikely it can become generic across all architectures.

There is also the issue of kvm_flush_remote_tlbs(), that's also weak,
the generic one is using IPIs. Since it's only used in mmu.c maybe make 
this one static.


> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support
  2014-06-08 12:05   ` Christoffer Dall
@ 2014-06-10 18:23     ` Mario Smarduch
  2014-06-11  6:58       ` Christoffer Dall
  0 siblings, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-10 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> On Tue, Jun 03, 2014 at 04:19:27PM -0700, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and disolves huge pages to page tables.
> 
> s/disolves/dissolves/g
Will do.
> 
>> In case migration is canceled huge pages will be used again.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/kvm/mmu.c |   36 ++++++++++++++++++++++++++++++++++--
>>  1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 1c546c9..aca4fbf 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -966,6 +966,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>  	struct vm_area_struct *vma;
>>  	pfn_t pfn;
>> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>> +	bool logging_active = !!memslot->dirty_bitmap;
> 
>>  
>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>  	if (fault_status == FSC_PERM && !write_fault) {
>> @@ -1019,10 +1021,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	spin_lock(&kvm->mmu_lock);
>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>  		goto out_unlock;
>> -	if (!hugetlb && !force_pte)
>> +
>> +	/* When logging don't spend cycles to check for huge pages */
> 
> drop the comment: either explain the entire clause (which would be too
> long) or don't explain anything.
> 
Ok.
>> +	if (!hugetlb && !force_pte && !logging_active)
> 
> instead of having all this, can't you just change 
> 
> if (is_vm_hugetlb_page(vma)) to
> if (is_vm_hugetlb_page(vma) && !logging_active)
> 
> then you're also not mucking around with the gfn etc.

I didn't want to modify this function too much, but if that's ok that 
simplifies things a lot.

> 
>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>  
>> -	if (hugetlb) {
>> +	/*
>> +	 * Force all not present/perm faults to PTE handling, address both
>> +	 * PMD and PTE faults
>> +	 */
> 
> I don't understand this comment?  In which case does this apply?
> 
The cases I see here -
- huge page permission fault is forced into page table code while logging
- pte permission/not present handled by page table code as before.
>> +	if (hugetlb && !logging_active) {
>>  		pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
>>  		new_pmd = pmd_mkhuge(new_pmd);
>>  		if (writable) {
>> @@ -1034,6 +1042,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	} else {
>>  		pte_t new_pte = pfn_pte(pfn, PAGE_S2);
>>  		if (writable) {
>> +			/*
>> +			 * If pmd is  mapping a huge page then clear it and let
>> +			 * stage2_set_pte() create a pte table. At the sametime
>> +			 * you write protect the pte (PAGE_S2 pgprot_t).
>> +			 */
>> +			if (logging_active) {
>> +				pmd_t *pmd;
>> +				if (hugetlb) {
>> +					pfn += pte_index(fault_ipa);
>> +					gfn = fault_ipa >> PAGE_SHIFT;
>> +					new_pte = pfn_pte(pfn, PAGE_S2);
>> +				}
>> +				pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
>> +				if (pmd && kvm_pmd_huge(*pmd))
>> +					clear_pmd_entry(kvm, pmd, fault_ipa);
>> +			}
> 
> now instead of all this, you just need to check for kvm_pmd_huge() in
> stage2_set_pte() and if that's true, you clear it, and then then install
> your new pte.

Yes this really simplifies things!

> 
>>  			kvm_set_s2pte_writable(&new_pte);
>>  			kvm_set_pfn_dirty(pfn);
>>  		}
>> @@ -1041,6 +1065,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>>  	}
>>  
>> +	/*
>> +	 * Log the dirty page in dirty_bitmap[], call regardless if logging is
>> +	 * disabled or enabled both cases handled safely.
>> +	 * TODO: for larger page size mark mulitple dirty page bits for each
>> +	 *       4k page.
>> +	 */
>> +	if (writable)
>> +		mark_page_dirty(kvm, gfn);
> 
> what if you just faulted in a page on a read which wasn't present
> before but it happens to belong to a writeable memslot, is that page
> then dirty? hmmm.
> 
A bug, must also check if it was a write fault not just that we're dealing with
a writable region. This one could be pretty bad on performance, not to mention
in accurate. It will be interesting to see new test results, glad you caught
that.

Thanks,
  Mario.
> 
>>  
>>  out_unlock:
>>  	spin_unlock(&kvm->mmu_lock);
>> -- 
>> 1.7.9.5
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support
  2014-06-10 18:23     ` Mario Smarduch
@ 2014-06-11  6:58       ` Christoffer Dall
  2014-06-12  2:53         ` Mario Smarduch
  0 siblings, 1 reply; 31+ messages in thread
From: Christoffer Dall @ 2014-06-11  6:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 10, 2014 at 11:23:17AM -0700, Mario Smarduch wrote:
> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> > On Tue, Jun 03, 2014 at 04:19:27PM -0700, Mario Smarduch wrote:
> >> This patch adds support for handling 2nd stage page faults during migration,
> >> it disables faulting in huge pages, and disolves huge pages to page tables.
> > 
> > s/disolves/dissolves/g
> Will do.
> > 
> >> In case migration is canceled huge pages will be used again.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/kvm/mmu.c |   36 ++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 34 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 1c546c9..aca4fbf 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -966,6 +966,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> >>  	struct vm_area_struct *vma;
> >>  	pfn_t pfn;
> >> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
> >> +	bool logging_active = !!memslot->dirty_bitmap;
> > 
> >>  
> >>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> >>  	if (fault_status == FSC_PERM && !write_fault) {
> >> @@ -1019,10 +1021,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	spin_lock(&kvm->mmu_lock);
> >>  	if (mmu_notifier_retry(kvm, mmu_seq))
> >>  		goto out_unlock;
> >> -	if (!hugetlb && !force_pte)
> >> +
> >> +	/* When logging don't spend cycles to check for huge pages */
> > 
> > drop the comment: either explain the entire clause (which would be too
> > long) or don't explain anything.
> > 
> Ok.
> >> +	if (!hugetlb && !force_pte && !logging_active)
> > 
> > instead of having all this, can't you just change 
> > 
> > if (is_vm_hugetlb_page(vma)) to
> > if (is_vm_hugetlb_page(vma) && !logging_active)
> > 
> > then you're also not mucking around with the gfn etc.
> 
> I didn't want to modify this function too much, but if that's ok that 
> simplifies things a lot.
> 

Don't worry about the changes as much as the resulting code.  If
something requires a lot of refactoring, usually that can be handled by
splitting up renames, factoring out functions, etc. into multiple
smaller patches.

> > 
> >>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> >>  
> >> -	if (hugetlb) {
> >> +	/*
> >> +	 * Force all not present/perm faults to PTE handling, address both
> >> +	 * PMD and PTE faults
> >> +	 */
> > 
> > I don't understand this comment?  In which case does this apply?
> > 
> The cases I see here -
> - huge page permission fault is forced into page table code while logging
> - pte permission/not present handled by page table code as before.

Hmm, the wording doesn't really work for me.  I don't think this comment
adds anything or is required, when getting this deep into the fault
handler etc., one better understand what's going on.

The most suitable place for a comment in this work is probably in
stage2_set_pte() where you can now detect a kvm_pmd_huge(), when you add
that, you may want to add a small comment that this only happens when
logging dirty pages.

> >> +	if (hugetlb && !logging_active) {
> >>  		pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
> >>  		new_pmd = pmd_mkhuge(new_pmd);
> >>  		if (writable) {
> >> @@ -1034,6 +1042,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	} else {
> >>  		pte_t new_pte = pfn_pte(pfn, PAGE_S2);
> >>  		if (writable) {
> >> +			/*
> >> +			 * If pmd is  mapping a huge page then clear it and let
> >> +			 * stage2_set_pte() create a pte table. At the sametime
> >> +			 * you write protect the pte (PAGE_S2 pgprot_t).
> >> +			 */
> >> +			if (logging_active) {
> >> +				pmd_t *pmd;
> >> +				if (hugetlb) {
> >> +					pfn += pte_index(fault_ipa);
> >> +					gfn = fault_ipa >> PAGE_SHIFT;
> >> +					new_pte = pfn_pte(pfn, PAGE_S2);
> >> +				}
> >> +				pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
> >> +				if (pmd && kvm_pmd_huge(*pmd))
> >> +					clear_pmd_entry(kvm, pmd, fault_ipa);
> >> +			}
> > 
> > now instead of all this, you just need to check for kvm_pmd_huge() in
> > stage2_set_pte() and if that's true, you clear it, and then then install
> > your new pte.
> 
> Yes this really simplifies things!
> 
> > 
> >>  			kvm_set_s2pte_writable(&new_pte);
> >>  			kvm_set_pfn_dirty(pfn);
> >>  		}
> >> @@ -1041,6 +1065,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
> >>  	}
> >>  
> >> +	/*
> >> +	 * Log the dirty page in dirty_bitmap[], call regardless if logging is
> >> +	 * disabled or enabled both cases handled safely.
> >> +	 * TODO: for larger page size mark mulitple dirty page bits for each
> >> +	 *       4k page.
> >> +	 */
> >> +	if (writable)
> >> +		mark_page_dirty(kvm, gfn);
> > 
> > what if you just faulted in a page on a read which wasn't present
> > before but it happens to belong to a writeable memslot, is that page
> > then dirty? hmmm.
> > 
> A bug, must also check if it was a write fault not just that we're dealing with
> a writable region. This one could be pretty bad on performance, not to mention
> in accurate. It will be interesting to see new test results, glad you caught
> that.
> 
ok, please fix.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-10 18:08         ` Mario Smarduch
@ 2014-06-11  7:03           ` Christoffer Dall
  2014-06-12  3:02             ` Mario Smarduch
  2014-06-18  1:41             ` Mario Smarduch
  0 siblings, 2 replies; 31+ messages in thread
From: Christoffer Dall @ 2014-06-11  7:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 10, 2014 at 11:08:24AM -0700, Mario Smarduch wrote:
> On 06/10/2014 02:22 AM, Christoffer Dall wrote:
> > On Mon, Jun 09, 2014 at 06:47:12PM -0700, Mario Smarduch wrote:
> >> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> >>> On Fri, Jun 06, 2014 at 10:33:41AM -0700, Mario Smarduch wrote:
> >>>> kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
> >>>> changed this function, this patch picks up those changes, re-tested everything
> >>>> works. Applies cleanly with other patches.
> >>>>
> >>>> This patch adds support for keeping track of VM dirty pages. As dirty page log
> >>>> is retrieved, the pages that have been written are write protected again for
> >>>> next write and log read.
> >>>>
> >>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >>>> ---
> >>>>  arch/arm/include/asm/kvm_host.h |    3 ++
> >>>>  arch/arm/kvm/arm.c              |    5 ---
> >>>>  arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
> >>>>  arch/x86/kvm/x86.c              |   86 ---------------------------------------
> >>>>  virt/kvm/kvm_main.c             |   86 +++++++++++++++++++++++++++++++++++++++
> >>>>  5 files changed, 168 insertions(+), 91 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>>> index 59565f5..b760f9c 100644
> >>>> --- a/arch/arm/include/asm/kvm_host.h
> >>>> +++ b/arch/arm/include/asm/kvm_host.h
> >>>> @@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >>>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >>>>  
> >>>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >>>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> >>>> +	struct kvm_memory_slot *slot,
> >>>> +	gfn_t gfn_offset, unsigned long mask);
> >>>
> >>> Do all other architectures implement this function?  arm64?
> >>
> >> Besides arm, x86 but the function is not generic.
> >>>
> > 
> > you're now calling this from generic code, so all architecture must
> > implement it, and the prototype should proably be in
> > include/linux/kvm_host.h, not in the arch-specific headers.
> Ah ok.
> > 
> >>>>  
> >>>>  #endif /* __ARM_KVM_HOST_H__ */
> >>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>>> index dfd63ac..f06fb21 100644
> >>>> --- a/arch/arm/kvm/arm.c
> >>>> +++ b/arch/arm/kvm/arm.c
> >>>> @@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> >>>> -{
> >>>> -	return -EINVAL;
> >>>> -}
> >>>> -
> >>>
> >>> What about the other architectures implementing this function?
> >>
> >> Six architectures define this function. With this patch this
> >> function is generic in kvm_main.c used by x86.
> > 
> > But you're not defining it as a weak symbol (and I don't suspect that
> > you should unless other archs do this in a *very* different way), so you
> > need to either remove it from the other archs, make it a weak symbol (I
> > hope this is not the case) or do something else.
> Mistake on my part I just cut and paste Xiaos x86's recent upstream patch and 
> didn't add weak definition.
> 
> I looked at IA64, MIPS (two of them ), S390 somewhat similar but quite 
> different implementations. They use a sync version, where the dirty bitmaps 
> are maintained at arch level and then copied to memslot->dirty_bitmap. There 
> is only commonality between x86 and ARM right now, x86 uses
> memslot->dirty_bitmap directly.
> 
> Maybe this function should go back to architecture layer, it's
> unlikely it can become generic across all architectures.
> 
> There is also the issue of kvm_flush_remote_tlbs(), that's also weak,
> the generic one is using IPIs. Since it's only used in mmu.c maybe make 
> this one static.
> 
So I don't see a lot of use of weak symbols in kvm_main.c (actually on
kvmarm/next I don't see any), but we do want to share code when more
than one architecture implements something in the exact same way, like
it seems x86 and ARM is doing here for this particular function.

I think the KVM scheme is usually to check for some define, like:

#ifdef KVM_ARCH_HAVE_GET_DIRTY_LOG
	ret = kvm_arch_get_dirty_log(...);
#else
	ret = kvm_get_dirty_log(...);
#endif

but Paolo may have a more informed oppinion of how to deal with these.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support
  2014-06-11  6:58       ` Christoffer Dall
@ 2014-06-12  2:53         ` Mario Smarduch
  0 siblings, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-06-12  2:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/10/2014 11:58 PM, Christoffer Dall wrote:
> On Tue, Jun 10, 2014 at 11:23:17AM -0700, Mario Smarduch wrote:
>> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
>>> On Tue, Jun 03, 2014 at 04:19:27PM -0700, Mario Smarduch wrote:
>>>> This patch adds support for handling 2nd stage page faults during migration,
>>>> it disables faulting in huge pages, and disolves huge pages to page tables.
>>>
>>> s/disolves/dissolves/g
>> Will do.
>>>
>>>> In case migration is canceled huge pages will be used again.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/kvm/mmu.c |   36 ++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 34 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 1c546c9..aca4fbf 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -966,6 +966,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>>>  	struct vm_area_struct *vma;
>>>>  	pfn_t pfn;
>>>> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>>>> +	bool logging_active = !!memslot->dirty_bitmap;
>>>
>>>>  
>>>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>>>  	if (fault_status == FSC_PERM && !write_fault) {
>>>> @@ -1019,10 +1021,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	spin_lock(&kvm->mmu_lock);
>>>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>>>  		goto out_unlock;
>>>> -	if (!hugetlb && !force_pte)
>>>> +
>>>> +	/* When logging don't spend cycles to check for huge pages */
>>>
>>> drop the comment: either explain the entire clause (which would be too
>>> long) or don't explain anything.
>>>
>> Ok.
>>>> +	if (!hugetlb && !force_pte && !logging_active)
>>>
>>> instead of having all this, can't you just change 
>>>
>>> if (is_vm_hugetlb_page(vma)) to
>>> if (is_vm_hugetlb_page(vma) && !logging_active)
>>>
>>> then you're also not mucking around with the gfn etc.
>>
>> I didn't want to modify this function too much, but if that's ok that 
>> simplifies things a lot.
>>
> 
> Don't worry about the changes as much as the resulting code.  If
> something requires a lot of refactoring, usually that can be handled by
> splitting up renames, factoring out functions, etc. into multiple
> smaller patches.
> 
>>>
>>>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>>>  
>>>> -	if (hugetlb) {
>>>> +	/*
>>>> +	 * Force all not present/perm faults to PTE handling, address both
>>>> +	 * PMD and PTE faults
>>>> +	 */
>>>
>>> I don't understand this comment?  In which case does this apply?
>>>
>> The cases I see here -
>> - huge page permission fault is forced into page table code while logging
>> - pte permission/not present handled by page table code as before.
> 
> Hmm, the wording doesn't really work for me.  I don't think this comment
> adds anything or is required, when getting this deep into the fault
> handler etc., one better understand what's going on.
> 
> The most suitable place for a comment in this work is probably in
> stage2_set_pte() where you can now detect a kvm_pmd_huge(), when you add
> that, you may want to add a small comment that this only happens when
> logging dirty pages.
> 
>>>> +	if (hugetlb && !logging_active) {
>>>>  		pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
>>>>  		new_pmd = pmd_mkhuge(new_pmd);
>>>>  		if (writable) {
>>>> @@ -1034,6 +1042,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	} else {
>>>>  		pte_t new_pte = pfn_pte(pfn, PAGE_S2);
>>>>  		if (writable) {
>>>> +			/*
>>>> +			 * If pmd is  mapping a huge page then clear it and let
>>>> +			 * stage2_set_pte() create a pte table. At the sametime
>>>> +			 * you write protect the pte (PAGE_S2 pgprot_t).
>>>> +			 */
>>>> +			if (logging_active) {
>>>> +				pmd_t *pmd;
>>>> +				if (hugetlb) {
>>>> +					pfn += pte_index(fault_ipa);
>>>> +					gfn = fault_ipa >> PAGE_SHIFT;
>>>> +					new_pte = pfn_pte(pfn, PAGE_S2);
>>>> +				}
>>>> +				pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
>>>> +				if (pmd && kvm_pmd_huge(*pmd))
>>>> +					clear_pmd_entry(kvm, pmd, fault_ipa);
>>>> +			}
>>>
>>> now instead of all this, you just need to check for kvm_pmd_huge() in
>>> stage2_set_pte() and if that's true, you clear it, and then then install
>>> your new pte.
>>
>> Yes this really simplifies things!
>>
>>>
>>>>  			kvm_set_s2pte_writable(&new_pte);
>>>>  			kvm_set_pfn_dirty(pfn);
>>>>  		}
>>>> @@ -1041,6 +1065,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>>>>  	}
>>>>  
>>>> +	/*
>>>> +	 * Log the dirty page in dirty_bitmap[], call regardless if logging is
>>>> +	 * disabled or enabled both cases handled safely.
>>>> +	 * TODO: for larger page size mark mulitple dirty page bits for each
>>>> +	 *       4k page.
>>>> +	 */
>>>> +	if (writable)
>>>> +		mark_page_dirty(kvm, gfn);
>>>
>>> what if you just faulted in a page on a read which wasn't present
>>> before but it happens to belong to a writeable memslot, is that page
>>> then dirty? hmmm.
>>>
>> A bug, must also check if it was a write fault not just that we're dealing with
>> a writable region. This one could be pretty bad on performance, not to mention
>> in accurate. It will be interesting to see new test results, glad you caught
>> that.
>>
> ok, please fix.
> 
> Thanks,
> -Christoffer
> 

So I'll start on next iteration, make sure it builds cleanly on all the
architecture. Follow up with Paolo on how to handle the generic
functions. I think arm64 will need some extra work.

- Mario

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-11  7:03           ` Christoffer Dall
@ 2014-06-12  3:02             ` Mario Smarduch
  2014-06-18  1:41             ` Mario Smarduch
  1 sibling, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-06-12  3:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Paolo,
   for ARM dirty page logging we have couple functions
that are generic.

- kvm_vm_ioctl_get_dirty_log - is identical to x86 version
- kvm_flush_remote_tlbs - ARM version does hardware broadcast
  it's different from the generic one in kvm_main.c

How to proceed to make these generic? Please see below
from Christoffer.

Current patch moves kvm_vm_ioctl_get_dirty_log() into kvm_main.c
and labels it and kvm_flush_remote_tlbs weak.

Please advise.

Thanks,
- Mario


> So I don't see a lot of use of weak symbols in kvm_main.c (actually on
> kvmarm/next I don't see any), but we do want to share code when more
> than one architecture implements something in the exact same way, like
> it seems x86 and ARM is doing here for this particular function.
> 
> I think the KVM scheme is usually to check for some define, like:
> 
> #ifdef KVM_ARCH_HAVE_GET_DIRTY_LOG
> 	ret = kvm_arch_get_dirty_log(...);
> #else
> 	ret = kvm_get_dirty_log(...);
> #endif
> 
> but Paolo may have a more informed oppinion of how to deal with these.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-11  7:03           ` Christoffer Dall
  2014-06-12  3:02             ` Mario Smarduch
@ 2014-06-18  1:41             ` Mario Smarduch
  2014-07-03 15:04               ` Christoffer Dall
  1 sibling, 1 reply; 31+ messages in thread
From: Mario Smarduch @ 2014-06-18  1:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/11/2014 12:03 AM, Christoffer Dall wrote:

>>
>> There is also the issue of kvm_flush_remote_tlbs(), that's also weak,
>> the generic one is using IPIs. Since it's only used in mmu.c maybe make 
>> this one static.
>>
> So I don't see a lot of use of weak symbols in kvm_main.c (actually on
> kvmarm/next I don't see any), but we do want to share code when more
> than one architecture implements something in the exact same way, like
> it seems x86 and ARM is doing here for this particular function.
> 
> I think the KVM scheme is usually to check for some define, like:
> 
> #ifdef KVM_ARCH_HAVE_GET_DIRTY_LOG
> 	ret = kvm_arch_get_dirty_log(...);
> #else
> 	ret = kvm_get_dirty_log(...);
> #endif
> 
> but Paolo may have a more informed oppinion of how to deal with these.
> 
> Thanks,
> -Christoffer
>

 
One approach I'm trying looking at the code in kvm_main().
This approach applies more to selecting features as opposed to
selecting generic vs architecture specific functions.

1.-------------------------------------------------
 - add to 'virt/kvm/Kconfig'
config HAVE_KVM_ARCH_TLB_FLUSH_ALL
       bool

config HAVE_KVM_ARCH_DIRTY_LOG
       bool
2.--------------------------------------------------
For ARM and later ARM64 add to 'arch/arm[64]/kvm/Kconfig'
config KVM
        bool "Kernel-based Virtual Machine (KVM) support"
...
select HAVE_KVM_ARCH_TLB_FLUSH_ALL
..

Not for HAVE_KVM_ARCH_DIRTY_LOG given it's shared with x86,
but would need to do it for every other architecture that
does not share it (except initially for arm64 since it
will use the variant that returns -EINVAL until feature
is supported)

3------------------------------------------------------
In kvm_main.c would have something like

void kvm_flush_remote_tlbs(struct kvm *kvm)
{
#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
        kvm_arch_flush_remote_tlbs(kvm);
#else
        long dirty_count = kvm->tlbs_dirty;

        smp_mb();
        if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
                ++kvm->stat.remote_tlb_flush;
        cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
#endif
}

Then add void kvm_flush_remote_tlbs(struct kvm *kvm) definition
to arm kvm_host.h. Define the function in this case mmu.c

For the dirty log function
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
                                                struct kvm_dirty_log *log)
{
#ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
        kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
#else
        int r;
        struct kvm_memory_slot *memslot;
        unsigned long n, i;
        unsigned long *dirty_bitmap;
        unsigned long *dirty_bitmap_buffer;
        bool is_dirty = false;
	...

But then you have to go into every architecture and define the
kvm_arch_vm_...() variant.

Is this the right way to go? Or is there a simpler way?

Thanks,
- Mario

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-06-18  1:41             ` Mario Smarduch
@ 2014-07-03 15:04               ` Christoffer Dall
  2014-07-04 16:29                 ` Paolo Bonzini
  2014-07-17 16:17                 ` Mario Smarduch
  0 siblings, 2 replies; 31+ messages in thread
From: Christoffer Dall @ 2014-07-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 17, 2014 at 06:41:52PM -0700, Mario Smarduch wrote:
> On 06/11/2014 12:03 AM, Christoffer Dall wrote:
> 
> >>
> >> There is also the issue of kvm_flush_remote_tlbs(), that's also weak,
> >> the generic one is using IPIs. Since it's only used in mmu.c maybe make 
> >> this one static.
> >>
> > So I don't see a lot of use of weak symbols in kvm_main.c (actually on
> > kvmarm/next I don't see any), but we do want to share code when more
> > than one architecture implements something in the exact same way, like
> > it seems x86 and ARM is doing here for this particular function.
> > 
> > I think the KVM scheme is usually to check for some define, like:
> > 
> > #ifdef KVM_ARCH_HAVE_GET_DIRTY_LOG
> > 	ret = kvm_arch_get_dirty_log(...);
> > #else
> > 	ret = kvm_get_dirty_log(...);
> > #endif
> > 
> > but Paolo may have a more informed oppinion of how to deal with these.
> > 
> > Thanks,
> > -Christoffer
> >
> 
>  
> One approach I'm trying looking at the code in kvm_main().
> This approach applies more to selecting features as opposed to
> selecting generic vs architecture specific functions.
> 
> 1.-------------------------------------------------
>  - add to 'virt/kvm/Kconfig'
> config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>        bool
> 
> config HAVE_KVM_ARCH_DIRTY_LOG
>        bool
> 2.--------------------------------------------------
> For ARM and later ARM64 add to 'arch/arm[64]/kvm/Kconfig'
> config KVM
>         bool "Kernel-based Virtual Machine (KVM) support"
> ...
> select HAVE_KVM_ARCH_TLB_FLUSH_ALL
> ..
> 
> Not for HAVE_KVM_ARCH_DIRTY_LOG given it's shared with x86,
> but would need to do it for every other architecture that
> does not share it (except initially for arm64 since it
> will use the variant that returns -EINVAL until feature
> is supported)
> 
> 3------------------------------------------------------
> In kvm_main.c would have something like
> 
> void kvm_flush_remote_tlbs(struct kvm *kvm)
> {
> #ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
>         kvm_arch_flush_remote_tlbs(kvm);
> #else
>         long dirty_count = kvm->tlbs_dirty;
> 
>         smp_mb();
>         if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>                 ++kvm->stat.remote_tlb_flush;
>         cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> #endif
> }
> 
> Then add void kvm_flush_remote_tlbs(struct kvm *kvm) definition
> to arm kvm_host.h. Define the function in this case mmu.c
> 
> For the dirty log function
> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>                                                 struct kvm_dirty_log *log)
> {
> #ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
>         kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
> #else
>         int r;
>         struct kvm_memory_slot *memslot;
>         unsigned long n, i;
>         unsigned long *dirty_bitmap;
>         unsigned long *dirty_bitmap_buffer;
>         bool is_dirty = false;
> 	...
> 
> But then you have to go into every architecture and define the
> kvm_arch_vm_...() variant.
> 
> Is this the right way to go? Or is there a simpler way?
> 
Hmmm, I'm really not an expert in the 'established procedures' for what
to put in config files etc., but here's my basic take:

a) you wouldn't put a config option in Kconfig unless it's comething
that's actually configurable or some generic feature/subsystem that
should only be enabled if hardware has certain capabilities or other
config options enabled.

b) this seems entirely an implementation issue and not depending on
anything users should select.

c) therefore, I think it's either a question of always having an
arch-specific implementation that you probe for its return value or you
have some sort of define in the header files for the
arch/X/include/asm/kvm_host.h to control what you need.

-Christoffer

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-07-03 15:04               ` Christoffer Dall
@ 2014-07-04 16:29                 ` Paolo Bonzini
  2014-07-17 16:00                   ` Mario Smarduch
  2014-07-17 16:17                 ` Mario Smarduch
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2014-07-04 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

Il 03/07/2014 17:04, Christoffer Dall ha scritto:
> Hmmm, I'm really not an expert in the 'established procedures' for what
> to put in config files etc., but here's my basic take:
>
> a) you wouldn't put a config option in Kconfig unless it's comething
> that's actually configurable or some generic feature/subsystem that
> should only be enabled if hardware has certain capabilities or other
> config options enabled.
>
> b) this seems entirely an implementation issue and not depending on
> anything users should select.

Actually I think Mario's idea is just fine.  Non-user-accessible Kconfig 
symbols are used a lot to invoke an #ifdef elsewhere in the code; 
compare this with his proposal is a bit different but not too much.

Sometimes #defines are used, sometimes Kconfig symbols, but the idea is 
the same.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-07-04 16:29                 ` Paolo Bonzini
@ 2014-07-17 16:00                   ` Mario Smarduch
  0 siblings, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-07-17 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/04/2014 09:29 AM, Paolo Bonzini wrote:
> Il 03/07/2014 17:04, Christoffer Dall ha scritto:
>> Hmmm, I'm really not an expert in the 'established procedures' for what
>> to put in config files etc., but here's my basic take:
>>
>> a) you wouldn't put a config option in Kconfig unless it's comething
>> that's actually configurable or some generic feature/subsystem that
>> should only be enabled if hardware has certain capabilities or other
>> config options enabled.
>>
>> b) this seems entirely an implementation issue and not depending on
>> anything users should select.
> 
> Actually I think Mario's idea is just fine.  Non-user-accessible Kconfig
> symbols are used a lot to invoke an #ifdef elsewhere in the code;
> compare this with his proposal is a bit different but not too much.
> 
> Sometimes #defines are used, sometimes Kconfig symbols, but the idea is
> the same.
> 
> Paolo

Hi Paolo,
  thanks for your feedback. I forgot to add that I tried define 
ARCH_HAVE_... approach but checkpatch rejected it and insisted
on Kconfig.

Thanks,
- Mario

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
  2014-07-03 15:04               ` Christoffer Dall
  2014-07-04 16:29                 ` Paolo Bonzini
@ 2014-07-17 16:17                 ` Mario Smarduch
  1 sibling, 0 replies; 31+ messages in thread
From: Mario Smarduch @ 2014-07-17 16:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,
   Just back from holiday - a short plan to resume work.

- move VM tlb flush and kvm log functions to generic, per Paolo's
comments use Kconfig approach
- update other architectures make sure they compile
- Keep it ARMv7 for now

Get maintainers to test the branch.

In parallel add dirty log support to ARMv8, to test I would
add a QEMU monitor function to validate general operation.

Your thoughts?

Thanks,
  Mario

On 07/03/2014 08:04 AM, Christoffer Dall wrote:
> On Tue, Jun 17, 2014 at 06:41:52PM -0700, Mario Smarduch wrote:
>> On 06/11/2014 12:03 AM, Christoffer Dall wrote:
>>
>>>>
>>>> There is also the issue of kvm_flush_remote_tlbs(), that's also weak,
>>>> the generic one is using IPIs. Since it's only used in mmu.c maybe make 
>>>> this one static.
>>>>
>>> So I don't see a lot of use of weak symbols in kvm_main.c (actually on
>>> kvmarm/next I don't see any), but we do want to share code when more
>>> than one architecture implements something in the exact same way, like
>>> it seems x86 and ARM is doing here for this particular function.
>>>
>>> I think the KVM scheme is usually to check for some define, like:
>>>
>>> #ifdef KVM_ARCH_HAVE_GET_DIRTY_LOG
>>> 	ret = kvm_arch_get_dirty_log(...);
>>> #else
>>> 	ret = kvm_get_dirty_log(...);
>>> #endif
>>>
>>> but Paolo may have a more informed oppinion of how to deal with these.
>>>
>>> Thanks,
>>> -Christoffer
>>>
>>
>>  
>> One approach I'm trying looking at the code in kvm_main().
>> This approach applies more to selecting features as opposed to
>> selecting generic vs architecture specific functions.
>>
>> 1.-------------------------------------------------
>>  - add to 'virt/kvm/Kconfig'
>> config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>        bool
>>
>> config HAVE_KVM_ARCH_DIRTY_LOG
>>        bool
>> 2.--------------------------------------------------
>> For ARM and later ARM64 add to 'arch/arm[64]/kvm/Kconfig'
>> config KVM
>>         bool "Kernel-based Virtual Machine (KVM) support"
>> ...
>> select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> ..
>>
>> Not for HAVE_KVM_ARCH_DIRTY_LOG given it's shared with x86,
>> but would need to do it for every other architecture that
>> does not share it (except initially for arm64 since it
>> will use the variant that returns -EINVAL until feature
>> is supported)
>>
>> 3------------------------------------------------------
>> In kvm_main.c would have something like
>>
>> void kvm_flush_remote_tlbs(struct kvm *kvm)
>> {
>> #ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>         kvm_arch_flush_remote_tlbs(kvm);
>> #else
>>         long dirty_count = kvm->tlbs_dirty;
>>
>>         smp_mb();
>>         if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>>                 ++kvm->stat.remote_tlb_flush;
>>         cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>> #endif
>> }
>>
>> Then add void kvm_flush_remote_tlbs(struct kvm *kvm) definition
>> to arm kvm_host.h. Define the function in this case mmu.c
>>
>> For the dirty log function
>> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>>                                                 struct kvm_dirty_log *log)
>> {
>> #ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
>>         kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
>> #else
>>         int r;
>>         struct kvm_memory_slot *memslot;
>>         unsigned long n, i;
>>         unsigned long *dirty_bitmap;
>>         unsigned long *dirty_bitmap_buffer;
>>         bool is_dirty = false;
>> 	...
>>
>> But then you have to go into every architecture and define the
>> kvm_arch_vm_...() variant.
>>
>> Is this the right way to go? Or is there a simpler way?
>>
> Hmmm, I'm really not an expert in the 'established procedures' for what
> to put in config files etc., but here's my basic take:
> 
> a) you wouldn't put a config option in Kconfig unless it's comething
> that's actually configurable or some generic feature/subsystem that
> should only be enabled if hardware has certain capabilities or other
> config options enabled.
> 
> b) this seems entirely an implementation issue and not depending on
> anything users should select.
> 
> c) therefore, I think it's either a question of always having an
> arch-specific implementation that you probe for its return value or you
> have some sort of define in the header files for the
> arch/X/include/asm/kvm_host.h to control what you need.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-07-17 16:17 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-03 23:19 [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-09 17:06     ` Mario Smarduch
2014-06-09 17:49       ` Christoffer Dall
2014-06-09 18:36         ` Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-09 17:58     ` Mario Smarduch
2014-06-09 18:09       ` Christoffer Dall
2014-06-09 18:33         ` Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-10 18:23     ` Mario Smarduch
2014-06-11  6:58       ` Christoffer Dall
2014-06-12  2:53         ` Mario Smarduch
2014-06-06 17:33 ` [RESEND PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-10  1:47     ` Mario Smarduch
2014-06-10  9:22       ` Christoffer Dall
2014-06-10 18:08         ` Mario Smarduch
2014-06-11  7:03           ` Christoffer Dall
2014-06-12  3:02             ` Mario Smarduch
2014-06-18  1:41             ` Mario Smarduch
2014-07-03 15:04               ` Christoffer Dall
2014-07-04 16:29                 ` Paolo Bonzini
2014-07-17 16:00                   ` Mario Smarduch
2014-07-17 16:17                 ` Mario Smarduch
2014-06-08 10:45 ` [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Christoffer Dall
2014-06-09 17:02   ` Mario Smarduch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).