linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/4] arm: dirty page logging support for ARMv7
@ 2014-06-03 23:19 Mario Smarduch
  2014-06-03 23:19 ` [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Mario Smarduch @ 2014-06-03 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for dirty page logging so far tested only on ARMv7.
With dirty page logging, GICv2 vGIC and arch timer save/restore support, live 
migration is supported. 

Dirty page logging support -
- initially write protects VM RAM memory regions - 2nd stage page tables
- add support to read dirty page log and again write protect the dirty pages 
  - second stage page table for next pass.
- second stage huge page are disolved into page tables to keep track of
  dirty pages at page granularity. Tracking at huge page granularity limits 
  migration to an almost idle system. There are couple approaches to handling
  huge pages:
  1 - break up huge page into page table and write protect all pte's
  2 - clear the PMD entry, create a page table install the faulted page entry
      and write protect it.

  This patch implements #2, in the future #1 may be implemented depending on
  more bench mark results.

  Option 1: may over commit and do unnecessary work, but on heavy loads appears
            to converge faster during live migration
  Option 2: Only write protects pages that are accessed, migration
	    varies, takes longer then Option 1 but eventually catches up.

- In the event migration is canceled, normal behavior is resumed huge pages
  are rebuilt over time.
- Another alternative is use of reverse mappings where for each level 2nd
  stage tables (PTE, PMD, PUD) pointers to spte's are maintained (x86 impl.).
  Primary reverse mapping benefits are for mmu notifiers for large memory range
  invalidations. Reverse mappings also improve dirty page logging, instead of
  walking page tables, spete pointers are accessed directly via reverse map
  array.
- Reverse mappings will be considered for future support once the current
  implementation is hardened.
  o validate current dirty page logging support
  o VMID TLB Flushing, migrating multiple guests
  o GIC/arch-timer migration
  o migration under various loads, primarily page reclaim and validate current
    mmu-notifiers
  o Run benchmarks (lmbench for now) and test impact on performance, and
    optimize
  o Test virtio - since it writes into guest memory. Wait until pci is supported
    on ARM.
  o Currently on ARM, KVM doesn't appear to write into Guest address space,
    need to mark those pages dirty too (???).
- Move onto ARMv8 since 2nd stage mmu is shared between both architectures. 
  But in addition to dirty page log additional support for GIC, arch timers, 
  and emulated devices is required. Also working on emulated platform masks
  a lot of potential bugs, but does help to get majority of code working.

Test Environment:
---------------------------------------------------------------------------
NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, infact 
      initially light loads were succeeding without dirty page logging support.
---------------------------------------------------------------------------
- Will put all components on github, including test setup diagram
- In short summary
  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
    storage, 1GBs Ethernet, with swap enabled
  o NFS Server runing Ubuntu 13.04 
    - both ARM boards mount shared file system 
    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
      file systems.
  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
    - Destination command syntax: can change smp to 4, machine model outdated,
      but has been tested on virt by others (need to upgrade)
	
	/mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
	/mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
	-M vexpress-a15 -cpu cortex-a15 -nographic \
	-append "root=/dev/vda rw console=ttyAMA0 rootwait" \
	-drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
	-device virtio-blk-device,drive=vm1 \
	-netdev type=tap,id=net0,ifname=tap0 \
	-device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
	-incoming tcp:0:4321

    - Source command syntax same except '-incoming'

  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
    has been tested as well.
  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
    pages periodically.
    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
    Example:
    ./dirtyram.arm 102580 812 30
    - dirty 102580 pages
    - 812 pages every 30ms with an incrementing counter 
    - run anywhere from one to as many copies as VM resources can support. If 
      the dirty rate is too high migration will run indefintely
    - run date output loop, check date is picked up smoothly
    - place guest/host into page reclaim/swap mode - by whatever means in this
      case run multiple copies of 'dirtyram.ram' on host
    - issue migrate command(s) on source
    - Top result is 409600, 8192, 5
  o QEMU is instrumented to save RAM memory regions on source and destination
    after memory is migrated, but before guest started. Later files are 
    checksummed on both ends for correctness, given VMs are small this works. 
  o Guest kernel is instrumented to capture current cycle counter - last cycle
    and compare to qemu down time to test arch timer accuracy. 
  o Network failover is at L3 due to interface limitations, ping continues
    working transparently
  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
    level instrumentation code).

Changes since v6:
- primarily reworked initial write protect, and write protect of dirty pages on
  logging request
- Only code logic change, disolve huge pages to page tables in page fault 
  handler
- Made many many changes based on Christoffers comments.

Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs without address param
  dirty page logging inital mem region write protect (w/no huge PUD
    support)
  dirty log write protect management sppport
  dirt page logging 2nd stage page fault handling support

 arch/arm/include/asm/kvm_asm.h        |    1 +
 arch/arm/include/asm/kvm_host.h       |    5 +
 arch/arm/include/asm/kvm_mmu.h        |   20 +++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |   11 +-
 arch/arm/kvm/interrupts.S             |   11 ++
 arch/arm/kvm/mmu.c                    |  243 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                    |   86 ------------
 virt/kvm/kvm_main.c                   |   83 ++++++++++-
 9 files changed, 367 insertions(+), 94 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 36+ messages in thread
* [RESEND PATCH v7 3/4] arm: dirty log write protect management support
@ 2014-06-04 21:11 Mario Smarduch
  2014-06-05  6:55 ` Xiao Guangrong
  0 siblings, 1 reply; 36+ messages in thread
From: Mario Smarduch @ 2014-06-04 21:11 UTC (permalink / raw)
  To: linux-arm-kernel

Resending patch, noticed I forgot to adjust start_ipa properly in 
stage2_wp_mask_range() and then noticed that pte's can be indexed directly. 
The patch applies cleanly after 2/4 and 4/4 applies cleanly after this patch.

This patch adds support for keeping track of VM dirty pages. As dirty page log
is retrieved, the pages that have been written are write protected again for
next write and log read.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h |    3 ++
 arch/arm/kvm/arm.c              |    5 ---
 arch/arm/kvm/mmu.c              |   79 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              |   86 ---------------------------------------
 virt/kvm/kvm_main.c             |   81 ++++++++++++++++++++++++++++++++++++
 5 files changed, 163 insertions(+), 91 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 59565f5..b760f9c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+	struct kvm_memory_slot *slot,
+	gfn_t gfn_offset, unsigned long mask);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dfd63ac..f06fb21 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -EINVAL;
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e5dff85..5ede813 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -874,6 +874,85 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	spin_unlock(&kvm->mmu_lock);
 }
 
+/**
+ * stage2_wp_mask_range() - write protect memslot pages set in mask
+ * @pmd - pointer to page table
+ * @start_ipa - the start range of mask
+ * @addr - start_ipa or start range of adjusted mask if crossing PMD range
+ * @mask - mask of dirty pages
+ *
+ * Walk mask and write protect the associated dirty pages in the memory region.
+ * If mask crosses a PMD range adjust it to next page table and return.
+ */
+static void stage2_wp_mask_range(pmd_t *pmd, phys_addr_t start_ipa,
+		phys_addr_t *addr, unsigned long *mask)
+{
+	pte_t *pte;
+	bool crosses_pmd;
+	int i = __ffs(*mask);
+
+	if (unlikely(*addr > start_ipa))
+		start_ipa = *addr - i * PAGE_SIZE;
+	pte = pte_offset_kernel(pmd, start_ipa);
+	for (*addr = start_ipa + i * PAGE_SIZE; *mask;
+		i = __ffs(*mask), *addr = start_ipa + i * PAGE_SIZE) {
+		crosses_pmd = !!((start_ipa & PMD_MASK) ^ (*addr & PMD_MASK));
+		if (unlikely(crosses_pmd)) {
+			/* Adjust mask dirty bits relative to next page table */
+			*mask >>= (PTRS_PER_PTE - pte_index(start_ipa));
+			return;
+		}
+		if (!pte_none(pte[i]))
+			kvm_set_s2pte_readonly(&pte[i]);
+		*mask &= ~(1 << i);
+	}
+}
+
+/**
+ * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
+ * @kvm:        The KVM pointer
+ * @slot:       The memory slot associated with mask
+ * @gfn_offset: The gfn offset in memory slot
+ * @mask:       The mask of dirty pages@offset 'gnf_offset' in this memory
+ *              slot to be write protected
+ *
+ * Called from dirty page logging read function to write protect bits set in
+ * mask to record future writes to these pages in dirty page log. This function
+ * uses simplified page table walk given  mask can spawn no more then 2 PMD
+ * table range.
+ * 'kvm->mmu_lock' must be  held to protect against concurrent modification
+ * of page tables (2nd stage fault, mmu modifiers, ...)
+ *
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+	phys_addr_t start_ipa = (slot->base_gfn + gfn_offset) << PAGE_SHIFT;
+	phys_addr_t end_ipa = start_ipa + BITS_PER_LONG * PAGE_SIZE;
+	phys_addr_t addr = start_ipa;
+	pgd_t *pgdp = kvm->arch.pgd, *pgd;
+
+	do {
+		pgd = pgdp + pgd_index(addr);
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, addr);
+			if (!pud_none(*pud) && !pud_huge(*pud)) {
+				pmd = pmd_offset(pud, addr);
+				if (!pmd_none(*pmd) && !kvm_pmd_huge(*pmd))
+					stage2_wp_mask_range(pmd, start_ipa,
+								&addr, &mask);
+				else
+					addr += PMD_SIZE;
+			} else
+				addr += PUD_SIZE;
+		} else
+			addr += PGDIR_SIZE;
+	} while (mask && addr < end_ipa);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5582c3..a603ca3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * We need to keep it in mind that VCPU threads can write to the bitmap
- * concurrently.  So, to avoid losing data, we keep the following order for
- * each bit:
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Flush TLB's if needed.
- *   4. Copy the snapshot to the userspace.
- *
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry.  This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	int r;
-	struct kvm_memory_slot *memslot;
-	unsigned long n, i;
-	unsigned long *dirty_bitmap;
-	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
-		goto out;
-
-	memslot = id_to_memslot(kvm->memslots, log->slot);
-
-	dirty_bitmap = memslot->dirty_bitmap;
-	r = -ENOENT;
-	if (!dirty_bitmap)
-		goto out;
-
-	n = kvm_dirty_bitmap_bytes(memslot);
-
-	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
-	memset(dirty_bitmap_buffer, 0, n);
-
-	spin_lock(&kvm->mmu_lock);
-
-	for (i = 0; i < n / sizeof(long); i++) {
-		unsigned long mask;
-		gfn_t offset;
-
-		if (!dirty_bitmap[i])
-			continue;
-
-		is_dirty = true;
-
-		mask = xchg(&dirty_bitmap[i], 0);
-		dirty_bitmap_buffer[i] = mask;
-
-		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
-	}
-
-	spin_unlock(&kvm->mmu_lock);
-
-	/* See the comments in kvm_mmu_slot_remove_write_access(). */
-	lockdep_assert_held(&kvm->slots_lock);
-
-	/*
-	 * All the TLBs can be flushed out of mmu lock, see the comments in
-	 * kvm_mmu_slot_remove_write_access().
-	 */
-	if (is_dirty)
-		kvm_flush_remote_tlbs(kvm);
-
-	r = -EFAULT;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
-		goto out;
-
-	r = 0;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
 			bool line_status)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba25765..d8d5091 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -429,6 +429,87 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 	return mmu_notifier_register(&kvm->mmu_notifier, current->mm);
 }
 
+
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+
+int __weak kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+						struct kvm_dirty_log *log)
+{
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+	if (is_dirty)
+		kvm_flush_remote_tlbs(kvm);
+
+	spin_unlock(&kvm->mmu_lock);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+
 #else  /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */
 
 static int kvm_init_mmu_notifier(struct kvm *kvm)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-07-17 16:17 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-03 23:19 [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 1/4] arm: add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-09 17:06     ` Mario Smarduch
2014-06-09 17:49       ` Christoffer Dall
2014-06-09 18:36         ` Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 2/4] arm: dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-09 17:58     ` Mario Smarduch
2014-06-09 18:09       ` Christoffer Dall
2014-06-09 18:33         ` Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
2014-06-03 23:19 ` [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-10 18:23     ` Mario Smarduch
2014-06-11  6:58       ` Christoffer Dall
2014-06-12  2:53         ` Mario Smarduch
2014-06-06 17:33 ` [RESEND PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
2014-06-08 12:05   ` Christoffer Dall
2014-06-10  1:47     ` Mario Smarduch
2014-06-10  9:22       ` Christoffer Dall
2014-06-10 18:08         ` Mario Smarduch
2014-06-11  7:03           ` Christoffer Dall
2014-06-12  3:02             ` Mario Smarduch
2014-06-18  1:41             ` Mario Smarduch
2014-07-03 15:04               ` Christoffer Dall
2014-07-04 16:29                 ` Paolo Bonzini
2014-07-17 16:00                   ` Mario Smarduch
2014-07-17 16:17                 ` Mario Smarduch
2014-06-08 10:45 ` [PATCH v7 0/4] arm: dirty page logging support for ARMv7 Christoffer Dall
2014-06-09 17:02   ` Mario Smarduch
2014-06-04 21:11 [RESEND PATCH v7 3/4] arm: dirty log write protect management support Mario Smarduch
2014-06-05  6:55 ` Xiao Guangrong
2014-06-05 19:09   ` Mario Smarduch
2014-06-06  5:52     ` Xiao Guangrong
2014-06-06 17:36       ` Mario Smarduch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).