All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/4] arm: dirty page logging support for ARMv7
@ 2014-07-25  0:56 ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall, pbonzini, gleb, agraf,
	xiantao.zhang, borntraeger, cornelia.huck
  Cc: xiaoguangrong, steve.capper, kvm, linux-arm-kernel, jays.lee,
	sungjinn.chung, Mario Smarduch

This patch adds support for dirty page logging so far tested only on ARMv7 HW,
and verified to compile on armv8, ia64, mips, ppc, s390 and compile and run on
x86_64. 

Change from previous version:
- kvm_flush_remote_tlbs() has generic and architecture specific variants.
  armv7 (later armv8) uses arch variant all other archtectures use generic 
  version. Reason being arm uses HW broadcast for TLB invalidation.
- kvm_vm_ioctl_get_dirty_log() - is generic between armv7, x86 (later ARMv8),
  other architectures use arch variant

The approach is documented 

https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010329.html
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010338.html

Compile targets
- x86_64 - defconfig also did validation, simple migration on same host.
- ia64 - ia64-linux-gcc4.6.3 - defconfig, ia64 Kconfig defines BROKEN worked 
  around that to make sure new changes don't break build. Eventually build 
  breaks when comping ioapic.c, unrelated to this patch.
- mips - mips64-linux-gcc4.6.3 - malta_kvm_defconfig
- ppc - powerpc64-linux-gcc4.6.3 - pseries_defconfig
- s390 - s390x-linux-gcc4.6.3 - defconfig

Dirty page logging support -
- initially write protects VM RAM memory regions - 2nd stage page tables
- add support to read dirty page log and again write protect the dirty pages 
  - second stage page table for next pass.
- second stage huge page are dissolved into page tables to keep track of
  dirty pages at page granularity. Tracking at huge page granularity limits
  migration to an almost idle system.
- In the event migration is canceled, normal behavior is resumed huge pages
  are rebuilt over time.
- At this time reverse mappings are not used to for write protecting of 2nd 
  stage tables.

- Future work
  - Enable diry memory logging to work on ARMv8 FastModels/Foundations Model

Test Environment:
---------------------------------------------------------------------------
NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, initially 
      light loads were succeeding without dirty page logging support.
---------------------------------------------------------------------------
- Will put all components on github, including test setup on github
- In short summary
  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
    storage, 1GBs Ethernet, with swap enabled
  o NFS Server runing Ubuntu 13.04
    - both ARM boards mount shared file system
    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
      file systems.
  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
    - Destination command syntax: can change smp to 4, machine model outdated,
      but has been tested on virt by others (need to upgrade)

        /mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
        /mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
        -M vexpress-a15 -cpu cortex-a15 -nographic \
        -append "root=/dev/vda rw console=ttyAMA0 rootwait" \
        -drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
        -device virtio-blk-device,drive=vm1 \
        -netdev type=tap,id=net0,ifname=tap0 \
        -device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
        -incoming tcp:0:4321

    - Source command syntax same except '-incoming'

  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
    has been tested as well.
  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
    pages periodically.
    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
    Example:
    ./dirtyram.arm 102580 812 30
    - dirty 102580 pages
    - 812 pages every 30ms with an incrementing counter
    - run anywhere from one to as many copies as VM resources can support. If
      the dirty rate is too high migration will run indefintely
    - run date output loop, check date is picked up smoothly
    - place guest/host into page reclaim/swap mode - by whatever means in this
      case run multiple copies of 'dirtyram.ram' on host
    - issue migrate command(s) on source
    - Top result is 409600, 8192, 5
  o QEMU is instrumented to save RAM memory regions on source and destination
    after memory is migrated, but before guest started. Later files are
    checksummed on both ends for correctness, given VMs are small this works.
  o Guest kernel is instrumented to capture current cycle counter - last cycle
    and compare to qemu down time to test arch timer accuracy.
  o Network failover is at L3 due to interface limitations, ping continues
    working transparently
  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
    level instrumentation code).
- Basic Network Test - Assuming one ethernet interface available

Source host IP 192.168.10.101/24, VM tap0 192.168.2.1/24 and
VM eth0 192.168.2.100/24 with default route 192.168.2.1

Destination host IP 192.168.10.100/24, VM same settings as above.
Both VMs have identical MAC addresses.

Initially NFS server route to 192.168.2.100 is via 192.168.10.101

- ssh 192.168.2.100
- start migration from source to destination
- after migration ends
- on NFS server switch routes.
   route add -host 192.168.2.100 gw 192.168.10.100

ssh should resume after route switch. ping as well should work
seamlessly.



Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to
    support arch flush
  ARMv7  dirty page logging inital mem region write protect (w/no huge
    PUD support)
  dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8
    ia64 mips powerpc s390 arch specific
  ARMv7 dirty page logging 2nd stage page fault handling support

 arch/arm/include/asm/kvm_asm.h        |    1 +
 arch/arm/include/asm/kvm_host.h       |    2 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/Kconfig                  |    1 +
 arch/arm/kvm/arm.c                    |   17 ++-
 arch/arm/kvm/interrupts.S             |   12 ++
 arch/arm/kvm/mmu.c                    |  198 ++++++++++++++++++++++++++++++++-
 arch/arm64/include/asm/kvm_host.h     |    2 +
 arch/arm64/kvm/Kconfig                |    1 +
 arch/ia64/include/asm/kvm_host.h      |    1 +
 arch/ia64/kvm/Kconfig                 |    1 +
 arch/ia64/kvm/kvm-ia64.c              |    2 +-
 arch/mips/include/asm/kvm_host.h      |    2 +-
 arch/mips/kvm/Kconfig                 |    1 +
 arch/mips/kvm/kvm_mips.c              |    2 +-
 arch/powerpc/include/asm/kvm_host.h   |    2 +
 arch/powerpc/kvm/Kconfig              |    1 +
 arch/powerpc/kvm/book3s.c             |    2 +-
 arch/powerpc/kvm/booke.c              |    2 +-
 arch/s390/include/asm/kvm_host.h      |    2 +
 arch/s390/kvm/Kconfig                 |    1 +
 arch/s390/kvm/kvm-s390.c              |    2 +-
 arch/x86/kvm/x86.c                    |   86 --------------
 include/linux/kvm_host.h              |    3 +
 virt/kvm/Kconfig                      |    6 +
 virt/kvm/kvm_main.c                   |   94 ++++++++++++++++
 27 files changed, 366 insertions(+), 99 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 0/4] arm: dirty page logging support for ARMv7
@ 2014-07-25  0:56 ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for dirty page logging so far tested only on ARMv7 HW,
and verified to compile on armv8, ia64, mips, ppc, s390 and compile and run on
x86_64. 

Change from previous version:
- kvm_flush_remote_tlbs() has generic and architecture specific variants.
  armv7 (later armv8) uses arch variant all other archtectures use generic 
  version. Reason being arm uses HW broadcast for TLB invalidation.
- kvm_vm_ioctl_get_dirty_log() - is generic between armv7, x86 (later ARMv8),
  other architectures use arch variant

The approach is documented 

https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010329.html
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010338.html

Compile targets
- x86_64 - defconfig also did validation, simple migration on same host.
- ia64 - ia64-linux-gcc4.6.3 - defconfig, ia64 Kconfig defines BROKEN worked 
  around that to make sure new changes don't break build. Eventually build 
  breaks when comping ioapic.c, unrelated to this patch.
- mips - mips64-linux-gcc4.6.3 - malta_kvm_defconfig
- ppc - powerpc64-linux-gcc4.6.3 - pseries_defconfig
- s390 - s390x-linux-gcc4.6.3 - defconfig

Dirty page logging support -
- initially write protects VM RAM memory regions - 2nd stage page tables
- add support to read dirty page log and again write protect the dirty pages 
  - second stage page table for next pass.
- second stage huge page are dissolved into page tables to keep track of
  dirty pages at page granularity. Tracking at huge page granularity limits
  migration to an almost idle system.
- In the event migration is canceled, normal behavior is resumed huge pages
  are rebuilt over time.
- At this time reverse mappings are not used to for write protecting of 2nd 
  stage tables.

- Future work
  - Enable diry memory logging to work on ARMv8 FastModels/Foundations Model

Test Environment:
---------------------------------------------------------------------------
NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, initially 
      light loads were succeeding without dirty page logging support.
---------------------------------------------------------------------------
- Will put all components on github, including test setup on github
- In short summary
  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
    storage, 1GBs Ethernet, with swap enabled
  o NFS Server runing Ubuntu 13.04
    - both ARM boards mount shared file system
    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
      file systems.
  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
    - Destination command syntax: can change smp to 4, machine model outdated,
      but has been tested on virt by others (need to upgrade)

        /mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
        /mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
        -M vexpress-a15 -cpu cortex-a15 -nographic \
        -append "root=/dev/vda rw console=ttyAMA0 rootwait" \
        -drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
        -device virtio-blk-device,drive=vm1 \
        -netdev type=tap,id=net0,ifname=tap0 \
        -device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
        -incoming tcp:0:4321

    - Source command syntax same except '-incoming'

  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
    has been tested as well.
  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
    pages periodically.
    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
    Example:
    ./dirtyram.arm 102580 812 30
    - dirty 102580 pages
    - 812 pages every 30ms with an incrementing counter
    - run anywhere from one to as many copies as VM resources can support. If
      the dirty rate is too high migration will run indefintely
    - run date output loop, check date is picked up smoothly
    - place guest/host into page reclaim/swap mode - by whatever means in this
      case run multiple copies of 'dirtyram.ram' on host
    - issue migrate command(s) on source
    - Top result is 409600, 8192, 5
  o QEMU is instrumented to save RAM memory regions on source and destination
    after memory is migrated, but before guest started. Later files are
    checksummed on both ends for correctness, given VMs are small this works.
  o Guest kernel is instrumented to capture current cycle counter - last cycle
    and compare to qemu down time to test arch timer accuracy.
  o Network failover is at L3 due to interface limitations, ping continues
    working transparently
  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
    level instrumentation code).
- Basic Network Test - Assuming one ethernet interface available

Source host IP 192.168.10.101/24, VM tap0 192.168.2.1/24 and
VM eth0 192.168.2.100/24 with default route 192.168.2.1

Destination host IP 192.168.10.100/24, VM same settings as above.
Both VMs have identical MAC addresses.

Initially NFS server route to 192.168.2.100 is via 192.168.10.101

- ssh 192.168.2.100
- start migration from source to destination
- after migration ends
- on NFS server switch routes.
   route add -host 192.168.2.100 gw 192.168.10.100

ssh should resume after route switch. ping as well should work
seamlessly.



Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to
    support arch flush
  ARMv7  dirty page logging inital mem region write protect (w/no huge
    PUD support)
  dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8
    ia64 mips powerpc s390 arch specific
  ARMv7 dirty page logging 2nd stage page fault handling support

 arch/arm/include/asm/kvm_asm.h        |    1 +
 arch/arm/include/asm/kvm_host.h       |    2 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/Kconfig                  |    1 +
 arch/arm/kvm/arm.c                    |   17 ++-
 arch/arm/kvm/interrupts.S             |   12 ++
 arch/arm/kvm/mmu.c                    |  198 ++++++++++++++++++++++++++++++++-
 arch/arm64/include/asm/kvm_host.h     |    2 +
 arch/arm64/kvm/Kconfig                |    1 +
 arch/ia64/include/asm/kvm_host.h      |    1 +
 arch/ia64/kvm/Kconfig                 |    1 +
 arch/ia64/kvm/kvm-ia64.c              |    2 +-
 arch/mips/include/asm/kvm_host.h      |    2 +-
 arch/mips/kvm/Kconfig                 |    1 +
 arch/mips/kvm/kvm_mips.c              |    2 +-
 arch/powerpc/include/asm/kvm_host.h   |    2 +
 arch/powerpc/kvm/Kconfig              |    1 +
 arch/powerpc/kvm/book3s.c             |    2 +-
 arch/powerpc/kvm/booke.c              |    2 +-
 arch/s390/include/asm/kvm_host.h      |    2 +
 arch/s390/kvm/Kconfig                 |    1 +
 arch/s390/kvm/kvm-s390.c              |    2 +-
 arch/x86/kvm/x86.c                    |   86 --------------
 include/linux/kvm_host.h              |    3 +
 virt/kvm/Kconfig                      |    6 +
 virt/kvm/kvm_main.c                   |   94 ++++++++++++++++
 27 files changed, 366 insertions(+), 99 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
  2014-07-25  0:56 ` Mario Smarduch
@ 2014-07-25  0:56   ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall, pbonzini, gleb, agraf,
	xiantao.zhang, borntraeger, cornelia.huck
  Cc: xiaoguangrong, steve.capper, kvm, linux-arm-kernel, jays.lee,
	sungjinn.chung, Mario Smarduch

Patch adds HYP interface for global VM TLB invalidation without address
parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_asm.h  |    1 +
 arch/arm/include/asm/kvm_host.h |    1 +
 arch/arm/kvm/Kconfig            |    1 +
 arch/arm/kvm/interrupts.S       |   12 ++++++++++++
 arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
 virt/kvm/Kconfig                |    3 +++
 virt/kvm/kvm_main.c             |    4 ++++
 7 files changed, 39 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 53b3c4a..21bc519 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..042206f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
 
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
+void kvm_arch_flush_remote_tlbs(struct kvm *);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 466bd29..44d3b6f 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
 	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select KVM_MMIO
+	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
 	select KVM_ARM_HOST
 	depends on ARM_VIRT_EXT && ARM_LPAE
 	---help---
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 0d68d40..1258d46 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid_ipa)
 
+/**
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
+ *
+ * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
+ * parameter
+ */
+
+ENTRY(__kvm_tlb_flush_vmid)
+	b	__kvm_tlb_flush_vmid_ipa
+ENDPROC(__kvm_tlb_flush_vmid)
+
+
 /********************************************************************
  * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
  * domain, for all VMIDs
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2ac9588..35254c6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+#ifdef CONFIG_ARM
+/**
+ * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
+ * @kvm:       pointer to kvm structure.
+ *
+ * Interface to HYP function to flush all VM TLB entries without address
+ * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
+ * kvm_tlb_flush_vmid_ipa().
+ */
+void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
+{
+	if (kvm)
+		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
+#endif
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
 {
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 13f2d19..f1efaa5 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
 
 config KVM_VFIO
        bool
+
+config HAVE_KVM_ARCH_TLB_FLUSH_ALL
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fa70c6e..258f3d9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
+#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
+	kvm_arch_flush_remote_tlbs(kvm);
+#else
 	long dirty_count = kvm->tlbs_dirty;
 
 	smp_mb();
 	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
 		++kvm->stat.remote_tlb_flush;
 	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
+#endif
 }
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
@ 2014-07-25  0:56   ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: linux-arm-kernel

Patch adds HYP interface for global VM TLB invalidation without address
parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_asm.h  |    1 +
 arch/arm/include/asm/kvm_host.h |    1 +
 arch/arm/kvm/Kconfig            |    1 +
 arch/arm/kvm/interrupts.S       |   12 ++++++++++++
 arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
 virt/kvm/Kconfig                |    3 +++
 virt/kvm/kvm_main.c             |    4 ++++
 7 files changed, 39 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 53b3c4a..21bc519 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 193ceaf..042206f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
 
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
+void kvm_arch_flush_remote_tlbs(struct kvm *);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 466bd29..44d3b6f 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
 	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select KVM_MMIO
+	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
 	select KVM_ARM_HOST
 	depends on ARM_VIRT_EXT && ARM_LPAE
 	---help---
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 0d68d40..1258d46 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
 	bx	lr
 ENDPROC(__kvm_tlb_flush_vmid_ipa)
 
+/**
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
+ *
+ * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
+ * parameter
+ */
+
+ENTRY(__kvm_tlb_flush_vmid)
+	b	__kvm_tlb_flush_vmid_ipa
+ENDPROC(__kvm_tlb_flush_vmid)
+
+
 /********************************************************************
  * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
  * domain, for all VMIDs
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2ac9588..35254c6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+#ifdef CONFIG_ARM
+/**
+ * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
+ * @kvm:       pointer to kvm structure.
+ *
+ * Interface to HYP function to flush all VM TLB entries without address
+ * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
+ * kvm_tlb_flush_vmid_ipa().
+ */
+void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
+{
+	if (kvm)
+		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
+#endif
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
 {
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 13f2d19..f1efaa5 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
 
 config KVM_VFIO
        bool
+
+config HAVE_KVM_ARCH_TLB_FLUSH_ALL
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fa70c6e..258f3d9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
+#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
+	kvm_arch_flush_remote_tlbs(kvm);
+#else
 	long dirty_count = kvm->tlbs_dirty;
 
 	smp_mb();
 	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
 		++kvm->stat.remote_tlb_flush;
 	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
+#endif
 }
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-07-25  0:56 ` Mario Smarduch
@ 2014-07-25  0:56   ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall, pbonzini, gleb, agraf,
	xiantao.zhang, borntraeger, cornelia.huck
  Cc: xiaoguangrong, steve.capper, kvm, linux-arm-kernel, jays.lee,
	sungjinn.chung, Mario Smarduch

Patch adds  support for initial write protection VM memlsot. This patch series
assumes that huge PUDs will not be used in 2nd stage tables.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h       |    1 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |    9 +++
 arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 042206f..6521a2d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 void kvm_arch_flush_remote_tlbs(struct kvm *);
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f..08ab5e8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
 	pmd_val(*pmd) |= L_PMD_S2_RDWR;
 }
 
+static inline void kvm_set_s2pte_readonly(pte_t *pte)
+{
+	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
+}
+
+static inline bool kvm_s2pte_readonly(pte_t *pte)
+{
+	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
+}
+
+static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
+{
+	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
+}
+
+static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
+{
+	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
+}
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 #define kvm_pgd_addr_end(addr, end)					\
 ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..d8bb40b 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -129,6 +129,7 @@
 #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
 
+#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 
 /*
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..e11c2dd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_memory_slot *old,
 				   enum kvm_mr_change change)
 {
+#ifdef CONFIG_ARM
+	/*
+	 * At this point memslot has been committed and there is an
+	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
+	 * memory slot is write protected.
+	 */
+	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
+		kvm_mmu_wp_memory_region(kvm, mem->slot);
+#endif
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 35254c6..7bfc792 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
 	return false;
 }
 
+#ifdef CONFIG_ARM
+/**
+ * stage2_wp_pte_range - write protect PTE range
+ * @pmd:	pointer to pmd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		if (!pte_none(*pte)) {
+			if (!kvm_s2pte_readonly(pte))
+				kvm_set_s2pte_readonly(pte);
+		}
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_wp_pmd_range - write protect PMD range
+ * @pud:	pointer to pud entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+{
+	pmd_t *pmd;
+	phys_addr_t next;
+
+	pmd = pmd_offset(pud, addr);
+
+	do {
+		next = kvm_pmd_addr_end(addr, end);
+		if (!pmd_none(*pmd)) {
+			if (kvm_pmd_huge(*pmd)) {
+				if (!kvm_s2pmd_readonly(pmd))
+					kvm_set_s2pmd_readonly(pmd);
+			} else
+				stage2_wp_pte_range(pmd, addr, next);
+
+		}
+	} while (pmd++, addr = next, addr != end);
+}
+
+/**
+  * stage2_wp_pud_range - write protect PUD range
+  * @kvm:	pointer to kvm structure
+  * @pud:	pointer to pgd entry
+  * @addr:	range start address
+  * @end:	range end address
+  *
+  * While walking the PUD range huge PUD pages are ignored, in the future this
+  * may need to be revisited. Determine how to handle huge PUDs when logging
+  * of dirty pages is enabled.
+  */
+static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
+				phys_addr_t addr, phys_addr_t end)
+{
+	pud_t *pud;
+	phys_addr_t next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = kvm_pud_addr_end(addr, end);
+		/* TODO: huge PUD not supported, revisit later */
+		BUG_ON(pud_huge(*pud));
+		if (!pud_none(*pud))
+			stage2_wp_pmd_range(pud, addr, next);
+	} while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_range() - write protect stage2 memory region range
+ * @kvm:	The KVM pointer
+ * @start:	Start address of range
+ * &end:	End address of range
+ */
+static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+{
+	pgd_t *pgd;
+	phys_addr_t next;
+
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	do {
+		/*
+		 * Release kvm_mmu_lock periodically if the memory region is
+		 * large features like detect hung task, lock detector or lock
+		 * dep  may panic. In addition holding the lock this long will
+		 * also starve other vCPUs. Applies to huge VM memory regions.
+		 */
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		next = kvm_pgd_addr_end(addr, end);
+		if (pgd_present(*pgd))
+			stage2_wp_pud_range(kvm, pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
+}
+
+/**
+ * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot to write protect
+ *
+ * Called to start logging dirty pages after memory region
+ * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
+ * all present PMD and PTEs are write protected in the memory region.
+ * Afterwards read of dirty page log can be called.
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+	spin_lock(&kvm->mmu_lock);
+	stage2_wp_range(kvm, start, end);
+	kvm_flush_remote_tlbs(kvm);
+	spin_unlock(&kvm->mmu_lock);
+}
+#endif
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7 dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-07-25  0:56   ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: linux-arm-kernel

Patch adds  support for initial write protection VM memlsot. This patch series
assumes that huge PUDs will not be used in 2nd stage tables.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/include/asm/kvm_host.h       |    1 +
 arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
 arch/arm/include/asm/pgtable-3level.h |    1 +
 arch/arm/kvm/arm.c                    |    9 +++
 arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 042206f..6521a2d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 void kvm_arch_flush_remote_tlbs(struct kvm *);
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f..08ab5e8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
 	pmd_val(*pmd) |= L_PMD_S2_RDWR;
 }
 
+static inline void kvm_set_s2pte_readonly(pte_t *pte)
+{
+	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
+}
+
+static inline bool kvm_s2pte_readonly(pte_t *pte)
+{
+	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
+}
+
+static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
+{
+	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
+}
+
+static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
+{
+	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
+}
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 #define kvm_pgd_addr_end(addr, end)					\
 ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 85c60ad..d8bb40b 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -129,6 +129,7 @@
 #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
 
+#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
 #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 
 /*
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..e11c2dd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_memory_slot *old,
 				   enum kvm_mr_change change)
 {
+#ifdef CONFIG_ARM
+	/*
+	 * At this point memslot has been committed and there is an
+	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
+	 * memory slot is write protected.
+	 */
+	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
+		kvm_mmu_wp_memory_region(kvm, mem->slot);
+#endif
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 35254c6..7bfc792 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
 	return false;
 }
 
+#ifdef CONFIG_ARM
+/**
+ * stage2_wp_pte_range - write protect PTE range
+ * @pmd:	pointer to pmd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		if (!pte_none(*pte)) {
+			if (!kvm_s2pte_readonly(pte))
+				kvm_set_s2pte_readonly(pte);
+		}
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_wp_pmd_range - write protect PMD range
+ * @pud:	pointer to pud entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+{
+	pmd_t *pmd;
+	phys_addr_t next;
+
+	pmd = pmd_offset(pud, addr);
+
+	do {
+		next = kvm_pmd_addr_end(addr, end);
+		if (!pmd_none(*pmd)) {
+			if (kvm_pmd_huge(*pmd)) {
+				if (!kvm_s2pmd_readonly(pmd))
+					kvm_set_s2pmd_readonly(pmd);
+			} else
+				stage2_wp_pte_range(pmd, addr, next);
+
+		}
+	} while (pmd++, addr = next, addr != end);
+}
+
+/**
+  * stage2_wp_pud_range - write protect PUD range
+  * @kvm:	pointer to kvm structure
+  * @pud:	pointer to pgd entry
+  * @addr:	range start address
+  * @end:	range end address
+  *
+  * While walking the PUD range huge PUD pages are ignored, in the future this
+  * may need to be revisited. Determine how to handle huge PUDs when logging
+  * of dirty pages is enabled.
+  */
+static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
+				phys_addr_t addr, phys_addr_t end)
+{
+	pud_t *pud;
+	phys_addr_t next;
+
+	pud = pud_offset(pgd, addr);
+	do {
+		next = kvm_pud_addr_end(addr, end);
+		/* TODO: huge PUD not supported, revisit later */
+		BUG_ON(pud_huge(*pud));
+		if (!pud_none(*pud))
+			stage2_wp_pmd_range(pud, addr, next);
+	} while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_range() - write protect stage2 memory region range
+ * @kvm:	The KVM pointer
+ * @start:	Start address of range
+ * &end:	End address of range
+ */
+static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+{
+	pgd_t *pgd;
+	phys_addr_t next;
+
+	pgd = kvm->arch.pgd + pgd_index(addr);
+	do {
+		/*
+		 * Release kvm_mmu_lock periodically if the memory region is
+		 * large features like detect hung task, lock detector or lock
+		 * dep  may panic. In addition holding the lock this long will
+		 * also starve other vCPUs. Applies to huge VM memory regions.
+		 */
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		next = kvm_pgd_addr_end(addr, end);
+		if (pgd_present(*pgd))
+			stage2_wp_pud_range(kvm, pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
+}
+
+/**
+ * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot to write protect
+ *
+ * Called to start logging dirty pages after memory region
+ * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
+ * all present PMD and PTEs are write protected in the memory region.
+ * Afterwards read of dirty page log can be called.
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+	spin_lock(&kvm->mmu_lock);
+	stage2_wp_range(kvm, start, end);
+	kvm_flush_remote_tlbs(kvm);
+	spin_unlock(&kvm->mmu_lock);
+}
+#endif
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot,
 			  unsigned long fault_status)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific
  2014-07-25  0:56 ` Mario Smarduch
@ 2014-07-25  0:56   ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall, pbonzini, gleb, agraf,
	xiantao.zhang, borntraeger, cornelia.huck
  Cc: xiaoguangrong, steve.capper, kvm, linux-arm-kernel, jays.lee,
	sungjinn.chung, Mario Smarduch

This patch adds support for keeping track of VM dirty pages. As dirty page log
is retrieved, the pages that have been written are write protected again for
next write and log read.

The dirty log read function is generic for armv7 and x86, and arch specific
for arm64, ia64, mips, powerpc, s390.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/kvm/arm.c                  |    8 +++-
 arch/arm/kvm/mmu.c                  |   22 +++++++++
 arch/arm64/include/asm/kvm_host.h   |    2 +
 arch/arm64/kvm/Kconfig              |    1 +
 arch/ia64/include/asm/kvm_host.h    |    1 +
 arch/ia64/kvm/Kconfig               |    1 +
 arch/ia64/kvm/kvm-ia64.c            |    2 +-
 arch/mips/include/asm/kvm_host.h    |    2 +-
 arch/mips/kvm/Kconfig               |    1 +
 arch/mips/kvm/kvm_mips.c            |    2 +-
 arch/powerpc/include/asm/kvm_host.h |    2 +
 arch/powerpc/kvm/Kconfig            |    1 +
 arch/powerpc/kvm/book3s.c           |    2 +-
 arch/powerpc/kvm/booke.c            |    2 +-
 arch/s390/include/asm/kvm_host.h    |    2 +
 arch/s390/kvm/Kconfig               |    1 +
 arch/s390/kvm/kvm-s390.c            |    2 +-
 arch/x86/kvm/x86.c                  |   86 ---------------------------------
 include/linux/kvm_host.h            |    3 ++
 virt/kvm/Kconfig                    |    3 ++
 virt/kvm/kvm_main.c                 |   90 +++++++++++++++++++++++++++++++++++
 21 files changed, 143 insertions(+), 93 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e11c2dd..f7739a0 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+#ifdef CONFIG_ARM64
+/*
+ * For now features not supported on ARM64, the #ifdef is added to make that
+ * clear but not needed since ARM64 Kconfig selects function in generic code.
+ */
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return -EINVAL;
 }
+#endif
 
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7bfc792..ca84331 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	kvm_flush_remote_tlbs(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
+
+/**
+ * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot associated with mask
+ * @gfn_offset:	The gfn offset in memory slot
+ * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
+ *		slot to be write protected
+ *
+ * Walks bits set in mask write protects the associated pte's. Caller must
+ * acquire kvm_mmu_lock.
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask)
+{
+	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
+	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
+	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+
+	stage2_wp_range(kvm, start, end);
+}
 #endif
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 92242ce..b4a280b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -200,4 +200,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 		     hyp_stack_ptr, vector_ptr);
 }
 
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ba85e9..9e21a8a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	select KVM_MMIO
 	select KVM_ARM_HOST
 	select KVM_ARM_VGIC
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..d79f520 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -594,6 +594,7 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu);
 #define __KVM_HAVE_ARCH_VM_ALLOC 1
 struct kvm *kvm_arch_alloc_vm(void);
 void kvm_arch_free_vm(struct kvm *kvm);
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 
 #endif /* __ASSEMBLY__*/
 
diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index 990b864..32dd6c8 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
 	depends on BROKEN
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
 	select KVM_APIC_ARCHITECTURE
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 6a4309b..3166df5 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
 	spin_unlock(&kvm->arch.dirty_log_lock);
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		struct kvm_dirty_log *log)
 {
 	int r;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 060aaa6..f7e2262 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -649,6 +649,6 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
 extern void mips32_SyncICache(unsigned long addr, unsigned long size);
 extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
-
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index 30e334e..b57f49e 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select KVM_MMIO
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	---help---
 	  Support for hosting Guest kernels.
 	  Currently supported on MIPS32 processors.
diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index da5186f..f9a1e62 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -790,7 +790,7 @@ out:
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	struct kvm_memory_slot *memslot;
 	unsigned long ga, ga_end;
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1eaea2d..fb31595 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -676,4 +676,6 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 141b202..c1fa061 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select HAVE_KVM_EVENTFD
+	select HAVE_KVM_ARCH_DIRTY_LOG
 
 config KVM_BOOK3S_HANDLER
 	bool
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 94e597e..3835936 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -781,7 +781,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
 }
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ab62109..50dd33d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1624,7 +1624,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return -ENOTSUPP;
 }
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 0d45f6f..8afbe12 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -422,6 +422,7 @@ static inline bool kvm_is_error_hva(unsigned long addr)
 }
 
 #define ASYNC_PF_PER_VCPU	64
+struct kvm;
 struct kvm_vcpu;
 struct kvm_async_pf;
 struct kvm_arch_async_pf {
@@ -441,4 +442,5 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 #endif
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 10d529a..3ba07a7 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
 	depends on HAVE_KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_EVENTFD
 	select KVM_ASYNC_PF
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b32c42c..95164e7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -207,7 +207,7 @@ static void kvm_s390_sync_dirty_log(struct kvm *kvm,
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
 			       struct kvm_dirty_log *log)
 {
 	int r;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5582c3..a603ca3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * We need to keep it in mind that VCPU threads can write to the bitmap
- * concurrently.  So, to avoid losing data, we keep the following order for
- * each bit:
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Flush TLB's if needed.
- *   4. Copy the snapshot to the userspace.
- *
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry.  This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	int r;
-	struct kvm_memory_slot *memslot;
-	unsigned long n, i;
-	unsigned long *dirty_bitmap;
-	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
-		goto out;
-
-	memslot = id_to_memslot(kvm->memslots, log->slot);
-
-	dirty_bitmap = memslot->dirty_bitmap;
-	r = -ENOENT;
-	if (!dirty_bitmap)
-		goto out;
-
-	n = kvm_dirty_bitmap_bytes(memslot);
-
-	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
-	memset(dirty_bitmap_buffer, 0, n);
-
-	spin_lock(&kvm->mmu_lock);
-
-	for (i = 0; i < n / sizeof(long); i++) {
-		unsigned long mask;
-		gfn_t offset;
-
-		if (!dirty_bitmap[i])
-			continue;
-
-		is_dirty = true;
-
-		mask = xchg(&dirty_bitmap[i], 0);
-		dirty_bitmap_buffer[i] = mask;
-
-		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
-	}
-
-	spin_unlock(&kvm->mmu_lock);
-
-	/* See the comments in kvm_mmu_slot_remove_write_access(). */
-	lockdep_assert_held(&kvm->slots_lock);
-
-	/*
-	 * All the TLBs can be flushed out of mmu lock, see the comments in
-	 * kvm_mmu_slot_remove_write_access().
-	 */
-	if (is_dirty)
-		kvm_flush_remote_tlbs(kvm);
-
-	r = -EFAULT;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
-		goto out;
-
-	r = 0;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
 			bool line_status)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 820fc2e..2f3822b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -606,6 +606,9 @@ int kvm_get_dirty_log(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				struct kvm_dirty_log *log);
 
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+	struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask);
+
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
 			bool line_status);
 long kvm_arch_vm_ioctl(struct file *filp,
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index f1efaa5..975733f 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -37,3 +37,6 @@ config KVM_VFIO
 
 config HAVE_KVM_ARCH_TLB_FLUSH_ALL
        bool
+
+config HAVE_KVM_ARCH_DIRTY_LOG
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 258f3d9..51b90ca 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -442,6 +442,96 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+#ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
+	return kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
+#else
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+
+	/* See the comments in kvm_mmu_slot_remove_write_access(). */
+	lockdep_assert_held(&kvm->slots_lock);
+
+	/*
+	 * All the TLBs can be flushed out of mmu lock, see the comments in
+	 * kvm_mmu_slot_remove_write_access().
+	 */
+	if (is_dirty)
+		kvm_flush_remote_tlbs(kvm);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+#endif
+}
+
 static void kvm_init_memslots_id(struct kvm *kvm)
 {
 	int i;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific
@ 2014-07-25  0:56   ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for keeping track of VM dirty pages. As dirty page log
is retrieved, the pages that have been written are write protected again for
next write and log read.

The dirty log read function is generic for armv7 and x86, and arch specific
for arm64, ia64, mips, powerpc, s390.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/kvm/arm.c                  |    8 +++-
 arch/arm/kvm/mmu.c                  |   22 +++++++++
 arch/arm64/include/asm/kvm_host.h   |    2 +
 arch/arm64/kvm/Kconfig              |    1 +
 arch/ia64/include/asm/kvm_host.h    |    1 +
 arch/ia64/kvm/Kconfig               |    1 +
 arch/ia64/kvm/kvm-ia64.c            |    2 +-
 arch/mips/include/asm/kvm_host.h    |    2 +-
 arch/mips/kvm/Kconfig               |    1 +
 arch/mips/kvm/kvm_mips.c            |    2 +-
 arch/powerpc/include/asm/kvm_host.h |    2 +
 arch/powerpc/kvm/Kconfig            |    1 +
 arch/powerpc/kvm/book3s.c           |    2 +-
 arch/powerpc/kvm/booke.c            |    2 +-
 arch/s390/include/asm/kvm_host.h    |    2 +
 arch/s390/kvm/Kconfig               |    1 +
 arch/s390/kvm/kvm-s390.c            |    2 +-
 arch/x86/kvm/x86.c                  |   86 ---------------------------------
 include/linux/kvm_host.h            |    3 ++
 virt/kvm/Kconfig                    |    3 ++
 virt/kvm/kvm_main.c                 |   90 +++++++++++++++++++++++++++++++++++
 21 files changed, 143 insertions(+), 93 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e11c2dd..f7739a0 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+#ifdef CONFIG_ARM64
+/*
+ * For now features not supported on ARM64, the #ifdef is added to make that
+ * clear but not needed since ARM64 Kconfig selects function in generic code.
+ */
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return -EINVAL;
 }
+#endif
 
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7bfc792..ca84331 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	kvm_flush_remote_tlbs(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
+
+/**
+ * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot associated with mask
+ * @gfn_offset:	The gfn offset in memory slot
+ * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
+ *		slot to be write protected
+ *
+ * Walks bits set in mask write protects the associated pte's. Caller must
+ * acquire kvm_mmu_lock.
+ */
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+		struct kvm_memory_slot *slot,
+		gfn_t gfn_offset, unsigned long mask)
+{
+	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
+	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
+	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+
+	stage2_wp_range(kvm, start, end);
+}
 #endif
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 92242ce..b4a280b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -200,4 +200,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 		     hyp_stack_ptr, vector_ptr);
 }
 
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ba85e9..9e21a8a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	select KVM_MMIO
 	select KVM_ARM_HOST
 	select KVM_ARM_VGIC
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index db95f57..d79f520 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -594,6 +594,7 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu);
 #define __KVM_HAVE_ARCH_VM_ALLOC 1
 struct kvm *kvm_arch_alloc_vm(void);
 void kvm_arch_free_vm(struct kvm *kvm);
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 
 #endif /* __ASSEMBLY__*/
 
diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index 990b864..32dd6c8 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
 	depends on BROKEN
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
 	select KVM_APIC_ARCHITECTURE
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 6a4309b..3166df5 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
 	spin_unlock(&kvm->arch.dirty_log_lock);
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		struct kvm_dirty_log *log)
 {
 	int r;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 060aaa6..f7e2262 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -649,6 +649,6 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
 extern void mips32_SyncICache(unsigned long addr, unsigned long size);
 extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
-
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index 30e334e..b57f49e 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select KVM_MMIO
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	---help---
 	  Support for hosting Guest kernels.
 	  Currently supported on MIPS32 processors.
diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index da5186f..f9a1e62 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -790,7 +790,7 @@ out:
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	struct kvm_memory_slot *memslot;
 	unsigned long ga, ga_end;
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1eaea2d..fb31595 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -676,4 +676,6 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 141b202..c1fa061 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 	select HAVE_KVM_EVENTFD
+	select HAVE_KVM_ARCH_DIRTY_LOG
 
 config KVM_BOOK3S_HANDLER
 	bool
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 94e597e..3835936 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -781,7 +781,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
 }
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ab62109..50dd33d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1624,7 +1624,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return -ENOTSUPP;
 }
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 0d45f6f..8afbe12 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -422,6 +422,7 @@ static inline bool kvm_is_error_hva(unsigned long addr)
 }
 
 #define ASYNC_PF_PER_VCPU	64
+struct kvm;
 struct kvm_vcpu;
 struct kvm_async_pf;
 struct kvm_arch_async_pf {
@@ -441,4 +442,5 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 #endif
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 10d529a..3ba07a7 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
 	depends on HAVE_KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
+	select HAVE_KVM_ARCH_DIRTY_LOG
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_EVENTFD
 	select KVM_ASYNC_PF
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b32c42c..95164e7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -207,7 +207,7 @@ static void kvm_s390_sync_dirty_log(struct kvm *kvm,
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
 			       struct kvm_dirty_log *log)
 {
 	int r;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5582c3..a603ca3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * We need to keep it in mind that VCPU threads can write to the bitmap
- * concurrently.  So, to avoid losing data, we keep the following order for
- * each bit:
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Flush TLB's if needed.
- *   4. Copy the snapshot to the userspace.
- *
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry.  This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	int r;
-	struct kvm_memory_slot *memslot;
-	unsigned long n, i;
-	unsigned long *dirty_bitmap;
-	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = -EINVAL;
-	if (log->slot >= KVM_USER_MEM_SLOTS)
-		goto out;
-
-	memslot = id_to_memslot(kvm->memslots, log->slot);
-
-	dirty_bitmap = memslot->dirty_bitmap;
-	r = -ENOENT;
-	if (!dirty_bitmap)
-		goto out;
-
-	n = kvm_dirty_bitmap_bytes(memslot);
-
-	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
-	memset(dirty_bitmap_buffer, 0, n);
-
-	spin_lock(&kvm->mmu_lock);
-
-	for (i = 0; i < n / sizeof(long); i++) {
-		unsigned long mask;
-		gfn_t offset;
-
-		if (!dirty_bitmap[i])
-			continue;
-
-		is_dirty = true;
-
-		mask = xchg(&dirty_bitmap[i], 0);
-		dirty_bitmap_buffer[i] = mask;
-
-		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
-	}
-
-	spin_unlock(&kvm->mmu_lock);
-
-	/* See the comments in kvm_mmu_slot_remove_write_access(). */
-	lockdep_assert_held(&kvm->slots_lock);
-
-	/*
-	 * All the TLBs can be flushed out of mmu lock, see the comments in
-	 * kvm_mmu_slot_remove_write_access().
-	 */
-	if (is_dirty)
-		kvm_flush_remote_tlbs(kvm);
-
-	r = -EFAULT;
-	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
-		goto out;
-
-	r = 0;
-out:
-	mutex_unlock(&kvm->slots_lock);
-	return r;
-}
-
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
 			bool line_status)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 820fc2e..2f3822b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -606,6 +606,9 @@ int kvm_get_dirty_log(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				struct kvm_dirty_log *log);
 
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+	struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask);
+
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
 			bool line_status);
 long kvm_arch_vm_ioctl(struct file *filp,
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index f1efaa5..975733f 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -37,3 +37,6 @@ config KVM_VFIO
 
 config HAVE_KVM_ARCH_TLB_FLUSH_ALL
        bool
+
+config HAVE_KVM_ARCH_DIRTY_LOG
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 258f3d9..51b90ca 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -442,6 +442,96 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently.  So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Flush TLB's if needed.
+ *   4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry.  This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+#ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
+	return kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
+#else
+	int r;
+	struct kvm_memory_slot *memslot;
+	unsigned long n, i;
+	unsigned long *dirty_bitmap;
+	unsigned long *dirty_bitmap_buffer;
+	bool is_dirty = false;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = -EINVAL;
+	if (log->slot >= KVM_USER_MEM_SLOTS)
+		goto out;
+
+	memslot = id_to_memslot(kvm->memslots, log->slot);
+
+	dirty_bitmap = memslot->dirty_bitmap;
+	r = -ENOENT;
+	if (!dirty_bitmap)
+		goto out;
+
+	n = kvm_dirty_bitmap_bytes(memslot);
+
+	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+	memset(dirty_bitmap_buffer, 0, n);
+
+	spin_lock(&kvm->mmu_lock);
+
+	for (i = 0; i < n / sizeof(long); i++) {
+		unsigned long mask;
+		gfn_t offset;
+
+		if (!dirty_bitmap[i])
+			continue;
+
+		is_dirty = true;
+
+		mask = xchg(&dirty_bitmap[i], 0);
+		dirty_bitmap_buffer[i] = mask;
+
+		offset = i * BITS_PER_LONG;
+		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+
+	/* See the comments in kvm_mmu_slot_remove_write_access(). */
+	lockdep_assert_held(&kvm->slots_lock);
+
+	/*
+	 * All the TLBs can be flushed out of mmu lock, see the comments in
+	 * kvm_mmu_slot_remove_write_access().
+	 */
+	if (is_dirty)
+		kvm_flush_remote_tlbs(kvm);
+
+	r = -EFAULT;
+	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+		goto out;
+
+	r = 0;
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+#endif
+}
+
 static void kvm_init_memslots_id(struct kvm *kvm)
 {
 	int i;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-07-25  0:56 ` Mario Smarduch
@ 2014-07-25  0:56   ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: kvmarm, marc.zyngier, christoffer.dall, pbonzini, gleb, agraf,
	xiantao.zhang, borntraeger, cornelia.huck
  Cc: xiaoguangrong, steve.capper, kvm, linux-arm-kernel, jays.lee,
	sungjinn.chung, Mario Smarduch

This patch adds support for handling 2nd stage page faults during migration,
it disables faulting in huge pages, and dissolves huge pages to page tables.
In case migration is canceled huge pages will be used again.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index ca84331..a17812a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 }
 
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
-			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
+			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
+			  bool logging_active)
 {
 	pmd_t *pmd;
 	pte_t *pte, old_pte;
@@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 		return 0;
 	}
 
+	/*
+	 * While dirty memory logging, clear PMD entry for huge page and split
+	 * into smaller pages, to track dirty memory at page granularity.
+	 */
+	if (logging_active && kvm_pmd_huge(*pmd)) {
+		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
+		clear_pmd_entry(kvm, pmd, ipa);
+	}
+
 	/* Create stage-2 page mappings - Level 2 */
 	if (pmd_none(*pmd)) {
 		if (!cache)
@@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
+		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
 			goto out;
@@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
 	pfn_t pfn;
+	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
+	#ifdef CONFIG_ARM
+		bool logging_active = !!memslot->dirty_bitmap;
+	#else
+		bool logging_active = false;
+	#endif
 
 	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
 	if (fault_status == FSC_PERM && !write_fault) {
@@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
-	if (is_vm_hugetlb_page(vma)) {
+	if (is_vm_hugetlb_page(vma) && !logging_active) {
 		hugetlb = true;
 		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
 	} else {
@@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	spin_lock(&kvm->mmu_lock);
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
-	if (!hugetlb && !force_pte)
+	if (!hugetlb && !force_pte && !logging_active)
 		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
 	if (hugetlb) {
@@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
-		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
+		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
+					logging_active);
 	}
 
+	if (write_fault)
+		mark_page_dirty(kvm, gfn);
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
@@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
 	pte_t *pte = (pte_t *)data;
 
-	stage2_set_pte(kvm, NULL, gpa, pte, false);
+	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
 }
 
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-07-25  0:56   ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25  0:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for handling 2nd stage page faults during migration,
it disables faulting in huge pages, and dissolves huge pages to page tables.
In case migration is canceled huge pages will be used again.

Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
---
 arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index ca84331..a17812a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 }
 
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
-			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
+			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
+			  bool logging_active)
 {
 	pmd_t *pmd;
 	pte_t *pte, old_pte;
@@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 		return 0;
 	}
 
+	/*
+	 * While dirty memory logging, clear PMD entry for huge page and split
+	 * into smaller pages, to track dirty memory at page granularity.
+	 */
+	if (logging_active && kvm_pmd_huge(*pmd)) {
+		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
+		clear_pmd_entry(kvm, pmd, ipa);
+	}
+
 	/* Create stage-2 page mappings - Level 2 */
 	if (pmd_none(*pmd)) {
 		if (!cache)
@@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
+		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
 			goto out;
@@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
 	pfn_t pfn;
+	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
+	#ifdef CONFIG_ARM
+		bool logging_active = !!memslot->dirty_bitmap;
+	#else
+		bool logging_active = false;
+	#endif
 
 	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
 	if (fault_status == FSC_PERM && !write_fault) {
@@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
-	if (is_vm_hugetlb_page(vma)) {
+	if (is_vm_hugetlb_page(vma) && !logging_active) {
 		hugetlb = true;
 		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
 	} else {
@@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	spin_lock(&kvm->mmu_lock);
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
-	if (!hugetlb && !force_pte)
+	if (!hugetlb && !force_pte && !logging_active)
 		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
 	if (hugetlb) {
@@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
-		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
+		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
+					logging_active);
 	}
 
+	if (write_fault)
+		mark_page_dirty(kvm, gfn);
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
@@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
 	pte_t *pte = (pte_t *)data;
 
-	stage2_set_pte(kvm, NULL, gpa, pte, false);
+	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
 }
 
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
  2014-07-25  0:56   ` Mario Smarduch
@ 2014-07-25  6:12     ` Alexander Graf
  -1 siblings, 0 replies; 60+ messages in thread
From: Alexander Graf @ 2014-07-25  6:12 UTC (permalink / raw)
  To: Mario Smarduch, kvmarm, marc.zyngier, christoffer.dall, pbonzini,
	gleb, xiantao.zhang, borntraeger, cornelia.huck
  Cc: xiaoguangrong, steve.capper, kvm, linux-arm-kernel, jays.lee,
	sungjinn.chung


On 25.07.14 02:56, Mario Smarduch wrote:
> Patch adds HYP interface for global VM TLB invalidation without address
> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>   arch/arm/include/asm/kvm_asm.h  |    1 +
>   arch/arm/include/asm/kvm_host.h |    1 +
>   arch/arm/kvm/Kconfig            |    1 +
>   arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>   arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>   virt/kvm/Kconfig                |    3 +++
>   virt/kvm/kvm_main.c             |    4 ++++
>   7 files changed, 39 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 53b3c4a..21bc519 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>   
>   extern void __kvm_flush_vm_context(void);
>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>   
>   extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>   #endif
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 193ceaf..042206f 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>   
>   u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>   
>   #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 466bd29..44d3b6f 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>   	select ANON_INODES
>   	select HAVE_KVM_CPU_RELAX_INTERCEPT
>   	select KVM_MMIO
> +	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>   	select KVM_ARM_HOST
>   	depends on ARM_VIRT_EXT && ARM_LPAE
>   	---help---
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 0d68d40..1258d46 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>   	bx	lr
>   ENDPROC(__kvm_tlb_flush_vmid_ipa)
>   
> +/**
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
> + *
> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
> + * parameter
> + */
> +
> +ENTRY(__kvm_tlb_flush_vmid)
> +	b	__kvm_tlb_flush_vmid_ipa
> +ENDPROC(__kvm_tlb_flush_vmid)
> +
> +
>   /********************************************************************
>    * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>    * domain, for all VMIDs
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2ac9588..35254c6 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>   		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>   }
>   
> +#ifdef CONFIG_ARM

Why the ifdef? We're in ARM code here, no?

> +/**
> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
> + * @kvm:       pointer to kvm structure.
> + *
> + * Interface to HYP function to flush all VM TLB entries without address
> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
> + * kvm_tlb_flush_vmid_ipa().
> + */
> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> +{
> +	if (kvm)
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);

I don't see why we should ever call this function with kvm==NULL.


Alex


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
@ 2014-07-25  6:12     ` Alexander Graf
  0 siblings, 0 replies; 60+ messages in thread
From: Alexander Graf @ 2014-07-25  6:12 UTC (permalink / raw)
  To: linux-arm-kernel


On 25.07.14 02:56, Mario Smarduch wrote:
> Patch adds HYP interface for global VM TLB invalidation without address
> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>   arch/arm/include/asm/kvm_asm.h  |    1 +
>   arch/arm/include/asm/kvm_host.h |    1 +
>   arch/arm/kvm/Kconfig            |    1 +
>   arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>   arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>   virt/kvm/Kconfig                |    3 +++
>   virt/kvm/kvm_main.c             |    4 ++++
>   7 files changed, 39 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 53b3c4a..21bc519 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>   
>   extern void __kvm_flush_vm_context(void);
>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>   
>   extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>   #endif
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 193ceaf..042206f 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>   
>   u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>   
>   #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 466bd29..44d3b6f 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>   	select ANON_INODES
>   	select HAVE_KVM_CPU_RELAX_INTERCEPT
>   	select KVM_MMIO
> +	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>   	select KVM_ARM_HOST
>   	depends on ARM_VIRT_EXT && ARM_LPAE
>   	---help---
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 0d68d40..1258d46 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>   	bx	lr
>   ENDPROC(__kvm_tlb_flush_vmid_ipa)
>   
> +/**
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
> + *
> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
> + * parameter
> + */
> +
> +ENTRY(__kvm_tlb_flush_vmid)
> +	b	__kvm_tlb_flush_vmid_ipa
> +ENDPROC(__kvm_tlb_flush_vmid)
> +
> +
>   /********************************************************************
>    * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>    * domain, for all VMIDs
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2ac9588..35254c6 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>   		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>   }
>   
> +#ifdef CONFIG_ARM

Why the ifdef? We're in ARM code here, no?

> +/**
> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
> + * @kvm:       pointer to kvm structure.
> + *
> + * Interface to HYP function to flush all VM TLB entries without address
> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
> + * kvm_tlb_flush_vmid_ipa().
> + */
> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> +{
> +	if (kvm)
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);

I don't see why we should ever call this function with kvm==NULL.


Alex

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-07-25  0:56   ` Mario Smarduch
@ 2014-07-25  6:16     ` Alexander Graf
  -1 siblings, 0 replies; 60+ messages in thread
From: Alexander Graf @ 2014-07-25  6:16 UTC (permalink / raw)
  To: Mario Smarduch, kvmarm, marc.zyngier, christoffer.dall, pbonzini,
	gleb, xiantao.zhang, borntraeger, cornelia.huck
  Cc: xiaoguangrong, steve.capper, kvm, linux-arm-kernel, jays.lee,
	sungjinn.chung


On 25.07.14 02:56, Mario Smarduch wrote:
> Patch adds  support for initial write protection VM memlsot. This patch series
> assumes that huge PUDs will not be used in 2nd stage tables.

Is this a valid assumption?

>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>   arch/arm/include/asm/kvm_host.h       |    1 +
>   arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>   arch/arm/include/asm/pgtable-3level.h |    1 +
>   arch/arm/kvm/arm.c                    |    9 +++
>   arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>   5 files changed, 159 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 042206f..6521a2d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>   u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>   void kvm_arch_flush_remote_tlbs(struct kvm *);
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>   
>   #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5cc0b0f..08ab5e8 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>   	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>   }
>   
> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> +{
> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> +{
> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> +}
> +
> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> +{
> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> +{
> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> +}
> +
>   /* Open coded p*d_addr_end that can deal with 64bit addresses */
>   #define kvm_pgd_addr_end(addr, end)					\
>   ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..d8bb40b 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -129,6 +129,7 @@
>   #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>   #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>   
> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>   #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>   
>   /*
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3c82b37..e11c2dd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>   				   const struct kvm_memory_slot *old,
>   				   enum kvm_mr_change change)
>   {
> +#ifdef CONFIG_ARM

Same question on CONFIG_ARM here. Is this the define used to distinguish 
between 32bit and 64bit?


Alex


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-07-25  6:16     ` Alexander Graf
  0 siblings, 0 replies; 60+ messages in thread
From: Alexander Graf @ 2014-07-25  6:16 UTC (permalink / raw)
  To: linux-arm-kernel


On 25.07.14 02:56, Mario Smarduch wrote:
> Patch adds  support for initial write protection VM memlsot. This patch series
> assumes that huge PUDs will not be used in 2nd stage tables.

Is this a valid assumption?

>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>   arch/arm/include/asm/kvm_host.h       |    1 +
>   arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>   arch/arm/include/asm/pgtable-3level.h |    1 +
>   arch/arm/kvm/arm.c                    |    9 +++
>   arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>   5 files changed, 159 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 042206f..6521a2d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>   u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>   void kvm_arch_flush_remote_tlbs(struct kvm *);
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>   
>   #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5cc0b0f..08ab5e8 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>   	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>   }
>   
> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> +{
> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> +{
> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> +}
> +
> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> +{
> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> +{
> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> +}
> +
>   /* Open coded p*d_addr_end that can deal with 64bit addresses */
>   #define kvm_pgd_addr_end(addr, end)					\
>   ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..d8bb40b 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -129,6 +129,7 @@
>   #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>   #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>   
> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>   #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>   
>   /*
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3c82b37..e11c2dd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>   				   const struct kvm_memory_slot *old,
>   				   enum kvm_mr_change change)
>   {
> +#ifdef CONFIG_ARM

Same question on CONFIG_ARM here. Is this the define used to distinguish 
between 32bit and 64bit?


Alex

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
  2014-07-25  6:12     ` Alexander Graf
@ 2014-07-25 17:37       ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25 17:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-arm-kernel, kvm, borntraeger, marc.zyngier, steve.capper,
	xiaoguangrong, gleb, christoffer.dall, jays.lee, cornelia.huck,
	pbonzini, sungjinn.chung, kvmarm, xiantao.zhang

On 07/24/2014 11:12 PM, Alexander Graf wrote:
> 
> On 25.07.14 02:56, Mario Smarduch wrote:
>> Patch adds HYP interface for global VM TLB invalidation without address
>> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush
>> function.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>   arch/arm/include/asm/kvm_asm.h  |    1 +
>>   arch/arm/include/asm/kvm_host.h |    1 +
>>   arch/arm/kvm/Kconfig            |    1 +
>>   arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>>   arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>>   virt/kvm/Kconfig                |    3 +++
>>   virt/kvm/kvm_main.c             |    4 ++++
>>   7 files changed, 39 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h
>> b/arch/arm/include/asm/kvm_asm.h
>> index 53b3c4a..21bc519 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>>     extern void __kvm_flush_vm_context(void);
>>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>     extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>   #endif
>> diff --git a/arch/arm/include/asm/kvm_host.h
>> b/arch/arm/include/asm/kvm_host.h
>> index 193ceaf..042206f 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>>     u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>>     #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 466bd29..44d3b6f 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>       select ANON_INODES
>>       select HAVE_KVM_CPU_RELAX_INTERCEPT
>>       select KVM_MMIO
>> +    select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>       select KVM_ARM_HOST
>>       depends on ARM_VIRT_EXT && ARM_LPAE
>>       ---help---
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 0d68d40..1258d46 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>       bx    lr
>>   ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>   +/**
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
>> + *
>> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
>> + * parameter
>> + */
>> +
>> +ENTRY(__kvm_tlb_flush_vmid)
>> +    b    __kvm_tlb_flush_vmid_ipa
>> +ENDPROC(__kvm_tlb_flush_vmid)
>> +
>> +
>>   /********************************************************************
>>    * Flush TLBs and instruction caches of all CPUs inside the
>> inner-shareable
>>    * domain, for all VMIDs
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 2ac9588..35254c6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm,
>> phys_addr_t ipa)
>>           kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>   }
>>   +#ifdef CONFIG_ARM
> 
> Why the ifdef? We're in ARM code here, no?

For the time being to compile ARM64.

> 
>> +/**
>> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
>> + * @kvm:       pointer to kvm structure.
>> + *
>> + * Interface to HYP function to flush all VM TLB entries without address
>> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function
>> used by
>> + * kvm_tlb_flush_vmid_ipa().
>> + */
>> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>> +{
>> +    if (kvm)
>> +        kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> 
> I don't see why we should ever call this function with kvm==NULL.

Yes that true, I copied a generic arm/arm64 mmu function. But it's
use here guarantees kvm != NULL.

> 
> 
> Alex
> 

Thanks,
  Mario

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
@ 2014-07-25 17:37       ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25 17:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/24/2014 11:12 PM, Alexander Graf wrote:
> 
> On 25.07.14 02:56, Mario Smarduch wrote:
>> Patch adds HYP interface for global VM TLB invalidation without address
>> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush
>> function.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>   arch/arm/include/asm/kvm_asm.h  |    1 +
>>   arch/arm/include/asm/kvm_host.h |    1 +
>>   arch/arm/kvm/Kconfig            |    1 +
>>   arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>>   arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>>   virt/kvm/Kconfig                |    3 +++
>>   virt/kvm/kvm_main.c             |    4 ++++
>>   7 files changed, 39 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h
>> b/arch/arm/include/asm/kvm_asm.h
>> index 53b3c4a..21bc519 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>>     extern void __kvm_flush_vm_context(void);
>>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>     extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>   #endif
>> diff --git a/arch/arm/include/asm/kvm_host.h
>> b/arch/arm/include/asm/kvm_host.h
>> index 193ceaf..042206f 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>>     u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>>     #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 466bd29..44d3b6f 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>       select ANON_INODES
>>       select HAVE_KVM_CPU_RELAX_INTERCEPT
>>       select KVM_MMIO
>> +    select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>       select KVM_ARM_HOST
>>       depends on ARM_VIRT_EXT && ARM_LPAE
>>       ---help---
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 0d68d40..1258d46 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>       bx    lr
>>   ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>   +/**
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
>> + *
>> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
>> + * parameter
>> + */
>> +
>> +ENTRY(__kvm_tlb_flush_vmid)
>> +    b    __kvm_tlb_flush_vmid_ipa
>> +ENDPROC(__kvm_tlb_flush_vmid)
>> +
>> +
>>   /********************************************************************
>>    * Flush TLBs and instruction caches of all CPUs inside the
>> inner-shareable
>>    * domain, for all VMIDs
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 2ac9588..35254c6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm,
>> phys_addr_t ipa)
>>           kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>   }
>>   +#ifdef CONFIG_ARM
> 
> Why the ifdef? We're in ARM code here, no?

For the time being to compile ARM64.

> 
>> +/**
>> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
>> + * @kvm:       pointer to kvm structure.
>> + *
>> + * Interface to HYP function to flush all VM TLB entries without address
>> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function
>> used by
>> + * kvm_tlb_flush_vmid_ipa().
>> + */
>> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>> +{
>> +    if (kvm)
>> +        kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> 
> I don't see why we should ever call this function with kvm==NULL.

Yes that true, I copied a generic arm/arm64 mmu function. But it's
use here guarantees kvm != NULL.

> 
> 
> Alex
> 

Thanks,
  Mario

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-07-25  6:16     ` Alexander Graf
@ 2014-07-25 17:45       ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25 17:45 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvmarm, marc.zyngier, christoffer.dall, pbonzini, gleb,
	xiantao.zhang, borntraeger, cornelia.huck, xiaoguangrong,
	steve.capper, kvm, linux-arm-kernel, jays.lee, sungjinn.chung

On 07/24/2014 11:16 PM, Alexander Graf wrote:
> 
> On 25.07.14 02:56, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This
>> patch series
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> Is this a valid assumption?

Right now it's unclear if PUDs will be used to back guest
memory, assuming so required quite a bit of additional code.
After discussing on mailing list it was recommended to
treat this as BUG_ON case for now.

> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>   arch/arm/include/asm/kvm_host.h       |    1 +
>>   arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>   arch/arm/include/asm/pgtable-3level.h |    1 +
>>   arch/arm/kvm/arm.c                    |    9 +++
>>   arch/arm/kvm/mmu.c                    |  128
>> +++++++++++++++++++++++++++++++++
>>   5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h
>> b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>   u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>   void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>     #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h
>> b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t
>> *pmd)
>>       pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>   }
>>   +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +    pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +    return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +    pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +    return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>   /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>   #define kvm_pgd_addr_end(addr, end)                    \
>>   ({    u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;        \
>> diff --git a/arch/arm/include/asm/pgtable-3level.h
>> b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>   #define L_PTE_S2_RDONLY            (_AT(pteval_t, 1) << 6)   /*
>> HAP[1]   */
>>   #define L_PTE_S2_RDWR            (_AT(pteval_t, 3) << 6)   /*
>> HAP[2:1] */
>>   +#define L_PMD_S2_RDONLY            (_AT(pteval_t, 1) << 6)   /*
>> HAP[1]   */
>>   #define L_PMD_S2_RDWR            (_AT(pmdval_t, 3) << 6)   /*
>> HAP[2:1] */
>>     /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>                      const struct kvm_memory_slot *old,
>>                      enum kvm_mr_change change)
>>   {
>> +#ifdef CONFIG_ARM
> 
> Same question on CONFIG_ARM here. Is this the define used to distinguish
> between 32bit and 64bit?

Yes let ARM64 compile. Eventually we'll come back to ARM64 soon, and
these will go.
> 
> 
> Alex
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-07-25 17:45       ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-07-25 17:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/24/2014 11:16 PM, Alexander Graf wrote:
> 
> On 25.07.14 02:56, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This
>> patch series
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> Is this a valid assumption?

Right now it's unclear if PUDs will be used to back guest
memory, assuming so required quite a bit of additional code.
After discussing on mailing list it was recommended to
treat this as BUG_ON case for now.

> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>   arch/arm/include/asm/kvm_host.h       |    1 +
>>   arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>   arch/arm/include/asm/pgtable-3level.h |    1 +
>>   arch/arm/kvm/arm.c                    |    9 +++
>>   arch/arm/kvm/mmu.c                    |  128
>> +++++++++++++++++++++++++++++++++
>>   5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h
>> b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>   u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>   int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>   void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>     #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h
>> b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t
>> *pmd)
>>       pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>   }
>>   +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +    pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +    return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +    pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +    return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>   /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>   #define kvm_pgd_addr_end(addr, end)                    \
>>   ({    u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;        \
>> diff --git a/arch/arm/include/asm/pgtable-3level.h
>> b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>   #define L_PTE_S2_RDONLY            (_AT(pteval_t, 1) << 6)   /*
>> HAP[1]   */
>>   #define L_PTE_S2_RDWR            (_AT(pteval_t, 3) << 6)   /*
>> HAP[2:1] */
>>   +#define L_PMD_S2_RDONLY            (_AT(pteval_t, 1) << 6)   /*
>> HAP[1]   */
>>   #define L_PMD_S2_RDWR            (_AT(pmdval_t, 3) << 6)   /*
>> HAP[2:1] */
>>     /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>                      const struct kvm_memory_slot *old,
>>                      enum kvm_mr_change change)
>>   {
>> +#ifdef CONFIG_ARM
> 
> Same question on CONFIG_ARM here. Is this the define used to distinguish
> between 32bit and 64bit?

Yes let ARM64 compile. Eventually we'll come back to ARM64 soon, and
these will go.
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs ... - looking for comments
  2014-07-25 17:37       ` Mario Smarduch
@ 2014-08-08 17:50         ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-08 17:50 UTC (permalink / raw)
  To: Alexander Graf, marc.zyngier, christoffer.dall, pbonzini, gleb,
	borntraeger, cornelia.huck, xiaoguangrong
  Cc: kvmarm, steve.capper, kvm, linux-arm-kernel

Hello,
  I'm circling back on this patch series for review comments. Primarily
patches 1 and 3 (numbered below). 

To summarize this patch series adds dirty page logging for ARMv7, the dirty 
page log read function has been moved to generic layer common between 
x86, armv7, as result the function has generic and  architecture variants.

This led to split of kvm tlb flushing, patch 1 adds architecture variant
for ARM to support its hardware broadcast TLB invalidation. 

In both cases same approach is used - changes to arch kvm Kconfigs and
generic kvm Kconfig.

Would appreciate some feedback yes/no/maybe if?, I'm currently working on 
arm64, and review of this version will impact next.

1- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010557.html
2- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010559.html
3- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010558.html
4- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010560.html

- Mario

>This patch adds support for dirty page logging so far tested only on ARMv7 HW,
>and verified to compile on armv8, ia64, mips, ppc, s390 and compile and run on
>x86_64. 
>
>Change from previous version:
>- kvm_flush_remote_tlbs() has generic and architecture specific variants.
>  armv7 (later armv8) uses arch variant all other archtectures use generic 
>  version. Reason being arm uses HW broadcast for TLB invalidation.
>- kvm_vm_ioctl_get_dirty_log() - is generic between armv7, x86 (later ARMv8),
>  other architectures use arch variant
>
>The approach is documented 
>
>https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010329.html
>https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010338.html
>
>Compile targets
>- x86_64 - defconfig also did validation, simple migration on same host.
>- ia64 - ia64-linux-gcc4.6.3 - defconfig, ia64 Kconfig defines BROKEN worked 
>  around that to make sure new changes don't break build. Eventually build 
>  breaks when comping ioapic.c, unrelated to this patch.
>- mips - mips64-linux-gcc4.6.3 - malta_kvm_defconfig
>- ppc - powerpc64-linux-gcc4.6.3 - pseries_defconfig
>- s390 - s390x-linux-gcc4.6.3 - defconfig
>
>Dirty page logging support -
>- initially write protects VM RAM memory regions - 2nd stage page tables
>- add support to read dirty page log and again write protect the dirty pages 
>  - second stage page table for next pass.
>- second stage huge page are dissolved into page tables to keep track of
>  dirty pages at page granularity. Tracking at huge page granularity limits
>  migration to an almost idle system.
>- In the event migration is canceled, normal behavior is resumed huge pages
>  are rebuilt over time.
>- At this time reverse mappings are not used to for write protecting of 2nd 
>  stage tables.
>
>- Future work
>  - Enable diry memory logging to work on ARMv8 FastModels/Foundations Model
>
>Test Environment:
>---------------------------------------------------------------------------
>NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, initially 
>      light loads were succeeding without dirty page logging support.
>---------------------------------------------------------------------------
>- Will put all components on github, including test setup on github
>- In short summary
>  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
>    storage, 1GBs Ethernet, with swap enabled
>  o NFS Server runing Ubuntu 13.04
>    - both ARM boards mount shared file system
>    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
>      file systems.
>  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
>  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
>    - Destination command syntax: can change smp to 4, machine model outdated,
>      but has been tested on virt by others (need to upgrade)
>
>        /mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
>        /mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
>        -M vexpress-a15 -cpu cortex-a15 -nographic \
>        -append "root=/dev/vda rw console=ttyAMA0 rootwait" \
>        -drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
>        -device virtio-blk-device,drive=vm1 \
>        -netdev type=tap,id=net0,ifname=tap0 \
>        -device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
>        -incoming tcp:0:4321
>
>    - Source command syntax same except '-incoming'
>
>  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
>    has been tested as well.
>  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
>    pages periodically.
>    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
>    Example:
>    ./dirtyram.arm 102580 812 30
>    - dirty 102580 pages
>    - 812 pages every 30ms with an incrementing counter
>    - run anywhere from one to as many copies as VM resources can support. If
>      the dirty rate is too high migration will run indefintely
>    - run date output loop, check date is picked up smoothly
>    - place guest/host into page reclaim/swap mode - by whatever means in this
>      case run multiple copies of 'dirtyram.ram' on host
>    - issue migrate command(s) on source
>    - Top result is 409600, 8192, 5
>  o QEMU is instrumented to save RAM memory regions on source and destination
>    after memory is migrated, but before guest started. Later files are
>    checksummed on both ends for correctness, given VMs are small this works.
>  o Guest kernel is instrumented to capture current cycle counter - last cycle
>    and compare to qemu down time to test arch timer accuracy.
>  o Network failover is at L3 due to interface limitations, ping continues
>    working transparently
>  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
>    level instrumentation code).
>- Basic Network Test - Assuming one ethernet interface available
>
>Source host IP 192.168.10.101/24, VM tap0 192.168.2.1/24 and
>VM eth0 192.168.2.100/24 with default route 192.168.2.1
>
>Destination host IP 192.168.10.100/24, VM same settings as above.
>Both VMs have identical MAC addresses.
>
>Initially NFS server route to 192.168.2.100 is via 192.168.10.101
>
>- ssh 192.168.2.100
>- start migration from source to destination
>- after migration ends
>- on NFS server switch routes.
>   route add -host 192.168.2.100 gw 192.168.10.100
>
>ssh should resume after route switch. ping as well should work
>seamlessly.
>
>
>
>Mario Smarduch (4):
>  add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to
>    support arch flush
>  ARMv7  dirty page logging inital mem region write protect (w/no huge
>    PUD support)
>  dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8
>    ia64 mips powerpc s390 arch specific
>  ARMv7 dirty page logging 2nd stage page fault handling support
>
> arch/arm/include/asm/kvm_asm.h        |    1 +
> arch/arm/include/asm/kvm_host.h       |    2 +
> arch/arm/include/asm/kvm_mmu.h        |   20 ++++
> arch/arm/include/asm/pgtable-3level.h |    1 +
> arch/arm/kvm/Kconfig                  |    1 +
> arch/arm/kvm/arm.c                    |   17 ++-
> arch/arm/kvm/interrupts.S             |   12 ++
> arch/arm/kvm/mmu.c                    |  198 ++++++++++++++++++++++++++++++++-
> arch/arm64/include/asm/kvm_host.h     |    2 +
> arch/arm64/kvm/Kconfig                |    1 +
> arch/ia64/include/asm/kvm_host.h      |    1 +
> arch/ia64/kvm/Kconfig                 |    1 +
> arch/ia64/kvm/kvm-ia64.c              |    2 +-
> arch/mips/include/asm/kvm_host.h      |    2 +-
> arch/mips/kvm/Kconfig                 |    1 +
> arch/mips/kvm/kvm_mips.c              |    2 +-
> arch/powerpc/include/asm/kvm_host.h   |    2 +
> arch/powerpc/kvm/Kconfig              |    1 +
> arch/powerpc/kvm/book3s.c             |    2 +-
> arch/powerpc/kvm/booke.c              |    2 +-
> arch/s390/include/asm/kvm_host.h      |    2 +
> arch/s390/kvm/Kconfig                 |    1 +
> arch/s390/kvm/kvm-s390.c              |    2 +-
> arch/x86/kvm/x86.c                    |   86 --------------
> include/linux/kvm_host.h              |    3 +
> virt/kvm/Kconfig                      |    6 +
> virt/kvm/kvm_main.c                   |   94 ++++++++++++++++
> 27 files changed, 366 insertions(+), 99 deletions(-)
>
>-- 
>1.7.9.5

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs ... - looking for comments
@ 2014-08-08 17:50         ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-08 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,
  I'm circling back on this patch series for review comments. Primarily
patches 1 and 3 (numbered below). 

To summarize this patch series adds dirty page logging for ARMv7, the dirty 
page log read function has been moved to generic layer common between 
x86, armv7, as result the function has generic and  architecture variants.

This led to split of kvm tlb flushing, patch 1 adds architecture variant
for ARM to support its hardware broadcast TLB invalidation. 

In both cases same approach is used - changes to arch kvm Kconfigs and
generic kvm Kconfig.

Would appreciate some feedback yes/no/maybe if?, I'm currently working on 
arm64, and review of this version will impact next.

1- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010557.html
2- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010559.html
3- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010558.html
4- https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010560.html

- Mario

>This patch adds support for dirty page logging so far tested only on ARMv7 HW,
>and verified to compile on armv8, ia64, mips, ppc, s390 and compile and run on
>x86_64. 
>
>Change from previous version:
>- kvm_flush_remote_tlbs() has generic and architecture specific variants.
>  armv7 (later armv8) uses arch variant all other archtectures use generic 
>  version. Reason being arm uses HW broadcast for TLB invalidation.
>- kvm_vm_ioctl_get_dirty_log() - is generic between armv7, x86 (later ARMv8),
>  other architectures use arch variant
>
>The approach is documented 
>
>https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010329.html
>https://lists.cs.columbia.edu/pipermail/kvmarm/2014-July/010338.html
>
>Compile targets
>- x86_64 - defconfig also did validation, simple migration on same host.
>- ia64 - ia64-linux-gcc4.6.3 - defconfig, ia64 Kconfig defines BROKEN worked 
>  around that to make sure new changes don't break build. Eventually build 
>  breaks when comping ioapic.c, unrelated to this patch.
>- mips - mips64-linux-gcc4.6.3 - malta_kvm_defconfig
>- ppc - powerpc64-linux-gcc4.6.3 - pseries_defconfig
>- s390 - s390x-linux-gcc4.6.3 - defconfig
>
>Dirty page logging support -
>- initially write protects VM RAM memory regions - 2nd stage page tables
>- add support to read dirty page log and again write protect the dirty pages 
>  - second stage page table for next pass.
>- second stage huge page are dissolved into page tables to keep track of
>  dirty pages at page granularity. Tracking at huge page granularity limits
>  migration to an almost idle system.
>- In the event migration is canceled, normal behavior is resumed huge pages
>  are rebuilt over time.
>- At this time reverse mappings are not used to for write protecting of 2nd 
>  stage tables.
>
>- Future work
>  - Enable diry memory logging to work on ARMv8 FastModels/Foundations Model
>
>Test Environment:
>---------------------------------------------------------------------------
>NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, initially 
>      light loads were succeeding without dirty page logging support.
>---------------------------------------------------------------------------
>- Will put all components on github, including test setup on github
>- In short summary
>  o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
>    storage, 1GBs Ethernet, with swap enabled
>  o NFS Server runing Ubuntu 13.04
>    - both ARM boards mount shared file system
>    - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
>      file systems.
>  o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
>  o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
>    - Destination command syntax: can change smp to 4, machine model outdated,
>      but has been tested on virt by others (need to upgrade)
>
>        /mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
>        /mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
>        -M vexpress-a15 -cpu cortex-a15 -nographic \
>        -append "root=/dev/vda rw console=ttyAMA0 rootwait" \
>        -drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
>        -device virtio-blk-device,drive=vm1 \
>        -netdev type=tap,id=net0,ifname=tap0 \
>        -device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
>        -incoming tcp:0:4321
>
>    - Source command syntax same except '-incoming'
>
>  o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
>    has been tested as well.
>  o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
>    pages periodically.
>    ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
>    Example:
>    ./dirtyram.arm 102580 812 30
>    - dirty 102580 pages
>    - 812 pages every 30ms with an incrementing counter
>    - run anywhere from one to as many copies as VM resources can support. If
>      the dirty rate is too high migration will run indefintely
>    - run date output loop, check date is picked up smoothly
>    - place guest/host into page reclaim/swap mode - by whatever means in this
>      case run multiple copies of 'dirtyram.ram' on host
>    - issue migrate command(s) on source
>    - Top result is 409600, 8192, 5
>  o QEMU is instrumented to save RAM memory regions on source and destination
>    after memory is migrated, but before guest started. Later files are
>    checksummed on both ends for correctness, given VMs are small this works.
>  o Guest kernel is instrumented to capture current cycle counter - last cycle
>    and compare to qemu down time to test arch timer accuracy.
>  o Network failover is at L3 due to interface limitations, ping continues
>    working transparently
>  o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
>    level instrumentation code).
>- Basic Network Test - Assuming one ethernet interface available
>
>Source host IP 192.168.10.101/24, VM tap0 192.168.2.1/24 and
>VM eth0 192.168.2.100/24 with default route 192.168.2.1
>
>Destination host IP 192.168.10.100/24, VM same settings as above.
>Both VMs have identical MAC addresses.
>
>Initially NFS server route to 192.168.2.100 is via 192.168.10.101
>
>- ssh 192.168.2.100
>- start migration from source to destination
>- after migration ends
>- on NFS server switch routes.
>   route add -host 192.168.2.100 gw 192.168.10.100
>
>ssh should resume after route switch. ping as well should work
>seamlessly.
>
>
>
>Mario Smarduch (4):
>  add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to
>    support arch flush
>  ARMv7  dirty page logging inital mem region write protect (w/no huge
>    PUD support)
>  dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8
>    ia64 mips powerpc s390 arch specific
>  ARMv7 dirty page logging 2nd stage page fault handling support
>
> arch/arm/include/asm/kvm_asm.h        |    1 +
> arch/arm/include/asm/kvm_host.h       |    2 +
> arch/arm/include/asm/kvm_mmu.h        |   20 ++++
> arch/arm/include/asm/pgtable-3level.h |    1 +
> arch/arm/kvm/Kconfig                  |    1 +
> arch/arm/kvm/arm.c                    |   17 ++-
> arch/arm/kvm/interrupts.S             |   12 ++
> arch/arm/kvm/mmu.c                    |  198 ++++++++++++++++++++++++++++++++-
> arch/arm64/include/asm/kvm_host.h     |    2 +
> arch/arm64/kvm/Kconfig                |    1 +
> arch/ia64/include/asm/kvm_host.h      |    1 +
> arch/ia64/kvm/Kconfig                 |    1 +
> arch/ia64/kvm/kvm-ia64.c              |    2 +-
> arch/mips/include/asm/kvm_host.h      |    2 +-
> arch/mips/kvm/Kconfig                 |    1 +
> arch/mips/kvm/kvm_mips.c              |    2 +-
> arch/powerpc/include/asm/kvm_host.h   |    2 +
> arch/powerpc/kvm/Kconfig              |    1 +
> arch/powerpc/kvm/book3s.c             |    2 +-
> arch/powerpc/kvm/booke.c              |    2 +-
> arch/s390/include/asm/kvm_host.h      |    2 +
> arch/s390/kvm/Kconfig                 |    1 +
> arch/s390/kvm/kvm-s390.c              |    2 +-
> arch/x86/kvm/x86.c                    |   86 --------------
> include/linux/kvm_host.h              |    3 +
> virt/kvm/Kconfig                      |    6 +
> virt/kvm/kvm_main.c                   |   94 ++++++++++++++++
> 27 files changed, 366 insertions(+), 99 deletions(-)
>
>-- 
>1.7.9.5

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
  2014-07-25  0:56   ` Mario Smarduch
@ 2014-08-11 19:12     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:12 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Thu, Jul 24, 2014 at 05:56:05PM -0700, Mario Smarduch wrote:
> Patch adds HYP interface for global VM TLB invalidation without address
> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_asm.h  |    1 +
>  arch/arm/include/asm/kvm_host.h |    1 +
>  arch/arm/kvm/Kconfig            |    1 +
>  arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>  arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>  virt/kvm/Kconfig                |    3 +++
>  virt/kvm/kvm_main.c             |    4 ++++
>  7 files changed, 39 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 53b3c4a..21bc519 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>  
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  #endif
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 193ceaf..042206f 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>  
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 466bd29..44d3b6f 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>  	select ANON_INODES
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>  	select KVM_MMIO
> +	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>  	select KVM_ARM_HOST
>  	depends on ARM_VIRT_EXT && ARM_LPAE
>  	---help---
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 0d68d40..1258d46 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>  	bx	lr
>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>  
> +/**
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
> + *
> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
> + * parameter
> + */
> +
> +ENTRY(__kvm_tlb_flush_vmid)
> +	b	__kvm_tlb_flush_vmid_ipa
> +ENDPROC(__kvm_tlb_flush_vmid)
> +
> +
>  /********************************************************************
>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>   * domain, for all VMIDs
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2ac9588..35254c6 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>  }
>  
> +#ifdef CONFIG_ARM

I assume this is here because of arm vs. arm64, use static inlines in
the header files to differentiate instead.

> +/**
> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
> + * @kvm:       pointer to kvm structure.
> + *
> + * Interface to HYP function to flush all VM TLB entries without address
> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
> + * kvm_tlb_flush_vmid_ipa().

remove the last sentence from here, it's repetitive.

> + */
> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> +{
> +	if (kvm)
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> +}
> +
> +#endif
> +
>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>  				  int min, int max)
>  {
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 13f2d19..f1efaa5 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  config KVM_VFIO
>         bool
> +
> +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
> +       bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fa70c6e..258f3d9 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
>  
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
> +	kvm_arch_flush_remote_tlbs(kvm);
> +#else
>  	long dirty_count = kvm->tlbs_dirty;
>  
>  	smp_mb();
>  	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>  		++kvm->stat.remote_tlb_flush;
>  	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> +#endif

I would split this into two patches, one trivial one for the KVM generic
solution, and one to add the arm-specific part.

That will make your commit text and title much nicer to read too.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
@ 2014-08-11 19:12     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 05:56:05PM -0700, Mario Smarduch wrote:
> Patch adds HYP interface for global VM TLB invalidation without address
> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_asm.h  |    1 +
>  arch/arm/include/asm/kvm_host.h |    1 +
>  arch/arm/kvm/Kconfig            |    1 +
>  arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>  arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>  virt/kvm/Kconfig                |    3 +++
>  virt/kvm/kvm_main.c             |    4 ++++
>  7 files changed, 39 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 53b3c4a..21bc519 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>  
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  #endif
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 193ceaf..042206f 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>  
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 466bd29..44d3b6f 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>  	select ANON_INODES
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>  	select KVM_MMIO
> +	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>  	select KVM_ARM_HOST
>  	depends on ARM_VIRT_EXT && ARM_LPAE
>  	---help---
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 0d68d40..1258d46 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>  	bx	lr
>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>  
> +/**
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
> + *
> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
> + * parameter
> + */
> +
> +ENTRY(__kvm_tlb_flush_vmid)
> +	b	__kvm_tlb_flush_vmid_ipa
> +ENDPROC(__kvm_tlb_flush_vmid)
> +
> +
>  /********************************************************************
>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>   * domain, for all VMIDs
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2ac9588..35254c6 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>  }
>  
> +#ifdef CONFIG_ARM

I assume this is here because of arm vs. arm64, use static inlines in
the header files to differentiate instead.

> +/**
> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
> + * @kvm:       pointer to kvm structure.
> + *
> + * Interface to HYP function to flush all VM TLB entries without address
> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
> + * kvm_tlb_flush_vmid_ipa().

remove the last sentence from here, it's repetitive.

> + */
> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> +{
> +	if (kvm)
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> +}
> +
> +#endif
> +
>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>  				  int min, int max)
>  {
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 13f2d19..f1efaa5 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  config KVM_VFIO
>         bool
> +
> +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
> +       bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fa70c6e..258f3d9 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
>  
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
> +	kvm_arch_flush_remote_tlbs(kvm);
> +#else
>  	long dirty_count = kvm->tlbs_dirty;
>  
>  	smp_mb();
>  	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>  		++kvm->stat.remote_tlb_flush;
>  	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> +#endif

I would split this into two patches, one trivial one for the KVM generic
solution, and one to add the arm-specific part.

That will make your commit text and title much nicer to read too.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-07-25  0:56   ` Mario Smarduch
@ 2014-08-11 19:12     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:12 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

Remove the parenthesis from the subject line.

On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
> Patch adds  support for initial write protection VM memlsot. This patch series
            ^^                                    ^
stray whitespace                                 of


> assumes that huge PUDs will not be used in 2nd stage tables.

may be worth mentioning that this is always valid on ARMv7.

> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h       |    1 +
>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>  arch/arm/include/asm/pgtable-3level.h |    1 +
>  arch/arm/kvm/arm.c                    |    9 +++
>  arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>  5 files changed, 159 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 042206f..6521a2d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  void kvm_arch_flush_remote_tlbs(struct kvm *);
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5cc0b0f..08ab5e8 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>  }
>  
> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> +{
> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> +{
> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> +}
> +
> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> +{
> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> +{
> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> +}
> +
>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>  #define kvm_pgd_addr_end(addr, end)					\
>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..d8bb40b 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -129,6 +129,7 @@
>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>  
> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>  
>  /*
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3c82b37..e11c2dd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  				   const struct kvm_memory_slot *old,
>  				   enum kvm_mr_change change)
>  {
> +#ifdef CONFIG_ARM
> +	/*
> +	 * At this point memslot has been committed and there is an
> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
> +	 * memory slot is write protected.
> +	 */
> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
> +#endif
>  }
>  
>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 35254c6..7bfc792 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>  	return false;
>  }
>  
> +#ifdef CONFIG_ARM
> +/**
> + * stage2_wp_pte_range - write protect PTE range
> + * @pmd:	pointer to pmd entry
> + * @addr:	range start address
> + * @end:	range end address
> + */
> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> +{
> +	pte_t *pte;
> +
> +	pte = pte_offset_kernel(pmd, addr);
> +	do {
> +		if (!pte_none(*pte)) {
> +			if (!kvm_s2pte_readonly(pte))
> +				kvm_set_s2pte_readonly(pte);
> +		}
> +	} while (pte++, addr += PAGE_SIZE, addr != end);
> +}
> +
> +/**
> + * stage2_wp_pmd_range - write protect PMD range
> + * @pud:	pointer to pud entry
> + * @addr:	range start address
> + * @end:	range end address
> + */
> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
> +{
> +	pmd_t *pmd;
> +	phys_addr_t next;
> +
> +	pmd = pmd_offset(pud, addr);
> +
> +	do {
> +		next = kvm_pmd_addr_end(addr, end);
> +		if (!pmd_none(*pmd)) {
> +			if (kvm_pmd_huge(*pmd)) {
> +				if (!kvm_s2pmd_readonly(pmd))
> +					kvm_set_s2pmd_readonly(pmd);
> +			} else
> +				stage2_wp_pte_range(pmd, addr, next);
please use a closing brace when the first part of the if-statement is a
multi-line block with braces, as per the CodingStyle.
> +

stray blank line

> +		}
> +	} while (pmd++, addr = next, addr != end);
> +}
> +
> +/**
> +  * stage2_wp_pud_range - write protect PUD range
> +  * @kvm:	pointer to kvm structure
> +  * @pud:	pointer to pgd entry
        pgd
> +  * @addr:	range start address
> +  * @end:	range end address
> +  *
> +  * While walking the PUD range huge PUD pages are ignored, in the future this
                             range, huge PUDs are ignored.  In the future...
> +  * may need to be revisited. Determine how to handle huge PUDs when logging
> +  * of dirty pages is enabled.

I don't understand the last sentence?

> +  */
> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
> +				phys_addr_t addr, phys_addr_t end)
> +{
> +	pud_t *pud;
> +	phys_addr_t next;
> +
> +	pud = pud_offset(pgd, addr);
> +	do {
> +		next = kvm_pud_addr_end(addr, end);
> +		/* TODO: huge PUD not supported, revisit later */
> +		BUG_ON(pud_huge(*pud));

we should probably define kvm_pud_huge()

> +		if (!pud_none(*pud))
> +			stage2_wp_pmd_range(pud, addr, next);
> +	} while (pud++, addr = next, addr != end);
> +}
> +
> +/**
> + * stage2_wp_range() - write protect stage2 memory region range
> + * @kvm:	The KVM pointer
> + * @start:	Start address of range
> + * &end:	End address of range
> + */
> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> +{
> +	pgd_t *pgd;
> +	phys_addr_t next;
> +
> +	pgd = kvm->arch.pgd + pgd_index(addr);
> +	do {
> +		/*
> +		 * Release kvm_mmu_lock periodically if the memory region is
> +		 * large features like detect hung task, lock detector or lock
                   large.  Otherwise, we may see panics due to..
> +		 * dep  may panic. In addition holding the lock this long will
    extra white space ^^           Additionally, holding the lock for a
    long timer will
> +		 * also starve other vCPUs. Applies to huge VM memory regions.
                                            ^^^ I don't understand this
					    last remark.
> +		 */
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);
> +
> +		next = kvm_pgd_addr_end(addr, end);
> +		if (pgd_present(*pgd))
> +			stage2_wp_pud_range(kvm, pgd, addr, next);
> +	} while (pgd++, addr = next, addr != end);
> +}
> +
> +/**
> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
> + * @kvm:	The KVM pointer
> + * @slot:	The memory slot to write protect
> + *
> + * Called to start logging dirty pages after memory region
> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
> + * all present PMD and PTEs are write protected in the memory region.
> + * Afterwards read of dirty page log can be called.
> + *
> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
> + * serializing operations for VM memory regions.
> + */
> +
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
> +{
> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
> +
> +	spin_lock(&kvm->mmu_lock);
> +	stage2_wp_range(kvm, start, end);
> +	kvm_flush_remote_tlbs(kvm);
> +	spin_unlock(&kvm->mmu_lock);
> +}
> +#endif
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot,
>  			  unsigned long fault_status)
> -- 
> 1.7.9.5
> 

Besides the commenting and whitespace stuff, this is beginning to look
good.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-08-11 19:12     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:12 UTC (permalink / raw)
  To: linux-arm-kernel

Remove the parenthesis from the subject line.

On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
> Patch adds  support for initial write protection VM memlsot. This patch series
            ^^                                    ^
stray whitespace                                 of


> assumes that huge PUDs will not be used in 2nd stage tables.

may be worth mentioning that this is always valid on ARMv7.

> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h       |    1 +
>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>  arch/arm/include/asm/pgtable-3level.h |    1 +
>  arch/arm/kvm/arm.c                    |    9 +++
>  arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>  5 files changed, 159 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 042206f..6521a2d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  void kvm_arch_flush_remote_tlbs(struct kvm *);
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5cc0b0f..08ab5e8 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>  }
>  
> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> +{
> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> +{
> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> +}
> +
> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> +{
> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> +{
> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> +}
> +
>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>  #define kvm_pgd_addr_end(addr, end)					\
>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..d8bb40b 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -129,6 +129,7 @@
>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>  
> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>  
>  /*
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3c82b37..e11c2dd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  				   const struct kvm_memory_slot *old,
>  				   enum kvm_mr_change change)
>  {
> +#ifdef CONFIG_ARM
> +	/*
> +	 * At this point memslot has been committed and there is an
> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
> +	 * memory slot is write protected.
> +	 */
> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
> +#endif
>  }
>  
>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 35254c6..7bfc792 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>  	return false;
>  }
>  
> +#ifdef CONFIG_ARM
> +/**
> + * stage2_wp_pte_range - write protect PTE range
> + * @pmd:	pointer to pmd entry
> + * @addr:	range start address
> + * @end:	range end address
> + */
> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> +{
> +	pte_t *pte;
> +
> +	pte = pte_offset_kernel(pmd, addr);
> +	do {
> +		if (!pte_none(*pte)) {
> +			if (!kvm_s2pte_readonly(pte))
> +				kvm_set_s2pte_readonly(pte);
> +		}
> +	} while (pte++, addr += PAGE_SIZE, addr != end);
> +}
> +
> +/**
> + * stage2_wp_pmd_range - write protect PMD range
> + * @pud:	pointer to pud entry
> + * @addr:	range start address
> + * @end:	range end address
> + */
> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
> +{
> +	pmd_t *pmd;
> +	phys_addr_t next;
> +
> +	pmd = pmd_offset(pud, addr);
> +
> +	do {
> +		next = kvm_pmd_addr_end(addr, end);
> +		if (!pmd_none(*pmd)) {
> +			if (kvm_pmd_huge(*pmd)) {
> +				if (!kvm_s2pmd_readonly(pmd))
> +					kvm_set_s2pmd_readonly(pmd);
> +			} else
> +				stage2_wp_pte_range(pmd, addr, next);
please use a closing brace when the first part of the if-statement is a
multi-line block with braces, as per the CodingStyle.
> +

stray blank line

> +		}
> +	} while (pmd++, addr = next, addr != end);
> +}
> +
> +/**
> +  * stage2_wp_pud_range - write protect PUD range
> +  * @kvm:	pointer to kvm structure
> +  * @pud:	pointer to pgd entry
        pgd
> +  * @addr:	range start address
> +  * @end:	range end address
> +  *
> +  * While walking the PUD range huge PUD pages are ignored, in the future this
                             range, huge PUDs are ignored.  In the future...
> +  * may need to be revisited. Determine how to handle huge PUDs when logging
> +  * of dirty pages is enabled.

I don't understand the last sentence?

> +  */
> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
> +				phys_addr_t addr, phys_addr_t end)
> +{
> +	pud_t *pud;
> +	phys_addr_t next;
> +
> +	pud = pud_offset(pgd, addr);
> +	do {
> +		next = kvm_pud_addr_end(addr, end);
> +		/* TODO: huge PUD not supported, revisit later */
> +		BUG_ON(pud_huge(*pud));

we should probably define kvm_pud_huge()

> +		if (!pud_none(*pud))
> +			stage2_wp_pmd_range(pud, addr, next);
> +	} while (pud++, addr = next, addr != end);
> +}
> +
> +/**
> + * stage2_wp_range() - write protect stage2 memory region range
> + * @kvm:	The KVM pointer
> + * @start:	Start address of range
> + * &end:	End address of range
> + */
> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> +{
> +	pgd_t *pgd;
> +	phys_addr_t next;
> +
> +	pgd = kvm->arch.pgd + pgd_index(addr);
> +	do {
> +		/*
> +		 * Release kvm_mmu_lock periodically if the memory region is
> +		 * large features like detect hung task, lock detector or lock
                   large.  Otherwise, we may see panics due to..
> +		 * dep  may panic. In addition holding the lock this long will
    extra white space ^^           Additionally, holding the lock for a
    long timer will
> +		 * also starve other vCPUs. Applies to huge VM memory regions.
                                            ^^^ I don't understand this
					    last remark.
> +		 */
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);
> +
> +		next = kvm_pgd_addr_end(addr, end);
> +		if (pgd_present(*pgd))
> +			stage2_wp_pud_range(kvm, pgd, addr, next);
> +	} while (pgd++, addr = next, addr != end);
> +}
> +
> +/**
> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
> + * @kvm:	The KVM pointer
> + * @slot:	The memory slot to write protect
> + *
> + * Called to start logging dirty pages after memory region
> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
> + * all present PMD and PTEs are write protected in the memory region.
> + * Afterwards read of dirty page log can be called.
> + *
> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
> + * serializing operations for VM memory regions.
> + */
> +
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
> +{
> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
> +
> +	spin_lock(&kvm->mmu_lock);
> +	stage2_wp_range(kvm, start, end);
> +	kvm_flush_remote_tlbs(kvm);
> +	spin_unlock(&kvm->mmu_lock);
> +}
> +#endif
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot,
>  			  unsigned long fault_status)
> -- 
> 1.7.9.5
> 

Besides the commenting and whitespace stuff, this is beginning to look
good.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific
  2014-07-25  0:56   ` Mario Smarduch
@ 2014-08-11 19:13     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:13 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Thu, Jul 24, 2014 at 05:56:07PM -0700, Mario Smarduch wrote:
> This patch adds support for keeping track of VM dirty pages. As dirty page log
> is retrieved, the pages that have been written are write protected again for
> next write and log read.
> 
> The dirty log read function is generic for armv7 and x86, and arch specific
> for arm64, ia64, mips, powerpc, s390.

So I would also split up this patch.  One that only modifies the
existing functionality, but does not introduce any new functionality for
ARM.  Put this first patch in the beginning of the patch series with the
other prepatory patch, so that you get something like this:

[PATCH 1/X] KVM: Add architecture-specific TLB flush implementations
[PATCH 2/X] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log
[PATCH 3/X] arm: KVM: Add ARMv7 API to flush TLBs
[PATCH 4/X] arm: KVM: Add initial dirty page locking infrastructure
...

That will make it easier to get the patches accepted and for us to
review...


> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/kvm/arm.c                  |    8 +++-
>  arch/arm/kvm/mmu.c                  |   22 +++++++++
>  arch/arm64/include/asm/kvm_host.h   |    2 +
>  arch/arm64/kvm/Kconfig              |    1 +
>  arch/ia64/include/asm/kvm_host.h    |    1 +
>  arch/ia64/kvm/Kconfig               |    1 +
>  arch/ia64/kvm/kvm-ia64.c            |    2 +-
>  arch/mips/include/asm/kvm_host.h    |    2 +-
>  arch/mips/kvm/Kconfig               |    1 +
>  arch/mips/kvm/kvm_mips.c            |    2 +-
>  arch/powerpc/include/asm/kvm_host.h |    2 +
>  arch/powerpc/kvm/Kconfig            |    1 +
>  arch/powerpc/kvm/book3s.c           |    2 +-
>  arch/powerpc/kvm/booke.c            |    2 +-
>  arch/s390/include/asm/kvm_host.h    |    2 +
>  arch/s390/kvm/Kconfig               |    1 +
>  arch/s390/kvm/kvm-s390.c            |    2 +-
>  arch/x86/kvm/x86.c                  |   86 ---------------------------------
>  include/linux/kvm_host.h            |    3 ++
>  virt/kvm/Kconfig                    |    3 ++
>  virt/kvm/kvm_main.c                 |   90 +++++++++++++++++++++++++++++++++++
>  21 files changed, 143 insertions(+), 93 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e11c2dd..f7739a0 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	}
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +#ifdef CONFIG_ARM64
> +/*
> + * For now features not supported on ARM64, the #ifdef is added to make that
> + * clear but not needed since ARM64 Kconfig selects function in generic code.
> + */

I don't think this comment is needed, but if you really want it, it
should be something like:

/*
 * ARM64 does not support dirty logging and therefore selects
 * CONFIG_HAVE_KVM_ARCH_DIRTY_LOG.  Provide a -EINVAL stub.
 */

> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	return -EINVAL;
>  }
> +#endif
>  
>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>  					struct kvm_arm_device_addr *dev_addr)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7bfc792..ca84331 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  	kvm_flush_remote_tlbs(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  }
> +
> +/**
> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
> + * @kvm:	The KVM pointer
> + * @slot:	The memory slot associated with mask
> + * @gfn_offset:	The gfn offset in memory slot
> + * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
> + *		slot to be write protected
> + *
> + * Walks bits set in mask write protects the associated pte's. Caller must
> + * acquire kvm_mmu_lock.
> + */
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +		struct kvm_memory_slot *slot,
> +		gfn_t gfn_offset, unsigned long mask)
> +{
> +	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
> +	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
> +	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;

__fls(x) + 1 is the same as fls(x)
> +
> +	stage2_wp_range(kvm, start, end);
> +}
>  #endif
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 92242ce..b4a280b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -200,4 +200,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>  		     hyp_stack_ptr, vector_ptr);
>  }
>  
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8ba85e9..9e21a8a 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	select KVM_MMIO
>  	select KVM_ARM_HOST
>  	select KVM_ARM_VGIC
> diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
> index db95f57..d79f520 100644
> --- a/arch/ia64/include/asm/kvm_host.h
> +++ b/arch/ia64/include/asm/kvm_host.h
> @@ -594,6 +594,7 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu);
>  #define __KVM_HAVE_ARCH_VM_ALLOC 1
>  struct kvm *kvm_arch_alloc_vm(void);
>  void kvm_arch_free_vm(struct kvm *kvm);
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  
>  #endif /* __ASSEMBLY__*/
>  
> diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
> index 990b864..32dd6c8 100644
> --- a/arch/ia64/kvm/Kconfig
> +++ b/arch/ia64/kvm/Kconfig
> @@ -24,6 +24,7 @@ config KVM
>  	depends on BROKEN
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	select HAVE_KVM_IRQCHIP
>  	select HAVE_KVM_IRQ_ROUTING
>  	select KVM_APIC_ARCHITECTURE
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 6a4309b..3166df5 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
>  	spin_unlock(&kvm->arch.dirty_log_lock);
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  		struct kvm_dirty_log *log)
>  {
>  	int r;
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index 060aaa6..f7e2262 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -649,6 +649,6 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
>  extern void mips32_SyncICache(unsigned long addr, unsigned long size);
>  extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
>  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
> -
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  
>  #endif /* __MIPS_KVM_HOST_H__ */
> diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
> index 30e334e..b57f49e 100644
> --- a/arch/mips/kvm/Kconfig
> +++ b/arch/mips/kvm/Kconfig
> @@ -20,6 +20,7 @@ config KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
>  	select KVM_MMIO
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	---help---
>  	  Support for hosting Guest kernels.
>  	  Currently supported on MIPS32 processors.
> diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
> index da5186f..f9a1e62 100644
> --- a/arch/mips/kvm/kvm_mips.c
> +++ b/arch/mips/kvm/kvm_mips.c
> @@ -790,7 +790,7 @@ out:
>  /*
>   * Get (and clear) the dirty memory log for a memory slot.
>   */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	struct kvm_memory_slot *memslot;
>  	unsigned long ga, ga_end;
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 1eaea2d..fb31595 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -676,4 +676,6 @@ struct kvm_vcpu_arch {
>  #define __KVM_HAVE_ARCH_WQP
>  #define __KVM_HAVE_CREATE_DEVICE
>  
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
> +
>  #endif /* __POWERPC_KVM_HOST_H__ */
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 141b202..c1fa061 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
>  	select HAVE_KVM_EVENTFD
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  
>  config KVM_BOOK3S_HANDLER
>  	bool
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 94e597e..3835936 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -781,7 +781,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
>  	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
>  }
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index ab62109..50dd33d 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -1624,7 +1624,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>  	return r;
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	return -ENOTSUPP;
>  }
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 0d45f6f..8afbe12 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -422,6 +422,7 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>  }
>  
>  #define ASYNC_PF_PER_VCPU	64
> +struct kvm;
>  struct kvm_vcpu;
>  struct kvm_async_pf;
>  struct kvm_arch_async_pf {
> @@ -441,4 +442,5 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  
>  extern int sie64a(struct kvm_s390_sie_block *, u64 *);
>  extern char sie_exit;
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  #endif
> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> index 10d529a..3ba07a7 100644
> --- a/arch/s390/kvm/Kconfig
> +++ b/arch/s390/kvm/Kconfig
> @@ -21,6 +21,7 @@ config KVM
>  	depends on HAVE_KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>  	select HAVE_KVM_EVENTFD
>  	select KVM_ASYNC_PF
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index b32c42c..95164e7 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -207,7 +207,7 @@ static void kvm_s390_sync_dirty_log(struct kvm *kvm,
>  /*
>   * Get (and clear) the dirty memory log for a memory slot.
>   */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  			       struct kvm_dirty_log *log)
>  {
>  	int r;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5582c3..a603ca3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>  	return 0;
>  }
>  
> -/**
> - * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> - * @kvm: kvm instance
> - * @log: slot id and address to which we copy the log
> - *
> - * We need to keep it in mind that VCPU threads can write to the bitmap
> - * concurrently.  So, to avoid losing data, we keep the following order for
> - * each bit:
> - *
> - *   1. Take a snapshot of the bit and clear it if needed.
> - *   2. Write protect the corresponding page.
> - *   3. Flush TLB's if needed.
> - *   4. Copy the snapshot to the userspace.
> - *
> - * Between 2 and 3, the guest may write to the page using the remaining TLB
> - * entry.  This is not a problem because the page will be reported dirty at
> - * step 4 using the snapshot taken before and step 3 ensures that successive
> - * writes will be logged for the next call.
> - */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> -{
> -	int r;
> -	struct kvm_memory_slot *memslot;
> -	unsigned long n, i;
> -	unsigned long *dirty_bitmap;
> -	unsigned long *dirty_bitmap_buffer;
> -	bool is_dirty = false;
> -
> -	mutex_lock(&kvm->slots_lock);
> -
> -	r = -EINVAL;
> -	if (log->slot >= KVM_USER_MEM_SLOTS)
> -		goto out;
> -
> -	memslot = id_to_memslot(kvm->memslots, log->slot);
> -
> -	dirty_bitmap = memslot->dirty_bitmap;
> -	r = -ENOENT;
> -	if (!dirty_bitmap)
> -		goto out;
> -
> -	n = kvm_dirty_bitmap_bytes(memslot);
> -
> -	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> -	memset(dirty_bitmap_buffer, 0, n);
> -
> -	spin_lock(&kvm->mmu_lock);
> -
> -	for (i = 0; i < n / sizeof(long); i++) {
> -		unsigned long mask;
> -		gfn_t offset;
> -
> -		if (!dirty_bitmap[i])
> -			continue;
> -
> -		is_dirty = true;
> -
> -		mask = xchg(&dirty_bitmap[i], 0);
> -		dirty_bitmap_buffer[i] = mask;
> -
> -		offset = i * BITS_PER_LONG;
> -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> -	}
> -
> -	spin_unlock(&kvm->mmu_lock);
> -
> -	/* See the comments in kvm_mmu_slot_remove_write_access(). */
> -	lockdep_assert_held(&kvm->slots_lock);
> -
> -	/*
> -	 * All the TLBs can be flushed out of mmu lock, see the comments in
> -	 * kvm_mmu_slot_remove_write_access().
> -	 */
> -	if (is_dirty)
> -		kvm_flush_remote_tlbs(kvm);
> -
> -	r = -EFAULT;
> -	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> -		goto out;
> -
> -	r = 0;
> -out:
> -	mutex_unlock(&kvm->slots_lock);
> -	return r;
> -}
> -
>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
>  			bool line_status)
>  {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 820fc2e..2f3822b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -606,6 +606,9 @@ int kvm_get_dirty_log(struct kvm *kvm,
>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  				struct kvm_dirty_log *log);
>  
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +	struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask);
> +
>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
>  			bool line_status);
>  long kvm_arch_vm_ioctl(struct file *filp,
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index f1efaa5..975733f 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -37,3 +37,6 @@ config KVM_VFIO
>  
>  config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>         bool
> +
> +config HAVE_KVM_ARCH_DIRTY_LOG
> +       bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 258f3d9..51b90ca 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -442,6 +442,96 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
>  
>  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
>  
> +/**
> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> + * @kvm: kvm instance
> + * @log: slot id and address to which we copy the log
> + *
> + * We need to keep it in mind that VCPU threads can write to the bitmap
> + * concurrently.  So, to avoid losing data, we keep the following order for
> + * each bit:
> + *
> + *   1. Take a snapshot of the bit and clear it if needed.
> + *   2. Write protect the corresponding page.
> + *   3. Flush TLB's if needed.
> + *   4. Copy the snapshot to the userspace.
> + *
> + * Between 2 and 3, the guest may write to the page using the remaining TLB
> + * entry.  This is not a problem because the page will be reported dirty at
> + * step 4 using the snapshot taken before and step 3 ensures that successive
> + * writes will be logged for the next call.
> + */
> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +{
> +#ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
> +	return kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
> +#else
> +	int r;
> +	struct kvm_memory_slot *memslot;
> +	unsigned long n, i;
> +	unsigned long *dirty_bitmap;
> +	unsigned long *dirty_bitmap_buffer;
> +	bool is_dirty = false;
> +
> +	mutex_lock(&kvm->slots_lock);
> +
> +	r = -EINVAL;
> +	if (log->slot >= KVM_USER_MEM_SLOTS)
> +		goto out;
> +
> +	memslot = id_to_memslot(kvm->memslots, log->slot);
> +
> +	dirty_bitmap = memslot->dirty_bitmap;
> +	r = -ENOENT;
> +	if (!dirty_bitmap)
> +		goto out;
> +
> +	n = kvm_dirty_bitmap_bytes(memslot);
> +
> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> +	memset(dirty_bitmap_buffer, 0, n);
> +
> +	spin_lock(&kvm->mmu_lock);
> +
> +	for (i = 0; i < n / sizeof(long); i++) {
> +		unsigned long mask;
> +		gfn_t offset;
> +
> +		if (!dirty_bitmap[i])
> +			continue;
> +
> +		is_dirty = true;
> +
> +		mask = xchg(&dirty_bitmap[i], 0);
> +		dirty_bitmap_buffer[i] = mask;
> +
> +		offset = i * BITS_PER_LONG;
> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> +	}
> +
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	/* See the comments in kvm_mmu_slot_remove_write_access(). */
> +	lockdep_assert_held(&kvm->slots_lock);
> +
> +	/*
> +	 * All the TLBs can be flushed out of mmu lock, see the comments in
> +	 * kvm_mmu_slot_remove_write_access().
> +	 */
> +	if (is_dirty)
> +		kvm_flush_remote_tlbs(kvm);
> +
> +	r = -EFAULT;
> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> +		goto out;
> +
> +	r = 0;
> +out:
> +	mutex_unlock(&kvm->slots_lock);
> +	return r;
> +#endif
> +}
> +
>  static void kvm_init_memslots_id(struct kvm *kvm)
>  {
>  	int i;
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific
@ 2014-08-11 19:13     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 05:56:07PM -0700, Mario Smarduch wrote:
> This patch adds support for keeping track of VM dirty pages. As dirty page log
> is retrieved, the pages that have been written are write protected again for
> next write and log read.
> 
> The dirty log read function is generic for armv7 and x86, and arch specific
> for arm64, ia64, mips, powerpc, s390.

So I would also split up this patch.  One that only modifies the
existing functionality, but does not introduce any new functionality for
ARM.  Put this first patch in the beginning of the patch series with the
other prepatory patch, so that you get something like this:

[PATCH 1/X] KVM: Add architecture-specific TLB flush implementations
[PATCH 2/X] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log
[PATCH 3/X] arm: KVM: Add ARMv7 API to flush TLBs
[PATCH 4/X] arm: KVM: Add initial dirty page locking infrastructure
...

That will make it easier to get the patches accepted and for us to
review...


> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/kvm/arm.c                  |    8 +++-
>  arch/arm/kvm/mmu.c                  |   22 +++++++++
>  arch/arm64/include/asm/kvm_host.h   |    2 +
>  arch/arm64/kvm/Kconfig              |    1 +
>  arch/ia64/include/asm/kvm_host.h    |    1 +
>  arch/ia64/kvm/Kconfig               |    1 +
>  arch/ia64/kvm/kvm-ia64.c            |    2 +-
>  arch/mips/include/asm/kvm_host.h    |    2 +-
>  arch/mips/kvm/Kconfig               |    1 +
>  arch/mips/kvm/kvm_mips.c            |    2 +-
>  arch/powerpc/include/asm/kvm_host.h |    2 +
>  arch/powerpc/kvm/Kconfig            |    1 +
>  arch/powerpc/kvm/book3s.c           |    2 +-
>  arch/powerpc/kvm/booke.c            |    2 +-
>  arch/s390/include/asm/kvm_host.h    |    2 +
>  arch/s390/kvm/Kconfig               |    1 +
>  arch/s390/kvm/kvm-s390.c            |    2 +-
>  arch/x86/kvm/x86.c                  |   86 ---------------------------------
>  include/linux/kvm_host.h            |    3 ++
>  virt/kvm/Kconfig                    |    3 ++
>  virt/kvm/kvm_main.c                 |   90 +++++++++++++++++++++++++++++++++++
>  21 files changed, 143 insertions(+), 93 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e11c2dd..f7739a0 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	}
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +#ifdef CONFIG_ARM64
> +/*
> + * For now features not supported on ARM64, the #ifdef is added to make that
> + * clear but not needed since ARM64 Kconfig selects function in generic code.
> + */

I don't think this comment is needed, but if you really want it, it
should be something like:

/*
 * ARM64 does not support dirty logging and therefore selects
 * CONFIG_HAVE_KVM_ARCH_DIRTY_LOG.  Provide a -EINVAL stub.
 */

> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	return -EINVAL;
>  }
> +#endif
>  
>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>  					struct kvm_arm_device_addr *dev_addr)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7bfc792..ca84331 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  	kvm_flush_remote_tlbs(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  }
> +
> +/**
> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
> + * @kvm:	The KVM pointer
> + * @slot:	The memory slot associated with mask
> + * @gfn_offset:	The gfn offset in memory slot
> + * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
> + *		slot to be write protected
> + *
> + * Walks bits set in mask write protects the associated pte's. Caller must
> + * acquire kvm_mmu_lock.
> + */
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +		struct kvm_memory_slot *slot,
> +		gfn_t gfn_offset, unsigned long mask)
> +{
> +	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
> +	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
> +	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;

__fls(x) + 1 is the same as fls(x)
> +
> +	stage2_wp_range(kvm, start, end);
> +}
>  #endif
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 92242ce..b4a280b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -200,4 +200,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>  		     hyp_stack_ptr, vector_ptr);
>  }
>  
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8ba85e9..9e21a8a 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	select KVM_MMIO
>  	select KVM_ARM_HOST
>  	select KVM_ARM_VGIC
> diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
> index db95f57..d79f520 100644
> --- a/arch/ia64/include/asm/kvm_host.h
> +++ b/arch/ia64/include/asm/kvm_host.h
> @@ -594,6 +594,7 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu);
>  #define __KVM_HAVE_ARCH_VM_ALLOC 1
>  struct kvm *kvm_arch_alloc_vm(void);
>  void kvm_arch_free_vm(struct kvm *kvm);
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  
>  #endif /* __ASSEMBLY__*/
>  
> diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
> index 990b864..32dd6c8 100644
> --- a/arch/ia64/kvm/Kconfig
> +++ b/arch/ia64/kvm/Kconfig
> @@ -24,6 +24,7 @@ config KVM
>  	depends on BROKEN
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	select HAVE_KVM_IRQCHIP
>  	select HAVE_KVM_IRQ_ROUTING
>  	select KVM_APIC_ARCHITECTURE
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 6a4309b..3166df5 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
>  	spin_unlock(&kvm->arch.dirty_log_lock);
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  		struct kvm_dirty_log *log)
>  {
>  	int r;
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index 060aaa6..f7e2262 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -649,6 +649,6 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
>  extern void mips32_SyncICache(unsigned long addr, unsigned long size);
>  extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
>  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
> -
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  
>  #endif /* __MIPS_KVM_HOST_H__ */
> diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
> index 30e334e..b57f49e 100644
> --- a/arch/mips/kvm/Kconfig
> +++ b/arch/mips/kvm/Kconfig
> @@ -20,6 +20,7 @@ config KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
>  	select KVM_MMIO
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	---help---
>  	  Support for hosting Guest kernels.
>  	  Currently supported on MIPS32 processors.
> diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
> index da5186f..f9a1e62 100644
> --- a/arch/mips/kvm/kvm_mips.c
> +++ b/arch/mips/kvm/kvm_mips.c
> @@ -790,7 +790,7 @@ out:
>  /*
>   * Get (and clear) the dirty memory log for a memory slot.
>   */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	struct kvm_memory_slot *memslot;
>  	unsigned long ga, ga_end;
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 1eaea2d..fb31595 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -676,4 +676,6 @@ struct kvm_vcpu_arch {
>  #define __KVM_HAVE_ARCH_WQP
>  #define __KVM_HAVE_CREATE_DEVICE
>  
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
> +
>  #endif /* __POWERPC_KVM_HOST_H__ */
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 141b202..c1fa061 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
>  	select HAVE_KVM_EVENTFD
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  
>  config KVM_BOOK3S_HANDLER
>  	bool
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 94e597e..3835936 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -781,7 +781,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
>  	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
>  }
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index ab62109..50dd33d 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -1624,7 +1624,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>  	return r;
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	return -ENOTSUPP;
>  }
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 0d45f6f..8afbe12 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -422,6 +422,7 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>  }
>  
>  #define ASYNC_PF_PER_VCPU	64
> +struct kvm;
>  struct kvm_vcpu;
>  struct kvm_async_pf;
>  struct kvm_arch_async_pf {
> @@ -441,4 +442,5 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  
>  extern int sie64a(struct kvm_s390_sie_block *, u64 *);
>  extern char sie_exit;
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  #endif
> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
> index 10d529a..3ba07a7 100644
> --- a/arch/s390/kvm/Kconfig
> +++ b/arch/s390/kvm/Kconfig
> @@ -21,6 +21,7 @@ config KVM
>  	depends on HAVE_KVM
>  	select PREEMPT_NOTIFIERS
>  	select ANON_INODES
> +	select HAVE_KVM_ARCH_DIRTY_LOG
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>  	select HAVE_KVM_EVENTFD
>  	select KVM_ASYNC_PF
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index b32c42c..95164e7 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -207,7 +207,7 @@ static void kvm_s390_sync_dirty_log(struct kvm *kvm,
>  /*
>   * Get (and clear) the dirty memory log for a memory slot.
>   */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  			       struct kvm_dirty_log *log)
>  {
>  	int r;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5582c3..a603ca3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>  	return 0;
>  }
>  
> -/**
> - * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> - * @kvm: kvm instance
> - * @log: slot id and address to which we copy the log
> - *
> - * We need to keep it in mind that VCPU threads can write to the bitmap
> - * concurrently.  So, to avoid losing data, we keep the following order for
> - * each bit:
> - *
> - *   1. Take a snapshot of the bit and clear it if needed.
> - *   2. Write protect the corresponding page.
> - *   3. Flush TLB's if needed.
> - *   4. Copy the snapshot to the userspace.
> - *
> - * Between 2 and 3, the guest may write to the page using the remaining TLB
> - * entry.  This is not a problem because the page will be reported dirty at
> - * step 4 using the snapshot taken before and step 3 ensures that successive
> - * writes will be logged for the next call.
> - */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> -{
> -	int r;
> -	struct kvm_memory_slot *memslot;
> -	unsigned long n, i;
> -	unsigned long *dirty_bitmap;
> -	unsigned long *dirty_bitmap_buffer;
> -	bool is_dirty = false;
> -
> -	mutex_lock(&kvm->slots_lock);
> -
> -	r = -EINVAL;
> -	if (log->slot >= KVM_USER_MEM_SLOTS)
> -		goto out;
> -
> -	memslot = id_to_memslot(kvm->memslots, log->slot);
> -
> -	dirty_bitmap = memslot->dirty_bitmap;
> -	r = -ENOENT;
> -	if (!dirty_bitmap)
> -		goto out;
> -
> -	n = kvm_dirty_bitmap_bytes(memslot);
> -
> -	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> -	memset(dirty_bitmap_buffer, 0, n);
> -
> -	spin_lock(&kvm->mmu_lock);
> -
> -	for (i = 0; i < n / sizeof(long); i++) {
> -		unsigned long mask;
> -		gfn_t offset;
> -
> -		if (!dirty_bitmap[i])
> -			continue;
> -
> -		is_dirty = true;
> -
> -		mask = xchg(&dirty_bitmap[i], 0);
> -		dirty_bitmap_buffer[i] = mask;
> -
> -		offset = i * BITS_PER_LONG;
> -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> -	}
> -
> -	spin_unlock(&kvm->mmu_lock);
> -
> -	/* See the comments in kvm_mmu_slot_remove_write_access(). */
> -	lockdep_assert_held(&kvm->slots_lock);
> -
> -	/*
> -	 * All the TLBs can be flushed out of mmu lock, see the comments in
> -	 * kvm_mmu_slot_remove_write_access().
> -	 */
> -	if (is_dirty)
> -		kvm_flush_remote_tlbs(kvm);
> -
> -	r = -EFAULT;
> -	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> -		goto out;
> -
> -	r = 0;
> -out:
> -	mutex_unlock(&kvm->slots_lock);
> -	return r;
> -}
> -
>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
>  			bool line_status)
>  {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 820fc2e..2f3822b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -606,6 +606,9 @@ int kvm_get_dirty_log(struct kvm *kvm,
>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  				struct kvm_dirty_log *log);
>  
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +	struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask);
> +
>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
>  			bool line_status);
>  long kvm_arch_vm_ioctl(struct file *filp,
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index f1efaa5..975733f 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -37,3 +37,6 @@ config KVM_VFIO
>  
>  config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>         bool
> +
> +config HAVE_KVM_ARCH_DIRTY_LOG
> +       bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 258f3d9..51b90ca 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -442,6 +442,96 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
>  
>  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
>  
> +/**
> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> + * @kvm: kvm instance
> + * @log: slot id and address to which we copy the log
> + *
> + * We need to keep it in mind that VCPU threads can write to the bitmap
> + * concurrently.  So, to avoid losing data, we keep the following order for
> + * each bit:
> + *
> + *   1. Take a snapshot of the bit and clear it if needed.
> + *   2. Write protect the corresponding page.
> + *   3. Flush TLB's if needed.
> + *   4. Copy the snapshot to the userspace.
> + *
> + * Between 2 and 3, the guest may write to the page using the remaining TLB
> + * entry.  This is not a problem because the page will be reported dirty at
> + * step 4 using the snapshot taken before and step 3 ensures that successive
> + * writes will be logged for the next call.
> + */
> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +{
> +#ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
> +	return kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
> +#else
> +	int r;
> +	struct kvm_memory_slot *memslot;
> +	unsigned long n, i;
> +	unsigned long *dirty_bitmap;
> +	unsigned long *dirty_bitmap_buffer;
> +	bool is_dirty = false;
> +
> +	mutex_lock(&kvm->slots_lock);
> +
> +	r = -EINVAL;
> +	if (log->slot >= KVM_USER_MEM_SLOTS)
> +		goto out;
> +
> +	memslot = id_to_memslot(kvm->memslots, log->slot);
> +
> +	dirty_bitmap = memslot->dirty_bitmap;
> +	r = -ENOENT;
> +	if (!dirty_bitmap)
> +		goto out;
> +
> +	n = kvm_dirty_bitmap_bytes(memslot);
> +
> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> +	memset(dirty_bitmap_buffer, 0, n);
> +
> +	spin_lock(&kvm->mmu_lock);
> +
> +	for (i = 0; i < n / sizeof(long); i++) {
> +		unsigned long mask;
> +		gfn_t offset;
> +
> +		if (!dirty_bitmap[i])
> +			continue;
> +
> +		is_dirty = true;
> +
> +		mask = xchg(&dirty_bitmap[i], 0);
> +		dirty_bitmap_buffer[i] = mask;
> +
> +		offset = i * BITS_PER_LONG;
> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> +	}
> +
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	/* See the comments in kvm_mmu_slot_remove_write_access(). */
> +	lockdep_assert_held(&kvm->slots_lock);
> +
> +	/*
> +	 * All the TLBs can be flushed out of mmu lock, see the comments in
> +	 * kvm_mmu_slot_remove_write_access().
> +	 */
> +	if (is_dirty)
> +		kvm_flush_remote_tlbs(kvm);
> +
> +	r = -EFAULT;
> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> +		goto out;
> +
> +	r = 0;
> +out:
> +	mutex_unlock(&kvm->slots_lock);
> +	return r;
> +#endif
> +}
> +
>  static void kvm_init_memslots_id(struct kvm *kvm)
>  {
>  	int i;
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-07-25  0:56   ` Mario Smarduch
@ 2014-08-11 19:13     ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:13 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> This patch adds support for handling 2nd stage page faults during migration,
> it disables faulting in huge pages, and dissolves huge pages to page tables.
> In case migration is canceled huge pages will be used again.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index ca84331..a17812a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  }
>  
>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
> +			  bool logging_active)
>  {
>  	pmd_t *pmd;
>  	pte_t *pte, old_pte;
> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  		return 0;
>  	}
>  
> +	/*
> +	 * While dirty memory logging, clear PMD entry for huge page and split
> +	 * into smaller pages, to track dirty memory at page granularity.
> +	 */
> +	if (logging_active && kvm_pmd_huge(*pmd)) {
> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
> +		clear_pmd_entry(kvm, pmd, ipa);

clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
definitely not the right thing to call.

> +	}
> +
>  	/* Create stage-2 page mappings - Level 2 */
>  	if (pmd_none(*pmd)) {
>  		if (!cache)
> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  		if (ret)
>  			goto out;
>  		spin_lock(&kvm->mmu_lock);
> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>  		spin_unlock(&kvm->mmu_lock);
>  		if (ret)
>  			goto out;
> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
>  	pfn_t pfn;
> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
> +	#ifdef CONFIG_ARM
> +		bool logging_active = !!memslot->dirty_bitmap;
> +	#else
> +		bool logging_active = false;
> +	#endif

can you make this an inline in the header files for now please?

>  
>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>  	if (fault_status == FSC_PERM && !write_fault) {
> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
>  	down_read(&current->mm->mmap_sem);
>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
> -	if (is_vm_hugetlb_page(vma)) {
> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
>  		hugetlb = true;
>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>  	} else {
> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	spin_lock(&kvm->mmu_lock);
>  	if (mmu_notifier_retry(kvm, mmu_seq))
>  		goto out_unlock;
> -	if (!hugetlb && !force_pte)
> +	if (!hugetlb && !force_pte && !logging_active)
>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>  
>  	if (hugetlb) {
> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  		}
>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
> +					logging_active);
>  	}
>  
> +	if (write_fault)
> +		mark_page_dirty(kvm, gfn);
>  
>  out_unlock:
>  	spin_unlock(&kvm->mmu_lock);
> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>  {
>  	pte_t *pte = (pte_t *)data;
>  
> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);

why is logging never active if we are called from MMU notifiers?

>  }
>  
>  
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-11 19:13     ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-11 19:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> This patch adds support for handling 2nd stage page faults during migration,
> it disables faulting in huge pages, and dissolves huge pages to page tables.
> In case migration is canceled huge pages will be used again.
> 
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index ca84331..a17812a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  }
>  
>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
> +			  bool logging_active)
>  {
>  	pmd_t *pmd;
>  	pte_t *pte, old_pte;
> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  		return 0;
>  	}
>  
> +	/*
> +	 * While dirty memory logging, clear PMD entry for huge page and split
> +	 * into smaller pages, to track dirty memory at page granularity.
> +	 */
> +	if (logging_active && kvm_pmd_huge(*pmd)) {
> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
> +		clear_pmd_entry(kvm, pmd, ipa);

clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
definitely not the right thing to call.

> +	}
> +
>  	/* Create stage-2 page mappings - Level 2 */
>  	if (pmd_none(*pmd)) {
>  		if (!cache)
> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  		if (ret)
>  			goto out;
>  		spin_lock(&kvm->mmu_lock);
> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>  		spin_unlock(&kvm->mmu_lock);
>  		if (ret)
>  			goto out;
> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
>  	pfn_t pfn;
> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
> +	#ifdef CONFIG_ARM
> +		bool logging_active = !!memslot->dirty_bitmap;
> +	#else
> +		bool logging_active = false;
> +	#endif

can you make this an inline in the header files for now please?

>  
>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>  	if (fault_status == FSC_PERM && !write_fault) {
> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
>  	down_read(&current->mm->mmap_sem);
>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
> -	if (is_vm_hugetlb_page(vma)) {
> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
>  		hugetlb = true;
>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>  	} else {
> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	spin_lock(&kvm->mmu_lock);
>  	if (mmu_notifier_retry(kvm, mmu_seq))
>  		goto out_unlock;
> -	if (!hugetlb && !force_pte)
> +	if (!hugetlb && !force_pte && !logging_active)
>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>  
>  	if (hugetlb) {
> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  		}
>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
> +					logging_active);
>  	}
>  
> +	if (write_fault)
> +		mark_page_dirty(kvm, gfn);
>  
>  out_unlock:
>  	spin_unlock(&kvm->mmu_lock);
> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>  {
>  	pte_t *pte = (pte_t *)data;
>  
> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);

why is logging never active if we are called from MMU notifiers?

>  }
>  
>  
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
  2014-08-11 19:12     ` Christoffer Dall
@ 2014-08-11 23:54       ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-11 23:54 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:05PM -0700, Mario Smarduch wrote:
>> Patch adds HYP interface for global VM TLB invalidation without address
>> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_asm.h  |    1 +
>>  arch/arm/include/asm/kvm_host.h |    1 +
>>  arch/arm/kvm/Kconfig            |    1 +
>>  arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>>  arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>>  virt/kvm/Kconfig                |    3 +++
>>  virt/kvm/kvm_main.c             |    4 ++++
>>  7 files changed, 39 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 53b3c4a..21bc519 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>>  
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  #endif
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 193ceaf..042206f 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>>  
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 466bd29..44d3b6f 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>  	select ANON_INODES
>>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>>  	select KVM_MMIO
>> +	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>  	select KVM_ARM_HOST
>>  	depends on ARM_VIRT_EXT && ARM_LPAE
>>  	---help---
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 0d68d40..1258d46 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>  	bx	lr
>>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>  
>> +/**
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
>> + *
>> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
>> + * parameter
>> + */
>> +
>> +ENTRY(__kvm_tlb_flush_vmid)
>> +	b	__kvm_tlb_flush_vmid_ipa
>> +ENDPROC(__kvm_tlb_flush_vmid)
>> +
>> +
>>  /********************************************************************
>>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>>   * domain, for all VMIDs
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 2ac9588..35254c6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>  }
>>  
>> +#ifdef CONFIG_ARM
> 
> I assume this is here because of arm vs. arm64, use static inlines in
> the header files to differentiate instead.
Yes that's right, will move it.
> 
>> +/**
>> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
>> + * @kvm:       pointer to kvm structure.
>> + *
>> + * Interface to HYP function to flush all VM TLB entries without address
>> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
>> + * kvm_tlb_flush_vmid_ipa().
> 
> remove the last sentence from here, it's repetitive.
Ok.
> 
>> + */
>> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>> +{
>> +	if (kvm)
>> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>> +}
>> +
>> +#endif
>> +
>>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>>  				  int min, int max)
>>  {
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 13f2d19..f1efaa5 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
>>  
>>  config KVM_VFIO
>>         bool
>> +
>> +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> +       bool
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index fa70c6e..258f3d9 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
>>  
>>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>>  {
>> +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> +	kvm_arch_flush_remote_tlbs(kvm);
>> +#else
>>  	long dirty_count = kvm->tlbs_dirty;
>>  
>>  	smp_mb();
>>  	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>>  		++kvm->stat.remote_tlb_flush;
>>  	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>> +#endif
> 
> I would split this into two patches, one trivial one for the KVM generic
> solution, and one to add the arm-specific part.
> 
> That will make your commit text and title much nicer to read too.

Yes makes sense easier to review generic and arch layers.

> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush
@ 2014-08-11 23:54       ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-11 23:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:05PM -0700, Mario Smarduch wrote:
>> Patch adds HYP interface for global VM TLB invalidation without address
>> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_asm.h  |    1 +
>>  arch/arm/include/asm/kvm_host.h |    1 +
>>  arch/arm/kvm/Kconfig            |    1 +
>>  arch/arm/kvm/interrupts.S       |   12 ++++++++++++
>>  arch/arm/kvm/mmu.c              |   17 +++++++++++++++++
>>  virt/kvm/Kconfig                |    3 +++
>>  virt/kvm/kvm_main.c             |    4 ++++
>>  7 files changed, 39 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 53b3c4a..21bc519 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>>  
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  #endif
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 193ceaf..042206f 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>>  
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 466bd29..44d3b6f 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>  	select ANON_INODES
>>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>>  	select KVM_MMIO
>> +	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>  	select KVM_ARM_HOST
>>  	depends on ARM_VIRT_EXT && ARM_LPAE
>>  	---help---
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 0d68d40..1258d46 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>  	bx	lr
>>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>  
>> +/**
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
>> + *
>> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
>> + * parameter
>> + */
>> +
>> +ENTRY(__kvm_tlb_flush_vmid)
>> +	b	__kvm_tlb_flush_vmid_ipa
>> +ENDPROC(__kvm_tlb_flush_vmid)
>> +
>> +
>>  /********************************************************************
>>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>>   * domain, for all VMIDs
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 2ac9588..35254c6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>>  		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>  }
>>  
>> +#ifdef CONFIG_ARM
> 
> I assume this is here because of arm vs. arm64, use static inlines in
> the header files to differentiate instead.
Yes that's right, will move it.
> 
>> +/**
>> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
>> + * @kvm:       pointer to kvm structure.
>> + *
>> + * Interface to HYP function to flush all VM TLB entries without address
>> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
>> + * kvm_tlb_flush_vmid_ipa().
> 
> remove the last sentence from here, it's repetitive.
Ok.
> 
>> + */
>> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>> +{
>> +	if (kvm)
>> +		kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>> +}
>> +
>> +#endif
>> +
>>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>>  				  int min, int max)
>>  {
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 13f2d19..f1efaa5 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
>>  
>>  config KVM_VFIO
>>         bool
>> +
>> +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> +       bool
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index fa70c6e..258f3d9 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
>>  
>>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>>  {
>> +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> +	kvm_arch_flush_remote_tlbs(kvm);
>> +#else
>>  	long dirty_count = kvm->tlbs_dirty;
>>  
>>  	smp_mb();
>>  	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>>  		++kvm->stat.remote_tlb_flush;
>>  	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>> +#endif
> 
> I would split this into two patches, one trivial one for the KVM generic
> solution, and one to add the arm-specific part.
> 
> That will make your commit text and title much nicer to read too.

Yes makes sense easier to review generic and arch layers.

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-08-11 19:12     ` Christoffer Dall
@ 2014-08-12  0:16       ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  0:16 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> Remove the parenthesis from the subject line.

Hmmm have to check this don't see it my patch file.
> 
> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This patch series
>             ^^                                    ^
> stray whitespace                                 of
> 
Need to watch out for these adds delays to review cycle.
> 
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> may be worth mentioning that this is always valid on ARMv7.
> 

Yep definitely.

>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h       |    1 +
>>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>  arch/arm/include/asm/pgtable-3level.h |    1 +
>>  arch/arm/kvm/arm.c                    |    9 +++
>>  arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>>  5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>  }
>>  
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  #define kvm_pgd_addr_end(addr, end)					\
>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>  
>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>  
>>  /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>  				   const struct kvm_memory_slot *old,
>>  				   enum kvm_mr_change change)
>>  {
>> +#ifdef CONFIG_ARM
>> +	/*
>> +	 * At this point memslot has been committed and there is an
>> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
>> +	 * memory slot is write protected.
>> +	 */
>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>> +#endif
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 35254c6..7bfc792 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>>  	return false;
>>  }
>>  
>> +#ifdef CONFIG_ARM
>> +/**
>> + * stage2_wp_pte_range - write protect PTE range
>> + * @pmd:	pointer to pmd entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pte_t *pte;
>> +
>> +	pte = pte_offset_kernel(pmd, addr);
>> +	do {
>> +		if (!pte_none(*pte)) {
>> +			if (!kvm_s2pte_readonly(pte))
>> +				kvm_set_s2pte_readonly(pte);
>> +		}
>> +	} while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pmd_range - write protect PMD range
>> + * @pud:	pointer to pud entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pmd_t *pmd;
>> +	phys_addr_t next;
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +
>> +	do {
>> +		next = kvm_pmd_addr_end(addr, end);
>> +		if (!pmd_none(*pmd)) {
>> +			if (kvm_pmd_huge(*pmd)) {
>> +				if (!kvm_s2pmd_readonly(pmd))
>> +					kvm_set_s2pmd_readonly(pmd);
>> +			} else
>> +				stage2_wp_pte_range(pmd, addr, next);
> please use a closing brace when the first part of the if-statement is a
> multi-line block with braces, as per the CodingStyle.
Will fix.
>> +
> 
> stray blank line

Not sure how it got by checkpatch, will fix.
> 
>> +		}
>> +	} while (pmd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> +  * stage2_wp_pud_range - write protect PUD range
>> +  * @kvm:	pointer to kvm structure
>> +  * @pud:	pointer to pgd entry
>         pgd
>> +  * @addr:	range start address
>> +  * @end:	range end address
>> +  *
>> +  * While walking the PUD range huge PUD pages are ignored, in the future this
>                              range, huge PUDs are ignored.  In the future...
>> +  * may need to be revisited. Determine how to handle huge PUDs when logging
>> +  * of dirty pages is enabled.
> 
> I don't understand the last sentence?

Probably last two sentences should be combined.
".... to determine how to handle huge PUT...". Would that be
clear enough?

The overall theme is what to do about PUDs - mark all pages dirty
in the region, attempt to breakup such huge regions?

> 
>> +  */
>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>> +				phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pud_t *pud;
>> +	phys_addr_t next;
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	do {
>> +		next = kvm_pud_addr_end(addr, end);
>> +		/* TODO: huge PUD not supported, revisit later */
>> +		BUG_ON(pud_huge(*pud));
> 
> we should probably define kvm_pud_huge()

Yep will add it.

> 
>> +		if (!pud_none(*pud))
>> +			stage2_wp_pmd_range(pud, addr, next);
>> +	} while (pud++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_range() - write protect stage2 memory region range
>> + * @kvm:	The KVM pointer
>> + * @start:	Start address of range
>> + * &end:	End address of range
>> + */
>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pgd_t *pgd;
>> +	phys_addr_t next;
>> +
>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>> +	do {
>> +		/*
>> +		 * Release kvm_mmu_lock periodically if the memory region is
>> +		 * large features like detect hung task, lock detector or lock
>                    large.  Otherwise, we may see panics due to..
>> +		 * dep  may panic. In addition holding the lock this long will
>     extra white space ^^           Additionally, holding the lock for a
>     long timer will
>> +		 * also starve other vCPUs. Applies to huge VM memory regions.
>                                             ^^^ I don't understand this
> 					    last remark.
>> +		 */
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
>> +
>> +		next = kvm_pgd_addr_end(addr, end);
>> +		if (pgd_present(*pgd))
>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>> +	} while (pgd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
>> + * @kvm:	The KVM pointer
>> + * @slot:	The memory slot to write protect
>> + *
>> + * Called to start logging dirty pages after memory region
>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>> + * all present PMD and PTEs are write protected in the memory region.
>> + * Afterwards read of dirty page log can be called.
>> + *
>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>> + * serializing operations for VM memory regions.
>> + */
>> +
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>> +{
>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +	stage2_wp_range(kvm, start, end);
>> +	kvm_flush_remote_tlbs(kvm);
>> +	spin_unlock(&kvm->mmu_lock);
>> +}
>> +#endif
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot,
>>  			  unsigned long fault_status)
>> -- 
>> 1.7.9.5
>>
> 
> Besides the commenting and whitespace stuff, this is beginning to look
> good.

I'll clean it up for next iteration.
> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-08-12  0:16       ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  0:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> Remove the parenthesis from the subject line.

Hmmm have to check this don't see it my patch file.
> 
> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This patch series
>             ^^                                    ^
> stray whitespace                                 of
> 
Need to watch out for these adds delays to review cycle.
> 
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> may be worth mentioning that this is always valid on ARMv7.
> 

Yep definitely.

>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h       |    1 +
>>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>  arch/arm/include/asm/pgtable-3level.h |    1 +
>>  arch/arm/kvm/arm.c                    |    9 +++
>>  arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>>  5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>  }
>>  
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  #define kvm_pgd_addr_end(addr, end)					\
>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>  
>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>  
>>  /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>  				   const struct kvm_memory_slot *old,
>>  				   enum kvm_mr_change change)
>>  {
>> +#ifdef CONFIG_ARM
>> +	/*
>> +	 * At this point memslot has been committed and there is an
>> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
>> +	 * memory slot is write protected.
>> +	 */
>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>> +#endif
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 35254c6..7bfc792 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>>  	return false;
>>  }
>>  
>> +#ifdef CONFIG_ARM
>> +/**
>> + * stage2_wp_pte_range - write protect PTE range
>> + * @pmd:	pointer to pmd entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pte_t *pte;
>> +
>> +	pte = pte_offset_kernel(pmd, addr);
>> +	do {
>> +		if (!pte_none(*pte)) {
>> +			if (!kvm_s2pte_readonly(pte))
>> +				kvm_set_s2pte_readonly(pte);
>> +		}
>> +	} while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pmd_range - write protect PMD range
>> + * @pud:	pointer to pud entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pmd_t *pmd;
>> +	phys_addr_t next;
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +
>> +	do {
>> +		next = kvm_pmd_addr_end(addr, end);
>> +		if (!pmd_none(*pmd)) {
>> +			if (kvm_pmd_huge(*pmd)) {
>> +				if (!kvm_s2pmd_readonly(pmd))
>> +					kvm_set_s2pmd_readonly(pmd);
>> +			} else
>> +				stage2_wp_pte_range(pmd, addr, next);
> please use a closing brace when the first part of the if-statement is a
> multi-line block with braces, as per the CodingStyle.
Will fix.
>> +
> 
> stray blank line

Not sure how it got by checkpatch, will fix.
> 
>> +		}
>> +	} while (pmd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> +  * stage2_wp_pud_range - write protect PUD range
>> +  * @kvm:	pointer to kvm structure
>> +  * @pud:	pointer to pgd entry
>         pgd
>> +  * @addr:	range start address
>> +  * @end:	range end address
>> +  *
>> +  * While walking the PUD range huge PUD pages are ignored, in the future this
>                              range, huge PUDs are ignored.  In the future...
>> +  * may need to be revisited. Determine how to handle huge PUDs when logging
>> +  * of dirty pages is enabled.
> 
> I don't understand the last sentence?

Probably last two sentences should be combined.
".... to determine how to handle huge PUT...". Would that be
clear enough?

The overall theme is what to do about PUDs - mark all pages dirty
in the region, attempt to breakup such huge regions?

> 
>> +  */
>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>> +				phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pud_t *pud;
>> +	phys_addr_t next;
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	do {
>> +		next = kvm_pud_addr_end(addr, end);
>> +		/* TODO: huge PUD not supported, revisit later */
>> +		BUG_ON(pud_huge(*pud));
> 
> we should probably define kvm_pud_huge()

Yep will add it.

> 
>> +		if (!pud_none(*pud))
>> +			stage2_wp_pmd_range(pud, addr, next);
>> +	} while (pud++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_range() - write protect stage2 memory region range
>> + * @kvm:	The KVM pointer
>> + * @start:	Start address of range
>> + * &end:	End address of range
>> + */
>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pgd_t *pgd;
>> +	phys_addr_t next;
>> +
>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>> +	do {
>> +		/*
>> +		 * Release kvm_mmu_lock periodically if the memory region is
>> +		 * large features like detect hung task, lock detector or lock
>                    large.  Otherwise, we may see panics due to..
>> +		 * dep  may panic. In addition holding the lock this long will
>     extra white space ^^           Additionally, holding the lock for a
>     long timer will
>> +		 * also starve other vCPUs. Applies to huge VM memory regions.
>                                             ^^^ I don't understand this
> 					    last remark.
>> +		 */
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
>> +
>> +		next = kvm_pgd_addr_end(addr, end);
>> +		if (pgd_present(*pgd))
>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>> +	} while (pgd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
>> + * @kvm:	The KVM pointer
>> + * @slot:	The memory slot to write protect
>> + *
>> + * Called to start logging dirty pages after memory region
>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>> + * all present PMD and PTEs are write protected in the memory region.
>> + * Afterwards read of dirty page log can be called.
>> + *
>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>> + * serializing operations for VM memory regions.
>> + */
>> +
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>> +{
>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +	stage2_wp_range(kvm, start, end);
>> +	kvm_flush_remote_tlbs(kvm);
>> +	spin_unlock(&kvm->mmu_lock);
>> +}
>> +#endif
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot,
>>  			  unsigned long fault_status)
>> -- 
>> 1.7.9.5
>>
> 
> Besides the commenting and whitespace stuff, this is beginning to look
> good.

I'll clean it up for next iteration.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific
  2014-08-11 19:13     ` Christoffer Dall
@ 2014-08-12  0:24       ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  0:24 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:07PM -0700, Mario Smarduch wrote:
>> This patch adds support for keeping track of VM dirty pages. As dirty page log
>> is retrieved, the pages that have been written are write protected again for
>> next write and log read.
>>
>> The dirty log read function is generic for armv7 and x86, and arch specific
>> for arm64, ia64, mips, powerpc, s390.
> 
> So I would also split up this patch.  One that only modifies the
> existing functionality, but does not introduce any new functionality for
> ARM.  Put this first patch in the beginning of the patch series with the
> other prepatory patch, so that you get something like this:
> 
> [PATCH 1/X] KVM: Add architecture-specific TLB flush implementations
> [PATCH 2/X] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log
> [PATCH 3/X] arm: KVM: Add ARMv7 API to flush TLBs
> [PATCH 4/X] arm: KVM: Add initial dirty page locking infrastructure
> ...

Yes definitely, thanks for the advice makes the patch series easier to
review.

> 
> That will make it easier to get the patches accepted and for us to
> review...
> 
> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/kvm/arm.c                  |    8 +++-
>>  arch/arm/kvm/mmu.c                  |   22 +++++++++
>>  arch/arm64/include/asm/kvm_host.h   |    2 +
>>  arch/arm64/kvm/Kconfig              |    1 +
>>  arch/ia64/include/asm/kvm_host.h    |    1 +
>>  arch/ia64/kvm/Kconfig               |    1 +
>>  arch/ia64/kvm/kvm-ia64.c            |    2 +-
>>  arch/mips/include/asm/kvm_host.h    |    2 +-
>>  arch/mips/kvm/Kconfig               |    1 +
>>  arch/mips/kvm/kvm_mips.c            |    2 +-
>>  arch/powerpc/include/asm/kvm_host.h |    2 +
>>  arch/powerpc/kvm/Kconfig            |    1 +
>>  arch/powerpc/kvm/book3s.c           |    2 +-
>>  arch/powerpc/kvm/booke.c            |    2 +-
>>  arch/s390/include/asm/kvm_host.h    |    2 +
>>  arch/s390/kvm/Kconfig               |    1 +
>>  arch/s390/kvm/kvm-s390.c            |    2 +-
>>  arch/x86/kvm/x86.c                  |   86 ---------------------------------
>>  include/linux/kvm_host.h            |    3 ++
>>  virt/kvm/Kconfig                    |    3 ++
>>  virt/kvm/kvm_main.c                 |   90 +++++++++++++++++++++++++++++++++++
>>  21 files changed, 143 insertions(+), 93 deletions(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index e11c2dd..f7739a0 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>  	}
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +#ifdef CONFIG_ARM64
>> +/*
>> + * For now features not supported on ARM64, the #ifdef is added to make that
>> + * clear but not needed since ARM64 Kconfig selects function in generic code.
>> + */
> 
> I don't think this comment is needed, but if you really want it, it
> should be something like:
> 
> /*
>  * ARM64 does not support dirty logging and therefore selects
>  * CONFIG_HAVE_KVM_ARCH_DIRTY_LOG.  Provide a -EINVAL stub.
>  */

I think it could go since I'm doing arm64 now.

> 
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	return -EINVAL;
>>  }
>> +#endif
>>  
>>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>>  					struct kvm_arm_device_addr *dev_addr)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7bfc792..ca84331 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>>  	kvm_flush_remote_tlbs(kvm);
>>  	spin_unlock(&kvm->mmu_lock);
>>  }
>> +
>> +/**
>> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
>> + * @kvm:	The KVM pointer
>> + * @slot:	The memory slot associated with mask
>> + * @gfn_offset:	The gfn offset in memory slot
>> + * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
>> + *		slot to be write protected
>> + *
>> + * Walks bits set in mask write protects the associated pte's. Caller must
>> + * acquire kvm_mmu_lock.
>> + */
>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +		struct kvm_memory_slot *slot,
>> +		gfn_t gfn_offset, unsigned long mask)
>> +{
>> +	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
>> +	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>> +	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
> 
> __fls(x) + 1 is the same as fls(x)

For me the __fls(x) + 1 is easier to see the covered range. Unless
it really breaks the convention I'd prefer to keep the '+1'. Either
way no problem.

>> +
>> +	stage2_wp_range(kvm, start, end);
>> +}
>>  #endif
>>  
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 92242ce..b4a280b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -200,4 +200,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>>  		     hyp_stack_ptr, vector_ptr);
>>  }
>>  
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>> +
>>  #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index 8ba85e9..9e21a8a 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	select KVM_MMIO
>>  	select KVM_ARM_HOST
>>  	select KVM_ARM_VGIC
>> diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
>> index db95f57..d79f520 100644
>> --- a/arch/ia64/include/asm/kvm_host.h
>> +++ b/arch/ia64/include/asm/kvm_host.h
>> @@ -594,6 +594,7 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu);
>>  #define __KVM_HAVE_ARCH_VM_ALLOC 1
>>  struct kvm *kvm_arch_alloc_vm(void);
>>  void kvm_arch_free_vm(struct kvm *kvm);
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>>  
>>  #endif /* __ASSEMBLY__*/
>>  
>> diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
>> index 990b864..32dd6c8 100644
>> --- a/arch/ia64/kvm/Kconfig
>> +++ b/arch/ia64/kvm/Kconfig
>> @@ -24,6 +24,7 @@ config KVM
>>  	depends on BROKEN
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	select HAVE_KVM_IRQCHIP
>>  	select HAVE_KVM_IRQ_ROUTING
>>  	select KVM_APIC_ARCHITECTURE
>> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
>> index 6a4309b..3166df5 100644
>> --- a/arch/ia64/kvm/kvm-ia64.c
>> +++ b/arch/ia64/kvm/kvm-ia64.c
>> @@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
>>  	spin_unlock(&kvm->arch.dirty_log_lock);
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>>  		struct kvm_dirty_log *log)
>>  {
>>  	int r;
>> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
>> index 060aaa6..f7e2262 100644
>> --- a/arch/mips/include/asm/kvm_host.h
>> +++ b/arch/mips/include/asm/kvm_host.h
>> @@ -649,6 +649,6 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
>>  extern void mips32_SyncICache(unsigned long addr, unsigned long size);
>>  extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
>>  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
>> -
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>>  
>>  #endif /* __MIPS_KVM_HOST_H__ */
>> diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
>> index 30e334e..b57f49e 100644
>> --- a/arch/mips/kvm/Kconfig
>> +++ b/arch/mips/kvm/Kconfig
>> @@ -20,6 +20,7 @@ config KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>>  	select KVM_MMIO
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	---help---
>>  	  Support for hosting Guest kernels.
>>  	  Currently supported on MIPS32 processors.
>> diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
>> index da5186f..f9a1e62 100644
>> --- a/arch/mips/kvm/kvm_mips.c
>> +++ b/arch/mips/kvm/kvm_mips.c
>> @@ -790,7 +790,7 @@ out:
>>  /*
>>   * Get (and clear) the dirty memory log for a memory slot.
>>   */
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	struct kvm_memory_slot *memslot;
>>  	unsigned long ga, ga_end;
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>> index 1eaea2d..fb31595 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -676,4 +676,6 @@ struct kvm_vcpu_arch {
>>  #define __KVM_HAVE_ARCH_WQP
>>  #define __KVM_HAVE_CREATE_DEVICE
>>  
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>> +
>>  #endif /* __POWERPC_KVM_HOST_H__ */
>> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
>> index 141b202..c1fa061 100644
>> --- a/arch/powerpc/kvm/Kconfig
>> +++ b/arch/powerpc/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>>  	select HAVE_KVM_EVENTFD
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  
>>  config KVM_BOOK3S_HANDLER
>>  	bool
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index 94e597e..3835936 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -781,7 +781,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
>>  	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
>>  }
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index ab62109..50dd33d 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -1624,7 +1624,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>>  	return r;
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	return -ENOTSUPP;
>>  }
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 0d45f6f..8afbe12 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -422,6 +422,7 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>>  }
>>  
>>  #define ASYNC_PF_PER_VCPU	64
>> +struct kvm;
>>  struct kvm_vcpu;
>>  struct kvm_async_pf;
>>  struct kvm_arch_async_pf {
>> @@ -441,4 +442,5 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>>  
>>  extern int sie64a(struct kvm_s390_sie_block *, u64 *);
>>  extern char sie_exit;
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>>  #endif
>> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
>> index 10d529a..3ba07a7 100644
>> --- a/arch/s390/kvm/Kconfig
>> +++ b/arch/s390/kvm/Kconfig
>> @@ -21,6 +21,7 @@ config KVM
>>  	depends on HAVE_KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>>  	select HAVE_KVM_EVENTFD
>>  	select KVM_ASYNC_PF
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index b32c42c..95164e7 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -207,7 +207,7 @@ static void kvm_s390_sync_dirty_log(struct kvm *kvm,
>>  /*
>>   * Get (and clear) the dirty memory log for a memory slot.
>>   */
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>>  			       struct kvm_dirty_log *log)
>>  {
>>  	int r;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c5582c3..a603ca3 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>>  	return 0;
>>  }
>>  
>> -/**
>> - * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
>> - * @kvm: kvm instance
>> - * @log: slot id and address to which we copy the log
>> - *
>> - * We need to keep it in mind that VCPU threads can write to the bitmap
>> - * concurrently.  So, to avoid losing data, we keep the following order for
>> - * each bit:
>> - *
>> - *   1. Take a snapshot of the bit and clear it if needed.
>> - *   2. Write protect the corresponding page.
>> - *   3. Flush TLB's if needed.
>> - *   4. Copy the snapshot to the userspace.
>> - *
>> - * Between 2 and 3, the guest may write to the page using the remaining TLB
>> - * entry.  This is not a problem because the page will be reported dirty at
>> - * step 4 using the snapshot taken before and step 3 ensures that successive
>> - * writes will be logged for the next call.
>> - */
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> -{
>> -	int r;
>> -	struct kvm_memory_slot *memslot;
>> -	unsigned long n, i;
>> -	unsigned long *dirty_bitmap;
>> -	unsigned long *dirty_bitmap_buffer;
>> -	bool is_dirty = false;
>> -
>> -	mutex_lock(&kvm->slots_lock);
>> -
>> -	r = -EINVAL;
>> -	if (log->slot >= KVM_USER_MEM_SLOTS)
>> -		goto out;
>> -
>> -	memslot = id_to_memslot(kvm->memslots, log->slot);
>> -
>> -	dirty_bitmap = memslot->dirty_bitmap;
>> -	r = -ENOENT;
>> -	if (!dirty_bitmap)
>> -		goto out;
>> -
>> -	n = kvm_dirty_bitmap_bytes(memslot);
>> -
>> -	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
>> -	memset(dirty_bitmap_buffer, 0, n);
>> -
>> -	spin_lock(&kvm->mmu_lock);
>> -
>> -	for (i = 0; i < n / sizeof(long); i++) {
>> -		unsigned long mask;
>> -		gfn_t offset;
>> -
>> -		if (!dirty_bitmap[i])
>> -			continue;
>> -
>> -		is_dirty = true;
>> -
>> -		mask = xchg(&dirty_bitmap[i], 0);
>> -		dirty_bitmap_buffer[i] = mask;
>> -
>> -		offset = i * BITS_PER_LONG;
>> -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
>> -	}
>> -
>> -	spin_unlock(&kvm->mmu_lock);
>> -
>> -	/* See the comments in kvm_mmu_slot_remove_write_access(). */
>> -	lockdep_assert_held(&kvm->slots_lock);
>> -
>> -	/*
>> -	 * All the TLBs can be flushed out of mmu lock, see the comments in
>> -	 * kvm_mmu_slot_remove_write_access().
>> -	 */
>> -	if (is_dirty)
>> -		kvm_flush_remote_tlbs(kvm);
>> -
>> -	r = -EFAULT;
>> -	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>> -		goto out;
>> -
>> -	r = 0;
>> -out:
>> -	mutex_unlock(&kvm->slots_lock);
>> -	return r;
>> -}
>> -
>>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
>>  			bool line_status)
>>  {
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 820fc2e..2f3822b 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -606,6 +606,9 @@ int kvm_get_dirty_log(struct kvm *kvm,
>>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>>  				struct kvm_dirty_log *log);
>>  
>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +	struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask);
>> +
>>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
>>  			bool line_status);
>>  long kvm_arch_vm_ioctl(struct file *filp,
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index f1efaa5..975733f 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -37,3 +37,6 @@ config KVM_VFIO
>>  
>>  config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>         bool
>> +
>> +config HAVE_KVM_ARCH_DIRTY_LOG
>> +       bool
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 258f3d9..51b90ca 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -442,6 +442,96 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
>>  
>>  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
>>  
>> +/**
>> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
>> + * @kvm: kvm instance
>> + * @log: slot id and address to which we copy the log
>> + *
>> + * We need to keep it in mind that VCPU threads can write to the bitmap
>> + * concurrently.  So, to avoid losing data, we keep the following order for
>> + * each bit:
>> + *
>> + *   1. Take a snapshot of the bit and clear it if needed.
>> + *   2. Write protect the corresponding page.
>> + *   3. Flush TLB's if needed.
>> + *   4. Copy the snapshot to the userspace.
>> + *
>> + * Between 2 and 3, the guest may write to the page using the remaining TLB
>> + * entry.  This is not a problem because the page will be reported dirty at
>> + * step 4 using the snapshot taken before and step 3 ensures that successive
>> + * writes will be logged for the next call.
>> + */
>> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +{
>> +#ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
>> +	return kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
>> +#else
>> +	int r;
>> +	struct kvm_memory_slot *memslot;
>> +	unsigned long n, i;
>> +	unsigned long *dirty_bitmap;
>> +	unsigned long *dirty_bitmap_buffer;
>> +	bool is_dirty = false;
>> +
>> +	mutex_lock(&kvm->slots_lock);
>> +
>> +	r = -EINVAL;
>> +	if (log->slot >= KVM_USER_MEM_SLOTS)
>> +		goto out;
>> +
>> +	memslot = id_to_memslot(kvm->memslots, log->slot);
>> +
>> +	dirty_bitmap = memslot->dirty_bitmap;
>> +	r = -ENOENT;
>> +	if (!dirty_bitmap)
>> +		goto out;
>> +
>> +	n = kvm_dirty_bitmap_bytes(memslot);
>> +
>> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
>> +	memset(dirty_bitmap_buffer, 0, n);
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +
>> +	for (i = 0; i < n / sizeof(long); i++) {
>> +		unsigned long mask;
>> +		gfn_t offset;
>> +
>> +		if (!dirty_bitmap[i])
>> +			continue;
>> +
>> +		is_dirty = true;
>> +
>> +		mask = xchg(&dirty_bitmap[i], 0);
>> +		dirty_bitmap_buffer[i] = mask;
>> +
>> +		offset = i * BITS_PER_LONG;
>> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
>> +	}
>> +
>> +	spin_unlock(&kvm->mmu_lock);
>> +
>> +	/* See the comments in kvm_mmu_slot_remove_write_access(). */
>> +	lockdep_assert_held(&kvm->slots_lock);
>> +
>> +	/*
>> +	 * All the TLBs can be flushed out of mmu lock, see the comments in
>> +	 * kvm_mmu_slot_remove_write_access().
>> +	 */
>> +	if (is_dirty)
>> +		kvm_flush_remote_tlbs(kvm);
>> +
>> +	r = -EFAULT;
>> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>> +		goto out;
>> +
>> +	r = 0;
>> +out:
>> +	mutex_unlock(&kvm->slots_lock);
>> +	return r;
>> +#endif
>> +}
>> +
>>  static void kvm_init_memslots_id(struct kvm *kvm)
>>  {
>>  	int i;
>> -- 
>> 1.7.9.5
>>
> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific
@ 2014-08-12  0:24       ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  0:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:07PM -0700, Mario Smarduch wrote:
>> This patch adds support for keeping track of VM dirty pages. As dirty page log
>> is retrieved, the pages that have been written are write protected again for
>> next write and log read.
>>
>> The dirty log read function is generic for armv7 and x86, and arch specific
>> for arm64, ia64, mips, powerpc, s390.
> 
> So I would also split up this patch.  One that only modifies the
> existing functionality, but does not introduce any new functionality for
> ARM.  Put this first patch in the beginning of the patch series with the
> other prepatory patch, so that you get something like this:
> 
> [PATCH 1/X] KVM: Add architecture-specific TLB flush implementations
> [PATCH 2/X] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log
> [PATCH 3/X] arm: KVM: Add ARMv7 API to flush TLBs
> [PATCH 4/X] arm: KVM: Add initial dirty page locking infrastructure
> ...

Yes definitely, thanks for the advice makes the patch series easier to
review.

> 
> That will make it easier to get the patches accepted and for us to
> review...
> 
> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/kvm/arm.c                  |    8 +++-
>>  arch/arm/kvm/mmu.c                  |   22 +++++++++
>>  arch/arm64/include/asm/kvm_host.h   |    2 +
>>  arch/arm64/kvm/Kconfig              |    1 +
>>  arch/ia64/include/asm/kvm_host.h    |    1 +
>>  arch/ia64/kvm/Kconfig               |    1 +
>>  arch/ia64/kvm/kvm-ia64.c            |    2 +-
>>  arch/mips/include/asm/kvm_host.h    |    2 +-
>>  arch/mips/kvm/Kconfig               |    1 +
>>  arch/mips/kvm/kvm_mips.c            |    2 +-
>>  arch/powerpc/include/asm/kvm_host.h |    2 +
>>  arch/powerpc/kvm/Kconfig            |    1 +
>>  arch/powerpc/kvm/book3s.c           |    2 +-
>>  arch/powerpc/kvm/booke.c            |    2 +-
>>  arch/s390/include/asm/kvm_host.h    |    2 +
>>  arch/s390/kvm/Kconfig               |    1 +
>>  arch/s390/kvm/kvm-s390.c            |    2 +-
>>  arch/x86/kvm/x86.c                  |   86 ---------------------------------
>>  include/linux/kvm_host.h            |    3 ++
>>  virt/kvm/Kconfig                    |    3 ++
>>  virt/kvm/kvm_main.c                 |   90 +++++++++++++++++++++++++++++++++++
>>  21 files changed, 143 insertions(+), 93 deletions(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index e11c2dd..f7739a0 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>  	}
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +#ifdef CONFIG_ARM64
>> +/*
>> + * For now features not supported on ARM64, the #ifdef is added to make that
>> + * clear but not needed since ARM64 Kconfig selects function in generic code.
>> + */
> 
> I don't think this comment is needed, but if you really want it, it
> should be something like:
> 
> /*
>  * ARM64 does not support dirty logging and therefore selects
>  * CONFIG_HAVE_KVM_ARCH_DIRTY_LOG.  Provide a -EINVAL stub.
>  */

I think it could go since I'm doing arm64 now.

> 
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	return -EINVAL;
>>  }
>> +#endif
>>  
>>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>>  					struct kvm_arm_device_addr *dev_addr)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7bfc792..ca84331 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>>  	kvm_flush_remote_tlbs(kvm);
>>  	spin_unlock(&kvm->mmu_lock);
>>  }
>> +
>> +/**
>> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in mask
>> + * @kvm:	The KVM pointer
>> + * @slot:	The memory slot associated with mask
>> + * @gfn_offset:	The gfn offset in memory slot
>> + * @mask:	The mask of dirty pages at offset 'gfn_offset' in this memory
>> + *		slot to be write protected
>> + *
>> + * Walks bits set in mask write protects the associated pte's. Caller must
>> + * acquire kvm_mmu_lock.
>> + */
>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +		struct kvm_memory_slot *slot,
>> +		gfn_t gfn_offset, unsigned long mask)
>> +{
>> +	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
>> +	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>> +	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
> 
> __fls(x) + 1 is the same as fls(x)

For me the __fls(x) + 1 is easier to see the covered range. Unless
it really breaks the convention I'd prefer to keep the '+1'. Either
way no problem.

>> +
>> +	stage2_wp_range(kvm, start, end);
>> +}
>>  #endif
>>  
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 92242ce..b4a280b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -200,4 +200,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
>>  		     hyp_stack_ptr, vector_ptr);
>>  }
>>  
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>> +
>>  #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index 8ba85e9..9e21a8a 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	select KVM_MMIO
>>  	select KVM_ARM_HOST
>>  	select KVM_ARM_VGIC
>> diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
>> index db95f57..d79f520 100644
>> --- a/arch/ia64/include/asm/kvm_host.h
>> +++ b/arch/ia64/include/asm/kvm_host.h
>> @@ -594,6 +594,7 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu);
>>  #define __KVM_HAVE_ARCH_VM_ALLOC 1
>>  struct kvm *kvm_arch_alloc_vm(void);
>>  void kvm_arch_free_vm(struct kvm *kvm);
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>>  
>>  #endif /* __ASSEMBLY__*/
>>  
>> diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
>> index 990b864..32dd6c8 100644
>> --- a/arch/ia64/kvm/Kconfig
>> +++ b/arch/ia64/kvm/Kconfig
>> @@ -24,6 +24,7 @@ config KVM
>>  	depends on BROKEN
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	select HAVE_KVM_IRQCHIP
>>  	select HAVE_KVM_IRQ_ROUTING
>>  	select KVM_APIC_ARCHITECTURE
>> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
>> index 6a4309b..3166df5 100644
>> --- a/arch/ia64/kvm/kvm-ia64.c
>> +++ b/arch/ia64/kvm/kvm-ia64.c
>> @@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
>>  	spin_unlock(&kvm->arch.dirty_log_lock);
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>>  		struct kvm_dirty_log *log)
>>  {
>>  	int r;
>> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
>> index 060aaa6..f7e2262 100644
>> --- a/arch/mips/include/asm/kvm_host.h
>> +++ b/arch/mips/include/asm/kvm_host.h
>> @@ -649,6 +649,6 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
>>  extern void mips32_SyncICache(unsigned long addr, unsigned long size);
>>  extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
>>  extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
>> -
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>>  
>>  #endif /* __MIPS_KVM_HOST_H__ */
>> diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
>> index 30e334e..b57f49e 100644
>> --- a/arch/mips/kvm/Kconfig
>> +++ b/arch/mips/kvm/Kconfig
>> @@ -20,6 +20,7 @@ config KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>>  	select KVM_MMIO
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	---help---
>>  	  Support for hosting Guest kernels.
>>  	  Currently supported on MIPS32 processors.
>> diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
>> index da5186f..f9a1e62 100644
>> --- a/arch/mips/kvm/kvm_mips.c
>> +++ b/arch/mips/kvm/kvm_mips.c
>> @@ -790,7 +790,7 @@ out:
>>  /*
>>   * Get (and clear) the dirty memory log for a memory slot.
>>   */
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	struct kvm_memory_slot *memslot;
>>  	unsigned long ga, ga_end;
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>> index 1eaea2d..fb31595 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -676,4 +676,6 @@ struct kvm_vcpu_arch {
>>  #define __KVM_HAVE_ARCH_WQP
>>  #define __KVM_HAVE_CREATE_DEVICE
>>  
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>> +
>>  #endif /* __POWERPC_KVM_HOST_H__ */
>> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
>> index 141b202..c1fa061 100644
>> --- a/arch/powerpc/kvm/Kconfig
>> +++ b/arch/powerpc/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>>  	select HAVE_KVM_EVENTFD
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  
>>  config KVM_BOOK3S_HANDLER
>>  	bool
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index 94e597e..3835936 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -781,7 +781,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
>>  	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
>>  }
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index ab62109..50dd33d 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -1624,7 +1624,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>>  	return r;
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>  {
>>  	return -ENOTSUPP;
>>  }
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 0d45f6f..8afbe12 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -422,6 +422,7 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>>  }
>>  
>>  #define ASYNC_PF_PER_VCPU	64
>> +struct kvm;
>>  struct kvm_vcpu;
>>  struct kvm_async_pf;
>>  struct kvm_arch_async_pf {
>> @@ -441,4 +442,5 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>>  
>>  extern int sie64a(struct kvm_s390_sie_block *, u64 *);
>>  extern char sie_exit;
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>>  #endif
>> diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
>> index 10d529a..3ba07a7 100644
>> --- a/arch/s390/kvm/Kconfig
>> +++ b/arch/s390/kvm/Kconfig
>> @@ -21,6 +21,7 @@ config KVM
>>  	depends on HAVE_KVM
>>  	select PREEMPT_NOTIFIERS
>>  	select ANON_INODES
>> +	select HAVE_KVM_ARCH_DIRTY_LOG
>>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>>  	select HAVE_KVM_EVENTFD
>>  	select KVM_ASYNC_PF
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index b32c42c..95164e7 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -207,7 +207,7 @@ static void kvm_s390_sync_dirty_log(struct kvm *kvm,
>>  /*
>>   * Get (and clear) the dirty memory log for a memory slot.
>>   */
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>>  			       struct kvm_dirty_log *log)
>>  {
>>  	int r;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c5582c3..a603ca3 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3569,92 +3569,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>>  	return 0;
>>  }
>>  
>> -/**
>> - * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
>> - * @kvm: kvm instance
>> - * @log: slot id and address to which we copy the log
>> - *
>> - * We need to keep it in mind that VCPU threads can write to the bitmap
>> - * concurrently.  So, to avoid losing data, we keep the following order for
>> - * each bit:
>> - *
>> - *   1. Take a snapshot of the bit and clear it if needed.
>> - *   2. Write protect the corresponding page.
>> - *   3. Flush TLB's if needed.
>> - *   4. Copy the snapshot to the userspace.
>> - *
>> - * Between 2 and 3, the guest may write to the page using the remaining TLB
>> - * entry.  This is not a problem because the page will be reported dirty at
>> - * step 4 using the snapshot taken before and step 3 ensures that successive
>> - * writes will be logged for the next call.
>> - */
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> -{
>> -	int r;
>> -	struct kvm_memory_slot *memslot;
>> -	unsigned long n, i;
>> -	unsigned long *dirty_bitmap;
>> -	unsigned long *dirty_bitmap_buffer;
>> -	bool is_dirty = false;
>> -
>> -	mutex_lock(&kvm->slots_lock);
>> -
>> -	r = -EINVAL;
>> -	if (log->slot >= KVM_USER_MEM_SLOTS)
>> -		goto out;
>> -
>> -	memslot = id_to_memslot(kvm->memslots, log->slot);
>> -
>> -	dirty_bitmap = memslot->dirty_bitmap;
>> -	r = -ENOENT;
>> -	if (!dirty_bitmap)
>> -		goto out;
>> -
>> -	n = kvm_dirty_bitmap_bytes(memslot);
>> -
>> -	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
>> -	memset(dirty_bitmap_buffer, 0, n);
>> -
>> -	spin_lock(&kvm->mmu_lock);
>> -
>> -	for (i = 0; i < n / sizeof(long); i++) {
>> -		unsigned long mask;
>> -		gfn_t offset;
>> -
>> -		if (!dirty_bitmap[i])
>> -			continue;
>> -
>> -		is_dirty = true;
>> -
>> -		mask = xchg(&dirty_bitmap[i], 0);
>> -		dirty_bitmap_buffer[i] = mask;
>> -
>> -		offset = i * BITS_PER_LONG;
>> -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
>> -	}
>> -
>> -	spin_unlock(&kvm->mmu_lock);
>> -
>> -	/* See the comments in kvm_mmu_slot_remove_write_access(). */
>> -	lockdep_assert_held(&kvm->slots_lock);
>> -
>> -	/*
>> -	 * All the TLBs can be flushed out of mmu lock, see the comments in
>> -	 * kvm_mmu_slot_remove_write_access().
>> -	 */
>> -	if (is_dirty)
>> -		kvm_flush_remote_tlbs(kvm);
>> -
>> -	r = -EFAULT;
>> -	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>> -		goto out;
>> -
>> -	r = 0;
>> -out:
>> -	mutex_unlock(&kvm->slots_lock);
>> -	return r;
>> -}
>> -
>>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
>>  			bool line_status)
>>  {
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 820fc2e..2f3822b 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -606,6 +606,9 @@ int kvm_get_dirty_log(struct kvm *kvm,
>>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>>  				struct kvm_dirty_log *log);
>>  
>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +	struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask);
>> +
>>  int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
>>  			bool line_status);
>>  long kvm_arch_vm_ioctl(struct file *filp,
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index f1efaa5..975733f 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -37,3 +37,6 @@ config KVM_VFIO
>>  
>>  config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>         bool
>> +
>> +config HAVE_KVM_ARCH_DIRTY_LOG
>> +       bool
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 258f3d9..51b90ca 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -442,6 +442,96 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
>>  
>>  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
>>  
>> +/**
>> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
>> + * @kvm: kvm instance
>> + * @log: slot id and address to which we copy the log
>> + *
>> + * We need to keep it in mind that VCPU threads can write to the bitmap
>> + * concurrently.  So, to avoid losing data, we keep the following order for
>> + * each bit:
>> + *
>> + *   1. Take a snapshot of the bit and clear it if needed.
>> + *   2. Write protect the corresponding page.
>> + *   3. Flush TLB's if needed.
>> + *   4. Copy the snapshot to the userspace.
>> + *
>> + * Between 2 and 3, the guest may write to the page using the remaining TLB
>> + * entry.  This is not a problem because the page will be reported dirty at
>> + * step 4 using the snapshot taken before and step 3 ensures that successive
>> + * writes will be logged for the next call.
>> + */
>> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +{
>> +#ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
>> +	return kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
>> +#else
>> +	int r;
>> +	struct kvm_memory_slot *memslot;
>> +	unsigned long n, i;
>> +	unsigned long *dirty_bitmap;
>> +	unsigned long *dirty_bitmap_buffer;
>> +	bool is_dirty = false;
>> +
>> +	mutex_lock(&kvm->slots_lock);
>> +
>> +	r = -EINVAL;
>> +	if (log->slot >= KVM_USER_MEM_SLOTS)
>> +		goto out;
>> +
>> +	memslot = id_to_memslot(kvm->memslots, log->slot);
>> +
>> +	dirty_bitmap = memslot->dirty_bitmap;
>> +	r = -ENOENT;
>> +	if (!dirty_bitmap)
>> +		goto out;
>> +
>> +	n = kvm_dirty_bitmap_bytes(memslot);
>> +
>> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
>> +	memset(dirty_bitmap_buffer, 0, n);
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +
>> +	for (i = 0; i < n / sizeof(long); i++) {
>> +		unsigned long mask;
>> +		gfn_t offset;
>> +
>> +		if (!dirty_bitmap[i])
>> +			continue;
>> +
>> +		is_dirty = true;
>> +
>> +		mask = xchg(&dirty_bitmap[i], 0);
>> +		dirty_bitmap_buffer[i] = mask;
>> +
>> +		offset = i * BITS_PER_LONG;
>> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
>> +	}
>> +
>> +	spin_unlock(&kvm->mmu_lock);
>> +
>> +	/* See the comments in kvm_mmu_slot_remove_write_access(). */
>> +	lockdep_assert_held(&kvm->slots_lock);
>> +
>> +	/*
>> +	 * All the TLBs can be flushed out of mmu lock, see the comments in
>> +	 * kvm_mmu_slot_remove_write_access().
>> +	 */
>> +	if (is_dirty)
>> +		kvm_flush_remote_tlbs(kvm);
>> +
>> +	r = -EFAULT;
>> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>> +		goto out;
>> +
>> +	r = 0;
>> +out:
>> +	mutex_unlock(&kvm->slots_lock);
>> +	return r;
>> +#endif
>> +}
>> +
>>  static void kvm_init_memslots_id(struct kvm *kvm)
>>  {
>>  	int i;
>> -- 
>> 1.7.9.5
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-11 19:13     ` Christoffer Dall
@ 2014-08-12  1:25       ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  1:25 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and dissolves huge pages to page tables.
>> In case migration is canceled huge pages will be used again.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
>>  1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index ca84331..a17812a 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>  }
>>  
>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
>> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
>> +			  bool logging_active)
>>  {
>>  	pmd_t *pmd;
>>  	pte_t *pte, old_pte;
>> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>  		return 0;
>>  	}
>>  
>> +	/*
>> +	 * While dirty memory logging, clear PMD entry for huge page and split
>> +	 * into smaller pages, to track dirty memory at page granularity.
>> +	 */
>> +	if (logging_active && kvm_pmd_huge(*pmd)) {
>> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
>> +		clear_pmd_entry(kvm, pmd, ipa);
> 
> clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
> definitely not the right thing to call.

I don't see that in 3.15rc1/rc4 -

static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
{
        if (kvm_pmd_huge(*pmd)) {
                pmd_clear(pmd);
                kvm_tlb_flush_vmid_ipa(kvm, addr);
        } else {
                  [....]
}

I thought the purpose of this function was to clear PMD entry. Also
ran hundreds of tests no problems. Hmmm confused.

> 
>> +	}
>> +
>>  	/* Create stage-2 page mappings - Level 2 */
>>  	if (pmd_none(*pmd)) {
>>  		if (!cache)
>> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>>  		if (ret)
>>  			goto out;
>>  		spin_lock(&kvm->mmu_lock);
>> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
>> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>>  		spin_unlock(&kvm->mmu_lock);
>>  		if (ret)
>>  			goto out;
>> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>  	struct vm_area_struct *vma;
>>  	pfn_t pfn;
>> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>> +	#ifdef CONFIG_ARM
>> +		bool logging_active = !!memslot->dirty_bitmap;
>> +	#else
>> +		bool logging_active = false;
>> +	#endif
> 
> can you make this an inline in the header files for now please?

Yes definitely.

> 
>>  
>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>  	if (fault_status == FSC_PERM && !write_fault) {
>> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
>>  	down_read(&current->mm->mmap_sem);
>>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
>> -	if (is_vm_hugetlb_page(vma)) {
>> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
>>  		hugetlb = true;
>>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>>  	} else {
>> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	spin_lock(&kvm->mmu_lock);
>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>  		goto out_unlock;
>> -	if (!hugetlb && !force_pte)
>> +	if (!hugetlb && !force_pte && !logging_active)
>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>  
>>  	if (hugetlb) {
>> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			kvm_set_pfn_dirty(pfn);
>>  		}
>>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
>> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
>> +					logging_active);
>>  	}
>>  
>> +	if (write_fault)
>> +		mark_page_dirty(kvm, gfn);
>>  
>>  out_unlock:
>>  	spin_unlock(&kvm->mmu_lock);
>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>  {
>>  	pte_t *pte = (pte_t *)data;
>>  
>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> 
> why is logging never active if we are called from MMU notifiers?

mmu notifiers update sptes, but I don't see how these updates
can result in guest dirty pages. Also guest pages are marked dirty
from 2nd stage page fault handlers (searching through the code).

> 
>>  }
>>  
>>  
>> -- 
>> 1.7.9.5
>>
> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-12  1:25       ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  1:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and dissolves huge pages to page tables.
>> In case migration is canceled huge pages will be used again.
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
>>  1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index ca84331..a17812a 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>  }
>>  
>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
>> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
>> +			  bool logging_active)
>>  {
>>  	pmd_t *pmd;
>>  	pte_t *pte, old_pte;
>> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>  		return 0;
>>  	}
>>  
>> +	/*
>> +	 * While dirty memory logging, clear PMD entry for huge page and split
>> +	 * into smaller pages, to track dirty memory at page granularity.
>> +	 */
>> +	if (logging_active && kvm_pmd_huge(*pmd)) {
>> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
>> +		clear_pmd_entry(kvm, pmd, ipa);
> 
> clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
> definitely not the right thing to call.

I don't see that in 3.15rc1/rc4 -

static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
{
        if (kvm_pmd_huge(*pmd)) {
                pmd_clear(pmd);
                kvm_tlb_flush_vmid_ipa(kvm, addr);
        } else {
                  [....]
}

I thought the purpose of this function was to clear PMD entry. Also
ran hundreds of tests no problems. Hmmm confused.

> 
>> +	}
>> +
>>  	/* Create stage-2 page mappings - Level 2 */
>>  	if (pmd_none(*pmd)) {
>>  		if (!cache)
>> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>>  		if (ret)
>>  			goto out;
>>  		spin_lock(&kvm->mmu_lock);
>> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
>> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>>  		spin_unlock(&kvm->mmu_lock);
>>  		if (ret)
>>  			goto out;
>> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>  	struct vm_area_struct *vma;
>>  	pfn_t pfn;
>> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>> +	#ifdef CONFIG_ARM
>> +		bool logging_active = !!memslot->dirty_bitmap;
>> +	#else
>> +		bool logging_active = false;
>> +	#endif
> 
> can you make this an inline in the header files for now please?

Yes definitely.

> 
>>  
>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>  	if (fault_status == FSC_PERM && !write_fault) {
>> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
>>  	down_read(&current->mm->mmap_sem);
>>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
>> -	if (is_vm_hugetlb_page(vma)) {
>> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
>>  		hugetlb = true;
>>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>>  	} else {
>> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	spin_lock(&kvm->mmu_lock);
>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>  		goto out_unlock;
>> -	if (!hugetlb && !force_pte)
>> +	if (!hugetlb && !force_pte && !logging_active)
>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>  
>>  	if (hugetlb) {
>> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			kvm_set_pfn_dirty(pfn);
>>  		}
>>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
>> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
>> +					logging_active);
>>  	}
>>  
>> +	if (write_fault)
>> +		mark_page_dirty(kvm, gfn);
>>  
>>  out_unlock:
>>  	spin_unlock(&kvm->mmu_lock);
>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>  {
>>  	pte_t *pte = (pte_t *)data;
>>  
>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> 
> why is logging never active if we are called from MMU notifiers?

mmu notifiers update sptes, but I don't see how these updates
can result in guest dirty pages. Also guest pages are marked dirty
from 2nd stage page fault handlers (searching through the code).

> 
>>  }
>>  
>>  
>> -- 
>> 1.7.9.5
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-08-11 19:12     ` Christoffer Dall
@ 2014-08-12  1:36       ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  1:36 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> Remove the parenthesis from the subject line.
> 
> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This patch series
>             ^^                                    ^
> stray whitespace                                 of
> 
> 
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> may be worth mentioning that this is always valid on ARMv7.
> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h       |    1 +
>>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>  arch/arm/include/asm/pgtable-3level.h |    1 +
>>  arch/arm/kvm/arm.c                    |    9 +++
>>  arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>>  5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>  }
>>  
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  #define kvm_pgd_addr_end(addr, end)					\
>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>  
>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>  
>>  /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>  				   const struct kvm_memory_slot *old,
>>  				   enum kvm_mr_change change)
>>  {
>> +#ifdef CONFIG_ARM
>> +	/*
>> +	 * At this point memslot has been committed and there is an
>> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
>> +	 * memory slot is write protected.
>> +	 */
>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>> +#endif
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 35254c6..7bfc792 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>>  	return false;
>>  }
>>  
>> +#ifdef CONFIG_ARM
>> +/**
>> + * stage2_wp_pte_range - write protect PTE range
>> + * @pmd:	pointer to pmd entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pte_t *pte;
>> +
>> +	pte = pte_offset_kernel(pmd, addr);
>> +	do {
>> +		if (!pte_none(*pte)) {
>> +			if (!kvm_s2pte_readonly(pte))
>> +				kvm_set_s2pte_readonly(pte);
>> +		}
>> +	} while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pmd_range - write protect PMD range
>> + * @pud:	pointer to pud entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pmd_t *pmd;
>> +	phys_addr_t next;
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +
>> +	do {
>> +		next = kvm_pmd_addr_end(addr, end);
>> +		if (!pmd_none(*pmd)) {
>> +			if (kvm_pmd_huge(*pmd)) {
>> +				if (!kvm_s2pmd_readonly(pmd))
>> +					kvm_set_s2pmd_readonly(pmd);
>> +			} else
>> +				stage2_wp_pte_range(pmd, addr, next);
> please use a closing brace when the first part of the if-statement is a
> multi-line block with braces, as per the CodingStyle.
>> +
> 
> stray blank line
> 
>> +		}
>> +	} while (pmd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> +  * stage2_wp_pud_range - write protect PUD range
>> +  * @kvm:	pointer to kvm structure
>> +  * @pud:	pointer to pgd entry
>         pgd
>> +  * @addr:	range start address
>> +  * @end:	range end address
>> +  *
>> +  * While walking the PUD range huge PUD pages are ignored, in the future this
>                              range, huge PUDs are ignored.  In the future...
>> +  * may need to be revisited. Determine how to handle huge PUDs when logging
>> +  * of dirty pages is enabled.
> 
> I don't understand the last sentence?
> 
>> +  */
>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>> +				phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pud_t *pud;
>> +	phys_addr_t next;
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	do {
>> +		next = kvm_pud_addr_end(addr, end);
>> +		/* TODO: huge PUD not supported, revisit later */
>> +		BUG_ON(pud_huge(*pud));
> 
> we should probably define kvm_pud_huge()
> 
>> +		if (!pud_none(*pud))
>> +			stage2_wp_pmd_range(pud, addr, next);
>> +	} while (pud++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_range() - write protect stage2 memory region range
>> + * @kvm:	The KVM pointer
>> + * @start:	Start address of range
>> + * &end:	End address of range
>> + */
>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pgd_t *pgd;
>> +	phys_addr_t next;
>> +
>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>> +	do {
>> +		/*
>> +		 * Release kvm_mmu_lock periodically if the memory region is
>> +		 * large features like detect hung task, lock detector or lock
>                    large.  Otherwise, we may see panics due to..
>> +		 * dep  may panic. In addition holding the lock this long will
>     extra white space ^^           Additionally, holding the lock for a
>     long timer will
>> +		 * also starve other vCPUs. Applies to huge VM memory regions.
>                                             ^^^ I don't understand this
> 					    last remark.
Sorry overlooked this.

While testing - VM regions that were small (~1GB) holding the mmu_lock
caused not problems, but when I was running memory regions around 2GB large
some kernel lockup detection/lock contention options (some selected by default) 
caused deadlock warnings/panics in host kernel.

This was in one my previous review comments sometime ago, I can go back
and find the options.

>> +		 */
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
>> +
>> +		next = kvm_pgd_addr_end(addr, end);
>> +		if (pgd_present(*pgd))
>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>> +	} while (pgd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
>> + * @kvm:	The KVM pointer
>> + * @slot:	The memory slot to write protect
>> + *
>> + * Called to start logging dirty pages after memory region
>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>> + * all present PMD and PTEs are write protected in the memory region.
>> + * Afterwards read of dirty page log can be called.
>> + *
>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>> + * serializing operations for VM memory regions.
>> + */
>> +
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>> +{
>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +	stage2_wp_range(kvm, start, end);
>> +	kvm_flush_remote_tlbs(kvm);
>> +	spin_unlock(&kvm->mmu_lock);
>> +}
>> +#endif
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot,
>>  			  unsigned long fault_status)
>> -- 
>> 1.7.9.5
>>
> 
> Besides the commenting and whitespace stuff, this is beginning to look
> good.
> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-08-12  1:36       ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12  1:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> Remove the parenthesis from the subject line.
> 
> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This patch series
>             ^^                                    ^
> stray whitespace                                 of
> 
> 
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> may be worth mentioning that this is always valid on ARMv7.
> 
>>
>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h       |    1 +
>>  arch/arm/include/asm/kvm_mmu.h        |   20 ++++++
>>  arch/arm/include/asm/pgtable-3level.h |    1 +
>>  arch/arm/kvm/arm.c                    |    9 +++
>>  arch/arm/kvm/mmu.c                    |  128 +++++++++++++++++++++++++++++++++
>>  5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>  	pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>  }
>>  
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  #define kvm_pgd_addr_end(addr, end)					\
>>  ({	u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;		\
>> diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>  #define L_PTE_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PTE_S2_RDWR			(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>  
>> +#define L_PMD_S2_RDONLY			(_AT(pteval_t, 1) << 6)   /* HAP[1]   */
>>  #define L_PMD_S2_RDWR			(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>  
>>  /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>  				   const struct kvm_memory_slot *old,
>>  				   enum kvm_mr_change change)
>>  {
>> +#ifdef CONFIG_ARM
>> +	/*
>> +	 * At this point memslot has been committed and there is an
>> +	 * allocated dirty_bitmap[], dirty pages will be be tracked while the
>> +	 * memory slot is write protected.
>> +	 */
>> +	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> +		kvm_mmu_wp_memory_region(kvm, mem->slot);
>> +#endif
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 35254c6..7bfc792 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
>>  	return false;
>>  }
>>  
>> +#ifdef CONFIG_ARM
>> +/**
>> + * stage2_wp_pte_range - write protect PTE range
>> + * @pmd:	pointer to pmd entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pte_t *pte;
>> +
>> +	pte = pte_offset_kernel(pmd, addr);
>> +	do {
>> +		if (!pte_none(*pte)) {
>> +			if (!kvm_s2pte_readonly(pte))
>> +				kvm_set_s2pte_readonly(pte);
>> +		}
>> +	} while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pmd_range - write protect PMD range
>> + * @pud:	pointer to pud entry
>> + * @addr:	range start address
>> + * @end:	range end address
>> + */
>> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pmd_t *pmd;
>> +	phys_addr_t next;
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +
>> +	do {
>> +		next = kvm_pmd_addr_end(addr, end);
>> +		if (!pmd_none(*pmd)) {
>> +			if (kvm_pmd_huge(*pmd)) {
>> +				if (!kvm_s2pmd_readonly(pmd))
>> +					kvm_set_s2pmd_readonly(pmd);
>> +			} else
>> +				stage2_wp_pte_range(pmd, addr, next);
> please use a closing brace when the first part of the if-statement is a
> multi-line block with braces, as per the CodingStyle.
>> +
> 
> stray blank line
> 
>> +		}
>> +	} while (pmd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> +  * stage2_wp_pud_range - write protect PUD range
>> +  * @kvm:	pointer to kvm structure
>> +  * @pud:	pointer to pgd entry
>         pgd
>> +  * @addr:	range start address
>> +  * @end:	range end address
>> +  *
>> +  * While walking the PUD range huge PUD pages are ignored, in the future this
>                              range, huge PUDs are ignored.  In the future...
>> +  * may need to be revisited. Determine how to handle huge PUDs when logging
>> +  * of dirty pages is enabled.
> 
> I don't understand the last sentence?
> 
>> +  */
>> +static void  stage2_wp_pud_range(struct kvm *kvm, pgd_t *pgd,
>> +				phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pud_t *pud;
>> +	phys_addr_t next;
>> +
>> +	pud = pud_offset(pgd, addr);
>> +	do {
>> +		next = kvm_pud_addr_end(addr, end);
>> +		/* TODO: huge PUD not supported, revisit later */
>> +		BUG_ON(pud_huge(*pud));
> 
> we should probably define kvm_pud_huge()
> 
>> +		if (!pud_none(*pud))
>> +			stage2_wp_pmd_range(pud, addr, next);
>> +	} while (pud++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_range() - write protect stage2 memory region range
>> + * @kvm:	The KVM pointer
>> + * @start:	Start address of range
>> + * &end:	End address of range
>> + */
>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>> +{
>> +	pgd_t *pgd;
>> +	phys_addr_t next;
>> +
>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>> +	do {
>> +		/*
>> +		 * Release kvm_mmu_lock periodically if the memory region is
>> +		 * large features like detect hung task, lock detector or lock
>                    large.  Otherwise, we may see panics due to..
>> +		 * dep  may panic. In addition holding the lock this long will
>     extra white space ^^           Additionally, holding the lock for a
>     long timer will
>> +		 * also starve other vCPUs. Applies to huge VM memory regions.
>                                             ^^^ I don't understand this
> 					    last remark.
Sorry overlooked this.

While testing - VM regions that were small (~1GB) holding the mmu_lock
caused not problems, but when I was running memory regions around 2GB large
some kernel lockup detection/lock contention options (some selected by default) 
caused deadlock warnings/panics in host kernel.

This was in one my previous review comments sometime ago, I can go back
and find the options.

>> +		 */
>> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>> +			cond_resched_lock(&kvm->mmu_lock);
>> +
>> +		next = kvm_pgd_addr_end(addr, end);
>> +		if (pgd_present(*pgd))
>> +			stage2_wp_pud_range(kvm, pgd, addr, next);
>> +	} while (pgd++, addr = next, addr != end);
>> +}
>> +
>> +/**
>> + * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
>> + * @kvm:	The KVM pointer
>> + * @slot:	The memory slot to write protect
>> + *
>> + * Called to start logging dirty pages after memory region
>> + * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
>> + * all present PMD and PTEs are write protected in the memory region.
>> + * Afterwards read of dirty page log can be called.
>> + *
>> + * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
>> + * serializing operations for VM memory regions.
>> + */
>> +
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>> +{
>> +	struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
>> +	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
>> +	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>> +
>> +	spin_lock(&kvm->mmu_lock);
>> +	stage2_wp_range(kvm, start, end);
>> +	kvm_flush_remote_tlbs(kvm);
>> +	spin_unlock(&kvm->mmu_lock);
>> +}
>> +#endif
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot,
>>  			  unsigned long fault_status)
>> -- 
>> 1.7.9.5
>>
> 
> Besides the commenting and whitespace stuff, this is beginning to look
> good.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-08-12  0:16       ` Mario Smarduch
@ 2014-08-12  9:32         ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-12  9:32 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Mon, Aug 11, 2014 at 05:16:21PM -0700, Mario Smarduch wrote:
> On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> > Remove the parenthesis from the subject line.
> 

I just prefer not having the "(w/no huge PUD support)" part in the patch
title.

> Hmmm have to check this don't see it my patch file.
> > 
> > On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
> >> Patch adds  support for initial write protection VM memlsot. This patch series
> >             ^^                                    ^
> > stray whitespace                                 of
> > 
> Need to watch out for these adds delays to review cycle.

yes, I care quite a lot about proper English, syntax, grammar and
spelling.  Reading critically through your own patch files before
mailing them out is a good exercise.  You can even consider putting them
through a spell-checker and/or configure your editor to mark double
white space, trailing white space etc.

[...]

> >> +	do {
> >> +		next = kvm_pmd_addr_end(addr, end);
> >> +		if (!pmd_none(*pmd)) {
> >> +			if (kvm_pmd_huge(*pmd)) {
> >> +				if (!kvm_s2pmd_readonly(pmd))
> >> +					kvm_set_s2pmd_readonly(pmd);
> >> +			} else
> >> +				stage2_wp_pte_range(pmd, addr, next);
> > please use a closing brace when the first part of the if-statement is a
> > multi-line block with braces, as per the CodingStyle.
> Will fix.
> >> +
> > 
> > stray blank line
> 
> Not sure how it got by checkpatch, will fix.

Not sure checkpatch will complain, but I do ;)  No big deal anyway.

> > 
> >> +		}
> >> +	} while (pmd++, addr = next, addr != end);
> >> +}
> >> +
> >> +/**
> >> +  * stage2_wp_pud_range - write protect PUD range
> >> +  * @kvm:	pointer to kvm structure
> >> +  * @pud:	pointer to pgd entry
> >         pgd
> >> +  * @addr:	range start address
> >> +  * @end:	range end address
> >> +  *
> >> +  * While walking the PUD range huge PUD pages are ignored, in the future this
> >                              range, huge PUDs are ignored.  In the future...
> >> +  * may need to be revisited. Determine how to handle huge PUDs when logging
> >> +  * of dirty pages is enabled.
> > 
> > I don't understand the last sentence?
> 
> Probably last two sentences should be combined.
> ".... to determine how to handle huge PUT...". Would that be
> clear enough?
> 
> The overall theme is what to do about PUDs - mark all pages dirty
> in the region, attempt to breakup such huge regions?
> 

I think you should just state that this is not supported and worry
about how to deal with it when it's properly supported.  The TODO below
is sufficient, so just drop all mentionings about the future in the
function description above - it's likely to be forgotten when PUDs are
in fact support anyhow.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-08-12  9:32         ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-12  9:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 11, 2014 at 05:16:21PM -0700, Mario Smarduch wrote:
> On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> > Remove the parenthesis from the subject line.
> 

I just prefer not having the "(w/no huge PUD support)" part in the patch
title.

> Hmmm have to check this don't see it my patch file.
> > 
> > On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
> >> Patch adds  support for initial write protection VM memlsot. This patch series
> >             ^^                                    ^
> > stray whitespace                                 of
> > 
> Need to watch out for these adds delays to review cycle.

yes, I care quite a lot about proper English, syntax, grammar and
spelling.  Reading critically through your own patch files before
mailing them out is a good exercise.  You can even consider putting them
through a spell-checker and/or configure your editor to mark double
white space, trailing white space etc.

[...]

> >> +	do {
> >> +		next = kvm_pmd_addr_end(addr, end);
> >> +		if (!pmd_none(*pmd)) {
> >> +			if (kvm_pmd_huge(*pmd)) {
> >> +				if (!kvm_s2pmd_readonly(pmd))
> >> +					kvm_set_s2pmd_readonly(pmd);
> >> +			} else
> >> +				stage2_wp_pte_range(pmd, addr, next);
> > please use a closing brace when the first part of the if-statement is a
> > multi-line block with braces, as per the CodingStyle.
> Will fix.
> >> +
> > 
> > stray blank line
> 
> Not sure how it got by checkpatch, will fix.

Not sure checkpatch will complain, but I do ;)  No big deal anyway.

> > 
> >> +		}
> >> +	} while (pmd++, addr = next, addr != end);
> >> +}
> >> +
> >> +/**
> >> +  * stage2_wp_pud_range - write protect PUD range
> >> +  * @kvm:	pointer to kvm structure
> >> +  * @pud:	pointer to pgd entry
> >         pgd
> >> +  * @addr:	range start address
> >> +  * @end:	range end address
> >> +  *
> >> +  * While walking the PUD range huge PUD pages are ignored, in the future this
> >                              range, huge PUDs are ignored.  In the future...
> >> +  * may need to be revisited. Determine how to handle huge PUDs when logging
> >> +  * of dirty pages is enabled.
> > 
> > I don't understand the last sentence?
> 
> Probably last two sentences should be combined.
> ".... to determine how to handle huge PUT...". Would that be
> clear enough?
> 
> The overall theme is what to do about PUDs - mark all pages dirty
> in the region, attempt to breakup such huge regions?
> 

I think you should just state that this is not supported and worry
about how to deal with it when it's properly supported.  The TODO below
is sufficient, so just drop all mentionings about the future in the
function description above - it's likely to be forgotten when PUDs are
in fact support anyhow.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-08-12  1:36       ` Mario Smarduch
@ 2014-08-12  9:36         ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-12  9:36 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Mon, Aug 11, 2014 at 06:36:14PM -0700, Mario Smarduch wrote:
> On 08/11/2014 12:12 PM, Christoffer Dall wrote:

[...]

> >> +/**
> >> + * stage2_wp_range() - write protect stage2 memory region range
> >> + * @kvm:	The KVM pointer
> >> + * @start:	Start address of range
> >> + * &end:	End address of range
> >> + */
> >> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> >> +{
> >> +	pgd_t *pgd;
> >> +	phys_addr_t next;
> >> +
> >> +	pgd = kvm->arch.pgd + pgd_index(addr);
> >> +	do {
> >> +		/*
> >> +		 * Release kvm_mmu_lock periodically if the memory region is
> >> +		 * large features like detect hung task, lock detector or lock
> >                    large.  Otherwise, we may see panics due to..
> >> +		 * dep  may panic. In addition holding the lock this long will
> >     extra white space ^^           Additionally, holding the lock for a
> >     long timer will
> >> +		 * also starve other vCPUs. Applies to huge VM memory regions.
> >                                             ^^^ I don't understand this
> > 					    last remark.
> Sorry overlooked this.
> 
> While testing - VM regions that were small (~1GB) holding the mmu_lock
> caused not problems, but when I was running memory regions around 2GB large
> some kernel lockup detection/lock contention options (some selected by default) 
> caused deadlock warnings/panics in host kernel.
> 
> This was in one my previous review comments sometime ago, I can go back
> and find the options.
> 

Just drop the last part of the comment, so the whole thing reads:

/*
 * Release kvm_mmu_lock periodically if the memory region is
 * large. Otherwise, we may see kernel panics from debugging features
 * such as "detect hung task", "lock detector" or "lock dep checks".
 * Additionally, holding the lock too long will also starve other vCPUs.
 */

And check the actual names of those debugging features or use the
CONFIG_<WHATEVER> names and say "we may see kernel panics with CONFIG_X,
CONFIG_Y, and CONFIG_Z.

Makes sense?

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-08-12  9:36         ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-12  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 11, 2014 at 06:36:14PM -0700, Mario Smarduch wrote:
> On 08/11/2014 12:12 PM, Christoffer Dall wrote:

[...]

> >> +/**
> >> + * stage2_wp_range() - write protect stage2 memory region range
> >> + * @kvm:	The KVM pointer
> >> + * @start:	Start address of range
> >> + * &end:	End address of range
> >> + */
> >> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> >> +{
> >> +	pgd_t *pgd;
> >> +	phys_addr_t next;
> >> +
> >> +	pgd = kvm->arch.pgd + pgd_index(addr);
> >> +	do {
> >> +		/*
> >> +		 * Release kvm_mmu_lock periodically if the memory region is
> >> +		 * large features like detect hung task, lock detector or lock
> >                    large.  Otherwise, we may see panics due to..
> >> +		 * dep  may panic. In addition holding the lock this long will
> >     extra white space ^^           Additionally, holding the lock for a
> >     long timer will
> >> +		 * also starve other vCPUs. Applies to huge VM memory regions.
> >                                             ^^^ I don't understand this
> > 					    last remark.
> Sorry overlooked this.
> 
> While testing - VM regions that were small (~1GB) holding the mmu_lock
> caused not problems, but when I was running memory regions around 2GB large
> some kernel lockup detection/lock contention options (some selected by default) 
> caused deadlock warnings/panics in host kernel.
> 
> This was in one my previous review comments sometime ago, I can go back
> and find the options.
> 

Just drop the last part of the comment, so the whole thing reads:

/*
 * Release kvm_mmu_lock periodically if the memory region is
 * large. Otherwise, we may see kernel panics from debugging features
 * such as "detect hung task", "lock detector" or "lock dep checks".
 * Additionally, holding the lock too long will also starve other vCPUs.
 */

And check the actual names of those debugging features or use the
CONFIG_<WHATEVER> names and say "we may see kernel panics with CONFIG_X,
CONFIG_Y, and CONFIG_Z.

Makes sense?

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-12  1:25       ` Mario Smarduch
@ 2014-08-12  9:50         ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-12  9:50 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> > On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> >> This patch adds support for handling 2nd stage page faults during migration,
> >> it disables faulting in huge pages, and dissolves huge pages to page tables.
> >> In case migration is canceled huge pages will be used again.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
> >>  1 file changed, 25 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index ca84331..a17812a 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> >>  }
> >>  
> >>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> >> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
> >> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
> >> +			  bool logging_active)
> >>  {
> >>  	pmd_t *pmd;
> >>  	pte_t *pte, old_pte;
> >> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> >>  		return 0;
> >>  	}
> >>  
> >> +	/*
> >> +	 * While dirty memory logging, clear PMD entry for huge page and split
> >> +	 * into smaller pages, to track dirty memory at page granularity.
> >> +	 */
> >> +	if (logging_active && kvm_pmd_huge(*pmd)) {
> >> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
> >> +		clear_pmd_entry(kvm, pmd, ipa);
> > 
> > clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
> > definitely not the right thing to call.
> 
> I don't see that in 3.15rc1/rc4 -
> 
> static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
> {
>         if (kvm_pmd_huge(*pmd)) {
>                 pmd_clear(pmd);
>                 kvm_tlb_flush_vmid_ipa(kvm, addr);
>         } else {
>                   [....]
> }
> 
> I thought the purpose of this function was to clear PMD entry. Also
> ran hundreds of tests no problems. Hmmm confused.
> 

You need to rebase on kvm/next or linus/master, something that contains:

4f853a7 arm/arm64: KVM: Fix and refactor unmap_range

> > 
> >> +	}
> >> +
> >>  	/* Create stage-2 page mappings - Level 2 */
> >>  	if (pmd_none(*pmd)) {
> >>  		if (!cache)
> >> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> >>  		if (ret)
> >>  			goto out;
> >>  		spin_lock(&kvm->mmu_lock);
> >> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
> >> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
> >>  		spin_unlock(&kvm->mmu_lock);
> >>  		if (ret)
> >>  			goto out;
> >> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> >>  	struct vm_area_struct *vma;
> >>  	pfn_t pfn;
> >> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
> >> +	#ifdef CONFIG_ARM
> >> +		bool logging_active = !!memslot->dirty_bitmap;
> >> +	#else
> >> +		bool logging_active = false;
> >> +	#endif
> > 
> > can you make this an inline in the header files for now please?
> 
> Yes definitely.
> 
> > 
> >>  
> >>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> >>  	if (fault_status == FSC_PERM && !write_fault) {
> >> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
> >>  	down_read(&current->mm->mmap_sem);
> >>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
> >> -	if (is_vm_hugetlb_page(vma)) {
> >> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
> >>  		hugetlb = true;
> >>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> >>  	} else {
> >> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	spin_lock(&kvm->mmu_lock);
> >>  	if (mmu_notifier_retry(kvm, mmu_seq))
> >>  		goto out_unlock;
> >> -	if (!hugetlb && !force_pte)
> >> +	if (!hugetlb && !force_pte && !logging_active)
> >>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> >>  
> >>  	if (hugetlb) {
> >> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			kvm_set_pfn_dirty(pfn);
> >>  		}
> >>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
> >> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
> >> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
> >> +					logging_active);
> >>  	}
> >>  
> >> +	if (write_fault)
> >> +		mark_page_dirty(kvm, gfn);
> >>  
> >>  out_unlock:
> >>  	spin_unlock(&kvm->mmu_lock);
> >> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>  {
> >>  	pte_t *pte = (pte_t *)data;
> >>  
> >> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> > 
> > why is logging never active if we are called from MMU notifiers?
> 
> mmu notifiers update sptes, but I don't see how these updates
> can result in guest dirty pages. Also guest pages are marked dirty
> from 2nd stage page fault handlers (searching through the code).
> 
Ok, then add:

/*
 * We can always call stage2_set_pte with logging_active == false,
 * because MMU notifiers will have unmapped a huge PMD before calling
 * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
 * stage2_set_pte() never needs to clear out a huge PMD through this
 * calling path.
 */

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-12  9:50         ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-12  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> > On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> >> This patch adds support for handling 2nd stage page faults during migration,
> >> it disables faulting in huge pages, and dissolves huge pages to page tables.
> >> In case migration is canceled huge pages will be used again.
> >>
> >> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> >> ---
> >>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
> >>  1 file changed, 25 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index ca84331..a17812a 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> >>  }
> >>  
> >>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> >> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
> >> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
> >> +			  bool logging_active)
> >>  {
> >>  	pmd_t *pmd;
> >>  	pte_t *pte, old_pte;
> >> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> >>  		return 0;
> >>  	}
> >>  
> >> +	/*
> >> +	 * While dirty memory logging, clear PMD entry for huge page and split
> >> +	 * into smaller pages, to track dirty memory at page granularity.
> >> +	 */
> >> +	if (logging_active && kvm_pmd_huge(*pmd)) {
> >> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
> >> +		clear_pmd_entry(kvm, pmd, ipa);
> > 
> > clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
> > definitely not the right thing to call.
> 
> I don't see that in 3.15rc1/rc4 -
> 
> static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
> {
>         if (kvm_pmd_huge(*pmd)) {
>                 pmd_clear(pmd);
>                 kvm_tlb_flush_vmid_ipa(kvm, addr);
>         } else {
>                   [....]
> }
> 
> I thought the purpose of this function was to clear PMD entry. Also
> ran hundreds of tests no problems. Hmmm confused.
> 

You need to rebase on kvm/next or linus/master, something that contains:

4f853a7 arm/arm64: KVM: Fix and refactor unmap_range

> > 
> >> +	}
> >> +
> >>  	/* Create stage-2 page mappings - Level 2 */
> >>  	if (pmd_none(*pmd)) {
> >>  		if (!cache)
> >> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> >>  		if (ret)
> >>  			goto out;
> >>  		spin_lock(&kvm->mmu_lock);
> >> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
> >> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
> >>  		spin_unlock(&kvm->mmu_lock);
> >>  		if (ret)
> >>  			goto out;
> >> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> >>  	struct vm_area_struct *vma;
> >>  	pfn_t pfn;
> >> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
> >> +	#ifdef CONFIG_ARM
> >> +		bool logging_active = !!memslot->dirty_bitmap;
> >> +	#else
> >> +		bool logging_active = false;
> >> +	#endif
> > 
> > can you make this an inline in the header files for now please?
> 
> Yes definitely.
> 
> > 
> >>  
> >>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> >>  	if (fault_status == FSC_PERM && !write_fault) {
> >> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
> >>  	down_read(&current->mm->mmap_sem);
> >>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
> >> -	if (is_vm_hugetlb_page(vma)) {
> >> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
> >>  		hugetlb = true;
> >>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> >>  	} else {
> >> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	spin_lock(&kvm->mmu_lock);
> >>  	if (mmu_notifier_retry(kvm, mmu_seq))
> >>  		goto out_unlock;
> >> -	if (!hugetlb && !force_pte)
> >> +	if (!hugetlb && !force_pte && !logging_active)
> >>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> >>  
> >>  	if (hugetlb) {
> >> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			kvm_set_pfn_dirty(pfn);
> >>  		}
> >>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
> >> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
> >> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
> >> +					logging_active);
> >>  	}
> >>  
> >> +	if (write_fault)
> >> +		mark_page_dirty(kvm, gfn);
> >>  
> >>  out_unlock:
> >>  	spin_unlock(&kvm->mmu_lock);
> >> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>  {
> >>  	pte_t *pte = (pte_t *)data;
> >>  
> >> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> > 
> > why is logging never active if we are called from MMU notifiers?
> 
> mmu notifiers update sptes, but I don't see how these updates
> can result in guest dirty pages. Also guest pages are marked dirty
> from 2nd stage page fault handlers (searching through the code).
> 
Ok, then add:

/*
 * We can always call stage2_set_pte with logging_active == false,
 * because MMU notifiers will have unmapped a huge PMD before calling
 * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
 * stage2_set_pte() never needs to clear out a huge PMD through this
 * calling path.
 */

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-08-12  9:36         ` Christoffer Dall
@ 2014-08-12 21:08           ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12 21:08 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/12/2014 02:36 AM, Christoffer Dall wrote:
> On Mon, Aug 11, 2014 at 06:36:14PM -0700, Mario Smarduch wrote:
>> On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> 
> [...]
> 
>>>> +/**
>>>> + * stage2_wp_range() - write protect stage2 memory region range
>>>> + * @kvm:	The KVM pointer
>>>> + * @start:	Start address of range
>>>> + * &end:	End address of range
>>>> + */
>>>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>>>> +{
>>>> +	pgd_t *pgd;
>>>> +	phys_addr_t next;
>>>> +
>>>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>>>> +	do {
>>>> +		/*
>>>> +		 * Release kvm_mmu_lock periodically if the memory region is
>>>> +		 * large features like detect hung task, lock detector or lock
>>>                    large.  Otherwise, we may see panics due to..
>>>> +		 * dep  may panic. In addition holding the lock this long will
>>>     extra white space ^^           Additionally, holding the lock for a
>>>     long timer will
>>>> +		 * also starve other vCPUs. Applies to huge VM memory regions.
>>>                                             ^^^ I don't understand this
>>> 					    last remark.
>> Sorry overlooked this.
>>
>> While testing - VM regions that were small (~1GB) holding the mmu_lock
>> caused not problems, but when I was running memory regions around 2GB large
>> some kernel lockup detection/lock contention options (some selected by default) 
>> caused deadlock warnings/panics in host kernel.
>>
>> This was in one my previous review comments sometime ago, I can go back
>> and find the options.
>>
> 
> Just drop the last part of the comment, so the whole thing reads:
> 
> /*
>  * Release kvm_mmu_lock periodically if the memory region is
>  * large. Otherwise, we may see kernel panics from debugging features
>  * such as "detect hung task", "lock detector" or "lock dep checks".
>  * Additionally, holding the lock too long will also starve other vCPUs.
>  */
> 
> And check the actual names of those debugging features or use the
> CONFIG_<WHATEVER> names and say "we may see kernel panics with CONFIG_X,
> CONFIG_Y, and CONFIG_Z.
> 
> Makes sense?

Yep sure does.

> 
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-08-12 21:08           ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12 21:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/12/2014 02:36 AM, Christoffer Dall wrote:
> On Mon, Aug 11, 2014 at 06:36:14PM -0700, Mario Smarduch wrote:
>> On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> 
> [...]
> 
>>>> +/**
>>>> + * stage2_wp_range() - write protect stage2 memory region range
>>>> + * @kvm:	The KVM pointer
>>>> + * @start:	Start address of range
>>>> + * &end:	End address of range
>>>> + */
>>>> +static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>>>> +{
>>>> +	pgd_t *pgd;
>>>> +	phys_addr_t next;
>>>> +
>>>> +	pgd = kvm->arch.pgd + pgd_index(addr);
>>>> +	do {
>>>> +		/*
>>>> +		 * Release kvm_mmu_lock periodically if the memory region is
>>>> +		 * large features like detect hung task, lock detector or lock
>>>                    large.  Otherwise, we may see panics due to..
>>>> +		 * dep  may panic. In addition holding the lock this long will
>>>     extra white space ^^           Additionally, holding the lock for a
>>>     long timer will
>>>> +		 * also starve other vCPUs. Applies to huge VM memory regions.
>>>                                             ^^^ I don't understand this
>>> 					    last remark.
>> Sorry overlooked this.
>>
>> While testing - VM regions that were small (~1GB) holding the mmu_lock
>> caused not problems, but when I was running memory regions around 2GB large
>> some kernel lockup detection/lock contention options (some selected by default) 
>> caused deadlock warnings/panics in host kernel.
>>
>> This was in one my previous review comments sometime ago, I can go back
>> and find the options.
>>
> 
> Just drop the last part of the comment, so the whole thing reads:
> 
> /*
>  * Release kvm_mmu_lock periodically if the memory region is
>  * large. Otherwise, we may see kernel panics from debugging features
>  * such as "detect hung task", "lock detector" or "lock dep checks".
>  * Additionally, holding the lock too long will also starve other vCPUs.
>  */
> 
> And check the actual names of those debugging features or use the
> CONFIG_<WHATEVER> names and say "we may see kernel panics with CONFIG_X,
> CONFIG_Y, and CONFIG_Z.
> 
> Makes sense?

Yep sure does.

> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
  2014-08-12  9:32         ` Christoffer Dall
@ 2014-08-12 23:17           ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12 23:17 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/12/2014 02:32 AM, Christoffer Dall wrote:
> On Mon, Aug 11, 2014 at 05:16:21PM -0700, Mario Smarduch wrote:
>> On 08/11/2014 12:12 PM, Christoffer Dall wrote:
>>> Remove the parenthesis from the subject line.
>>
> 
> I just prefer not having the "(w/no huge PUD support)" part in the patch
> title.

Ah ok.

> 
>> Hmmm have to check this don't see it my patch file.
>>>
>>> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>>>> Patch adds  support for initial write protection VM memlsot. This patch series
>>>             ^^                                    ^
>>> stray whitespace                                 of
>>>
>> Need to watch out for these adds delays to review cycle.
> 
> yes, I care quite a lot about proper English, syntax, grammar and
> spelling.  Reading critically through your own patch files before
> mailing them out is a good exercise.  You can even consider putting them
> through a spell-checker and/or configure your editor to mark double
> white space, trailing white space etc.
> 
> [...]
> 
>>>> +	do {
>>>> +		next = kvm_pmd_addr_end(addr, end);
>>>> +		if (!pmd_none(*pmd)) {
>>>> +			if (kvm_pmd_huge(*pmd)) {
>>>> +				if (!kvm_s2pmd_readonly(pmd))
>>>> +					kvm_set_s2pmd_readonly(pmd);
>>>> +			} else
>>>> +				stage2_wp_pte_range(pmd, addr, next);
>>> please use a closing brace when the first part of the if-statement is a
>>> multi-line block with braces, as per the CodingStyle.
>> Will fix.
>>>> +
>>>
>>> stray blank line
>>
>> Not sure how it got by checkpatch, will fix.
> 
> Not sure checkpatch will complain, but I do ;)  No big deal anyway.
> 
>>>
>>>> +		}
>>>> +	} while (pmd++, addr = next, addr != end);
>>>> +}
>>>> +
>>>> +/**
>>>> +  * stage2_wp_pud_range - write protect PUD range
>>>> +  * @kvm:	pointer to kvm structure
>>>> +  * @pud:	pointer to pgd entry
>>>         pgd
>>>> +  * @addr:	range start address
>>>> +  * @end:	range end address
>>>> +  *
>>>> +  * While walking the PUD range huge PUD pages are ignored, in the future this
>>>                              range, huge PUDs are ignored.  In the future...
>>>> +  * may need to be revisited. Determine how to handle huge PUDs when logging
>>>> +  * of dirty pages is enabled.
>>>
>>> I don't understand the last sentence?
>>
>> Probably last two sentences should be combined.
>> ".... to determine how to handle huge PUT...". Would that be
>> clear enough?
>>
>> The overall theme is what to do about PUDs - mark all pages dirty
>> in the region, attempt to breakup such huge regions?
>>
> 
> I think you should just state that this is not supported and worry
> about how to deal with it when it's properly supported.  The TODO below
> is sufficient, so just drop all mentionings about the future in the
> function description above - it's likely to be forgotten when PUDs are
> in fact support anyhow.

Ok the simpler the better.
> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 2/4] arm: ARMv7  dirty page logging inital mem region write protect (w/no huge PUD support)
@ 2014-08-12 23:17           ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-12 23:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/12/2014 02:32 AM, Christoffer Dall wrote:
> On Mon, Aug 11, 2014 at 05:16:21PM -0700, Mario Smarduch wrote:
>> On 08/11/2014 12:12 PM, Christoffer Dall wrote:
>>> Remove the parenthesis from the subject line.
>>
> 
> I just prefer not having the "(w/no huge PUD support)" part in the patch
> title.

Ah ok.

> 
>> Hmmm have to check this don't see it my patch file.
>>>
>>> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>>>> Patch adds  support for initial write protection VM memlsot. This patch series
>>>             ^^                                    ^
>>> stray whitespace                                 of
>>>
>> Need to watch out for these adds delays to review cycle.
> 
> yes, I care quite a lot about proper English, syntax, grammar and
> spelling.  Reading critically through your own patch files before
> mailing them out is a good exercise.  You can even consider putting them
> through a spell-checker and/or configure your editor to mark double
> white space, trailing white space etc.
> 
> [...]
> 
>>>> +	do {
>>>> +		next = kvm_pmd_addr_end(addr, end);
>>>> +		if (!pmd_none(*pmd)) {
>>>> +			if (kvm_pmd_huge(*pmd)) {
>>>> +				if (!kvm_s2pmd_readonly(pmd))
>>>> +					kvm_set_s2pmd_readonly(pmd);
>>>> +			} else
>>>> +				stage2_wp_pte_range(pmd, addr, next);
>>> please use a closing brace when the first part of the if-statement is a
>>> multi-line block with braces, as per the CodingStyle.
>> Will fix.
>>>> +
>>>
>>> stray blank line
>>
>> Not sure how it got by checkpatch, will fix.
> 
> Not sure checkpatch will complain, but I do ;)  No big deal anyway.
> 
>>>
>>>> +		}
>>>> +	} while (pmd++, addr = next, addr != end);
>>>> +}
>>>> +
>>>> +/**
>>>> +  * stage2_wp_pud_range - write protect PUD range
>>>> +  * @kvm:	pointer to kvm structure
>>>> +  * @pud:	pointer to pgd entry
>>>         pgd
>>>> +  * @addr:	range start address
>>>> +  * @end:	range end address
>>>> +  *
>>>> +  * While walking the PUD range huge PUD pages are ignored, in the future this
>>>                              range, huge PUDs are ignored.  In the future...
>>>> +  * may need to be revisited. Determine how to handle huge PUDs when logging
>>>> +  * of dirty pages is enabled.
>>>
>>> I don't understand the last sentence?
>>
>> Probably last two sentences should be combined.
>> ".... to determine how to handle huge PUT...". Would that be
>> clear enough?
>>
>> The overall theme is what to do about PUDs - mark all pages dirty
>> in the region, attempt to breakup such huge regions?
>>
> 
> I think you should just state that this is not supported and worry
> about how to deal with it when it's properly supported.  The TODO below
> is sufficient, so just drop all mentionings about the future in the
> function description above - it's likely to be forgotten when PUDs are
> in fact support anyhow.

Ok the simpler the better.
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-12  9:50         ` Christoffer Dall
@ 2014-08-13  1:27           ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-13  1:27 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>>>> This patch adds support for handling 2nd stage page faults during migration,
>>>> it disables faulting in huge pages, and dissolves huge pages to page tables.
>>>> In case migration is canceled huge pages will be used again.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
>>>>  1 file changed, 25 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index ca84331..a17812a 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>>>  }
>>>>  
>>>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>>> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
>>>> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
>>>> +			  bool logging_active)
>>>>  {
>>>>  	pmd_t *pmd;
>>>>  	pte_t *pte, old_pte;
>>>> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>>>  		return 0;
>>>>  	}
>>>>  
>>>> +	/*
>>>> +	 * While dirty memory logging, clear PMD entry for huge page and split
>>>> +	 * into smaller pages, to track dirty memory at page granularity.
>>>> +	 */
>>>> +	if (logging_active && kvm_pmd_huge(*pmd)) {
>>>> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
>>>> +		clear_pmd_entry(kvm, pmd, ipa);
>>>
>>> clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
>>> definitely not the right thing to call.
>>
>> I don't see that in 3.15rc1/rc4 -
>>
>> static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
>> {
>>         if (kvm_pmd_huge(*pmd)) {
>>                 pmd_clear(pmd);
>>                 kvm_tlb_flush_vmid_ipa(kvm, addr);
>>         } else {
>>                   [....]
>> }
>>
>> I thought the purpose of this function was to clear PMD entry. Also
>> ran hundreds of tests no problems. Hmmm confused.
>>
> 
> You need to rebase on kvm/next or linus/master, something that contains:
> 
> 4f853a7 arm/arm64: KVM: Fix and refactor unmap_range
> 
>>>
>>>> +	}
>>>> +
>>>>  	/* Create stage-2 page mappings - Level 2 */
>>>>  	if (pmd_none(*pmd)) {
>>>>  		if (!cache)
>>>> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>>>>  		if (ret)
>>>>  			goto out;
>>>>  		spin_lock(&kvm->mmu_lock);
>>>> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
>>>> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>>>>  		spin_unlock(&kvm->mmu_lock);
>>>>  		if (ret)
>>>>  			goto out;
>>>> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>>>  	struct vm_area_struct *vma;
>>>>  	pfn_t pfn;
>>>> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>>>> +	#ifdef CONFIG_ARM
>>>> +		bool logging_active = !!memslot->dirty_bitmap;
>>>> +	#else
>>>> +		bool logging_active = false;
>>>> +	#endif
>>>
>>> can you make this an inline in the header files for now please?
>>
>> Yes definitely.
>>
>>>
>>>>  
>>>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>>>  	if (fault_status == FSC_PERM && !write_fault) {
>>>> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
>>>>  	down_read(&current->mm->mmap_sem);
>>>>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
>>>> -	if (is_vm_hugetlb_page(vma)) {
>>>> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
>>>>  		hugetlb = true;
>>>>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>>>>  	} else {
>>>> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	spin_lock(&kvm->mmu_lock);
>>>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>>>  		goto out_unlock;
>>>> -	if (!hugetlb && !force_pte)
>>>> +	if (!hugetlb && !force_pte && !logging_active)
>>>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>>>  
>>>>  	if (hugetlb) {
>>>> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			kvm_set_pfn_dirty(pfn);
>>>>  		}
>>>>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
>>>> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>>>> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
>>>> +					logging_active);
>>>>  	}
>>>>  
>>>> +	if (write_fault)
>>>> +		mark_page_dirty(kvm, gfn);
>>>>  
>>>>  out_unlock:
>>>>  	spin_unlock(&kvm->mmu_lock);
>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>>>  {
>>>>  	pte_t *pte = (pte_t *)data;
>>>>  
>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
>>>
>>> why is logging never active if we are called from MMU notifiers?
>>
>> mmu notifiers update sptes, but I don't see how these updates
>> can result in guest dirty pages. Also guest pages are marked dirty
>> from 2nd stage page fault handlers (searching through the code).
>>
> Ok, then add:
> 
> /*
>  * We can always call stage2_set_pte with logging_active == false,
>  * because MMU notifiers will have unmapped a huge PMD before calling
>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
>  * stage2_set_pte() never needs to clear out a huge PMD through this
>  * calling path.
>  */

So here on permission change to primary pte's kernel first invalidates
related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
consequence of invalidation we unmap huge PMDs, if a page falls in that
range.

Is the comment to point out use of logging flag under various scenarios?

Should I add comments on flag use in other places as well?

> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-13  1:27           ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-13  1:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>>>> This patch adds support for handling 2nd stage page faults during migration,
>>>> it disables faulting in huge pages, and dissolves huge pages to page tables.
>>>> In case migration is canceled huge pages will be used again.
>>>>
>>>> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
>>>> ---
>>>>  arch/arm/kvm/mmu.c |   31 +++++++++++++++++++++++++------
>>>>  1 file changed, 25 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index ca84331..a17812a 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>>>  }
>>>>  
>>>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>>> -			  phys_addr_t addr, const pte_t *new_pte, bool iomap)
>>>> +			  phys_addr_t addr, const pte_t *new_pte, bool iomap,
>>>> +			  bool logging_active)
>>>>  {
>>>>  	pmd_t *pmd;
>>>>  	pte_t *pte, old_pte;
>>>> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>>>  		return 0;
>>>>  	}
>>>>  
>>>> +	/*
>>>> +	 * While dirty memory logging, clear PMD entry for huge page and split
>>>> +	 * into smaller pages, to track dirty memory at page granularity.
>>>> +	 */
>>>> +	if (logging_active && kvm_pmd_huge(*pmd)) {
>>>> +		phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
>>>> +		clear_pmd_entry(kvm, pmd, ipa);
>>>
>>> clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
>>> definitely not the right thing to call.
>>
>> I don't see that in 3.15rc1/rc4 -
>>
>> static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
>> {
>>         if (kvm_pmd_huge(*pmd)) {
>>                 pmd_clear(pmd);
>>                 kvm_tlb_flush_vmid_ipa(kvm, addr);
>>         } else {
>>                   [....]
>> }
>>
>> I thought the purpose of this function was to clear PMD entry. Also
>> ran hundreds of tests no problems. Hmmm confused.
>>
> 
> You need to rebase on kvm/next or linus/master, something that contains:
> 
> 4f853a7 arm/arm64: KVM: Fix and refactor unmap_range
> 
>>>
>>>> +	}
>>>> +
>>>>  	/* Create stage-2 page mappings - Level 2 */
>>>>  	if (pmd_none(*pmd)) {
>>>>  		if (!cache)
>>>> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>>>>  		if (ret)
>>>>  			goto out;
>>>>  		spin_lock(&kvm->mmu_lock);
>>>> -		ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
>>>> +		ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>>>>  		spin_unlock(&kvm->mmu_lock);
>>>>  		if (ret)
>>>>  			goto out;
>>>> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>>>  	struct vm_area_struct *vma;
>>>>  	pfn_t pfn;
>>>> +	/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>>>> +	#ifdef CONFIG_ARM
>>>> +		bool logging_active = !!memslot->dirty_bitmap;
>>>> +	#else
>>>> +		bool logging_active = false;
>>>> +	#endif
>>>
>>> can you make this an inline in the header files for now please?
>>
>> Yes definitely.
>>
>>>
>>>>  
>>>>  	write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>>>  	if (fault_status == FSC_PERM && !write_fault) {
>>>> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	/* Let's check if we will get back a huge page backed by hugetlbfs */
>>>>  	down_read(&current->mm->mmap_sem);
>>>>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
>>>> -	if (is_vm_hugetlb_page(vma)) {
>>>> +	if (is_vm_hugetlb_page(vma) && !logging_active) {
>>>>  		hugetlb = true;
>>>>  		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>>>>  	} else {
>>>> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	spin_lock(&kvm->mmu_lock);
>>>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>>>  		goto out_unlock;
>>>> -	if (!hugetlb && !force_pte)
>>>> +	if (!hugetlb && !force_pte && !logging_active)
>>>>  		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>>>  
>>>>  	if (hugetlb) {
>>>> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			kvm_set_pfn_dirty(pfn);
>>>>  		}
>>>>  		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
>>>> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>>>> +		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
>>>> +					logging_active);
>>>>  	}
>>>>  
>>>> +	if (write_fault)
>>>> +		mark_page_dirty(kvm, gfn);
>>>>  
>>>>  out_unlock:
>>>>  	spin_unlock(&kvm->mmu_lock);
>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>>>  {
>>>>  	pte_t *pte = (pte_t *)data;
>>>>  
>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
>>>
>>> why is logging never active if we are called from MMU notifiers?
>>
>> mmu notifiers update sptes, but I don't see how these updates
>> can result in guest dirty pages. Also guest pages are marked dirty
>> from 2nd stage page fault handlers (searching through the code).
>>
> Ok, then add:
> 
> /*
>  * We can always call stage2_set_pte with logging_active == false,
>  * because MMU notifiers will have unmapped a huge PMD before calling
>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
>  * stage2_set_pte() never needs to clear out a huge PMD through this
>  * calling path.
>  */

So here on permission change to primary pte's kernel first invalidates
related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
consequence of invalidation we unmap huge PMDs, if a page falls in that
range.

Is the comment to point out use of logging flag under various scenarios?

Should I add comments on flag use in other places as well?

> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-13  1:27           ` Mario Smarduch
@ 2014-08-13  7:30             ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-13  7:30 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> > On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> >> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> >>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:

[...]

> >>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>>>  {
> >>>>  	pte_t *pte = (pte_t *)data;
> >>>>  
> >>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> >>>
> >>> why is logging never active if we are called from MMU notifiers?
> >>
> >> mmu notifiers update sptes, but I don't see how these updates
> >> can result in guest dirty pages. Also guest pages are marked dirty
> >> from 2nd stage page fault handlers (searching through the code).
> >>
> > Ok, then add:
> > 
> > /*
> >  * We can always call stage2_set_pte with logging_active == false,
> >  * because MMU notifiers will have unmapped a huge PMD before calling
> >  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
> >  * stage2_set_pte() never needs to clear out a huge PMD through this
> >  * calling path.
> >  */
> 
> So here on permission change to primary pte's kernel first invalidates
> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
> consequence of invalidation we unmap huge PMDs, if a page falls in that
> range.
> 
> Is the comment to point out use of logging flag under various scenarios?

The comment is because when you look at this function it is not obvious
why we pass logging_active=false, despite logging may actually be
active.  This could suggest that the parameter to stage2_set_pte()
should be named differently (break_huge_pmds) or something like that,
but we can also be satisfied with the comment.

> 
> Should I add comments on flag use in other places as well?
> 

It's always a judgement call.  I didn't find it necessarry to put a
comment elsewhere because I think it's pretty obivous that we would
never care about logging writes to device regions.

However, this made me think, are we making sure that we are not marking
device mappings as read-only in the wp_range functions?  I think it's
quite bad if we mark the VCPU interface as read-only for example.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-13  7:30             ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-13  7:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> > On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> >> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> >>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:

[...]

> >>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>>>  {
> >>>>  	pte_t *pte = (pte_t *)data;
> >>>>  
> >>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> >>>
> >>> why is logging never active if we are called from MMU notifiers?
> >>
> >> mmu notifiers update sptes, but I don't see how these updates
> >> can result in guest dirty pages. Also guest pages are marked dirty
> >> from 2nd stage page fault handlers (searching through the code).
> >>
> > Ok, then add:
> > 
> > /*
> >  * We can always call stage2_set_pte with logging_active == false,
> >  * because MMU notifiers will have unmapped a huge PMD before calling
> >  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
> >  * stage2_set_pte() never needs to clear out a huge PMD through this
> >  * calling path.
> >  */
> 
> So here on permission change to primary pte's kernel first invalidates
> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
> consequence of invalidation we unmap huge PMDs, if a page falls in that
> range.
> 
> Is the comment to point out use of logging flag under various scenarios?

The comment is because when you look at this function it is not obvious
why we pass logging_active=false, despite logging may actually be
active.  This could suggest that the parameter to stage2_set_pte()
should be named differently (break_huge_pmds) or something like that,
but we can also be satisfied with the comment.

> 
> Should I add comments on flag use in other places as well?
> 

It's always a judgement call.  I didn't find it necessarry to put a
comment elsewhere because I think it's pretty obivous that we would
never care about logging writes to device regions.

However, this made me think, are we making sure that we are not marking
device mappings as read-only in the wp_range functions?  I think it's
quite bad if we mark the VCPU interface as read-only for example.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-13  7:30             ` Christoffer Dall
@ 2014-08-14  1:20               ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-14  1:20 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/13/2014 12:30 AM, Christoffer Dall wrote:
> On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
>> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
>>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> 
> [...]
> 
>>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>>>>>  {
>>>>>>  	pte_t *pte = (pte_t *)data;
>>>>>>  
>>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
>>>>>
>>>>> why is logging never active if we are called from MMU notifiers?
>>>>
>>>> mmu notifiers update sptes, but I don't see how these updates
>>>> can result in guest dirty pages. Also guest pages are marked dirty
>>>> from 2nd stage page fault handlers (searching through the code).
>>>>
>>> Ok, then add:
>>>
>>> /*
>>>  * We can always call stage2_set_pte with logging_active == false,
>>>  * because MMU notifiers will have unmapped a huge PMD before calling
>>>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
>>>  * stage2_set_pte() never needs to clear out a huge PMD through this
>>>  * calling path.
>>>  */
>>
>> So here on permission change to primary pte's kernel first invalidates
>> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
>> consequence of invalidation we unmap huge PMDs, if a page falls in that
>> range.
>>
>> Is the comment to point out use of logging flag under various scenarios?
> 
> The comment is because when you look at this function it is not obvious
> why we pass logging_active=false, despite logging may actually be
> active.  This could suggest that the parameter to stage2_set_pte()
> should be named differently (break_huge_pmds) or something like that,
> but we can also be satisfied with the comment.

Ok I see, I was thinking you thought it was breaking something.
Yeah I'll add the comment, in reality this is another use case
where a PMD may need to be converted to page table so it makes sense
to contrast use cases.

> 
>>
>> Should I add comments on flag use in other places as well?
>>
> 
> It's always a judgement call.  I didn't find it necessarry to put a
> comment elsewhere because I think it's pretty obivous that we would
> never care about logging writes to device regions.
> 
> However, this made me think, are we making sure that we are not marking
> device mappings as read-only in the wp_range functions?  I think it's

KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
these regions wind up in KVMState->KVMSlot[], when
memory_region_add_subregion() is called KVM listener installs it.
For migration and dirty page logging QEMU walks the KVMSlot[] array.

For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
causes the memory region to be added to KVMState->KVMSlot[].
In that case it's possible to walk KVMState->KVMSlot[] issue
the ioctl and  come across  a device mapping with normal memory and
WP it's s2ptes (VFIO sets unmigrateble state though).

But I'm not sure what's there to stop someone calling the ioctl and
install a region with device memory type. Most likely though if you
installed that kind of region migration would be disabled.

But just for logging use not checking memory type could be an issue.

> quite bad if we mark the VCPU interface as read-only for example.
> 
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-14  1:20               ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-14  1:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/13/2014 12:30 AM, Christoffer Dall wrote:
> On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
>> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
>>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> 
> [...]
> 
>>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>>>>>  {
>>>>>>  	pte_t *pte = (pte_t *)data;
>>>>>>  
>>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
>>>>>
>>>>> why is logging never active if we are called from MMU notifiers?
>>>>
>>>> mmu notifiers update sptes, but I don't see how these updates
>>>> can result in guest dirty pages. Also guest pages are marked dirty
>>>> from 2nd stage page fault handlers (searching through the code).
>>>>
>>> Ok, then add:
>>>
>>> /*
>>>  * We can always call stage2_set_pte with logging_active == false,
>>>  * because MMU notifiers will have unmapped a huge PMD before calling
>>>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
>>>  * stage2_set_pte() never needs to clear out a huge PMD through this
>>>  * calling path.
>>>  */
>>
>> So here on permission change to primary pte's kernel first invalidates
>> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
>> consequence of invalidation we unmap huge PMDs, if a page falls in that
>> range.
>>
>> Is the comment to point out use of logging flag under various scenarios?
> 
> The comment is because when you look at this function it is not obvious
> why we pass logging_active=false, despite logging may actually be
> active.  This could suggest that the parameter to stage2_set_pte()
> should be named differently (break_huge_pmds) or something like that,
> but we can also be satisfied with the comment.

Ok I see, I was thinking you thought it was breaking something.
Yeah I'll add the comment, in reality this is another use case
where a PMD may need to be converted to page table so it makes sense
to contrast use cases.

> 
>>
>> Should I add comments on flag use in other places as well?
>>
> 
> It's always a judgement call.  I didn't find it necessarry to put a
> comment elsewhere because I think it's pretty obivous that we would
> never care about logging writes to device regions.
> 
> However, this made me think, are we making sure that we are not marking
> device mappings as read-only in the wp_range functions?  I think it's

KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
these regions wind up in KVMState->KVMSlot[], when
memory_region_add_subregion() is called KVM listener installs it.
For migration and dirty page logging QEMU walks the KVMSlot[] array.

For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
causes the memory region to be added to KVMState->KVMSlot[].
In that case it's possible to walk KVMState->KVMSlot[] issue
the ioctl and  come across  a device mapping with normal memory and
WP it's s2ptes (VFIO sets unmigrateble state though).

But I'm not sure what's there to stop someone calling the ioctl and
install a region with device memory type. Most likely though if you
installed that kind of region migration would be disabled.

But just for logging use not checking memory type could be an issue.

> quite bad if we mark the VCPU interface as read-only for example.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-14  1:20               ` Mario Smarduch
@ 2014-08-15  0:01                 ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-15  0:01 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/13/2014 06:20 PM, Mario Smarduch wrote:
> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
>> On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
>>> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
>>>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>>>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>>
>> [...]
>>
>>>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>>>>>>  {
>>>>>>>  	pte_t *pte = (pte_t *)data;
>>>>>>>  
>>>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>>>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
>>>>>>
>>>>>> why is logging never active if we are called from MMU notifiers?
>>>>>
[...]

>> The comment is because when you look at this function it is not obvious
>> why we pass logging_active=false, despite logging may actually be
>> active.  This could suggest that the parameter to stage2_set_pte()
>> should be named differently (break_huge_pmds) or something like that,
>> but we can also be satisfied with the comment.
> 
> Ok I see, I was thinking you thought it was breaking something.
> Yeah I'll add the comment, in reality this is another use case
> where a PMD may need to be converted to page table so it makes sense
> to contrast use cases.
> 
>>
>>>
>>> Should I add comments on flag use in other places as well?
>>>
>>
>> It's always a judgement call.  I didn't find it necessarry to put a
>> comment elsewhere because I think it's pretty obivous that we would
>> never care about logging writes to device regions.
>>
>> However, this made me think, are we making sure that we are not marking
>> device mappings as read-only in the wp_range functions?  I think it's
> 
> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
> these regions wind up in KVMState->KVMSlot[], when
> memory_region_add_subregion() is called KVM listener installs it.
> For migration and dirty page logging QEMU walks the KVMSlot[] array.
> 
> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
> causes the memory region to be added to KVMState->KVMSlot[].
> In that case it's possible to walk KVMState->KVMSlot[] issue
> the ioctl and  come across  a device mapping with normal memory and
> WP it's s2ptes (VFIO sets unmigrateble state though).
> 
> But I'm not sure what's there to stop someone calling the ioctl and
> install a region with device memory type. Most likely though if you
> installed that kind of region migration would be disabled.
> 
> But just for logging use not checking memory type could be an issue.

Clarifying above a bit, KVM structures like kvm_run or vgic don't go
through KVM_SET_USER_MEMORY_REGION interface (can't think of any
other KVM structures). VFIO uses KVM_SET_USER_MEMORY_REGION,
user_mem_abort() should resolve the fault. I recall VFIO patch
series add that support.

It should be ok to write protect MMIO regions installed through
KVM_SET_USER_MEMORY_REGION. Although at this time I don't know
of use case for logging without migration, so this may not be
an issue at all at this time.

> 
>> quite bad if we mark the VCPU interface as read-only for example.
>>
>> -Christoffer
>>
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-15  0:01                 ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-15  0:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/13/2014 06:20 PM, Mario Smarduch wrote:
> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
>> On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
>>> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
>>>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>>>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>>
>> [...]
>>
>>>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>>>>>>>  {
>>>>>>>  	pte_t *pte = (pte_t *)data;
>>>>>>>  
>>>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
>>>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
>>>>>>
>>>>>> why is logging never active if we are called from MMU notifiers?
>>>>>
[...]

>> The comment is because when you look at this function it is not obvious
>> why we pass logging_active=false, despite logging may actually be
>> active.  This could suggest that the parameter to stage2_set_pte()
>> should be named differently (break_huge_pmds) or something like that,
>> but we can also be satisfied with the comment.
> 
> Ok I see, I was thinking you thought it was breaking something.
> Yeah I'll add the comment, in reality this is another use case
> where a PMD may need to be converted to page table so it makes sense
> to contrast use cases.
> 
>>
>>>
>>> Should I add comments on flag use in other places as well?
>>>
>>
>> It's always a judgement call.  I didn't find it necessarry to put a
>> comment elsewhere because I think it's pretty obivous that we would
>> never care about logging writes to device regions.
>>
>> However, this made me think, are we making sure that we are not marking
>> device mappings as read-only in the wp_range functions?  I think it's
> 
> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
> these regions wind up in KVMState->KVMSlot[], when
> memory_region_add_subregion() is called KVM listener installs it.
> For migration and dirty page logging QEMU walks the KVMSlot[] array.
> 
> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
> causes the memory region to be added to KVMState->KVMSlot[].
> In that case it's possible to walk KVMState->KVMSlot[] issue
> the ioctl and  come across  a device mapping with normal memory and
> WP it's s2ptes (VFIO sets unmigrateble state though).
> 
> But I'm not sure what's there to stop someone calling the ioctl and
> install a region with device memory type. Most likely though if you
> installed that kind of region migration would be disabled.
> 
> But just for logging use not checking memory type could be an issue.

Clarifying above a bit, KVM structures like kvm_run or vgic don't go
through KVM_SET_USER_MEMORY_REGION interface (can't think of any
other KVM structures). VFIO uses KVM_SET_USER_MEMORY_REGION,
user_mem_abort() should resolve the fault. I recall VFIO patch
series add that support.

It should be ok to write protect MMIO regions installed through
KVM_SET_USER_MEMORY_REGION. Although at this time I don't know
of use case for logging without migration, so this may not be
an issue at all at this time.

> 
>> quite bad if we mark the VCPU interface as read-only for example.
>>
>> -Christoffer
>>
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-14  1:20               ` Mario Smarduch
@ 2014-08-18 12:51                 ` Christoffer Dall
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-18 12:51 UTC (permalink / raw)
  To: Mario Smarduch
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On Wed, Aug 13, 2014 at 06:20:19PM -0700, Mario Smarduch wrote:
> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
> > On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
> >> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> >>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> >>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> >>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> > 
> > [...]
> > 
> >>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>>>>>  {
> >>>>>>  	pte_t *pte = (pte_t *)data;
> >>>>>>  
> >>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> >>>>>
> >>>>> why is logging never active if we are called from MMU notifiers?
> >>>>
> >>>> mmu notifiers update sptes, but I don't see how these updates
> >>>> can result in guest dirty pages. Also guest pages are marked dirty
> >>>> from 2nd stage page fault handlers (searching through the code).
> >>>>
> >>> Ok, then add:
> >>>
> >>> /*
> >>>  * We can always call stage2_set_pte with logging_active == false,
> >>>  * because MMU notifiers will have unmapped a huge PMD before calling
> >>>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
> >>>  * stage2_set_pte() never needs to clear out a huge PMD through this
> >>>  * calling path.
> >>>  */
> >>
> >> So here on permission change to primary pte's kernel first invalidates
> >> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
> >> consequence of invalidation we unmap huge PMDs, if a page falls in that
> >> range.
> >>
> >> Is the comment to point out use of logging flag under various scenarios?
> > 
> > The comment is because when you look at this function it is not obvious
> > why we pass logging_active=false, despite logging may actually be
> > active.  This could suggest that the parameter to stage2_set_pte()
> > should be named differently (break_huge_pmds) or something like that,
> > but we can also be satisfied with the comment.
> 
> Ok I see, I was thinking you thought it was breaking something.
> Yeah I'll add the comment, in reality this is another use case
> where a PMD may need to be converted to page table so it makes sense
> to contrast use cases.
> 

the hidden knowledge is that MMU notifiers will ensure a huge PMD gets
unmapped before trying to change the physical backing of the underlying
PTEs, so it's a gigantic kernel bug if this gets called on something
mapped with a huge PMD.


> > 
> >>
> >> Should I add comments on flag use in other places as well?
> >>
> > 
> > It's always a judgement call.  I didn't find it necessarry to put a
> > comment elsewhere because I think it's pretty obivous that we would
> > never care about logging writes to device regions.
> > 
> > However, this made me think, are we making sure that we are not marking
> > device mappings as read-only in the wp_range functions?  I think it's
> 
> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
> these regions wind up in KVMState->KVMSlot[], when
> memory_region_add_subregion() is called KVM listener installs it.
> For migration and dirty page logging QEMU walks the KVMSlot[] array.
> 
> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
> causes the memory region to be added to KVMState->KVMSlot[].
> In that case it's possible to walk KVMState->KVMSlot[] issue
> the ioctl and  come across  a device mapping with normal memory and
> WP it's s2ptes (VFIO sets unmigrateble state though).
> 
> But I'm not sure what's there to stop someone calling the ioctl and
> install a region with device memory type. Most likely though if you
> installed that kind of region migration would be disabled.
> 
> But just for logging use not checking memory type could be an issue.
> 
I forgot that the current write-protect'ing is limited to the memory
region boundaries, so everything should be fine.

If user-space write-protects device memory regions, the worst
consequence is that it breaks the guest, but that's its own
responsibility, so I don't think you need to change anything.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-18 12:51                 ` Christoffer Dall
  0 siblings, 0 replies; 60+ messages in thread
From: Christoffer Dall @ 2014-08-18 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 13, 2014 at 06:20:19PM -0700, Mario Smarduch wrote:
> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
> > On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
> >> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> >>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> >>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> >>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> > 
> > [...]
> > 
> >>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>>>>>  {
> >>>>>>  	pte_t *pte = (pte_t *)data;
> >>>>>>  
> >>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> >>>>>
> >>>>> why is logging never active if we are called from MMU notifiers?
> >>>>
> >>>> mmu notifiers update sptes, but I don't see how these updates
> >>>> can result in guest dirty pages. Also guest pages are marked dirty
> >>>> from 2nd stage page fault handlers (searching through the code).
> >>>>
> >>> Ok, then add:
> >>>
> >>> /*
> >>>  * We can always call stage2_set_pte with logging_active == false,
> >>>  * because MMU notifiers will have unmapped a huge PMD before calling
> >>>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
> >>>  * stage2_set_pte() never needs to clear out a huge PMD through this
> >>>  * calling path.
> >>>  */
> >>
> >> So here on permission change to primary pte's kernel first invalidates
> >> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
> >> consequence of invalidation we unmap huge PMDs, if a page falls in that
> >> range.
> >>
> >> Is the comment to point out use of logging flag under various scenarios?
> > 
> > The comment is because when you look at this function it is not obvious
> > why we pass logging_active=false, despite logging may actually be
> > active.  This could suggest that the parameter to stage2_set_pte()
> > should be named differently (break_huge_pmds) or something like that,
> > but we can also be satisfied with the comment.
> 
> Ok I see, I was thinking you thought it was breaking something.
> Yeah I'll add the comment, in reality this is another use case
> where a PMD may need to be converted to page table so it makes sense
> to contrast use cases.
> 

the hidden knowledge is that MMU notifiers will ensure a huge PMD gets
unmapped before trying to change the physical backing of the underlying
PTEs, so it's a gigantic kernel bug if this gets called on something
mapped with a huge PMD.


> > 
> >>
> >> Should I add comments on flag use in other places as well?
> >>
> > 
> > It's always a judgement call.  I didn't find it necessarry to put a
> > comment elsewhere because I think it's pretty obivous that we would
> > never care about logging writes to device regions.
> > 
> > However, this made me think, are we making sure that we are not marking
> > device mappings as read-only in the wp_range functions?  I think it's
> 
> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
> these regions wind up in KVMState->KVMSlot[], when
> memory_region_add_subregion() is called KVM listener installs it.
> For migration and dirty page logging QEMU walks the KVMSlot[] array.
> 
> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
> causes the memory region to be added to KVMState->KVMSlot[].
> In that case it's possible to walk KVMState->KVMSlot[] issue
> the ioctl and  come across  a device mapping with normal memory and
> WP it's s2ptes (VFIO sets unmigrateble state though).
> 
> But I'm not sure what's there to stop someone calling the ioctl and
> install a region with device memory type. Most likely though if you
> installed that kind of region migration would be disabled.
> 
> But just for logging use not checking memory type could be an issue.
> 
I forgot that the current write-protect'ing is limited to the memory
region boundaries, so everything should be fine.

If user-space write-protects device memory regions, the worst
consequence is that it breaks the guest, but that's its own
responsibility, so I don't think you need to change anything.

-Christoffer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
  2014-08-18 12:51                 ` Christoffer Dall
@ 2014-08-18 17:42                   ` Mario Smarduch
  -1 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-18 17:42 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, marc.zyngier, pbonzini, gleb, agraf, xiantao.zhang,
	borntraeger, cornelia.huck, xiaoguangrong, steve.capper, kvm,
	linux-arm-kernel, jays.lee, sungjinn.chung

On 08/18/2014 05:51 AM, Christoffer Dall wrote:
> On Wed, Aug 13, 2014 at 06:20:19PM -0700, Mario Smarduch wrote:
>> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
>>> On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
>>>> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
>>>>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>>>>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>>>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>>>
>>> [...]
>>>
>>>>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)

>>
>> Ok I see, I was thinking you thought it was breaking something.
>> Yeah I'll add the comment, in reality this is another use case
>> where a PMD may need to be converted to page table so it makes sense
>> to contrast use cases.
>>
> 
> the hidden knowledge is that MMU notifiers will ensure a huge PMD gets
> unmapped before trying to change the physical backing of the underlying
> PTEs, so it's a gigantic kernel bug if this gets called on something
> mapped with a huge PMD.
> 

That's a good way of putting it luckily I was aware previously looking
at some other features.

> 
>>>
>>>>
>>>> Should I add comments on flag use in other places as well?
>>>>
>>>
>>> It's always a judgement call.  I didn't find it necessarry to put a
>>> comment elsewhere because I think it's pretty obivous that we would
>>> never care about logging writes to device regions.
>>>
>>> However, this made me think, are we making sure that we are not marking
>>> device mappings as read-only in the wp_range functions?  I think it's
>>
>> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
>> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
>> these regions wind up in KVMState->KVMSlot[], when
>> memory_region_add_subregion() is called KVM listener installs it.
>> For migration and dirty page logging QEMU walks the KVMSlot[] array.
>>
>> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
>> causes the memory region to be added to KVMState->KVMSlot[].
>> In that case it's possible to walk KVMState->KVMSlot[] issue
>> the ioctl and  come across  a device mapping with normal memory and
>> WP it's s2ptes (VFIO sets unmigrateble state though).
>>
>> But I'm not sure what's there to stop someone calling the ioctl and
>> install a region with device memory type. Most likely though if you
>> installed that kind of region migration would be disabled.
>>
>> But just for logging use not checking memory type could be an issue.
>>
> I forgot that the current write-protect'ing is limited to the memory
> region boundaries, so everything should be fine.

I looked through this way back, but it was worth to revisit.
> 
> If user-space write-protects device memory regions, the worst
> consequence is that it breaks the guest, but that's its own
> responsibility, so I don't think you need to change anything.

Thanks for the detailed review. I'll go off now, rebase and make
the needed changes

> 
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
@ 2014-08-18 17:42                   ` Mario Smarduch
  0 siblings, 0 replies; 60+ messages in thread
From: Mario Smarduch @ 2014-08-18 17:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/18/2014 05:51 AM, Christoffer Dall wrote:
> On Wed, Aug 13, 2014 at 06:20:19PM -0700, Mario Smarduch wrote:
>> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
>>> On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
>>>> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
>>>>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
>>>>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>>>>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>>>
>>> [...]
>>>
>>>>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)

>>
>> Ok I see, I was thinking you thought it was breaking something.
>> Yeah I'll add the comment, in reality this is another use case
>> where a PMD may need to be converted to page table so it makes sense
>> to contrast use cases.
>>
> 
> the hidden knowledge is that MMU notifiers will ensure a huge PMD gets
> unmapped before trying to change the physical backing of the underlying
> PTEs, so it's a gigantic kernel bug if this gets called on something
> mapped with a huge PMD.
> 

That's a good way of putting it luckily I was aware previously looking
at some other features.

> 
>>>
>>>>
>>>> Should I add comments on flag use in other places as well?
>>>>
>>>
>>> It's always a judgement call.  I didn't find it necessarry to put a
>>> comment elsewhere because I think it's pretty obivous that we would
>>> never care about logging writes to device regions.
>>>
>>> However, this made me think, are we making sure that we are not marking
>>> device mappings as read-only in the wp_range functions?  I think it's
>>
>> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
>> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
>> these regions wind up in KVMState->KVMSlot[], when
>> memory_region_add_subregion() is called KVM listener installs it.
>> For migration and dirty page logging QEMU walks the KVMSlot[] array.
>>
>> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
>> causes the memory region to be added to KVMState->KVMSlot[].
>> In that case it's possible to walk KVMState->KVMSlot[] issue
>> the ioctl and  come across  a device mapping with normal memory and
>> WP it's s2ptes (VFIO sets unmigrateble state though).
>>
>> But I'm not sure what's there to stop someone calling the ioctl and
>> install a region with device memory type. Most likely though if you
>> installed that kind of region migration would be disabled.
>>
>> But just for logging use not checking memory type could be an issue.
>>
> I forgot that the current write-protect'ing is limited to the memory
> region boundaries, so everything should be fine.

I looked through this way back, but it was worth to revisit.
> 
> If user-space write-protects device memory regions, the worst
> consequence is that it breaks the guest, but that's its own
> responsibility, so I don't think you need to change anything.

Thanks for the detailed review. I'll go off now, rebase and make
the needed changes

> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2014-08-18 17:42 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-25  0:56 [PATCH v9 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
2014-07-25  0:56 ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-07-25  6:12   ` Alexander Graf
2014-07-25  6:12     ` Alexander Graf
2014-07-25 17:37     ` Mario Smarduch
2014-07-25 17:37       ` Mario Smarduch
2014-08-08 17:50       ` [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs ... - looking for comments Mario Smarduch
2014-08-08 17:50         ` Mario Smarduch
2014-08-11 19:12   ` [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush Christoffer Dall
2014-08-11 19:12     ` Christoffer Dall
2014-08-11 23:54     ` Mario Smarduch
2014-08-11 23:54       ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 2/4] arm: ARMv7 dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-07-25  6:16   ` Alexander Graf
2014-07-25  6:16     ` Alexander Graf
2014-07-25 17:45     ` Mario Smarduch
2014-07-25 17:45       ` Mario Smarduch
2014-08-11 19:12   ` Christoffer Dall
2014-08-11 19:12     ` Christoffer Dall
2014-08-12  0:16     ` Mario Smarduch
2014-08-12  0:16       ` Mario Smarduch
2014-08-12  9:32       ` Christoffer Dall
2014-08-12  9:32         ` Christoffer Dall
2014-08-12 23:17         ` Mario Smarduch
2014-08-12 23:17           ` Mario Smarduch
2014-08-12  1:36     ` Mario Smarduch
2014-08-12  1:36       ` Mario Smarduch
2014-08-12  9:36       ` Christoffer Dall
2014-08-12  9:36         ` Christoffer Dall
2014-08-12 21:08         ` Mario Smarduch
2014-08-12 21:08           ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-08-11 19:13   ` Christoffer Dall
2014-08-11 19:13     ` Christoffer Dall
2014-08-12  0:24     ` Mario Smarduch
2014-08-12  0:24       ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-08-11 19:13   ` Christoffer Dall
2014-08-11 19:13     ` Christoffer Dall
2014-08-12  1:25     ` Mario Smarduch
2014-08-12  1:25       ` Mario Smarduch
2014-08-12  9:50       ` Christoffer Dall
2014-08-12  9:50         ` Christoffer Dall
2014-08-13  1:27         ` Mario Smarduch
2014-08-13  1:27           ` Mario Smarduch
2014-08-13  7:30           ` Christoffer Dall
2014-08-13  7:30             ` Christoffer Dall
2014-08-14  1:20             ` Mario Smarduch
2014-08-14  1:20               ` Mario Smarduch
2014-08-15  0:01               ` Mario Smarduch
2014-08-15  0:01                 ` Mario Smarduch
2014-08-18 12:51               ` Christoffer Dall
2014-08-18 12:51                 ` Christoffer Dall
2014-08-18 17:42                 ` Mario Smarduch
2014-08-18 17:42                   ` Mario Smarduch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.