linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/19] KVM: Dynamically size memslot arrays
@ 2020-01-21 22:31 Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot Sean Christopherson
                   ` (18 more replies)
  0 siblings, 19 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

I'd love to get this into 5.6 if possible.  The rebase to kvm/queue only
had a single superficial conflict with 668effb63de8 ("KVM: Fix some wrong
function names in comment"), and s390 (Christian) and arm64 (Marc) both
got smoke tested in v4.

Thanks!

v5:
  - Make the selftest x86-only. [Christian].
  - Collect tags. [Peter]
  - Rebase to kvm/queue, fb0c5f8fb698 ("KVM: x86: inline memslot_...").

v4:
  - Add patch 01 to fix an x86 rmap/lpage bug, and patches 10 and 11 to
    resolve hidden conflicts with the bug fix.
  - Collect tags [Christian, Marc, Philippe].
  - Rebase to kvm/queue, commit e41a90be9659 ("KVM: x86/mmu: WARN if
    root_hpa is invalid when handling a page fault").
v3:
  - Fix build errors on PPC and MIPS due to missed params during
    refactoring [kbuild test robot].
  - Rename the helpers for update_memslots() and add comments describing
    the new algorithm and how it interacts with searching [Paolo].
  - Remove the unnecessary and obnoxious warning regarding memslots being
    a flexible array [Paolo].
  - Fix typos in the changelog of patch 09/15 [Christoffer].
  - Collect tags [Christoffer].

v2:
  - Split "Drop kvm_arch_create_memslot()" into three patches to move
    minor functional changes to standalone patches [Janosch].
  - Rebase to latest kvm/queue (f0574a1cea5b, "KVM: x86: fix ...")
  - Collect an Acked-by and a Reviewed-by

*** v1 cover letter ***

The end goal of this series is to dynamically size the memslot array so
that KVM allocates memory based on the number of memslots in use, as
opposed to unconditionally allocating memory for the maximum number of
memslots.  On x86, each memslot consumes 88 bytes, and so with 2 address
spaces of 512 memslots, each VM consumes ~90k bytes for the memslots.
E.g. given a VM that uses a total of 30 memslots, dynamic sizing reduces
the memory footprint from 90k to ~2.6k bytes.

The changes required to support dynamic sizing are relatively small,
e.g. are essentially contained in patches 17/19 and 18/19.

Patches 2-16 clean up the memslot code, which has gotten quite crusty,
especially __kvm_set_memory_region().  The clean up is likely not strictly
necessary to switch to dynamic sizing, but I didn't have a remotely
reasonable level of confidence in the correctness of the dynamic sizing
without first doing the clean up.

The only functional change in v4 is the addition of an x86-specific bug
fix in x86's handling of KVM_MR_MOVE.  The bug fix is not directly related
to dynamically allocating memslots, but it has subtle and hidden conflicts
with the cleanup patches, and the fix is higher priority than anything
else in the series, i.e. should be merged first.

On non-x86 architectures, v3 and v4 should be functionally equivalent,
the only non-x86 change in v4 is the dropping of a "const" in
kvm_arch_commit_memory_region().

Sean Christopherson (19):
  KVM: x86: Allocate new rmap and large page tracking when moving
    memslot
  KVM: Reinstall old memslots if arch preparation fails
  KVM: Don't free new memslot if allocation of said memslot fails
  KVM: PPC: Move memslot memory allocation into prepare_memory_region()
  KVM: x86: Allocate memslot resources during prepare_memory_region()
  KVM: Drop kvm_arch_create_memslot()
  KVM: Explicitly free allocated-but-unused dirty bitmap
  KVM: Refactor error handling for setting memory region
  KVM: Move setting of memslot into helper routine
  KVM: Drop "const" attribute from old memslot in commit_memory_region()
  KVM: x86: Free arrays for old memslot when moving memslot's base gfn
  KVM: Move memslot deletion to helper function
  KVM: Simplify kvm_free_memslot() and all its descendents
  KVM: Clean up local variable usage in __kvm_set_memory_region()
  KVM: Provide common implementation for generic dirty log functions
  KVM: Ensure validity of memslot with respect to kvm_get_dirty_log()
  KVM: Terminate memslot walks via used_slots
  KVM: Dynamically size memslot array based on number of used slots
  KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION

 arch/mips/include/asm/kvm_host.h              |   2 +-
 arch/mips/kvm/mips.c                          |  71 +-
 arch/powerpc/include/asm/kvm_ppc.h            |  17 +-
 arch/powerpc/kvm/book3s.c                     |  22 +-
 arch/powerpc/kvm/book3s_hv.c                  |  36 +-
 arch/powerpc/kvm/book3s_pr.c                  |  20 +-
 arch/powerpc/kvm/booke.c                      |  17 +-
 arch/powerpc/kvm/powerpc.c                    |  15 +-
 arch/s390/include/asm/kvm_host.h              |   2 +-
 arch/s390/kvm/kvm-s390.c                      |  23 +-
 arch/x86/include/asm/kvm_page_track.h         |   3 +-
 arch/x86/kvm/mmu/page_track.c                 |  15 +-
 arch/x86/kvm/x86.c                            | 114 +---
 include/linux/kvm_host.h                      |  48 +-
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../testing/selftests/kvm/include/kvm_util.h  |   1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  30 +
 .../kvm/x86_64/set_memory_region_test.c       | 142 ++++
 virt/kvm/arm/arm.c                            |  48 +-
 virt/kvm/arm/mmu.c                            |  20 +-
 virt/kvm/kvm_main.c                           | 621 ++++++++++++------
 22 files changed, 734 insertions(+), 535 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/set_memory_region_test.c

-- 
2.24.1


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-05 21:49   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 02/19] KVM: Reinstall old memslots if arch preparation fails Sean Christopherson
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Reallocate a rmap array and recalcuate large page compatibility when
moving an existing memslot to correctly handle the alignment properties
of the new memslot.  The number of rmap entries required at each level
is dependent on the alignment of the memslot's base gfn with respect to
that level, e.g. moving a large-page aligned memslot so that it becomes
unaligned will increase the number of rmap entries needed at the now
unaligned level.

Not updating the rmap array is the most obvious bug, as KVM accesses
garbage data beyond the end of the rmap.  KVM interprets the bad data as
pointers, leading to non-canonical #GPs, unexpected #PFs, etc...

  general protection fault: 0000 [#1] SMP
  CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
  Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
  RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
  RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
  RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
  RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
  R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
  R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
  FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
  Call Trace:
   kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
   __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
   kvm_set_memory_region+0x45/0x60 [kvm]
   kvm_vm_ioctl+0x30f/0x920 [kvm]
   do_vfs_ioctl+0xa1/0x620
   ksys_ioctl+0x66/0x70
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x4c/0x170
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f7ed9911f47
  Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
  RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
  RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
  R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
  R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
  Modules linked in: kvm_intel kvm irqbypass
  ---[ end trace 0c5f570b3358ca89 ]---

The disallow_lpage tracking is more subtle.  Failure to update results
in KVM creating large pages when it shouldn't, either due to stale data
or again due to indexing beyond the end of the metadata arrays, which
can lead to memory corruption and/or leaking data to guest/userspace.

Note, the arrays for the old memslot are freed by the unconditional call
to kvm_free_memslot() in __kvm_set_memory_region().

Fixes: 05da45583de9b ("KVM: MMU: large page support")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c30ebe74e5d..1953c71c52f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9793,6 +9793,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 {
 	int i;
 
+	/*
+	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
+	 * old arrays will be freed by __kvm_set_memory_region() if installing
+	 * the new memslot is successful.
+	 */
+	memset(&slot->arch, 0, sizeof(slot->arch));
+
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
 		struct kvm_lpage_info *linfo;
 		unsigned long ugfn;
@@ -9867,6 +9874,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
 				enum kvm_mr_change change)
 {
+	if (change == KVM_MR_MOVE)
+		return kvm_arch_create_memslot(kvm, memslot,
+					       mem->memory_size >> PAGE_SHIFT);
+
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 02/19] KVM: Reinstall old memslots if arch preparation fails
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-05 22:08   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 03/19] KVM: Don't free new memslot if allocation of said memslot fails Sean Christopherson
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Reinstall the old memslots if preparing the new memory region fails
after invalidating a to-be-{re}moved memslot.

Remove the superfluous 'old_memslots' variable so that it's somewhat
clear that the error handling path needs to free the unused memslots,
not simply the 'old' memslots.

Fixes: bc6678a33d9b9 ("KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index eb3709d55139..1429d8b726ea 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -998,7 +998,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	unsigned long npages;
 	struct kvm_memory_slot *slot;
 	struct kvm_memory_slot old, new;
-	struct kvm_memslots *slots = NULL, *old_memslots;
+	struct kvm_memslots *slots;
 	int as_id, id;
 	enum kvm_mr_change change;
 
@@ -1106,7 +1106,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		slot = id_to_memslot(slots, id);
 		slot->flags |= KVM_MEMSLOT_INVALID;
 
-		old_memslots = install_new_memslots(kvm, as_id, slots);
+		/*
+		 * We can re-use the old memslots, the only difference from the
+		 * newly installed memslots is the invalid flag, which will get
+		 * dropped by update_memslots anyway.  We'll also revert to the
+		 * old memslots if preparing the new memory region fails.
+		 */
+		slots = install_new_memslots(kvm, as_id, slots);
 
 		/* From this point no new shadow pages pointing to a deleted,
 		 * or moved, memslot will be created.
@@ -1116,13 +1122,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		 *	- kvm_is_visible_gfn (mmu_check_root)
 		 */
 		kvm_arch_flush_shadow_memslot(kvm, slot);
-
-		/*
-		 * We can re-use the old_memslots from above, the only difference
-		 * from the currently installed memslots is the invalid flag.  This
-		 * will get overwritten by update_memslots anyway.
-		 */
-		slots = old_memslots;
 	}
 
 	r = kvm_arch_prepare_memory_region(kvm, &new, mem, change);
@@ -1136,15 +1135,17 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	}
 
 	update_memslots(slots, &new, change);
-	old_memslots = install_new_memslots(kvm, as_id, slots);
+	slots = install_new_memslots(kvm, as_id, slots);
 
 	kvm_arch_commit_memory_region(kvm, mem, &old, &new, change);
 
 	kvm_free_memslot(kvm, &old, &new);
-	kvfree(old_memslots);
+	kvfree(slots);
 	return 0;
 
 out_slots:
+	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+		slots = install_new_memslots(kvm, as_id, slots);
 	kvfree(slots);
 out_free:
 	kvm_free_memslot(kvm, &new, &old);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 03/19] KVM: Don't free new memslot if allocation of said memslot fails
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 02/19] KVM: Reinstall old memslots if arch preparation fails Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-05 22:28   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 04/19] KVM: PPC: Move memslot memory allocation into prepare_memory_region() Sean Christopherson
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

The two implementations of kvm_arch_create_memslot() in x86 and PPC are
both good citizens and free up all local resources if creation fails.
Return immediately (via a superfluous goto) instead of calling
kvm_free_memslot().

Note, the call to kvm_free_memslot() is effectively an expensive nop in
this case as there are no resources to be freed.

No functional change intended.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1429d8b726ea..8c9211182680 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1088,7 +1088,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		new.userspace_addr = mem->userspace_addr;
 
 		if (kvm_arch_create_memslot(kvm, &new, npages))
-			goto out_free;
+			goto out;
 	}
 
 	/* Allocate page dirty bitmap if needed */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 04/19] KVM: PPC: Move memslot memory allocation into prepare_memory_region()
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (2 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 03/19] KVM: Don't free new memslot if allocation of said memslot fails Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-05 22:41   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 05/19] KVM: x86: Allocate memslot resources during prepare_memory_region() Sean Christopherson
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Allocate the rmap array during kvm_arch_prepare_memory_region() to pave
the way for removing kvm_arch_create_memslot() altogether.  Moving PPC's
memory allocation only changes the order of kernel memory allocations
between PPC and common KVM code.

No functional change intended.

Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/powerpc/include/asm/kvm_ppc.h | 11 ++++-------
 arch/powerpc/kvm/book3s.c          | 12 ++++--------
 arch/powerpc/kvm/book3s_hv.c       | 25 ++++++++++++-------------
 arch/powerpc/kvm/book3s_pr.c       | 11 ++---------
 arch/powerpc/kvm/booke.c           |  9 ++-------
 arch/powerpc/kvm/powerpc.c         |  4 ++--
 6 files changed, 26 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index bc2494e5710a..d162649430ba 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -202,12 +202,10 @@ extern void kvmppc_core_destroy_vm(struct kvm *kvm);
 extern void kvmppc_core_free_memslot(struct kvm *kvm,
 				     struct kvm_memory_slot *free,
 				     struct kvm_memory_slot *dont);
-extern int kvmppc_core_create_memslot(struct kvm *kvm,
-				      struct kvm_memory_slot *slot,
-				      unsigned long npages);
 extern int kvmppc_core_prepare_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *memslot,
-				const struct kvm_userspace_memory_region *mem);
+				const struct kvm_userspace_memory_region *mem,
+				enum kvm_mr_change change);
 extern void kvmppc_core_commit_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
 				const struct kvm_memory_slot *old,
@@ -280,7 +278,8 @@ struct kvmppc_ops {
 	void (*flush_memslot)(struct kvm *kvm, struct kvm_memory_slot *memslot);
 	int (*prepare_memory_region)(struct kvm *kvm,
 				     struct kvm_memory_slot *memslot,
-				     const struct kvm_userspace_memory_region *mem);
+				     const struct kvm_userspace_memory_region *mem,
+				     enum kvm_mr_change change);
 	void (*commit_memory_region)(struct kvm *kvm,
 				     const struct kvm_userspace_memory_region *mem,
 				     const struct kvm_memory_slot *old,
@@ -294,8 +293,6 @@ struct kvmppc_ops {
 	void (*mmu_destroy)(struct kvm_vcpu *vcpu);
 	void (*free_memslot)(struct kvm_memory_slot *free,
 			     struct kvm_memory_slot *dont);
-	int (*create_memslot)(struct kvm_memory_slot *slot,
-			      unsigned long npages);
 	int (*init_vm)(struct kvm *kvm);
 	void (*destroy_vm)(struct kvm *kvm);
 	int (*get_smmu_info)(struct kvm *kvm, struct kvm_ppc_smmu_info *info);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index d07a8e12fa15..e9149a815806 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -810,12 +810,6 @@ void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 	kvm->arch.kvm_ops->free_memslot(free, dont);
 }
 
-int kvmppc_core_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			       unsigned long npages)
-{
-	return kvm->arch.kvm_ops->create_memslot(slot, npages);
-}
-
 void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 	kvm->arch.kvm_ops->flush_memslot(kvm, memslot);
@@ -823,9 +817,11 @@ void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
 
 int kvmppc_core_prepare_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *memslot,
-				const struct kvm_userspace_memory_region *mem)
+				const struct kvm_userspace_memory_region *mem,
+				enum kvm_mr_change change)
 {
-	return kvm->arch.kvm_ops->prepare_memory_region(kvm, memslot, mem);
+	return kvm->arch.kvm_ops->prepare_memory_region(kvm, memslot, mem,
+							change);
 }
 
 void kvmppc_core_commit_memory_region(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f4b72cef09d5..c3d4abae5a4d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4453,20 +4453,20 @@ static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free,
 	}
 }
 
-static int kvmppc_core_create_memslot_hv(struct kvm_memory_slot *slot,
-					 unsigned long npages)
-{
-	slot->arch.rmap = vzalloc(array_size(npages, sizeof(*slot->arch.rmap)));
-	if (!slot->arch.rmap)
-		return -ENOMEM;
-
-	return 0;
-}
-
 static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
-					struct kvm_memory_slot *memslot,
-					const struct kvm_userspace_memory_region *mem)
+					struct kvm_memory_slot *slot,
+					const struct kvm_userspace_memory_region *mem,
+					enum kvm_mr_change change)
 {
+	unsigned long npages = mem->memory_size >> PAGE_SHIFT;
+
+	if (change == KVM_MR_CREATE) {
+		slot->arch.rmap = vzalloc(array_size(npages,
+					  sizeof(*slot->arch.rmap)));
+		if (!slot->arch.rmap)
+			return -ENOMEM;
+	}
+
 	return 0;
 }
 
@@ -5525,7 +5525,6 @@ static struct kvmppc_ops kvm_ops_hv = {
 	.set_spte_hva = kvm_set_spte_hva_hv,
 	.mmu_destroy  = kvmppc_mmu_destroy_hv,
 	.free_memslot = kvmppc_core_free_memslot_hv,
-	.create_memslot = kvmppc_core_create_memslot_hv,
 	.init_vm =  kvmppc_core_init_vm_hv,
 	.destroy_vm = kvmppc_core_destroy_vm_hv,
 	.get_smmu_info = kvm_vm_ioctl_get_smmu_info_hv,
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d88f708d5be3..d507bbf7a685 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1927,7 +1927,8 @@ static void kvmppc_core_flush_memslot_pr(struct kvm *kvm,
 
 static int kvmppc_core_prepare_memory_region_pr(struct kvm *kvm,
 					struct kvm_memory_slot *memslot,
-					const struct kvm_userspace_memory_region *mem)
+					const struct kvm_userspace_memory_region *mem,
+					enum kvm_mr_change change)
 {
 	return 0;
 }
@@ -1947,13 +1948,6 @@ static void kvmppc_core_free_memslot_pr(struct kvm_memory_slot *free,
 	return;
 }
 
-static int kvmppc_core_create_memslot_pr(struct kvm_memory_slot *slot,
-					 unsigned long npages)
-{
-	return 0;
-}
-
-
 #ifdef CONFIG_PPC64
 static int kvm_vm_ioctl_get_smmu_info_pr(struct kvm *kvm,
 					 struct kvm_ppc_smmu_info *info)
@@ -2098,7 +2092,6 @@ static struct kvmppc_ops kvm_ops_pr = {
 	.set_spte_hva = kvm_set_spte_hva_pr,
 	.mmu_destroy  = kvmppc_mmu_destroy_pr,
 	.free_memslot = kvmppc_core_free_memslot_pr,
-	.create_memslot = kvmppc_core_create_memslot_pr,
 	.init_vm = kvmppc_core_init_vm_pr,
 	.destroy_vm = kvmppc_core_destroy_vm_pr,
 	.get_smmu_info = kvm_vm_ioctl_get_smmu_info_pr,
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 9cb8257b4118..05cf548f172c 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1776,15 +1776,10 @@ void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 {
 }
 
-int kvmppc_core_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			       unsigned long npages)
-{
-	return 0;
-}
-
 int kvmppc_core_prepare_memory_region(struct kvm *kvm,
 				      struct kvm_memory_slot *memslot,
-				      const struct kvm_userspace_memory_region *mem)
+				      const struct kvm_userspace_memory_region *mem,
+				      enum kvm_mr_change change)
 {
 	return 0;
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1af96fb5dc6f..4e0f7a77920d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -694,7 +694,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    unsigned long npages)
 {
-	return kvmppc_core_create_memslot(kvm, slot, npages);
+	return 0;
 }
 
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
@@ -702,7 +702,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   const struct kvm_userspace_memory_region *mem,
 				   enum kvm_mr_change change)
 {
-	return kvmppc_core_prepare_memory_region(kvm, memslot, mem);
+	return kvmppc_core_prepare_memory_region(kvm, memslot, mem, change);
 }
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 05/19] KVM: x86: Allocate memslot resources during prepare_memory_region()
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (3 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 04/19] KVM: PPC: Move memslot memory allocation into prepare_memory_region() Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 06/19] KVM: Drop kvm_arch_create_memslot() Sean Christopherson
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Allocate the various metadata structures associated with a new memslot
during kvm_arch_prepare_memory_region(), which paves the way for
removing kvm_arch_create_memslot() altogether.  Moving x86's memory
allocation only changes the order of kernel memory allocations between
x86 and common KVM code.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1953c71c52f2..35a7e3d9b99a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9790,6 +9790,12 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    unsigned long npages)
+{
+	return 0;
+}
+
+static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
+				      unsigned long npages)
 {
 	int i;
 
@@ -9874,10 +9880,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
 				enum kvm_mr_change change)
 {
-	if (change == KVM_MR_MOVE)
-		return kvm_arch_create_memslot(kvm, memslot,
-					       mem->memory_size >> PAGE_SHIFT);
-
+	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE)
+		return kvm_alloc_memslot_metadata(memslot,
+						  mem->memory_size >> PAGE_SHIFT);
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 06/19] KVM: Drop kvm_arch_create_memslot()
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (4 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 05/19] KVM: x86: Allocate memslot resources during prepare_memory_region() Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-05 22:45   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 07/19] KVM: Explicitly free allocated-but-unused dirty bitmap Sean Christopherson
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Remove kvm_arch_create_memslot() now that all arch implementations are
effectively nops.  Removing kvm_arch_create_memslot() eliminates the
possibility for arch specific code to allocate memory prior to setting
a memslot, which sets the stage for simplifying kvm_free_memslot().

Cc: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/mips/kvm/mips.c       |  6 ------
 arch/powerpc/kvm/powerpc.c |  6 ------
 arch/s390/kvm/kvm-s390.c   |  6 ------
 arch/x86/kvm/x86.c         |  6 ------
 include/linux/kvm_host.h   |  2 --
 virt/kvm/arm/mmu.c         |  6 ------
 virt/kvm/kvm_main.c        | 21 +++++++--------------
 7 files changed, 7 insertions(+), 46 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 2606f3f02b54..6d54e18ebdc1 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -188,12 +188,6 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl,
 	return -ENOIOCTLCMD;
 }
 
-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    unsigned long npages)
-{
-	return 0;
-}
-
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
 	/* Flush whole GPA */
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4e0f7a77920d..48abf1b9ad58 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -691,12 +691,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 	kvmppc_core_free_memslot(kvm, free, dont);
 }
 
-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    unsigned long npages)
-{
-	return 0;
-}
-
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   struct kvm_memory_slot *memslot,
 				   const struct kvm_userspace_memory_region *mem,
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c059b86aacd4..743e09dd38b5 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4483,12 +4483,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 	return VM_FAULT_SIGBUS;
 }
 
-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    unsigned long npages)
-{
-	return 0;
-}
-
 /* Section: memory related */
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   struct kvm_memory_slot *memslot,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 35a7e3d9b99a..b55bf2ecdd98 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9788,12 +9788,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 	kvm_page_track_free_memslot(free, dont);
 }
 
-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    unsigned long npages)
-{
-	return 0;
-}
-
 static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
 				      unsigned long npages)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6d5331b0d937..aa5cb2ff7a2b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -671,8 +671,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			    const struct kvm_userspace_memory_region *mem);
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 			   struct kvm_memory_slot *dont);
-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    unsigned long npages);
 void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen);
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *memslot,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index dc8254bf30ea..66f59c067bf6 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -2356,12 +2356,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 {
 }
 
-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    unsigned long npages)
-{
-	return 0;
-}
-
 void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 {
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8c9211182680..5e777ab02395 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1039,12 +1039,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	new.base_gfn = base_gfn;
 	new.npages = npages;
 	new.flags = mem->flags;
+	new.userspace_addr = mem->userspace_addr;
 
 	if (npages) {
 		if (!old.npages)
 			change = KVM_MR_CREATE;
 		else { /* Modify an existing slot. */
-			if ((mem->userspace_addr != old.userspace_addr) ||
+			if ((new.userspace_addr != old.userspace_addr) ||
 			    (npages != old.npages) ||
 			    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
 				goto out;
@@ -1079,22 +1080,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		}
 	}
 
-	/* Free page dirty bitmap if unneeded */
-	if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
-		new.dirty_bitmap = NULL;
-
 	r = -ENOMEM;
-	if (change == KVM_MR_CREATE) {
-		new.userspace_addr = mem->userspace_addr;
 
-		if (kvm_arch_create_memslot(kvm, &new, npages))
-			goto out;
-	}
-
-	/* Allocate page dirty bitmap if needed */
-	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
+	/* Allocate/free page dirty bitmap as needed */
+	if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
+		new.dirty_bitmap = NULL;
+	else if (!new.dirty_bitmap) {
 		if (kvm_create_dirty_bitmap(&new) < 0)
-			goto out_free;
+			goto out;
 	}
 
 	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 07/19] KVM: Explicitly free allocated-but-unused dirty bitmap
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (5 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 06/19] KVM: Drop kvm_arch_create_memslot() Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 08/19] KVM: Refactor error handling for setting memory region Sean Christopherson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Explicitly free an allocated-but-unused dirty bitmap instead of relying
on kvm_free_memslot() if an error occurs in __kvm_set_memory_region().
There is no longer a need to abuse kvm_free_memslot() to free arch
specific resources as arch specific code is now called only after the
common flow is guaranteed to succeed.  Arch code can still fail, but
it's responsible for its own cleanup in that case.

Eliminating the error path's abuse of kvm_free_memslot() paves the way
for simplifying kvm_free_memslot(), i.e. dropping its @dont param.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5e777ab02395..9bc51f062022 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1092,7 +1092,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
 	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
 	if (!slots)
-		goto out_free;
+		goto out_bitmap;
 	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
 
 	if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
@@ -1140,8 +1140,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
 		slots = install_new_memslots(kvm, as_id, slots);
 	kvfree(slots);
-out_free:
-	kvm_free_memslot(kvm, &new, &old);
+out_bitmap:
+	if (new.dirty_bitmap && !old.dirty_bitmap)
+		kvm_destroy_dirty_bitmap(&new);
 out:
 	return r;
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 08/19] KVM: Refactor error handling for setting memory region
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (6 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 07/19] KVM: Explicitly free allocated-but-unused dirty bitmap Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-05 22:48   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 09/19] KVM: Move setting of memslot into helper routine Sean Christopherson
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Replace a big pile o' gotos with returns to make it more obvious what
error code is being returned, and to prepare for refactoring the
functional, i.e. post-checks, portion of __kvm_set_memory_region().

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 40 ++++++++++++++++++----------------------
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9bc51f062022..fdf045a5d240 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1004,34 +1004,33 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
 	r = check_memory_region_flags(mem);
 	if (r)
-		goto out;
+		return r;
 
-	r = -EINVAL;
 	as_id = mem->slot >> 16;
 	id = (u16)mem->slot;
 
 	/* General sanity checks */
 	if (mem->memory_size & (PAGE_SIZE - 1))
-		goto out;
+		return -EINVAL;
 	if (mem->guest_phys_addr & (PAGE_SIZE - 1))
-		goto out;
+		return -EINVAL;
 	/* We can read the guest memory with __xxx_user() later on. */
 	if ((id < KVM_USER_MEM_SLOTS) &&
 	    ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
 	     !access_ok((void __user *)(unsigned long)mem->userspace_addr,
 			mem->memory_size)))
-		goto out;
+		return -EINVAL;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
-		goto out;
+		return -EINVAL;
 	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
-		goto out;
+		return -EINVAL;
 
 	slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
 	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
 	npages = mem->memory_size >> PAGE_SHIFT;
 
 	if (npages > KVM_MEM_MAX_NR_PAGES)
-		goto out;
+		return -EINVAL;
 
 	new = old = *slot;
 
@@ -1048,20 +1047,18 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			if ((new.userspace_addr != old.userspace_addr) ||
 			    (npages != old.npages) ||
 			    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
-				goto out;
+				return -EINVAL;
 
 			if (base_gfn != old.base_gfn)
 				change = KVM_MR_MOVE;
 			else if (new.flags != old.flags)
 				change = KVM_MR_FLAGS_ONLY;
-			else { /* Nothing to change. */
-				r = 0;
-				goto out;
-			}
+			else /* Nothing to change. */
+				return 0;
 		}
 	} else {
 		if (!old.npages)
-			goto out;
+			return -EINVAL;
 
 		change = KVM_MR_DELETE;
 		new.base_gfn = 0;
@@ -1070,29 +1067,29 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
 	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
 		/* Check for overlaps */
-		r = -EEXIST;
 		kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
 			if (slot->id == id)
 				continue;
 			if (!((base_gfn + npages <= slot->base_gfn) ||
 			      (base_gfn >= slot->base_gfn + slot->npages)))
-				goto out;
+				return -EEXIST;
 		}
 	}
 
-	r = -ENOMEM;
-
 	/* Allocate/free page dirty bitmap as needed */
 	if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
 		new.dirty_bitmap = NULL;
 	else if (!new.dirty_bitmap) {
-		if (kvm_create_dirty_bitmap(&new) < 0)
-			goto out;
+		r = kvm_create_dirty_bitmap(&new);
+		if (r)
+			return r;
 	}
 
 	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
-	if (!slots)
+	if (!slots) {
+		r = -ENOMEM;
 		goto out_bitmap;
+	}
 	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
 
 	if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
@@ -1143,7 +1140,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
 out_bitmap:
 	if (new.dirty_bitmap && !old.dirty_bitmap)
 		kvm_destroy_dirty_bitmap(&new);
-out:
 	return r;
 }
 EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 09/19] KVM: Move setting of memslot into helper routine
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (7 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 08/19] KVM: Refactor error handling for setting memory region Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 16:26   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 10/19] KVM: Drop "const" attribute from old memslot in commit_memory_region() Sean Christopherson
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Split out the core functionality of setting a memslot into a separate
helper in preparation for moving memslot deletion into its own routine.

Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 106 ++++++++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 43 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fdf045a5d240..64f6c5d35260 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -982,6 +982,66 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 	return old_memslots;
 }
 
+static int kvm_set_memslot(struct kvm *kvm,
+			   const struct kvm_userspace_memory_region *mem,
+			   const struct kvm_memory_slot *old,
+			   struct kvm_memory_slot *new, int as_id,
+			   enum kvm_mr_change change)
+{
+	struct kvm_memory_slot *slot;
+	struct kvm_memslots *slots;
+	int r;
+
+	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
+	if (!slots)
+		return -ENOMEM;
+	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
+
+	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+		/*
+		 * Note, the INVALID flag needs to be in the appropriate entry
+		 * in the freshly allocated memslots, not in @old or @new.
+		 */
+		slot = id_to_memslot(slots, old->id);
+		slot->flags |= KVM_MEMSLOT_INVALID;
+
+		/*
+		 * We can re-use the old memslots, the only difference from the
+		 * newly installed memslots is the invalid flag, which will get
+		 * dropped by update_memslots anyway.  We'll also revert to the
+		 * old memslots if preparing the new memory region fails.
+		 */
+		slots = install_new_memslots(kvm, as_id, slots);
+
+		/* From this point no new shadow pages pointing to a deleted,
+		 * or moved, memslot will be created.
+		 *
+		 * validation of sp->gfn happens in:
+		 *	- gfn_to_hva (kvm_read_guest, gfn_to_pfn)
+		 *	- kvm_is_visible_gfn (mmu_check_root)
+		 */
+		kvm_arch_flush_shadow_memslot(kvm, slot);
+	}
+
+	r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
+	if (r)
+		goto out_slots;
+
+	update_memslots(slots, new, change);
+	slots = install_new_memslots(kvm, as_id, slots);
+
+	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
+
+	kvfree(slots);
+	return 0;
+
+out_slots:
+	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+		slots = install_new_memslots(kvm, as_id, slots);
+	kvfree(slots);
+	return r;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -998,7 +1058,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	unsigned long npages;
 	struct kvm_memory_slot *slot;
 	struct kvm_memory_slot old, new;
-	struct kvm_memslots *slots;
 	int as_id, id;
 	enum kvm_mr_change change;
 
@@ -1085,58 +1144,19 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			return r;
 	}
 
-	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
-	if (!slots) {
-		r = -ENOMEM;
-		goto out_bitmap;
-	}
-	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
-
-	if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
-		slot = id_to_memslot(slots, id);
-		slot->flags |= KVM_MEMSLOT_INVALID;
-
-		/*
-		 * We can re-use the old memslots, the only difference from the
-		 * newly installed memslots is the invalid flag, which will get
-		 * dropped by update_memslots anyway.  We'll also revert to the
-		 * old memslots if preparing the new memory region fails.
-		 */
-		slots = install_new_memslots(kvm, as_id, slots);
-
-		/* From this point no new shadow pages pointing to a deleted,
-		 * or moved, memslot will be created.
-		 *
-		 * validation of sp->gfn happens in:
-		 *	- gfn_to_hva (kvm_read_guest, gfn_to_pfn)
-		 *	- kvm_is_visible_gfn (mmu_check_root)
-		 */
-		kvm_arch_flush_shadow_memslot(kvm, slot);
-	}
-
-	r = kvm_arch_prepare_memory_region(kvm, &new, mem, change);
-	if (r)
-		goto out_slots;
-
 	/* actual memory is freed via old in kvm_free_memslot below */
 	if (change == KVM_MR_DELETE) {
 		new.dirty_bitmap = NULL;
 		memset(&new.arch, 0, sizeof(new.arch));
 	}
 
-	update_memslots(slots, &new, change);
-	slots = install_new_memslots(kvm, as_id, slots);
-
-	kvm_arch_commit_memory_region(kvm, mem, &old, &new, change);
+	r = kvm_set_memslot(kvm, mem, &old, &new, as_id, change);
+	if (r)
+		goto out_bitmap;
 
 	kvm_free_memslot(kvm, &old, &new);
-	kvfree(slots);
 	return 0;
 
-out_slots:
-	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
-		slots = install_new_memslots(kvm, as_id, slots);
-	kvfree(slots);
 out_bitmap:
 	if (new.dirty_bitmap && !old.dirty_bitmap)
 		kvm_destroy_dirty_bitmap(&new);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 10/19] KVM: Drop "const" attribute from old memslot in commit_memory_region()
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (8 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 09/19] KVM: Move setting of memslot into helper routine Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 16:26   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 11/19] KVM: x86: Free arrays for old memslot when moving memslot's base gfn Sean Christopherson
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Drop the "const" attribute from @old in kvm_arch_commit_memory_region()
to allow arch specific code to free arch specific resources in the old
memslot without having to cast away the attribute.  Freeing resources in
kvm_arch_commit_memory_region() paves the way for simplifying
kvm_free_memslot() by eliminating the last usage of its @dont param.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/mips/kvm/mips.c       | 2 +-
 arch/powerpc/kvm/powerpc.c | 2 +-
 arch/s390/kvm/kvm-s390.c   | 2 +-
 arch/x86/kvm/x86.c         | 2 +-
 include/linux/kvm_host.h   | 2 +-
 virt/kvm/arm/mmu.c         | 2 +-
 virt/kvm/kvm_main.c        | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 6d54e18ebdc1..908f7ec3e755 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -224,7 +224,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_userspace_memory_region *mem,
-				   const struct kvm_memory_slot *old,
+				   struct kvm_memory_slot *old,
 				   const struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 48abf1b9ad58..768c4a9269be 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -701,7 +701,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_userspace_memory_region *mem,
-				   const struct kvm_memory_slot *old,
+				   struct kvm_memory_slot *old,
 				   const struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 743e09dd38b5..1bfbeac13a3b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4508,7 +4508,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
-				const struct kvm_memory_slot *old,
+				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b55bf2ecdd98..a9d2d9decbc3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9932,7 +9932,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
-				const struct kvm_memory_slot *old,
+				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index aa5cb2ff7a2b..33b76106cd75 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -678,7 +678,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				enum kvm_mr_change change);
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_userspace_memory_region *mem,
-				const struct kvm_memory_slot *old,
+				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change);
 bool kvm_largepages_enabled(void);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 66f59c067bf6..c9e0acefaba2 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -2253,7 +2253,7 @@ int kvm_mmu_init(void)
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_userspace_memory_region *mem,
-				   const struct kvm_memory_slot *old,
+				   struct kvm_memory_slot *old,
 				   const struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 64f6c5d35260..69d6158cb405 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -984,7 +984,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 
 static int kvm_set_memslot(struct kvm *kvm,
 			   const struct kvm_userspace_memory_region *mem,
-			   const struct kvm_memory_slot *old,
+			   struct kvm_memory_slot *old,
 			   struct kvm_memory_slot *new, int as_id,
 			   enum kvm_mr_change change)
 {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 11/19] KVM: x86: Free arrays for old memslot when moving memslot's base gfn
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (9 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 10/19] KVM: Drop "const" attribute from old memslot in commit_memory_region() Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 12/19] KVM: Move memslot deletion to helper function Sean Christopherson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Explicitly free the metadata arrays (stored in slot->arch) in the old
memslot structure when moving the memslot's base gfn is committed.  This
eliminates x86's dependency on kvm_free_memslot() being called when a
memlsot move is committed, and paves the way for removing the funky code
in kvm_free_memslot() that conditionally frees structures based on its
@dont param.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/x86.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9d2d9decbc3..cd7af962accf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9974,6 +9974,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 	 */
 	if (change != KVM_MR_DELETE)
 		kvm_mmu_slot_apply_flags(kvm, (struct kvm_memory_slot *) new);
+
+	/* Free the arrays associated with the old memslot. */
+	if (change == KVM_MR_MOVE)
+		kvm_arch_free_memslot(kvm, old, NULL);
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 12/19] KVM: Move memslot deletion to helper function
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (10 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 11/19] KVM: x86: Free arrays for old memslot when moving memslot's base gfn Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 16:14   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 13/19] KVM: Simplify kvm_free_memslot() and all its descendents Sean Christopherson
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Move memslot deletion into its own routine so that the success path for
other memslot updates does not need to use kvm_free_memslot(), i.e. can
explicitly destroy the dirty bitmap when necessary.  This paves the way
for dropping @dont from kvm_free_memslot(), i.e. all callers now pass
NULL for @dont.

Add a comment above the code to make a copy of the existing memslot
prior to deletion, it is not at all obvious that the pointer will become
stale during sorting and/or installation of new memslots.

Note, kvm_arch_commit_memory_region() allows an architecture to free
resources when moving a memslot or changing its flags, e.g. x86 frees
its arch specific memslot metadata during commit_memory_region().

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 73 +++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 69d6158cb405..5ae2dd6d9993 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1042,6 +1042,27 @@ static int kvm_set_memslot(struct kvm *kvm,
 	return r;
 }
 
+static int kvm_delete_memslot(struct kvm *kvm,
+			      const struct kvm_userspace_memory_region *mem,
+			      struct kvm_memory_slot *old, int as_id)
+{
+	struct kvm_memory_slot new;
+	int r;
+
+	if (!old->npages)
+		return -EINVAL;
+
+	memset(&new, 0, sizeof(new));
+	new.id = old->id;
+
+	r = kvm_set_memslot(kvm, mem, old, &new, as_id, KVM_MR_DELETE);
+	if (r)
+		return r;
+
+	kvm_free_memslot(kvm, old, NULL);
+	return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -1091,7 +1112,15 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (npages > KVM_MEM_MAX_NR_PAGES)
 		return -EINVAL;
 
-	new = old = *slot;
+	/*
+	 * Make a full copy of the old memslot, the pointer will become stale
+	 * when the memslots are re-sorted by update_memslots().
+	 */
+	old = *slot;
+	if (!mem->memory_size)
+		return kvm_delete_memslot(kvm, mem, &old, as_id);
+
+	new = old;
 
 	new.id = id;
 	new.base_gfn = base_gfn;
@@ -1099,29 +1128,20 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	new.flags = mem->flags;
 	new.userspace_addr = mem->userspace_addr;
 
-	if (npages) {
-		if (!old.npages)
-			change = KVM_MR_CREATE;
-		else { /* Modify an existing slot. */
-			if ((new.userspace_addr != old.userspace_addr) ||
-			    (npages != old.npages) ||
-			    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
-				return -EINVAL;
-
-			if (base_gfn != old.base_gfn)
-				change = KVM_MR_MOVE;
-			else if (new.flags != old.flags)
-				change = KVM_MR_FLAGS_ONLY;
-			else /* Nothing to change. */
-				return 0;
-		}
-	} else {
-		if (!old.npages)
+	if (!old.npages) {
+		change = KVM_MR_CREATE;
+	} else { /* Modify an existing slot. */
+		if ((new.userspace_addr != old.userspace_addr) ||
+		    (npages != old.npages) ||
+		    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
 			return -EINVAL;
 
-		change = KVM_MR_DELETE;
-		new.base_gfn = 0;
-		new.flags = 0;
+		if (base_gfn != old.base_gfn)
+			change = KVM_MR_MOVE;
+		else if (new.flags != old.flags)
+			change = KVM_MR_FLAGS_ONLY;
+		else /* Nothing to change. */
+			return 0;
 	}
 
 	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
@@ -1144,17 +1164,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
 			return r;
 	}
 
-	/* actual memory is freed via old in kvm_free_memslot below */
-	if (change == KVM_MR_DELETE) {
-		new.dirty_bitmap = NULL;
-		memset(&new.arch, 0, sizeof(new.arch));
-	}
-
 	r = kvm_set_memslot(kvm, mem, &old, &new, as_id, change);
 	if (r)
 		goto out_bitmap;
 
-	kvm_free_memslot(kvm, &old, &new);
+	if (old.dirty_bitmap && !new.dirty_bitmap)
+		kvm_destroy_dirty_bitmap(&old);
 	return 0;
 
 out_bitmap:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 13/19] KVM: Simplify kvm_free_memslot() and all its descendents
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (11 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 12/19] KVM: Move memslot deletion to helper function Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 16:29   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region() Sean Christopherson
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Now that all callers of kvm_free_memslot() pass NULL for @dont, remove
the param from the top-level routine and all arch's implementations.

No functional change intended.

Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/mips/include/asm/kvm_host.h      |  2 +-
 arch/powerpc/include/asm/kvm_ppc.h    |  6 ++----
 arch/powerpc/kvm/book3s.c             |  5 ++---
 arch/powerpc/kvm/book3s_hv.c          |  9 +++------
 arch/powerpc/kvm/book3s_pr.c          |  3 +--
 arch/powerpc/kvm/booke.c              |  3 +--
 arch/powerpc/kvm/powerpc.c            |  5 ++---
 arch/s390/include/asm/kvm_host.h      |  2 +-
 arch/x86/include/asm/kvm_page_track.h |  3 +--
 arch/x86/kvm/mmu/page_track.c         | 15 ++++++---------
 arch/x86/kvm/x86.c                    | 21 ++++++++-------------
 include/linux/kvm_host.h              |  3 +--
 virt/kvm/arm/mmu.c                    |  3 +--
 virt/kvm/kvm_main.c                   | 18 +++++++-----------
 14 files changed, 37 insertions(+), 61 deletions(-)

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 41204a49cf95..2c343c346b79 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -1133,7 +1133,7 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_free_memslot(struct kvm *kvm,
-		struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {}
+					 struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index d162649430ba..406ec46304d5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -200,8 +200,7 @@ extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
 extern int kvmppc_core_init_vm(struct kvm *kvm);
 extern void kvmppc_core_destroy_vm(struct kvm *kvm);
 extern void kvmppc_core_free_memslot(struct kvm *kvm,
-				     struct kvm_memory_slot *free,
-				     struct kvm_memory_slot *dont);
+				     struct kvm_memory_slot *slot);
 extern int kvmppc_core_prepare_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *memslot,
 				const struct kvm_userspace_memory_region *mem,
@@ -291,8 +290,7 @@ struct kvmppc_ops {
 	int (*test_age_hva)(struct kvm *kvm, unsigned long hva);
 	void (*set_spte_hva)(struct kvm *kvm, unsigned long hva, pte_t pte);
 	void (*mmu_destroy)(struct kvm_vcpu *vcpu);
-	void (*free_memslot)(struct kvm_memory_slot *free,
-			     struct kvm_memory_slot *dont);
+	void (*free_memslot)(struct kvm_memory_slot *slot);
 	int (*init_vm)(struct kvm *kvm);
 	void (*destroy_vm)(struct kvm *kvm);
 	int (*get_smmu_info)(struct kvm *kvm, struct kvm_ppc_smmu_info *info);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index e9149a815806..97ce6c4f7b48 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -804,10 +804,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
 }
 
-void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-			      struct kvm_memory_slot *dont)
+void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
-	kvm->arch.kvm_ops->free_memslot(free, dont);
+	kvm->arch.kvm_ops->free_memslot(slot);
 }
 
 void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c3d4abae5a4d..4afabedcbace 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4444,13 +4444,10 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
 	return r;
 }
 
-static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free,
-					struct kvm_memory_slot *dont)
+static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *slot)
 {
-	if (!dont || free->arch.rmap != dont->arch.rmap) {
-		vfree(free->arch.rmap);
-		free->arch.rmap = NULL;
-	}
+	vfree(slot->arch.rmap);
+	slot->arch.rmap = NULL;
 }
 
 static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d507bbf7a685..c6d1fc3be91c 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1942,8 +1942,7 @@ static void kvmppc_core_commit_memory_region_pr(struct kvm *kvm,
 	return;
 }
 
-static void kvmppc_core_free_memslot_pr(struct kvm_memory_slot *free,
-					struct kvm_memory_slot *dont)
+static void kvmppc_core_free_memslot_pr(struct kvm_memory_slot *slot)
 {
 	return;
 }
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 05cf548f172c..24212b6ab03f 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1771,8 +1771,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	return -ENOTSUPP;
 }
 
-void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-			      struct kvm_memory_slot *dont)
+void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 768c4a9269be..838cdcd2db12 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -685,10 +685,9 @@ long kvm_arch_dev_ioctl(struct file *filp,
 	return -EINVAL;
 }
 
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-			   struct kvm_memory_slot *dont)
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
-	kvmppc_core_free_memslot(kvm, free, dont);
+	kvmppc_core_free_memslot(kvm, slot);
 }
 
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 11ecc4071a29..14e36bd7ab91 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -916,7 +916,7 @@ static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_free_memslot(struct kvm *kvm,
-		struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {}
+					 struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 172f9749dbb2..87bd6025d91d 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -49,8 +49,7 @@ struct kvm_page_track_notifier_node {
 void kvm_page_track_init(struct kvm *kvm);
 void kvm_page_track_cleanup(struct kvm *kvm);
 
-void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
-				 struct kvm_memory_slot *dont);
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
 int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 3521e2d176f2..d125ec379c79 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -19,17 +19,14 @@
 
 #include "mmu.h"
 
-void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
-				 struct kvm_memory_slot *dont)
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
 {
 	int i;
 
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
-		if (!dont || free->arch.gfn_track[i] !=
-		      dont->arch.gfn_track[i]) {
-			kvfree(free->arch.gfn_track[i]);
-			free->arch.gfn_track[i] = NULL;
-		}
+	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
+		kvfree(slot->arch.gfn_track[i]);
+		slot->arch.gfn_track[i] = NULL;
+	}
 }
 
 int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
@@ -48,7 +45,7 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 	return 0;
 
 track_free:
-	kvm_page_track_free_memslot(slot, NULL);
+	kvm_page_track_free_memslot(slot);
 	return -ENOMEM;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd7af962accf..fe914655b9d0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9765,27 +9765,22 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_hv_destroy_vm(kvm);
 }
 
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-			   struct kvm_memory_slot *dont)
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
 	int i;
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
-		if (!dont || free->arch.rmap[i] != dont->arch.rmap[i]) {
-			kvfree(free->arch.rmap[i]);
-			free->arch.rmap[i] = NULL;
-		}
+		kvfree(slot->arch.rmap[i]);
+		slot->arch.rmap[i] = NULL;
+
 		if (i == 0)
 			continue;
 
-		if (!dont || free->arch.lpage_info[i - 1] !=
-			     dont->arch.lpage_info[i - 1]) {
-			kvfree(free->arch.lpage_info[i - 1]);
-			free->arch.lpage_info[i - 1] = NULL;
-		}
+		kvfree(slot->arch.lpage_info[i - 1]);
+		slot->arch.lpage_info[i - 1] = NULL;
 	}
 
-	kvm_page_track_free_memslot(free, dont);
+	kvm_page_track_free_memslot(slot);
 }
 
 static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
@@ -9977,7 +9972,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 
 	/* Free the arrays associated with the old memslot. */
 	if (change == KVM_MR_MOVE)
-		kvm_arch_free_memslot(kvm, old, NULL);
+		kvm_arch_free_memslot(kvm, old);
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 33b76106cd75..dac96f9c0a82 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -669,8 +669,7 @@ int kvm_set_memory_region(struct kvm *kvm,
 			  const struct kvm_userspace_memory_region *mem);
 int __kvm_set_memory_region(struct kvm *kvm,
 			    const struct kvm_userspace_memory_region *mem);
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-			   struct kvm_memory_slot *dont);
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen);
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *memslot,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index c9e0acefaba2..23af65f8fd0f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -2351,8 +2351,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	return ret;
 }
 
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-			   struct kvm_memory_slot *dont)
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5ae2dd6d9993..3278ddf5a56a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -579,18 +579,14 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
 	memslot->dirty_bitmap = NULL;
 }
 
-/*
- * Free any memory in @free but not in @dont.
- */
-static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-			      struct kvm_memory_slot *dont)
+static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
-	if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
-		kvm_destroy_dirty_bitmap(free);
+	kvm_destroy_dirty_bitmap(slot);
 
-	kvm_arch_free_memslot(kvm, free, dont);
+	kvm_arch_free_memslot(kvm, slot);
 
-	free->npages = 0;
+	slot->flags = 0;
+	slot->npages = 0;
 }
 
 static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
@@ -601,7 +597,7 @@ static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
 		return;
 
 	kvm_for_each_memslot(memslot, slots)
-		kvm_free_memslot(kvm, memslot, NULL);
+		kvm_free_memslot(kvm, memslot);
 
 	kvfree(slots);
 }
@@ -1059,7 +1055,7 @@ static int kvm_delete_memslot(struct kvm *kvm,
 	if (r)
 		return r;
 
-	kvm_free_memslot(kvm, old, NULL);
+	kvm_free_memslot(kvm, old);
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region()
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (12 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 13/19] KVM: Simplify kvm_free_memslot() and all its descendents Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 19:06   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions Sean Christopherson
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Clean up __kvm_set_memory_region() to achieve several goals:

  - Remove local variables that serve no real purpose
  - Improve the readability of the code
  - Better show the relationship between the 'old' and 'new' memslot
  - Prepare for dynamically sizing memslots.

Note, using 'tmp' to hold the initial memslot is not strictly necessary
at this juncture, e.g. 'old' could be directly copied from
id_to_memslot(), but keep the pointer usage as id_to_memslot() will be
able to return a NULL pointer once memslots are dynamically sized.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 virt/kvm/kvm_main.c | 47 +++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3278ddf5a56a..6210738cf2f6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1070,13 +1070,11 @@ static int kvm_delete_memslot(struct kvm *kvm,
 int __kvm_set_memory_region(struct kvm *kvm,
 			    const struct kvm_userspace_memory_region *mem)
 {
-	int r;
-	gfn_t base_gfn;
-	unsigned long npages;
-	struct kvm_memory_slot *slot;
 	struct kvm_memory_slot old, new;
-	int as_id, id;
+	struct kvm_memory_slot *tmp;
 	enum kvm_mr_change change;
+	int as_id, id;
+	int r;
 
 	r = check_memory_region_flags(mem);
 	if (r)
@@ -1101,52 +1099,55 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
 		return -EINVAL;
 
-	slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
-	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
-	npages = mem->memory_size >> PAGE_SHIFT;
-
-	if (npages > KVM_MEM_MAX_NR_PAGES)
-		return -EINVAL;
-
 	/*
 	 * Make a full copy of the old memslot, the pointer will become stale
 	 * when the memslots are re-sorted by update_memslots().
 	 */
-	old = *slot;
+	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
+	old = *tmp;
+	tmp = NULL;
+
 	if (!mem->memory_size)
 		return kvm_delete_memslot(kvm, mem, &old, as_id);
 
-	new = old;
-
 	new.id = id;
-	new.base_gfn = base_gfn;
-	new.npages = npages;
+	new.base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
+	new.npages = mem->memory_size >> PAGE_SHIFT;
 	new.flags = mem->flags;
 	new.userspace_addr = mem->userspace_addr;
 
+	if (new.npages > KVM_MEM_MAX_NR_PAGES)
+		return -EINVAL;
+
 	if (!old.npages) {
 		change = KVM_MR_CREATE;
+		new.dirty_bitmap = NULL;
+		memset(&new.arch, 0, sizeof(new.arch));
 	} else { /* Modify an existing slot. */
 		if ((new.userspace_addr != old.userspace_addr) ||
-		    (npages != old.npages) ||
+		    (new.npages != old.npages) ||
 		    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
 			return -EINVAL;
 
-		if (base_gfn != old.base_gfn)
+		if (new.base_gfn != old.base_gfn)
 			change = KVM_MR_MOVE;
 		else if (new.flags != old.flags)
 			change = KVM_MR_FLAGS_ONLY;
 		else /* Nothing to change. */
 			return 0;
+
+		/* Copy dirty_bitmap and arch from the current memslot. */
+		new.dirty_bitmap = old.dirty_bitmap;
+		memcpy(&new.arch, &old.arch, sizeof(new.arch));
 	}
 
 	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
 		/* Check for overlaps */
-		kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
-			if (slot->id == id)
+		kvm_for_each_memslot(tmp, __kvm_memslots(kvm, as_id)) {
+			if (tmp->id == id)
 				continue;
-			if (!((base_gfn + npages <= slot->base_gfn) ||
-			      (base_gfn >= slot->base_gfn + slot->npages)))
+			if (!((new.base_gfn + new.npages <= tmp->base_gfn) ||
+			      (new.base_gfn >= tmp->base_gfn + tmp->npages)))
 				return -EEXIST;
 		}
 	}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (13 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region() Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 20:02   ` Peter Xu
  2020-02-06 21:24   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 16/19] KVM: Ensure validity of memslot with respect to kvm_get_dirty_log() Sean Christopherson
                   ` (3 subsequent siblings)
  18 siblings, 2 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
The arch specific implemenations are extremely similar, differing
only in whether the dirty log needs to be sync'd from hardware (x86)
and how the TLBs are flushed.  Add new arch hooks to handle sync
and TLB flush; the sync will also be used for non-generic dirty log
support in a future patch (s390).

The ulterior motive for providing a common implementation is to
eliminate the dependency between arch and common code with respect to
the memslot referenced by the dirty log, i.e. to make it obvious in the
code that the validity of the memslot is guaranteed, as a future patch
will rework memslot handling such that id_to_memslot() can return NULL.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/mips/kvm/mips.c      | 63 +++--------------------------
 arch/powerpc/kvm/book3s.c |  5 +++
 arch/powerpc/kvm/booke.c  |  5 +++
 arch/s390/kvm/kvm-s390.c  |  5 +--
 arch/x86/kvm/x86.c        | 61 ++--------------------------
 include/linux/kvm_host.h  | 21 +++++-----
 virt/kvm/arm/arm.c        | 48 ++--------------------
 virt/kvm/kvm_main.c       | 84 ++++++++++++++++++++++++++++++++-------
 8 files changed, 103 insertions(+), 189 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 908f7ec3e755..8d2da949476f 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -962,69 +962,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl,
 	return r;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed  and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Copy the snapshot to the userspace.
- *   4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *memslot;
-	bool flush = false;
-	int r;
 
-	mutex_lock(&kvm->slots_lock);
-
-	r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
-	if (flush) {
-		slots = kvm_memslots(kvm);
-		memslot = id_to_memslot(slots, log->slot);
-
-		/* Let implementation handle TLB/GVA invalidation */
-		kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
-	}
-
-	mutex_unlock(&kvm->slots_lock);
-	return r;
 }
 
-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+				  struct kvm_memory_slot *memslot)
 {
-	struct kvm_memslots *slots;
-	struct kvm_memory_slot *memslot;
-	bool flush = false;
-	int r;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
-	if (flush) {
-		slots = kvm_memslots(kvm);
-		memslot = id_to_memslot(slots, log->slot);
-
-		/* Let implementation handle TLB/GVA invalidation */
-		kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
-	}
-
-	mutex_unlock(&kvm->slots_lock);
-	return r;
+	/* Let implementation handle TLB/GVA invalidation */
+	kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
 }
 
 long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 97ce6c4f7b48..0adaf4791a6d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -799,6 +799,11 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
 }
 
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+
+}
+
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 24212b6ab03f..08b707a08357 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1766,6 +1766,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return r;
 }
 
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+
+}
+
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	return -ENOTSUPP;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1bfbeac13a3b..dacc13bb4465 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -569,8 +569,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	return r;
 }
 
-static void kvm_s390_sync_dirty_log(struct kvm *kvm,
-				    struct kvm_memory_slot *memslot)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 	int i;
 	gfn_t cur_gfn, last_gfn;
@@ -630,7 +629,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	if (!memslot->dirty_bitmap)
 		goto out;
 
-	kvm_s390_sync_dirty_log(kvm, memslot);
+	kvm_arch_sync_dirty_log(kvm, memslot);
 	r = kvm_get_dirty_log(kvm, log, &is_dirty);
 	if (r)
 		goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe914655b9d0..07f7d6458b89 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4737,77 +4737,24 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed  and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Copy the snapshot to the userspace.
- *   4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
-	bool flush = false;
-	int r;
-
-	mutex_lock(&kvm->slots_lock);
-
 	/*
 	 * Flush potentially hardware-cached dirty pages to dirty_bitmap.
 	 */
 	if (kvm_x86_ops->flush_log_dirty)
 		kvm_x86_ops->flush_log_dirty(kvm);
-
-	r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
-	/*
-	 * All the TLBs can be flushed out of mmu lock, see the comments in
-	 * kvm_mmu_slot_remove_write_access().
-	 */
-	lockdep_assert_held(&kvm->slots_lock);
-	if (flush)
-		kvm_flush_remote_tlbs(kvm);
-
-	mutex_unlock(&kvm->slots_lock);
-	return r;
 }
 
-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+				  struct kvm_memory_slot *memslot)
 {
-	bool flush = false;
-	int r;
-
-	mutex_lock(&kvm->slots_lock);
-
-	/*
-	 * Flush potentially hardware-cached dirty pages to dirty_bitmap.
-	 */
-	if (kvm_x86_ops->flush_log_dirty)
-		kvm_x86_ops->flush_log_dirty(kvm);
-
-	r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
 	/*
 	 * All the TLBs can be flushed out of mmu lock, see the comments in
 	 * kvm_mmu_slot_remove_write_access().
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-	if (flush)
-		kvm_flush_remote_tlbs(kvm);
-
-	mutex_unlock(&kvm->slots_lock);
-	return r;
+	kvm_flush_remote_tlbs(kvm);
 }
 
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dac96f9c0a82..a56ba1a3c8d0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -811,23 +811,20 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf);
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext);
 
-int kvm_get_dirty_log(struct kvm *kvm,
-			struct kvm_dirty_log *log, int *is_dirty);
-
-int kvm_get_dirty_log_protect(struct kvm *kvm,
-			      struct kvm_dirty_log *log, bool *flush);
-int kvm_clear_dirty_log_protect(struct kvm *kvm,
-				struct kvm_clear_dirty_log *log, bool *flush);
-
 void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					struct kvm_memory_slot *slot,
 					gfn_t gfn_offset,
 					unsigned long mask);
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
-				struct kvm_dirty_log *log);
-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
-				  struct kvm_clear_dirty_log *log);
+#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+				  struct kvm_memory_slot *memslot);
+#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
+		      int *is_dirty);
+#endif
 
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
 			bool line_status);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 3ff510599af6..2c9e1c12e53e 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1185,55 +1185,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	return r;
 }
 
-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed  and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- *   1. Take a snapshot of the bit and clear it if needed.
- *   2. Write protect the corresponding page.
- *   3. Copy the snapshot to the userspace.
- *   4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
-	bool flush = false;
-	int r;
 
-	mutex_lock(&kvm->slots_lock);
-
-	r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
-	if (flush)
-		kvm_flush_remote_tlbs(kvm);
-
-	mutex_unlock(&kvm->slots_lock);
-	return r;
 }
 
-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+				  struct kvm_memory_slot *memslot)
 {
-	bool flush = false;
-	int r;
-
-	mutex_lock(&kvm->slots_lock);
-
-	r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
-	if (flush)
-		kvm_flush_remote_tlbs(kvm);
-
-	mutex_unlock(&kvm->slots_lock);
-	return r;
+	kvm_flush_remote_tlbs(kvm);
 }
 
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6210738cf2f6..60692588f0fc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -855,7 +855,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 
 /*
  * Allocation size is twice as large as the actual dirty bitmap size.
- * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
+ * See kvm_vm_ioctl_get_dirty_log() why this is needed.
  */
 static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
@@ -1197,6 +1197,7 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 	return kvm_set_memory_region(kvm, mem);
 }
 
+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 int kvm_get_dirty_log(struct kvm *kvm,
 			struct kvm_dirty_log *log, int *is_dirty)
 {
@@ -1230,13 +1231,12 @@ int kvm_get_dirty_log(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
 
-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+#else /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
 /**
  * kvm_get_dirty_log_protect - get a snapshot of dirty pages
  *	and reenable dirty page tracking for the corresponding pages.
  * @kvm:	pointer to kvm instance
  * @log:	slot id and address to which we copy the log
- * @flush:	true if TLB flush is needed by caller
  *
  * We need to keep it in mind that VCPU threads can write to the bitmap
  * concurrently. So, to avoid losing track of dirty pages we keep the
@@ -1253,8 +1253,7 @@ EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
  * exiting to userspace will be logged for the next call.
  *
  */
-int kvm_get_dirty_log_protect(struct kvm *kvm,
-			struct kvm_dirty_log *log, bool *flush)
+static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
@@ -1262,6 +1261,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
 	unsigned long n;
 	unsigned long *dirty_bitmap;
 	unsigned long *dirty_bitmap_buffer;
+	bool flush;
 
 	as_id = log->slot >> 16;
 	id = (u16)log->slot;
@@ -1275,8 +1275,10 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
 	if (!dirty_bitmap)
 		return -ENOENT;
 
+	kvm_arch_sync_dirty_log(kvm, memslot);
+
 	n = kvm_dirty_bitmap_bytes(memslot);
-	*flush = false;
+	flush = false;
 	if (kvm->manual_dirty_log_protect) {
 		/*
 		 * Unlike kvm_get_dirty_log, we always return false in *flush,
@@ -1299,7 +1301,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
 			if (!dirty_bitmap[i])
 				continue;
 
-			*flush = true;
+			flush = true;
 			mask = xchg(&dirty_bitmap[i], 0);
 			dirty_bitmap_buffer[i] = mask;
 
@@ -1310,21 +1312,55 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
 		spin_unlock(&kvm->mmu_lock);
 	}
 
+	if (flush)
+		kvm_arch_dirty_log_tlb_flush(kvm, memslot);
+
 	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
 		return -EFAULT;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_get_dirty_log_protect);
+
+
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * Steps 1-4 below provide general overview of dirty page logging. See
+ * kvm_get_dirty_log_protect() function description for additional details.
+ *
+ * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
+ * always flush the TLB (step 4) even if previous step failed  and the dirty
+ * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
+ * does not preclude user space subsequent dirty log read. Flushing TLB ensures
+ * writes will be marked dirty for next log read.
+ *
+ *   1. Take a snapshot of the bit and clear it if needed.
+ *   2. Write protect the corresponding page.
+ *   3. Copy the snapshot to the userspace.
+ *   4. Flush TLB's if needed.
+ */
+static int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+				      struct kvm_dirty_log *log)
+{
+	int r;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = kvm_get_dirty_log_protect(kvm, log);
+
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
 
 /**
  * kvm_clear_dirty_log_protect - clear dirty bits in the bitmap
  *	and reenable dirty page tracking for the corresponding pages.
  * @kvm:	pointer to kvm instance
  * @log:	slot id and address from which to fetch the bitmap of dirty pages
- * @flush:	true if TLB flush is needed by caller
  */
-int kvm_clear_dirty_log_protect(struct kvm *kvm,
-				struct kvm_clear_dirty_log *log, bool *flush)
+static int kvm_clear_dirty_log_protect(struct kvm *kvm,
+				       struct kvm_clear_dirty_log *log)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
@@ -1333,6 +1369,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long i, n;
 	unsigned long *dirty_bitmap;
 	unsigned long *dirty_bitmap_buffer;
+	bool flush;
 
 	as_id = log->slot >> 16;
 	id = (u16)log->slot;
@@ -1356,7 +1393,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	    (log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
 	    return -EINVAL;
 
-	*flush = false;
+	kvm_arch_sync_dirty_log(kvm, memslot);
+
+	flush = false;
 	dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
 	if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
 		return -EFAULT;
@@ -1379,17 +1418,32 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
 		 * a problem if userspace sets them in log->dirty_bitmap.
 		*/
 		if (mask) {
-			*flush = true;
+			flush = true;
 			kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
 								offset, mask);
 		}
 	}
 	spin_unlock(&kvm->mmu_lock);
 
+	if (flush)
+		kvm_arch_dirty_log_tlb_flush(kvm, memslot);
+
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_clear_dirty_log_protect);
-#endif
+
+static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
+					struct kvm_clear_dirty_log *log)
+{
+	int r;
+
+	mutex_lock(&kvm->slots_lock);
+
+	r = kvm_clear_dirty_log_protect(kvm, log);
+
+	mutex_unlock(&kvm->slots_lock);
+	return r;
+}
+#endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
 
 bool kvm_largepages_enabled(void)
 {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 16/19] KVM: Ensure validity of memslot with respect to kvm_get_dirty_log()
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (14 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots Sean Christopherson
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Rework kvm_get_dirty_log() so that it "returns" the associated memslot
on success.  A future patch will rework memslot handling such that
id_to_memslot() can return NULL, returning the memslot makes it more
obvious that the validity of the memslot has been verified, i.e.
precludes the need to add validity checks in the arch code that are
technically unnecessary.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/powerpc/kvm/book3s_pr.c |  6 +-----
 arch/s390/kvm/kvm-s390.c     | 12 ++----------
 include/linux/kvm_host.h     |  2 +-
 virt/kvm/kvm_main.c          | 27 +++++++++++++++++++--------
 4 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index c6d1fc3be91c..eb41e897e693 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1884,7 +1884,6 @@ static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 static int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm,
 					 struct kvm_dirty_log *log)
 {
-	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
 	struct kvm_vcpu *vcpu;
 	ulong ga, ga_end;
@@ -1894,15 +1893,12 @@ static int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm,
 
 	mutex_lock(&kvm->slots_lock);
 
-	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	r = kvm_get_dirty_log(kvm, log, &is_dirty, &memslot);
 	if (r)
 		goto out;
 
 	/* If nothing is dirty, don't bother messing with page tables. */
 	if (is_dirty) {
-		slots = kvm_memslots(kvm);
-		memslot = id_to_memslot(slots, log->slot);
-
 		ga = memslot->base_gfn << PAGE_SHIFT;
 		ga_end = ga + (memslot->npages << PAGE_SHIFT);
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index dacc13bb4465..3ca1a784590c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -610,9 +610,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 {
 	int r;
 	unsigned long n;
-	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
-	int is_dirty = 0;
+	int is_dirty;
 
 	if (kvm_is_ucontrol(kvm))
 		return -EINVAL;
@@ -623,14 +622,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 	if (log->slot >= KVM_USER_MEM_SLOTS)
 		goto out;
 
-	slots = kvm_memslots(kvm);
-	memslot = id_to_memslot(slots, log->slot);
-	r = -ENOENT;
-	if (!memslot->dirty_bitmap)
-		goto out;
-
-	kvm_arch_sync_dirty_log(kvm, memslot);
-	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	r = kvm_get_dirty_log(kvm, log, &is_dirty, &memslot);
 	if (r)
 		goto out;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a56ba1a3c8d0..f05be99dc44a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -823,7 +823,7 @@ void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
 #else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
-		      int *is_dirty);
+		      int *is_dirty, struct kvm_memory_slot **memslot);
 #endif
 
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 60692588f0fc..190c065da48d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1198,31 +1198,42 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
 }
 
 #ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-int kvm_get_dirty_log(struct kvm *kvm,
-			struct kvm_dirty_log *log, int *is_dirty)
+/**
+ * kvm_get_dirty_log - get a snapshot of dirty pages
+ * @kvm:	pointer to kvm instance
+ * @log:	slot id and address to which we copy the log
+ * @is_dirty:	set to '1' if any dirty pages were found
+ * @memslot:	set to the associated memslot, always valid on success
+ */
+int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
+		      int *is_dirty, struct kvm_memory_slot **memslot)
 {
 	struct kvm_memslots *slots;
-	struct kvm_memory_slot *memslot;
 	int i, as_id, id;
 	unsigned long n;
 	unsigned long any = 0;
 
+	*memslot = NULL;
+	*is_dirty = 0;
+
 	as_id = log->slot >> 16;
 	id = (u16)log->slot;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
 		return -EINVAL;
 
 	slots = __kvm_memslots(kvm, as_id);
-	memslot = id_to_memslot(slots, id);
-	if (!memslot->dirty_bitmap)
+	*memslot = id_to_memslot(slots, id);
+	if (!(*memslot)->dirty_bitmap)
 		return -ENOENT;
 
-	n = kvm_dirty_bitmap_bytes(memslot);
+	kvm_arch_sync_dirty_log(kvm, *memslot);
+
+	n = kvm_dirty_bitmap_bytes(*memslot);
 
 	for (i = 0; !any && i < n/sizeof(long); ++i)
-		any = memslot->dirty_bitmap[i];
+		any = (*memslot)->dirty_bitmap[i];
 
-	if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
+	if (copy_to_user(log->dirty_bitmap, (*memslot)->dirty_bitmap, n))
 		return -EFAULT;
 
 	if (any)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (15 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 16/19] KVM: Ensure validity of memslot with respect to kvm_get_dirty_log() Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 21:09   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots Sean Christopherson
  2020-01-21 22:31 ` [PATCH v5 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION Sean Christopherson
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Refactor memslot handling to treat the number of used slots as the de
facto size of the memslot array, e.g. return NULL from id_to_memslot()
when an invalid index is provided instead of relying on npages==0 to
detect an invalid memslot.  Rework the sorting and walking of memslots
in advance of dynamically sizing memslots to aid bisection and debug,
e.g. with luck, a bug in the refactoring will bisect here and/or hit a
WARN instead of randomly corrupting memory.

Alternatively, a global null/invalid memslot could be returned, i.e. so
callers of id_to_memslot() don't have to explicitly check for a NULL
memslot, but that approach runs the risk of introducing difficult-to-
debug issues, e.g. if the global null slot is modified.  Constifying
the return from id_to_memslot() to combat such issues is possible, but
would require a massive refactoring of arch specific code and would
still be susceptible to casting shenanigans.

Add function comments to update_memslots() and search_memslots() to
explicitly (and loudly) state how memslots are sorted.

No functional change intended.

Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/powerpc/kvm/book3s_hv.c |   2 +-
 arch/x86/kvm/x86.c           |  14 +--
 include/linux/kvm_host.h     |  18 ++-
 virt/kvm/arm/mmu.c           |   9 +-
 virt/kvm/kvm_main.c          | 220 ++++++++++++++++++++++++++---------
 5 files changed, 189 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4afabedcbace..ea03cb868151 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4397,7 +4397,7 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
 	slots = kvm_memslots(kvm);
 	memslot = id_to_memslot(slots, log->slot);
 	r = -ENOENT;
-	if (!memslot->dirty_bitmap)
+	if (!memslot || !memslot->dirty_bitmap)
 		goto out;
 
 	/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 07f7d6458b89..53d2a86cc91e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9630,9 +9630,9 @@ void kvm_arch_sync_events(struct kvm *kvm)
 int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
 {
 	int i, r;
-	unsigned long hva;
+	unsigned long hva, uninitialized_var(old_npages);
 	struct kvm_memslots *slots = kvm_memslots(kvm);
-	struct kvm_memory_slot *slot, old;
+	struct kvm_memory_slot *slot;
 
 	/* Called with kvm->slots_lock held.  */
 	if (WARN_ON(id >= KVM_MEM_SLOTS_NUM))
@@ -9640,7 +9640,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
 
 	slot = id_to_memslot(slots, id);
 	if (size) {
-		if (slot->npages)
+		if (slot && slot->npages)
 			return -EEXIST;
 
 		/*
@@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
 		if (IS_ERR((void *)hva))
 			return PTR_ERR((void *)hva);
 	} else {
-		if (!slot->npages)
+		if (!slot || !slot->npages)
 			return 0;
 
-		hva = 0;
+		hva = slot->userspace_addr;
+		old_npages = slot->npages;
 	}
 
-	old = *slot;
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
 		struct kvm_userspace_memory_region m;
 
@@ -9673,7 +9673,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
 	}
 
 	if (!size)
-		vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
+		vm_munmap(hva, old_npages * PAGE_SIZE);
 
 	return 0;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f05be99dc44a..60ddfdb69378 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -572,10 +572,11 @@ static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu)
 	return vcpu->vcpu_idx;
 }
 
-#define kvm_for_each_memslot(memslot, slots)	\
-	for (memslot = &slots->memslots[0];	\
-	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
-		memslot++)
+#define kvm_for_each_memslot(memslot, slots)				\
+	for (memslot = &slots->memslots[0];				\
+	     memslot < slots->memslots + slots->used_slots; memslot++)	\
+		if (WARN_ON_ONCE(!memslot->npages)) {			\
+		} else
 
 void kvm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
@@ -635,12 +636,15 @@ static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu)
 	return __kvm_memslots(vcpu->kvm, as_id);
 }
 
-static inline struct kvm_memory_slot *
-id_to_memslot(struct kvm_memslots *slots, int id)
+static inline
+struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
 {
 	int index = slots->id_to_index[id];
 	struct kvm_memory_slot *slot;
 
+	if (index < 0)
+		return NULL;
+
 	slot = &slots->memslots[index];
 
 	WARN_ON(slot->id != id);
@@ -1005,6 +1009,8 @@ bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args);
  * used in non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c.
  * gfn_to_memslot() itself isn't here as an inline because that would
  * bloat other code too much.
+ *
+ * IMPORTANT: Slots are sorted from highest GFN to lowest GFN!
  */
 static inline struct kvm_memory_slot *
 search_memslots(struct kvm_memslots *slots, gfn_t gfn)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 23af65f8fd0f..a1d3813bad76 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1535,8 +1535,13 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
-	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
-	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	phys_addr_t start, end;
+
+	if (WARN_ON_ONCE(!memslot))
+		return;
+
+	start = memslot->base_gfn << PAGE_SHIFT;
+	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
 	stage2_wp_range(kvm, start, end);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 190c065da48d..9b614cf2ca20 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -565,7 +565,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 		return NULL;
 
 	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
-		slots->id_to_index[i] = slots->memslots[i].id = i;
+		slots->id_to_index[i] = slots->memslots[i].id = -1;
 
 	return slots;
 }
@@ -869,63 +869,162 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 }
 
 /*
- * Insert memslot and re-sort memslots based on their GFN,
- * so binary search could be used to lookup GFN.
- * Sorting algorithm takes advantage of having initially
- * sorted array and known changed memslot position.
+ * Delete a memslot by decrementing the number of used slots and shifting all
+ * other entries in the array forward one spot.
+ */
+static inline void kvm_memslot_delete(struct kvm_memslots *slots,
+				      struct kvm_memory_slot *memslot)
+{
+	struct kvm_memory_slot *mslots = slots->memslots;
+	int i;
+
+	if (WARN_ON(slots->id_to_index[memslot->id] == -1))
+		return;
+
+	slots->used_slots--;
+
+	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots; i++) {
+		mslots[i] = mslots[i + 1];
+		slots->id_to_index[mslots[i].id] = i;
+	}
+	mslots[i] = *memslot;
+	slots->id_to_index[memslot->id] = -1;
+}
+
+/*
+ * "Insert" a new memslot by incrementing the number of used slots.  Returns
+ * the new slot's initial index into the memslots array.
+ */
+static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
+{
+	return slots->used_slots++;
+}
+
+/*
+ * Move a changed memslot backwards in the array by shifting existing slots
+ * with a higher GFN toward the front of the array.  Note, the changed memslot
+ * itself is not preserved in the array, i.e. not swapped at this time, only
+ * its new index into the array is tracked.  Returns the changed memslot's
+ * current index into the memslots array.
+ */
+static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
+					    struct kvm_memory_slot *memslot)
+{
+	struct kvm_memory_slot *mslots = slots->memslots;
+	int i;
+
+	if (WARN_ON_ONCE(slots->id_to_index[memslot->id] == -1) ||
+	    WARN_ON_ONCE(!slots->used_slots))
+		return -1;
+
+	/*
+	 * Move the target memslot backward in the array by shifting existing
+	 * memslots with a higher GFN (than the target memslot) towards the
+	 * front of the array.
+	 */
+	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
+		if (memslot->base_gfn > mslots[i + 1].base_gfn)
+			break;
+
+		WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);
+
+		/* Shift the next memslot forward one and update its index. */
+		mslots[i] = mslots[i + 1];
+		slots->id_to_index[mslots[i].id] = i;
+	}
+	return i;
+}
+
+/*
+ * Move a changed memslot forwards in the array by shifting existing slots with
+ * a lower GFN toward the back of the array.  Note, the changed memslot itself
+ * is not preserved in the array, i.e. not swapped at this time, only its new
+ * index into the array is tracked.  Returns the changed memslot's final index
+ * into the memslots array.
+ */
+static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
+					   struct kvm_memory_slot *memslot,
+					   int start)
+{
+	struct kvm_memory_slot *mslots = slots->memslots;
+	int i;
+
+	for (i = start; i > 0; i--) {
+		if (memslot->base_gfn < mslots[i - 1].base_gfn)
+			break;
+
+		WARN_ON_ONCE(memslot->base_gfn == mslots[i - 1].base_gfn);
+
+		/* Shift the next memslot back one and update its index. */
+		mslots[i] = mslots[i - 1];
+		slots->id_to_index[mslots[i].id] = i;
+	}
+	return i;
+}
+
+/*
+ * Re-sort memslots based on their GFN to account for an added, deleted, or
+ * moved memslot.  Sorting memslots by GFN allows using a binary search during
+ * memslot lookup.
+ *
+ * IMPORTANT: Slots are sorted from highest GFN to lowest GFN!  I.e. the entry
+ * at memslots[0] has the highest GFN.
+ *
+ * The sorting algorithm takes advantage of having initially sorted memslots
+ * and knowing the position of the changed memslot.  Sorting is also optimized
+ * by not swapping the updated memslot and instead only shifting other memslots
+ * and tracking the new index for the update memslot.  Only once its final
+ * index is known is the updated memslot copied into its position in the array.
+ *
+ *  - When deleting a memslot, the deleted memslot simply needs to be moved to
+ *    the end of the array.
+ *
+ *  - When creating a memslot, the algorithm "inserts" the new memslot at the
+ *    end of the array and then it forward to its correct location.
+ *
+ *  - When moving a memslot, the algorithm first moves the updated memslot
+ *    backward to handle the scenario where the memslot's GFN was changed to a
+ *    lower value.  update_memslots() then falls through and runs the same flow
+ *    as creating a memslot to move the memslot forward to handle the scenario
+ *    where its GFN was changed to a higher value.
+ *
+ * Note, slots are sorted from highest->lowest instead of lowest->highest for
+ * historical reasons.  Originally, invalid memslots where denoted by having
+ * GFN=0, thus sorting from highest->lowest naturally sorted invalid memslots
+ * to the end of the array.  The current algorithm uses dedicated logic to
+ * delete a memslot and thus does not rely on invalid memslots having GFN=0.
+ *
+ * The other historical motiviation for highest->lowest was to improve the
+ * performance of memslot lookup.  KVM originally used a linear search starting
+ * at memslots[0].  On x86, the largest memslot usually has one of the highest,
+ * if not *the* highest, GFN, as the bulk of the guest's RAM is located in a
+ * single memslot above the 4gb boundary.  As the largest memslot is also the
+ * most likely to be referenced, sorting it to the front of the array was
+ * advantageous.  The current binary search starts from the middle of the array
+ * and uses an LRU pointer to improve performance for all memslots and GFNs.
  */
 static void update_memslots(struct kvm_memslots *slots,
-			    struct kvm_memory_slot *new,
+			    struct kvm_memory_slot *memslot,
 			    enum kvm_mr_change change)
 {
-	int id = new->id;
-	int i = slots->id_to_index[id];
-	struct kvm_memory_slot *mslots = slots->memslots;
+	int i;
 
-	WARN_ON(mslots[i].id != id);
-	switch (change) {
-	case KVM_MR_CREATE:
-		slots->used_slots++;
-		WARN_ON(mslots[i].npages || !new->npages);
-		break;
-	case KVM_MR_DELETE:
-		slots->used_slots--;
-		WARN_ON(new->npages || !mslots[i].npages);
-		break;
-	default:
-		break;
-	}
+	if (change == KVM_MR_DELETE) {
+		kvm_memslot_delete(slots, memslot);
+	} else {
+		if (change == KVM_MR_CREATE)
+			i = kvm_memslot_insert_back(slots);
+		else
+			i = kvm_memslot_move_backward(slots, memslot);
+		i = kvm_memslot_move_forward(slots, memslot, i);
 
-	while (i < KVM_MEM_SLOTS_NUM - 1 &&
-	       new->base_gfn <= mslots[i + 1].base_gfn) {
-		if (!mslots[i + 1].npages)
-			break;
-		mslots[i] = mslots[i + 1];
-		slots->id_to_index[mslots[i].id] = i;
-		i++;
+		/*
+		 * Copy the memslot to its new position in memslots and update
+		 * its index accordingly.
+		 */
+		slots->memslots[i] = *memslot;
+		slots->id_to_index[memslot->id] = i;
 	}
-
-	/*
-	 * The ">=" is needed when creating a slot with base_gfn == 0,
-	 * so that it moves before all those with base_gfn == npages == 0.
-	 *
-	 * On the other hand, if new->npages is zero, the above loop has
-	 * already left i pointing to the beginning of the empty part of
-	 * mslots, and the ">=" would move the hole backwards in this
-	 * case---which is wrong.  So skip the loop when deleting a slot.
-	 */
-	if (new->npages) {
-		while (i > 0 &&
-		       new->base_gfn >= mslots[i - 1].base_gfn) {
-			mslots[i] = mslots[i - 1];
-			slots->id_to_index[mslots[i].id] = i;
-			i--;
-		}
-	} else
-		WARN_ON_ONCE(i != slots->used_slots);
-
-	mslots[i] = *new;
-	slots->id_to_index[mslots[i].id] = i;
 }
 
 static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem)
@@ -1104,8 +1203,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	 * when the memslots are re-sorted by update_memslots().
 	 */
 	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
-	old = *tmp;
-	tmp = NULL;
+	if (tmp) {
+		old = *tmp;
+		tmp = NULL;
+	} else {
+		memset(&old, 0, sizeof(old));
+		old.id = id;
+	}
 
 	if (!mem->memory_size)
 		return kvm_delete_memslot(kvm, mem, &old, as_id);
@@ -1223,7 +1327,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 
 	slots = __kvm_memslots(kvm, as_id);
 	*memslot = id_to_memslot(slots, id);
-	if (!(*memslot)->dirty_bitmap)
+	if (!(*memslot) || !(*memslot)->dirty_bitmap)
 		return -ENOENT;
 
 	kvm_arch_sync_dirty_log(kvm, *memslot);
@@ -1281,10 +1385,10 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 
 	slots = __kvm_memslots(kvm, as_id);
 	memslot = id_to_memslot(slots, id);
+	if (!memslot || !memslot->dirty_bitmap)
+		return -ENOENT;
 
 	dirty_bitmap = memslot->dirty_bitmap;
-	if (!dirty_bitmap)
-		return -ENOENT;
 
 	kvm_arch_sync_dirty_log(kvm, memslot);
 
@@ -1392,10 +1496,10 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 
 	slots = __kvm_memslots(kvm, as_id);
 	memslot = id_to_memslot(slots, id);
+	if (!memslot || !memslot->dirty_bitmap)
+		return -ENOENT;
 
 	dirty_bitmap = memslot->dirty_bitmap;
-	if (!dirty_bitmap)
-		return -ENOENT;
 
 	n = ALIGN(log->num_pages, BITS_PER_LONG) / 8;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (16 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 22:12   ` Peter Xu
  2020-01-21 22:31 ` [PATCH v5 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION Sean Christopherson
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Now that the memslot logic doesn't assume memslots are always non-NULL,
dynamically size the array of memslots instead of unconditionally
allocating memory for the maximum number of memslots.

Note, because a to-be-deleted memslot must first be invalidated, the
array size cannot be immediately reduced when deleting a memslot.
However, consecutive deletions will realize the memory savings, i.e.
a second deletion will trim the entry.

Tested-by: Christoffer Dall <christoffer.dall@arm.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c      | 31 ++++++++++++++++++++++++++++---
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60ddfdb69378..8bb6fb127387 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -431,11 +431,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
  */
 struct kvm_memslots {
 	u64 generation;
-	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
 	/* The mapping table from slot id to the index in memslots[]. */
 	short id_to_index[KVM_MEM_SLOTS_NUM];
 	atomic_t lru_slot;
 	int used_slots;
+	struct kvm_memory_slot memslots[];
 };
 
 struct kvm {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9b614cf2ca20..ed392ce64e59 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -565,7 +565,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
 		return NULL;
 
 	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
-		slots->id_to_index[i] = slots->memslots[i].id = -1;
+		slots->id_to_index[i] = -1;
 
 	return slots;
 }
@@ -1077,6 +1077,32 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
 	return old_memslots;
 }
 
+/*
+ * Note, at a minimum, the current number of used slots must be allocated, even
+ * when deleting a memslot, as we need a complete duplicate of the memslots for
+ * use when invalidating a memslot prior to deleting/moving the memslot.
+ */
+static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
+					     enum kvm_mr_change change)
+{
+	struct kvm_memslots *slots;
+	size_t old_size, new_size;
+
+	old_size = sizeof(struct kvm_memslots) +
+		   (sizeof(struct kvm_memory_slot) * old->used_slots);
+
+	if (change == KVM_MR_CREATE)
+		new_size = old_size + sizeof(struct kvm_memory_slot);
+	else
+		new_size = old_size;
+
+	slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
+	if (likely(slots))
+		memcpy(slots, old, old_size);
+
+	return slots;
+}
+
 static int kvm_set_memslot(struct kvm *kvm,
 			   const struct kvm_userspace_memory_region *mem,
 			   struct kvm_memory_slot *old,
@@ -1087,10 +1113,9 @@ static int kvm_set_memslot(struct kvm *kvm,
 	struct kvm_memslots *slots;
 	int r;
 
-	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
+	slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
 	if (!slots)
 		return -ENOMEM;
-	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
 
 	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
 		/*
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v5 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION
  2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
                   ` (17 preceding siblings ...)
  2020-01-21 22:31 ` [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots Sean Christopherson
@ 2020-01-21 22:31 ` Sean Christopherson
  2020-02-06 22:30   ` Peter Xu
  18 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-01-21 22:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Paul Mackerras, Christian Borntraeger, Janosch Frank,
	David Hildenbrand, Cornelia Huck, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Peter Xu, Philippe Mathieu-Daudé

Add a KVM selftest to test moving the base gfn of a userspace memory
region.  Although the basic concept of moving memory regions is not x86
specific, the assumptions regarding large pages and MMIO shenanigans
used to verify the correctness make this x86_64 only for the time being.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../testing/selftests/kvm/include/kvm_util.h  |   1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  30 ++++
 .../kvm/x86_64/set_memory_region_test.c       | 142 ++++++++++++++++++
 5 files changed, 175 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/set_memory_region_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 30072c3f52fb..513d340bfda7 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -5,6 +5,7 @@
 /x86_64/hyperv_cpuid
 /x86_64/mmio_warning_test
 /x86_64/platform_info_test
+/x86_64/set_memory_region_test
 /x86_64/set_sregs_test
 /x86_64/smm_test
 /x86_64/state_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 3138a916574a..03b218b3b33e 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -17,6 +17,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
 TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
+TEST_GEN_PROGS_x86_64 += x86_64/set_memory_region_test
 TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
 TEST_GEN_PROGS_x86_64 += x86_64/smm_test
 TEST_GEN_PROGS_x86_64 += x86_64/state_test
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 29cccaf96baf..15d3b8690ffb 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -100,6 +100,7 @@ int _vcpu_ioctl(struct kvm_vm *vm, uint32_t vcpuid, unsigned long ioctl,
 		void *arg);
 void vm_ioctl(struct kvm_vm *vm, unsigned long ioctl, void *arg);
 void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags);
+void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa);
 void vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpuid);
 vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min,
 			  uint32_t data_memslot, uint32_t pgd_memslot);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 41cf45416060..464a75ce9843 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -756,6 +756,36 @@ void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags)
 		ret, errno, slot, flags);
 }
 
+/*
+ * VM Memory Region Move
+ *
+ * Input Args:
+ *   vm - Virtual Machine
+ *   slot - Slot of the memory region to move
+ *   flags - Starting guest physical address
+ *
+ * Output Args: None
+ *
+ * Return: None
+ *
+ * Change the gpa of a memory region.
+ */
+void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa)
+{
+	struct userspace_mem_region *region;
+	int ret;
+
+	region = memslot2region(vm, slot);
+
+	region->region.guest_phys_addr = new_gpa;
+
+	ret = ioctl(vm->fd, KVM_SET_USER_MEMORY_REGION, &region->region);
+
+	TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION failed\n"
+		    "ret: %i errno: %i slot: %u flags: 0x%x",
+		    ret, errno, slot, new_gpa);
+}
+
 /*
  * VCPU mmap Size
  *
diff --git a/tools/testing/selftests/kvm/x86_64/set_memory_region_test.c b/tools/testing/selftests/kvm/x86_64/set_memory_region_test.c
new file mode 100644
index 000000000000..125aeab59ab6
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/set_memory_region_test.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+
+#include <linux/compiler.h>
+
+#include <test_util.h>
+#include <kvm_util.h>
+#include <processor.h>
+
+#define VCPU_ID 0
+
+/*
+ * Somewhat arbitrary location and slot, intended to not overlap anything.  The
+ * location and size are specifically 2mb sized/aligned so that the initial
+ * region corresponds to exactly one large page.
+ */
+#define MEM_REGION_GPA		0xc0000000
+#define MEM_REGION_SIZE		0x200000
+#define MEM_REGION_SLOT		10
+
+static void guest_code(void)
+{
+	uint64_t val;
+
+	do {
+		val = READ_ONCE(*((uint64_t *)MEM_REGION_GPA));
+	} while (!val);
+
+	if (val != 1)
+		ucall(UCALL_ABORT, 1, val);
+
+	GUEST_DONE();
+}
+
+static void *vcpu_worker(void *data)
+{
+	struct kvm_vm *vm = data;
+	struct kvm_run *run;
+	struct ucall uc;
+	uint64_t cmd;
+
+	/*
+	 * Loop until the guest is done.  Re-enter the guest on all MMIO exits,
+	 * which will occur if the guest attempts to access a memslot while it
+	 * is being moved.
+	 */
+	run = vcpu_state(vm, VCPU_ID);
+	do {
+		vcpu_run(vm, VCPU_ID);
+	} while (run->exit_reason == KVM_EXIT_MMIO);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "Unexpected exit reason = %d", run->exit_reason);
+
+	cmd = get_ucall(vm, VCPU_ID, &uc);
+	TEST_ASSERT(cmd == UCALL_DONE, "Unexpected val in guest = %llu",
+		    uc.args[0]);
+	return NULL;
+}
+
+static void test_move_memory_region(void)
+{
+	pthread_t vcpu_thread;
+	struct kvm_vm *vm;
+	uint64_t *hva;
+	uint64_t gpa;
+
+	vm = vm_create_default(VCPU_ID, 0, guest_code);
+
+	vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS_THP,
+				    MEM_REGION_GPA, MEM_REGION_SLOT,
+				    MEM_REGION_SIZE / getpagesize(), 0);
+
+	/*
+	 * Allocate and map two pages so that the GPA accessed by guest_code()
+	 * stays valid across the memslot move.
+	 */
+	gpa = vm_phy_pages_alloc(vm, 2, MEM_REGION_GPA, MEM_REGION_SLOT);
+	TEST_ASSERT(gpa == MEM_REGION_GPA, "Failed vm_phy_pages_alloc\n");
+
+	virt_map(vm, MEM_REGION_GPA, MEM_REGION_GPA, 2 * 4096, 0);
+
+	/* Ditto for the host mapping so that both pages can be zeroed. */
+	hva = addr_gpa2hva(vm, MEM_REGION_GPA);
+	memset(hva, 0, 2 * 4096);
+
+	pthread_create(&vcpu_thread, NULL, vcpu_worker, vm);
+
+	/* Ensure the guest thread is spun up. */
+	usleep(100000);
+
+	/*
+	 * Shift the region's base GPA.  The guest should not see "2" as the
+	 * hva->gpa translation is misaligned, i.e. the guest is accessing a
+	 * different host pfn.
+	 */
+	vm_mem_region_move(vm, MEM_REGION_SLOT, MEM_REGION_GPA - 4096);
+	WRITE_ONCE(*hva, 2);
+
+	usleep(100000);
+
+	/*
+	 * Note, value in memory needs to be changed *before* restoring the
+	 * memslot, else the guest could race the update and see "2".
+	 */
+	WRITE_ONCE(*hva, 1);
+
+	/* Restore the original base, the guest should see "1". */
+	vm_mem_region_move(vm, MEM_REGION_SLOT, MEM_REGION_GPA);
+
+	pthread_join(vcpu_thread, NULL);
+
+	kvm_vm_free(vm);
+}
+
+int main(int argc, char *argv[])
+{
+	int i, loops;
+
+	/* Tell stdout not to buffer its content */
+	setbuf(stdout, NULL);
+
+	if (argc > 1)
+		loops = atoi(argv[1]);
+	else
+		loops = 10;
+
+	for (i = 0; i < loops; i++)
+		test_move_memory_region();
+
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
  2020-01-21 22:31 ` [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot Sean Christopherson
@ 2020-02-05 21:49   ` Peter Xu
  2020-02-05 23:55     ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-05 21:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:39PM -0800, Sean Christopherson wrote:
> Reallocate a rmap array and recalcuate large page compatibility when
> moving an existing memslot to correctly handle the alignment properties
> of the new memslot.  The number of rmap entries required at each level
> is dependent on the alignment of the memslot's base gfn with respect to
> that level, e.g. moving a large-page aligned memslot so that it becomes
> unaligned will increase the number of rmap entries needed at the now
> unaligned level.
> 
> Not updating the rmap array is the most obvious bug, as KVM accesses
> garbage data beyond the end of the rmap.  KVM interprets the bad data as
> pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
> 
>   general protection fault: 0000 [#1] SMP
>   CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>   RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
>   Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
>   RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
>   RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
>   RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
>   RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
>   R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
>   R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
>   FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
>   Call Trace:
>    kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
>    __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
>    kvm_set_memory_region+0x45/0x60 [kvm]
>    kvm_vm_ioctl+0x30f/0x920 [kvm]
>    do_vfs_ioctl+0xa1/0x620
>    ksys_ioctl+0x66/0x70
>    __x64_sys_ioctl+0x16/0x20
>    do_syscall_64+0x4c/0x170
>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   RIP: 0033:0x7f7ed9911f47
>   Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
>   RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>   RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
>   RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
>   RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
>   R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
>   R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
>   Modules linked in: kvm_intel kvm irqbypass
>   ---[ end trace 0c5f570b3358ca89 ]---
> 
> The disallow_lpage tracking is more subtle.  Failure to update results
> in KVM creating large pages when it shouldn't, either due to stale data
> or again due to indexing beyond the end of the metadata arrays, which
> can lead to memory corruption and/or leaking data to guest/userspace.
> 
> Note, the arrays for the old memslot are freed by the unconditional call
> to kvm_free_memslot() in __kvm_set_memory_region().

If __kvm_set_memory_region() failed, I think the old memslot will be
kept and the new memslot will be freed instead?

> 
> Fixes: 05da45583de9b ("KVM: MMU: large page support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/x86.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4c30ebe74e5d..1953c71c52f2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9793,6 +9793,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>  {
>  	int i;
>  
> +	/*
> +	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
> +	 * old arrays will be freed by __kvm_set_memory_region() if installing
> +	 * the new memslot is successful.
> +	 */
> +	memset(&slot->arch, 0, sizeof(slot->arch));

I actually gave r-b on this patch but it was lost... And then when I
read it again I start to confuse on why we need to set these to zeros.
Even if they're not zeros, iiuc kvm_free_memslot() will compare each
of the array pointer and it will only free the changed pointers, then
it looks fine even without zeroing?

> +
>  	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
>  		struct kvm_lpage_info *linfo;
>  		unsigned long ugfn;
> @@ -9867,6 +9874,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  				const struct kvm_userspace_memory_region *mem,
>  				enum kvm_mr_change change)
>  {
> +	if (change == KVM_MR_MOVE)
> +		return kvm_arch_create_memslot(kvm, memslot,
> +					       mem->memory_size >> PAGE_SHIFT);
> +

Instead of calling kvm_arch_create_memslot() explicitly again here,
can it be replaced by below?

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 72b45f491692..85a7b02fd752 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1144,7 +1144,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
                new.dirty_bitmap = NULL;
 
        r = -ENOMEM;
-       if (change == KVM_MR_CREATE) {
+       if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
                new.userspace_addr = mem->userspace_addr;
 
                if (kvm_arch_create_memslot(kvm, &new, npages))

>  	return 0;
>  }
>  
> -- 
> 2.24.1
> 

-- 
Peter Xu


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 02/19] KVM: Reinstall old memslots if arch preparation fails
  2020-01-21 22:31 ` [PATCH v5 02/19] KVM: Reinstall old memslots if arch preparation fails Sean Christopherson
@ 2020-02-05 22:08   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-05 22:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:40PM -0800, Sean Christopherson wrote:
> Reinstall the old memslots if preparing the new memory region fails
> after invalidating a to-be-{re}moved memslot.
> 
> Remove the superfluous 'old_memslots' variable so that it's somewhat
> clear that the error handling path needs to free the unused memslots,
> not simply the 'old' memslots.
> 
> Fixes: bc6678a33d9b9 ("KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update")
> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 03/19] KVM: Don't free new memslot if allocation of said memslot fails
  2020-01-21 22:31 ` [PATCH v5 03/19] KVM: Don't free new memslot if allocation of said memslot fails Sean Christopherson
@ 2020-02-05 22:28   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-05 22:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:41PM -0800, Sean Christopherson wrote:
> The two implementations of kvm_arch_create_memslot() in x86 and PPC are
> both good citizens and free up all local resources if creation fails.
> Return immediately (via a superfluous goto) instead of calling
> kvm_free_memslot().
> 
> Note, the call to kvm_free_memslot() is effectively an expensive nop in
> this case as there are no resources to be freed.

(I failed to understand why that is expensive.. but the change looks OK)

> 
> No functional change intended.
> 
> Acked-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 04/19] KVM: PPC: Move memslot memory allocation into prepare_memory_region()
  2020-01-21 22:31 ` [PATCH v5 04/19] KVM: PPC: Move memslot memory allocation into prepare_memory_region() Sean Christopherson
@ 2020-02-05 22:41   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-05 22:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:42PM -0800, Sean Christopherson wrote:
>  static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
> -					struct kvm_memory_slot *memslot,
> -					const struct kvm_userspace_memory_region *mem)
> +					struct kvm_memory_slot *slot,
> +					const struct kvm_userspace_memory_region *mem,
> +					enum kvm_mr_change change)
>  {
> +	unsigned long npages = mem->memory_size >> PAGE_SHIFT;

Only in case if this patch still needs a respin: IIUC we can directly
use slot->npages below.  No matter what:

Reviewed-by: Peter Xu <peterx@redhat.com>

> +
> +	if (change == KVM_MR_CREATE) {
> +		slot->arch.rmap = vzalloc(array_size(npages,
> +					  sizeof(*slot->arch.rmap)));
> +		if (!slot->arch.rmap)
> +			return -ENOMEM;
> +	}
> +
>  	return 0;
>  }

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 06/19] KVM: Drop kvm_arch_create_memslot()
  2020-01-21 22:31 ` [PATCH v5 06/19] KVM: Drop kvm_arch_create_memslot() Sean Christopherson
@ 2020-02-05 22:45   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-05 22:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:44PM -0800, Sean Christopherson wrote:
> Remove kvm_arch_create_memslot() now that all arch implementations are
> effectively nops.  Removing kvm_arch_create_memslot() eliminates the
> possibility for arch specific code to allocate memory prior to setting
> a memslot, which sets the stage for simplifying kvm_free_memslot().
> 
> Cc: Janosch Frank <frankja@linux.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 08/19] KVM: Refactor error handling for setting memory region
  2020-01-21 22:31 ` [PATCH v5 08/19] KVM: Refactor error handling for setting memory region Sean Christopherson
@ 2020-02-05 22:48   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-05 22:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:46PM -0800, Sean Christopherson wrote:
> Replace a big pile o' gotos with returns to make it more obvious what
> error code is being returned, and to prepare for refactoring the
> functional, i.e. post-checks, portion of __kvm_set_memory_region().
> 
> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
  2020-02-05 21:49   ` Peter Xu
@ 2020-02-05 23:55     ` Sean Christopherson
  2020-02-06  2:00       ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-05 23:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Wed, Feb 05, 2020 at 04:49:52PM -0500, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 02:31:39PM -0800, Sean Christopherson wrote:
> > Reallocate a rmap array and recalcuate large page compatibility when
> > moving an existing memslot to correctly handle the alignment properties
> > of the new memslot.  The number of rmap entries required at each level
> > is dependent on the alignment of the memslot's base gfn with respect to
> > that level, e.g. moving a large-page aligned memslot so that it becomes
> > unaligned will increase the number of rmap entries needed at the now
> > unaligned level.
> > 
> > Not updating the rmap array is the most obvious bug, as KVM accesses
> > garbage data beyond the end of the rmap.  KVM interprets the bad data as
> > pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
> > 
> >   general protection fault: 0000 [#1] SMP
> >   CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
> >   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> >   RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
> >   Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
> >   RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
> >   RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
> >   RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
> >   RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
> >   R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
> >   R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
> >   FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
> >   Call Trace:
> >    kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
> >    __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
> >    kvm_set_memory_region+0x45/0x60 [kvm]
> >    kvm_vm_ioctl+0x30f/0x920 [kvm]
> >    do_vfs_ioctl+0xa1/0x620
> >    ksys_ioctl+0x66/0x70
> >    __x64_sys_ioctl+0x16/0x20
> >    do_syscall_64+0x4c/0x170
> >    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >   RIP: 0033:0x7f7ed9911f47
> >   Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
> >   RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> >   RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
> >   RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
> >   RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
> >   R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
> >   R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
> >   Modules linked in: kvm_intel kvm irqbypass
> >   ---[ end trace 0c5f570b3358ca89 ]---
> > 
> > The disallow_lpage tracking is more subtle.  Failure to update results
> > in KVM creating large pages when it shouldn't, either due to stale data
> > or again due to indexing beyond the end of the metadata arrays, which
> > can lead to memory corruption and/or leaking data to guest/userspace.
> > 
> > Note, the arrays for the old memslot are freed by the unconditional call
> > to kvm_free_memslot() in __kvm_set_memory_region().
> 
> If __kvm_set_memory_region() failed, I think the old memslot will be
> kept and the new memslot will be freed instead?

This is referring to a successful MOVE operation to note that zeroing @arch
in kvm_arch_create_memslot() won't leak memory.

> > 
> > Fixes: 05da45583de9b ("KVM: MMU: large page support")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/x86.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 4c30ebe74e5d..1953c71c52f2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9793,6 +9793,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  {
> >  	int i;
> >  
> > +	/*
> > +	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
> > +	 * old arrays will be freed by __kvm_set_memory_region() if installing
> > +	 * the new memslot is successful.
> > +	 */
> > +	memset(&slot->arch, 0, sizeof(slot->arch));
> 
> I actually gave r-b on this patch but it was lost... And then when I
> read it again I start to confuse on why we need to set these to zeros.
> Even if they're not zeros, iiuc kvm_free_memslot() will compare each
> of the array pointer and it will only free the changed pointers, then
> it looks fine even without zeroing?

It's for the failure path, the out_free label, which blindy calls kvfree()
and relies on un-allocated pointers being NULL.  If @arch isn't zeroed, the
failure path will free metadata from the previous memslot.

> > +
> >  	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> >  		struct kvm_lpage_info *linfo;
> >  		unsigned long ugfn;
> > @@ -9867,6 +9874,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >  				const struct kvm_userspace_memory_region *mem,
> >  				enum kvm_mr_change change)
> >  {
> > +	if (change == KVM_MR_MOVE)
> > +		return kvm_arch_create_memslot(kvm, memslot,
> > +					       mem->memory_size >> PAGE_SHIFT);
> > +
> 
> Instead of calling kvm_arch_create_memslot() explicitly again here,
> can it be replaced by below?
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 72b45f491692..85a7b02fd752 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1144,7 +1144,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>                 new.dirty_bitmap = NULL;
>  
>         r = -ENOMEM;
> -       if (change == KVM_MR_CREATE) {
> +       if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
>                 new.userspace_addr = mem->userspace_addr;
>  
>                 if (kvm_arch_create_memslot(kvm, &new, npages))

No, because other architectures don't need to re-allocate new metadata on
MOVE and rely on __kvm_set_memory_region() to copy @arch from old to new,
e.g. see kvmppc_core_create_memslot_hv().

That being said, that's effectively what the x86 code looks like once
kvm_arch_create_memslot() gets merged into kvm_arch_prepare_memory_region().

> 
> >  	return 0;
> >  }
> >  
> > -- 
> > 2.24.1
> > 
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
  2020-02-05 23:55     ` Sean Christopherson
@ 2020-02-06  2:00       ` Peter Xu
  2020-02-06  2:17         ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06  2:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Wed, Feb 05, 2020 at 03:55:33PM -0800, Sean Christopherson wrote:
> On Wed, Feb 05, 2020 at 04:49:52PM -0500, Peter Xu wrote:
> > On Tue, Jan 21, 2020 at 02:31:39PM -0800, Sean Christopherson wrote:
> > > Reallocate a rmap array and recalcuate large page compatibility when
> > > moving an existing memslot to correctly handle the alignment properties
> > > of the new memslot.  The number of rmap entries required at each level
> > > is dependent on the alignment of the memslot's base gfn with respect to
> > > that level, e.g. moving a large-page aligned memslot so that it becomes
> > > unaligned will increase the number of rmap entries needed at the now
> > > unaligned level.
> > > 
> > > Not updating the rmap array is the most obvious bug, as KVM accesses
> > > garbage data beyond the end of the rmap.  KVM interprets the bad data as
> > > pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
> > > 
> > >   general protection fault: 0000 [#1] SMP
> > >   CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
> > >   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > >   RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
> > >   Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
> > >   RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
> > >   RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
> > >   RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
> > >   RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
> > >   R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
> > >   R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
> > >   FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
> > >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >   CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
> > >   Call Trace:
> > >    kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
> > >    __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
> > >    kvm_set_memory_region+0x45/0x60 [kvm]
> > >    kvm_vm_ioctl+0x30f/0x920 [kvm]
> > >    do_vfs_ioctl+0xa1/0x620
> > >    ksys_ioctl+0x66/0x70
> > >    __x64_sys_ioctl+0x16/0x20
> > >    do_syscall_64+0x4c/0x170
> > >    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >   RIP: 0033:0x7f7ed9911f47
> > >   Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
> > >   RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > >   RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
> > >   RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
> > >   RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
> > >   R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
> > >   R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
> > >   Modules linked in: kvm_intel kvm irqbypass
> > >   ---[ end trace 0c5f570b3358ca89 ]---
> > > 
> > > The disallow_lpage tracking is more subtle.  Failure to update results
> > > in KVM creating large pages when it shouldn't, either due to stale data
> > > or again due to indexing beyond the end of the metadata arrays, which
> > > can lead to memory corruption and/or leaking data to guest/userspace.
> > > 
> > > Note, the arrays for the old memslot are freed by the unconditional call
> > > to kvm_free_memslot() in __kvm_set_memory_region().
> > 
> > If __kvm_set_memory_region() failed, I think the old memslot will be
> > kept and the new memslot will be freed instead?
> 
> This is referring to a successful MOVE operation to note that zeroing @arch
> in kvm_arch_create_memslot() won't leak memory.
> 
> > > 
> > > Fixes: 05da45583de9b ("KVM: MMU: large page support")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > ---
> > >  arch/x86/kvm/x86.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 4c30ebe74e5d..1953c71c52f2 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -9793,6 +9793,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> > >  {
> > >  	int i;
> > >  
> > > +	/*
> > > +	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
> > > +	 * old arrays will be freed by __kvm_set_memory_region() if installing
> > > +	 * the new memslot is successful.
> > > +	 */
> > > +	memset(&slot->arch, 0, sizeof(slot->arch));
> > 
> > I actually gave r-b on this patch but it was lost... And then when I
> > read it again I start to confuse on why we need to set these to zeros.
> > Even if they're not zeros, iiuc kvm_free_memslot() will compare each
> > of the array pointer and it will only free the changed pointers, then
> > it looks fine even without zeroing?
> 
> It's for the failure path, the out_free label, which blindy calls kvfree()
> and relies on un-allocated pointers being NULL.  If @arch isn't zeroed, the
> failure path will free metadata from the previous memslot.

IMHO it won't, because kvm_free_memslot() will only free metadata if
the pointer changed.  So:

  - For succeeded kvcalloc(), the pointer will change in the new slot,
    so kvm_free_memslot() will free it,

  - For failed kvcalloc(), the pointer will be NULL, so
    kvm_free_memslot() will skip it,

  - For untouched pointer, it'll be the same as the old, so
    kvm_free_memslot() will skip it as well.

> 
> > > +
> > >  	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > >  		struct kvm_lpage_info *linfo;
> > >  		unsigned long ugfn;
> > > @@ -9867,6 +9874,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> > >  				const struct kvm_userspace_memory_region *mem,
> > >  				enum kvm_mr_change change)
> > >  {
> > > +	if (change == KVM_MR_MOVE)
> > > +		return kvm_arch_create_memslot(kvm, memslot,
> > > +					       mem->memory_size >> PAGE_SHIFT);
> > > +
> > 
> > Instead of calling kvm_arch_create_memslot() explicitly again here,
> > can it be replaced by below?
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 72b45f491692..85a7b02fd752 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -1144,7 +1144,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> >                 new.dirty_bitmap = NULL;
> >  
> >         r = -ENOMEM;
> > -       if (change == KVM_MR_CREATE) {
> > +       if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
> >                 new.userspace_addr = mem->userspace_addr;
> >  
> >                 if (kvm_arch_create_memslot(kvm, &new, npages))
> 
> No, because other architectures don't need to re-allocate new metadata on
> MOVE and rely on __kvm_set_memory_region() to copy @arch from old to new,
> e.g. see kvmppc_core_create_memslot_hv().

Yes it's only required in x86, but iiuc it also will still work for
ppc?  Say, in that case ppc won't copy @arch from old to new, and
kvmppc_core_free_memslot_hv() will free the old, however it should
still work.

> 
> That being said, that's effectively what the x86 code looks like once
> kvm_arch_create_memslot() gets merged into kvm_arch_prepare_memory_region().

Right.  I don't have strong opinion on this, but if my above analysis
is correct, it's still slightly cleaner, imho, to have this patch as a
oneliner as I provided, then in the other patch move the whole
CREATE|MOVE into prepare_memory_region().  The final code should be
the same.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
  2020-02-06  2:00       ` Peter Xu
@ 2020-02-06  2:17         ` Sean Christopherson
  2020-02-06  2:58           ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-06  2:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Wed, Feb 05, 2020 at 09:00:31PM -0500, Peter Xu wrote:
> On Wed, Feb 05, 2020 at 03:55:33PM -0800, Sean Christopherson wrote:
> > On Wed, Feb 05, 2020 at 04:49:52PM -0500, Peter Xu wrote:
> > > On Tue, Jan 21, 2020 at 02:31:39PM -0800, Sean Christopherson wrote:
> > > > Reallocate a rmap array and recalcuate large page compatibility when
> > > > moving an existing memslot to correctly handle the alignment properties
> > > > of the new memslot.  The number of rmap entries required at each level
> > > > is dependent on the alignment of the memslot's base gfn with respect to
> > > > that level, e.g. moving a large-page aligned memslot so that it becomes
> > > > unaligned will increase the number of rmap entries needed at the now
> > > > unaligned level.
> > > > 
> > > > Not updating the rmap array is the most obvious bug, as KVM accesses
> > > > garbage data beyond the end of the rmap.  KVM interprets the bad data as
> > > > pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
> > > > 
> > > >   general protection fault: 0000 [#1] SMP
> > > >   CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
> > > >   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > > >   RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
> > > >   Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
> > > >   RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
> > > >   RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
> > > >   RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
> > > >   RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
> > > >   R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
> > > >   R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
> > > >   FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
> > > >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > >   CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
> > > >   Call Trace:
> > > >    kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
> > > >    __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
> > > >    kvm_set_memory_region+0x45/0x60 [kvm]
> > > >    kvm_vm_ioctl+0x30f/0x920 [kvm]
> > > >    do_vfs_ioctl+0xa1/0x620
> > > >    ksys_ioctl+0x66/0x70
> > > >    __x64_sys_ioctl+0x16/0x20
> > > >    do_syscall_64+0x4c/0x170
> > > >    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >   RIP: 0033:0x7f7ed9911f47
> > > >   Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
> > > >   RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > > >   RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
> > > >   RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
> > > >   RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
> > > >   R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
> > > >   R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
> > > >   Modules linked in: kvm_intel kvm irqbypass
> > > >   ---[ end trace 0c5f570b3358ca89 ]---
> > > > 
> > > > The disallow_lpage tracking is more subtle.  Failure to update results
> > > > in KVM creating large pages when it shouldn't, either due to stale data
> > > > or again due to indexing beyond the end of the metadata arrays, which
> > > > can lead to memory corruption and/or leaking data to guest/userspace.
> > > > 
> > > > Note, the arrays for the old memslot are freed by the unconditional call
> > > > to kvm_free_memslot() in __kvm_set_memory_region().
> > > 
> > > If __kvm_set_memory_region() failed, I think the old memslot will be
> > > kept and the new memslot will be freed instead?
> > 
> > This is referring to a successful MOVE operation to note that zeroing @arch
> > in kvm_arch_create_memslot() won't leak memory.
> > 
> > > > 
> > > > Fixes: 05da45583de9b ("KVM: MMU: large page support")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > ---
> > > >  arch/x86/kvm/x86.c | 11 +++++++++++
> > > >  1 file changed, 11 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 4c30ebe74e5d..1953c71c52f2 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -9793,6 +9793,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> > > >  {
> > > >  	int i;
> > > >  
> > > > +	/*
> > > > +	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
> > > > +	 * old arrays will be freed by __kvm_set_memory_region() if installing
> > > > +	 * the new memslot is successful.
> > > > +	 */
> > > > +	memset(&slot->arch, 0, sizeof(slot->arch));
> > > 
> > > I actually gave r-b on this patch but it was lost... And then when I
> > > read it again I start to confuse on why we need to set these to zeros.
> > > Even if they're not zeros, iiuc kvm_free_memslot() will compare each
> > > of the array pointer and it will only free the changed pointers, then
> > > it looks fine even without zeroing?
> > 
> > It's for the failure path, the out_free label, which blindy calls kvfree()
> > and relies on un-allocated pointers being NULL.  If @arch isn't zeroed, the
> > failure path will free metadata from the previous memslot.
> 
> IMHO it won't, because kvm_free_memslot() will only free metadata if
> the pointer changed.  So:
> 
>   - For succeeded kvcalloc(), the pointer will change in the new slot,
>     so kvm_free_memslot() will free it,
> 
>   - For failed kvcalloc(), the pointer will be NULL, so
>     kvm_free_memslot() will skip it,

No.  The out_free path iterates over all possible entries and would free
pointers from the old memslot.  It's still be wrong even if the very last
kcalloc() failed as that allocation is captured in a local variable and
only propagated to lpage_info on success.

out_free:
	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
		kvfree(slot->arch.rmap[i]);
		slot->arch.rmap[i] = NULL;
		if (i == 0)
			continue;

		kvfree(slot->arch.lpage_info[i - 1]);
		slot->arch.lpage_info[i - 1] = NULL;
	}
	return -ENOMEM;

>   - For untouched pointer, it'll be the same as the old, so
>     kvm_free_memslot() will skip it as well.
> 
> > 
> > > > +
> > > >  	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > > >  		struct kvm_lpage_info *linfo;
> > > >  		unsigned long ugfn;
> > > > @@ -9867,6 +9874,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> > > >  				const struct kvm_userspace_memory_region *mem,
> > > >  				enum kvm_mr_change change)
> > > >  {
> > > > +	if (change == KVM_MR_MOVE)
> > > > +		return kvm_arch_create_memslot(kvm, memslot,
> > > > +					       mem->memory_size >> PAGE_SHIFT);
> > > > +
> > > 
> > > Instead of calling kvm_arch_create_memslot() explicitly again here,
> > > can it be replaced by below?
> > > 
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 72b45f491692..85a7b02fd752 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -1144,7 +1144,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > >                 new.dirty_bitmap = NULL;
> > >  
> > >         r = -ENOMEM;
> > > -       if (change == KVM_MR_CREATE) {
> > > +       if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
> > >                 new.userspace_addr = mem->userspace_addr;
> > >  
> > >                 if (kvm_arch_create_memslot(kvm, &new, npages))
> > 
> > No, because other architectures don't need to re-allocate new metadata on
> > MOVE and rely on __kvm_set_memory_region() to copy @arch from old to new,
> > e.g. see kvmppc_core_create_memslot_hv().
> 
> Yes it's only required in x86, but iiuc it also will still work for
> ppc?  Say, in that case ppc won't copy @arch from old to new, and
> kvmppc_core_free_memslot_hv() will free the old, however it should
> still work.

No, calling kvm_arch_create_memslot() for MOVE will result in PPC leaking
memory due to overwriting slot->arch.rmap with a new allocation.

> > 
> > That being said, that's effectively what the x86 code looks like once
> > kvm_arch_create_memslot() gets merged into kvm_arch_prepare_memory_region().
> 
> Right.  I don't have strong opinion on this, but if my above analysis
> is correct, it's still slightly cleaner, imho, to have this patch as a
> oneliner as I provided, then in the other patch move the whole
> CREATE|MOVE into prepare_memory_region().  The final code should be
> the same.
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
  2020-02-06  2:17         ` Sean Christopherson
@ 2020-02-06  2:58           ` Peter Xu
  2020-02-06  5:05             ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06  2:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Wed, Feb 05, 2020 at 06:17:15PM -0800, Sean Christopherson wrote:
> On Wed, Feb 05, 2020 at 09:00:31PM -0500, Peter Xu wrote:
> > On Wed, Feb 05, 2020 at 03:55:33PM -0800, Sean Christopherson wrote:
> > > On Wed, Feb 05, 2020 at 04:49:52PM -0500, Peter Xu wrote:
> > > > On Tue, Jan 21, 2020 at 02:31:39PM -0800, Sean Christopherson wrote:
> > > > > Reallocate a rmap array and recalcuate large page compatibility when
> > > > > moving an existing memslot to correctly handle the alignment properties
> > > > > of the new memslot.  The number of rmap entries required at each level
> > > > > is dependent on the alignment of the memslot's base gfn with respect to
> > > > > that level, e.g. moving a large-page aligned memslot so that it becomes
> > > > > unaligned will increase the number of rmap entries needed at the now
> > > > > unaligned level.
> > > > > 
> > > > > Not updating the rmap array is the most obvious bug, as KVM accesses
> > > > > garbage data beyond the end of the rmap.  KVM interprets the bad data as
> > > > > pointers, leading to non-canonical #GPs, unexpected #PFs, etc...
> > > > > 
> > > > >   general protection fault: 0000 [#1] SMP
> > > > >   CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139
> > > > >   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > > > >   RIP: 0010:rmap_get_first+0x37/0x50 [kvm]
> > > > >   Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3
> > > > >   RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246
> > > > >   RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012
> > > > >   RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570
> > > > >   RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8
> > > > >   R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000
> > > > >   R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8
> > > > >   FS:  00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
> > > > >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >   CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0
> > > > >   Call Trace:
> > > > >    kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm]
> > > > >    __kvm_set_memory_region.part.64+0x559/0x960 [kvm]
> > > > >    kvm_set_memory_region+0x45/0x60 [kvm]
> > > > >    kvm_vm_ioctl+0x30f/0x920 [kvm]
> > > > >    do_vfs_ioctl+0xa1/0x620
> > > > >    ksys_ioctl+0x66/0x70
> > > > >    __x64_sys_ioctl+0x16/0x20
> > > > >    do_syscall_64+0x4c/0x170
> > > > >    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > >   RIP: 0033:0x7f7ed9911f47
> > > > >   Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48
> > > > >   RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > > > >   RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47
> > > > >   RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004
> > > > >   RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700
> > > > >   R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000
> > > > >   R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000
> > > > >   Modules linked in: kvm_intel kvm irqbypass
> > > > >   ---[ end trace 0c5f570b3358ca89 ]---
> > > > > 
> > > > > The disallow_lpage tracking is more subtle.  Failure to update results
> > > > > in KVM creating large pages when it shouldn't, either due to stale data
> > > > > or again due to indexing beyond the end of the metadata arrays, which
> > > > > can lead to memory corruption and/or leaking data to guest/userspace.
> > > > > 
> > > > > Note, the arrays for the old memslot are freed by the unconditional call
> > > > > to kvm_free_memslot() in __kvm_set_memory_region().
> > > > 
> > > > If __kvm_set_memory_region() failed, I think the old memslot will be
> > > > kept and the new memslot will be freed instead?
> > > 
> > > This is referring to a successful MOVE operation to note that zeroing @arch
> > > in kvm_arch_create_memslot() won't leak memory.
> > > 
> > > > > 
> > > > > Fixes: 05da45583de9b ("KVM: MMU: large page support")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > > > ---
> > > > >  arch/x86/kvm/x86.c | 11 +++++++++++
> > > > >  1 file changed, 11 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index 4c30ebe74e5d..1953c71c52f2 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -9793,6 +9793,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> > > > >  {
> > > > >  	int i;
> > > > >  
> > > > > +	/*
> > > > > +	 * Clear out the previous array pointers for the KVM_MR_MOVE case.  The
> > > > > +	 * old arrays will be freed by __kvm_set_memory_region() if installing
> > > > > +	 * the new memslot is successful.
> > > > > +	 */
> > > > > +	memset(&slot->arch, 0, sizeof(slot->arch));
> > > > 
> > > > I actually gave r-b on this patch but it was lost... And then when I
> > > > read it again I start to confuse on why we need to set these to zeros.
> > > > Even if they're not zeros, iiuc kvm_free_memslot() will compare each
> > > > of the array pointer and it will only free the changed pointers, then
> > > > it looks fine even without zeroing?
> > > 
> > > It's for the failure path, the out_free label, which blindy calls kvfree()
> > > and relies on un-allocated pointers being NULL.  If @arch isn't zeroed, the
> > > failure path will free metadata from the previous memslot.
> > 
> > IMHO it won't, because kvm_free_memslot() will only free metadata if
> > the pointer changed.  So:
> > 
> >   - For succeeded kvcalloc(), the pointer will change in the new slot,
> >     so kvm_free_memslot() will free it,
> > 
> >   - For failed kvcalloc(), the pointer will be NULL, so
> >     kvm_free_memslot() will skip it,
> 
> No.  The out_free path iterates over all possible entries and would free
> pointers from the old memslot.  It's still be wrong even if the very last
> kcalloc() failed as that allocation is captured in a local variable and
> only propagated to lpage_info on success.
> 
> out_free:
> 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> 		kvfree(slot->arch.rmap[i]);
> 		slot->arch.rmap[i] = NULL;
> 		if (i == 0)
> 			continue;
> 
> 		kvfree(slot->arch.lpage_info[i - 1]);
> 		slot->arch.lpage_info[i - 1] = NULL;
> 	}
> 	return -ENOMEM;

Ah right.  These discussion does also prove that simplify the slot
free path is good, because it's easy to get confused. :)

> 
> >   - For untouched pointer, it'll be the same as the old, so
> >     kvm_free_memslot() will skip it as well.
> > 
> > > 
> > > > > +
> > > > >  	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > > > >  		struct kvm_lpage_info *linfo;
> > > > >  		unsigned long ugfn;
> > > > > @@ -9867,6 +9874,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> > > > >  				const struct kvm_userspace_memory_region *mem,
> > > > >  				enum kvm_mr_change change)
> > > > >  {
> > > > > +	if (change == KVM_MR_MOVE)
> > > > > +		return kvm_arch_create_memslot(kvm, memslot,
> > > > > +					       mem->memory_size >> PAGE_SHIFT);
> > > > > +
> > > > 
> > > > Instead of calling kvm_arch_create_memslot() explicitly again here,
> > > > can it be replaced by below?
> > > > 
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index 72b45f491692..85a7b02fd752 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -1144,7 +1144,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > > >                 new.dirty_bitmap = NULL;
> > > >  
> > > >         r = -ENOMEM;
> > > > -       if (change == KVM_MR_CREATE) {
> > > > +       if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
> > > >                 new.userspace_addr = mem->userspace_addr;
> > > >  
> > > >                 if (kvm_arch_create_memslot(kvm, &new, npages))
> > > 
> > > No, because other architectures don't need to re-allocate new metadata on
> > > MOVE and rely on __kvm_set_memory_region() to copy @arch from old to new,
> > > e.g. see kvmppc_core_create_memslot_hv().
> > 
> > Yes it's only required in x86, but iiuc it also will still work for
> > ppc?  Say, in that case ppc won't copy @arch from old to new, and
> > kvmppc_core_free_memslot_hv() will free the old, however it should
> > still work.
> 
> No, calling kvm_arch_create_memslot() for MOVE will result in PPC leaking
> memory due to overwriting slot->arch.rmap with a new allocation.

Why?  For the MOVE case, kvm_arch_create_memslot() will create a new
rmap for the "new" memslot.  If the whole procedure succeeded,
kvm_free_memslot() will free the old rmap.  If it failed,
kvm_free_memslot() will free the new rmap if !NULL.  Looks fine?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
  2020-02-06  2:58           ` Peter Xu
@ 2020-02-06  5:05             ` Sean Christopherson
  0 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-02-06  5:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Wed, Feb 05, 2020 at 09:58:58PM -0500, Peter Xu wrote:
> On Wed, Feb 05, 2020 at 06:17:15PM -0800, Sean Christopherson wrote:
> > On Wed, Feb 05, 2020 at 09:00:31PM -0500, Peter Xu wrote:
> > > On Wed, Feb 05, 2020 at 03:55:33PM -0800, Sean Christopherson wrote:
> > > > On Wed, Feb 05, 2020 at 04:49:52PM -0500, Peter Xu wrote:
> > > > > Instead of calling kvm_arch_create_memslot() explicitly again here,
> > > > > can it be replaced by below?
> > > > > 
> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > > index 72b45f491692..85a7b02fd752 100644
> > > > > --- a/virt/kvm/kvm_main.c
> > > > > +++ b/virt/kvm/kvm_main.c
> > > > > @@ -1144,7 +1144,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > > > >                 new.dirty_bitmap = NULL;
> > > > >  
> > > > >         r = -ENOMEM;
> > > > > -       if (change == KVM_MR_CREATE) {
> > > > > +       if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
> > > > >                 new.userspace_addr = mem->userspace_addr;
> > > > >  
> > > > >                 if (kvm_arch_create_memslot(kvm, &new, npages))
> > > > 
> > > > No, because other architectures don't need to re-allocate new metadata on
> > > > MOVE and rely on __kvm_set_memory_region() to copy @arch from old to new,
> > > > e.g. see kvmppc_core_create_memslot_hv().
> > > 
> > > Yes it's only required in x86, but iiuc it also will still work for
> > > ppc?  Say, in that case ppc won't copy @arch from old to new, and
> > > kvmppc_core_free_memslot_hv() will free the old, however it should
> > > still work.
> > 
> > No, calling kvm_arch_create_memslot() for MOVE will result in PPC leaking
> > memory due to overwriting slot->arch.rmap with a new allocation.
> 
> Why?  For the MOVE case, kvm_arch_create_memslot() will create a new
> rmap for the "new" memslot.  If the whole procedure succeeded,
> kvm_free_memslot() will free the old rmap.  If it failed,
> kvm_free_memslot() will free the new rmap if !NULL.  Looks fine?

Oh, I see what you're suggesting.   Please god no.

This is a bug fix that needs to be backported to stable.  Arbitrarily
changing PPC behavior is a bad idea, especially since I don't know squat
about the PPC rmap behavior.

If it happens to fix a PPC rmap bug, then PPC should get an explicit fix.
If it's not a bug fix, then at best it is a minor performance hit due to an
extra allocation and the need to refill the rmap.  Worst case scenario it
breaks PPC.

And unless this were a temporary change, which would be silly, I would have
to carry forward the change into "KVM: PPC: Move memslot memory allocation
into prepare_memory_region()", and again, I don't know squat about PPC.

I also don't want to effectively introduce a misnamed function, even if
only temporarily, e.g. it's kvm_arch_create_memslot(), not
kvm_arch_create_or_move_memslot(), because the whole flow gets reworked a
few patches later.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 12/19] KVM: Move memslot deletion to helper function
  2020-01-21 22:31 ` [PATCH v5 12/19] KVM: Move memslot deletion to helper function Sean Christopherson
@ 2020-02-06 16:14   ` Peter Xu
  2020-02-06 16:28     ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 16:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:50PM -0800, Sean Christopherson wrote:
> Move memslot deletion into its own routine so that the success path for
> other memslot updates does not need to use kvm_free_memslot(), i.e. can
> explicitly destroy the dirty bitmap when necessary.  This paves the way
> for dropping @dont from kvm_free_memslot(), i.e. all callers now pass
> NULL for @dont.
> 
> Add a comment above the code to make a copy of the existing memslot
> prior to deletion, it is not at all obvious that the pointer will become
> stale during sorting and/or installation of new memslots.

Could you help explain a bit on this explicit comment?  I can follow
up with the patch itself which looks all correct to me, but I failed
to catch what this extra comment wants to emphasize...

Thanks,

> 
> Note, kvm_arch_commit_memory_region() allows an architecture to free
> resources when moving a memslot or changing its flags, e.g. x86 frees
> its arch specific memslot metadata during commit_memory_region().

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 09/19] KVM: Move setting of memslot into helper routine
  2020-01-21 22:31 ` [PATCH v5 09/19] KVM: Move setting of memslot into helper routine Sean Christopherson
@ 2020-02-06 16:26   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-06 16:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:47PM -0800, Sean Christopherson wrote:
> Split out the core functionality of setting a memslot into a separate
> helper in preparation for moving memslot deletion into its own routine.
> 
> Tested-by: Christoffer Dall <christoffer.dall@arm.com>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  virt/kvm/kvm_main.c | 106 ++++++++++++++++++++++++++------------------
>  1 file changed, 63 insertions(+), 43 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fdf045a5d240..64f6c5d35260 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -982,6 +982,66 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>  	return old_memslots;
>  }
>  
> +static int kvm_set_memslot(struct kvm *kvm,
> +			   const struct kvm_userspace_memory_region *mem,
> +			   const struct kvm_memory_slot *old,
> +			   struct kvm_memory_slot *new, int as_id,
> +			   enum kvm_mr_change change)
> +{
> +	struct kvm_memory_slot *slot;
> +	struct kvm_memslots *slots;
> +	int r;
> +
> +	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> +	if (!slots)
> +		return -ENOMEM;
> +	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
> +
> +	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
> +		/*
> +		 * Note, the INVALID flag needs to be in the appropriate entry
> +		 * in the freshly allocated memslots, not in @old or @new.
> +		 */
> +		slot = id_to_memslot(slots, old->id);
> +		slot->flags |= KVM_MEMSLOT_INVALID;
> +
> +		/*
> +		 * We can re-use the old memslots, the only difference from the
> +		 * newly installed memslots is the invalid flag, which will get
> +		 * dropped by update_memslots anyway.  We'll also revert to the
> +		 * old memslots if preparing the new memory region fails.
> +		 */
> +		slots = install_new_memslots(kvm, as_id, slots);
> +
> +		/* From this point no new shadow pages pointing to a deleted,
> +		 * or moved, memslot will be created.
> +		 *
> +		 * validation of sp->gfn happens in:
> +		 *	- gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> +		 *	- kvm_is_visible_gfn (mmu_check_root)
> +		 */
> +		kvm_arch_flush_shadow_memslot(kvm, slot);
> +	}
> +
> +	r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
> +	if (r)
> +		goto out_slots;
> +
> +	update_memslots(slots, new, change);
> +	slots = install_new_memslots(kvm, as_id, slots);
> +
> +	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
> +
> +	kvfree(slots);
> +	return 0;
> +
> +out_slots:
> +	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
> +		slots = install_new_memslots(kvm, as_id, slots);
> +	kvfree(slots);
> +	return r;
> +}
> +
>  /*
>   * Allocate some memory and give it an address in the guest physical address
>   * space.
> @@ -998,7 +1058,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	unsigned long npages;
>  	struct kvm_memory_slot *slot;
>  	struct kvm_memory_slot old, new;
> -	struct kvm_memslots *slots;
>  	int as_id, id;
>  	enum kvm_mr_change change;
>  
> @@ -1085,58 +1144,19 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  			return r;
>  	}
>  
> -	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> -	if (!slots) {
> -		r = -ENOMEM;
> -		goto out_bitmap;
> -	}
> -	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
> -
> -	if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
> -		slot = id_to_memslot(slots, id);
> -		slot->flags |= KVM_MEMSLOT_INVALID;
> -
> -		/*
> -		 * We can re-use the old memslots, the only difference from the
> -		 * newly installed memslots is the invalid flag, which will get
> -		 * dropped by update_memslots anyway.  We'll also revert to the
> -		 * old memslots if preparing the new memory region fails.
> -		 */
> -		slots = install_new_memslots(kvm, as_id, slots);
> -
> -		/* From this point no new shadow pages pointing to a deleted,
> -		 * or moved, memslot will be created.
> -		 *
> -		 * validation of sp->gfn happens in:
> -		 *	- gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> -		 *	- kvm_is_visible_gfn (mmu_check_root)
> -		 */
> -		kvm_arch_flush_shadow_memslot(kvm, slot);
> -	}
> -
> -	r = kvm_arch_prepare_memory_region(kvm, &new, mem, change);
> -	if (r)
> -		goto out_slots;
> -
>  	/* actual memory is freed via old in kvm_free_memslot below */
>  	if (change == KVM_MR_DELETE) {
>  		new.dirty_bitmap = NULL;
>  		memset(&new.arch, 0, sizeof(new.arch));
>  	}

It's not extremely clear to me on why this single block is leftover
here instead of putting into kvm_set_memslot().  I see that after all
it'll be removed in patch 12, so it seems OK:

Reviewed-by: Peter Xu <peterx@redhat.com>

>  
> -	update_memslots(slots, &new, change);
> -	slots = install_new_memslots(kvm, as_id, slots);
> -
> -	kvm_arch_commit_memory_region(kvm, mem, &old, &new, change);
> +	r = kvm_set_memslot(kvm, mem, &old, &new, as_id, change);
> +	if (r)
> +		goto out_bitmap;
>  
>  	kvm_free_memslot(kvm, &old, &new);
> -	kvfree(slots);
>  	return 0;
>  
> -out_slots:
> -	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
> -		slots = install_new_memslots(kvm, as_id, slots);
> -	kvfree(slots);
>  out_bitmap:
>  	if (new.dirty_bitmap && !old.dirty_bitmap)
>  		kvm_destroy_dirty_bitmap(&new);
> -- 
> 2.24.1
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 10/19] KVM: Drop "const" attribute from old memslot in commit_memory_region()
  2020-01-21 22:31 ` [PATCH v5 10/19] KVM: Drop "const" attribute from old memslot in commit_memory_region() Sean Christopherson
@ 2020-02-06 16:26   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-06 16:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:48PM -0800, Sean Christopherson wrote:
> Drop the "const" attribute from @old in kvm_arch_commit_memory_region()
> to allow arch specific code to free arch specific resources in the old
> memslot without having to cast away the attribute.  Freeing resources in
> kvm_arch_commit_memory_region() paves the way for simplifying
> kvm_free_memslot() by eliminating the last usage of its @dont param.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 12/19] KVM: Move memslot deletion to helper function
  2020-02-06 16:14   ` Peter Xu
@ 2020-02-06 16:28     ` Sean Christopherson
  2020-02-06 16:51       ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-06 16:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 11:14:15AM -0500, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 02:31:50PM -0800, Sean Christopherson wrote:
> > Move memslot deletion into its own routine so that the success path for
> > other memslot updates does not need to use kvm_free_memslot(), i.e. can
> > explicitly destroy the dirty bitmap when necessary.  This paves the way
> > for dropping @dont from kvm_free_memslot(), i.e. all callers now pass
> > NULL for @dont.
> > 
> > Add a comment above the code to make a copy of the existing memslot
> > prior to deletion, it is not at all obvious that the pointer will become
> > stale during sorting and/or installation of new memslots.
> 
> Could you help explain a bit on this explicit comment?  I can follow
> up with the patch itself which looks all correct to me, but I failed
> to catch what this extra comment wants to emphasize...

It's tempting to write the code like this (I know, because I did it):

	if (!mem->memory_size)
		return kvm_delete_memslot(kvm, mem, slot, as_id);

	new = *slot;

Where @slot is a pointer to the memslot to be deleted.  At first, second,
and third glances, this seems perfectly sane.

The issue is that slot was pulled from struct kvm_memslots.memslots, e.g.

	slot = &slots->memslots[index];

Note that slots->memslots holds actual "struct kvm_memory_slot" objects,
not pointers to slots.  When update_memslots() sorts the slots, it swaps
the actual slot objects, not pointers.  I.e. after update_memslots(), even
though @slot points at the same address, it's could be pointing at a
different slot.  As a result kvm_free_memslot() in kvm_delete_memslot()
will free the dirty page info and arch-specific points for some random
slot, not the intended slot, and will set npages=0 for that random slot.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 13/19] KVM: Simplify kvm_free_memslot() and all its descendents
  2020-01-21 22:31 ` [PATCH v5 13/19] KVM: Simplify kvm_free_memslot() and all its descendents Sean Christopherson
@ 2020-02-06 16:29   ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-06 16:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:51PM -0800, Sean Christopherson wrote:
> Now that all callers of kvm_free_memslot() pass NULL for @dont, remove
> the param from the top-level routine and all arch's implementations.
> 
> No functional change intended.
> 
> Tested-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 12/19] KVM: Move memslot deletion to helper function
  2020-02-06 16:28     ` Sean Christopherson
@ 2020-02-06 16:51       ` Peter Xu
  2020-02-07 17:59         ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 16:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 08:28:18AM -0800, Sean Christopherson wrote:
> On Thu, Feb 06, 2020 at 11:14:15AM -0500, Peter Xu wrote:
> > On Tue, Jan 21, 2020 at 02:31:50PM -0800, Sean Christopherson wrote:
> > > Move memslot deletion into its own routine so that the success path for
> > > other memslot updates does not need to use kvm_free_memslot(), i.e. can
> > > explicitly destroy the dirty bitmap when necessary.  This paves the way
> > > for dropping @dont from kvm_free_memslot(), i.e. all callers now pass
> > > NULL for @dont.
> > > 
> > > Add a comment above the code to make a copy of the existing memslot
> > > prior to deletion, it is not at all obvious that the pointer will become
> > > stale during sorting and/or installation of new memslots.
> > 
> > Could you help explain a bit on this explicit comment?  I can follow
> > up with the patch itself which looks all correct to me, but I failed
> > to catch what this extra comment wants to emphasize...
> 
> It's tempting to write the code like this (I know, because I did it):
> 
> 	if (!mem->memory_size)
> 		return kvm_delete_memslot(kvm, mem, slot, as_id);
> 
> 	new = *slot;
> 
> Where @slot is a pointer to the memslot to be deleted.  At first, second,
> and third glances, this seems perfectly sane.
> 
> The issue is that slot was pulled from struct kvm_memslots.memslots, e.g.
> 
> 	slot = &slots->memslots[index];
> 
> Note that slots->memslots holds actual "struct kvm_memory_slot" objects,
> not pointers to slots.  When update_memslots() sorts the slots, it swaps
> the actual slot objects, not pointers.  I.e. after update_memslots(), even
> though @slot points at the same address, it's could be pointing at a
> different slot.  As a result kvm_free_memslot() in kvm_delete_memslot()
> will free the dirty page info and arch-specific points for some random
> slot, not the intended slot, and will set npages=0 for that random slot.

Ah I see, thanks.  Another alternative is we move the "old = *slot"
copy into kvm_delete_memslot(), which could be even clearer imo.
However I'm not sure whether it's a good idea to drop the test-by for
this.  Considering that comment change should not affect it, would you
mind enrich the comment into something like this (or anything better)?

/*
 * Make a full copy of the old memslot, the pointer will become stale
 * when the memslots are re-sorted by update_memslots() in
 * kvm_delete_memslot(), while to make the kvm_free_memslot() work as
 * expected later on, we still need the cached memory slot.
 */

In all cases:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region()
  2020-01-21 22:31 ` [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region() Sean Christopherson
@ 2020-02-06 19:06   ` Peter Xu
  2020-02-06 19:22     ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 19:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:52PM -0800, Sean Christopherson wrote:

[...]

> @@ -1101,52 +1099,55 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
>  		return -EINVAL;
>  
> -	slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> -	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
> -	npages = mem->memory_size >> PAGE_SHIFT;
> -
> -	if (npages > KVM_MEM_MAX_NR_PAGES)
> -		return -EINVAL;
> -
>  	/*
>  	 * Make a full copy of the old memslot, the pointer will become stale
>  	 * when the memslots are re-sorted by update_memslots().
>  	 */
> -	old = *slot;
> +	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> +	old = *tmp;
> +	tmp = NULL;

Shall we keep this chunk to the patch where it will be used?  Other
than that, it looks good to me.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region()
  2020-02-06 19:06   ` Peter Xu
@ 2020-02-06 19:22     ` Sean Christopherson
  2020-02-06 19:36       ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-06 19:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 02:06:41PM -0500, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 02:31:52PM -0800, Sean Christopherson wrote:
> 
> [...]
> 
> > @@ -1101,52 +1099,55 @@ int __kvm_set_memory_region(struct kvm *kvm,
> >  	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
> >  		return -EINVAL;
> >  
> > -	slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> > -	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
> > -	npages = mem->memory_size >> PAGE_SHIFT;
> > -
> > -	if (npages > KVM_MEM_MAX_NR_PAGES)
> > -		return -EINVAL;
> > -
> >  	/*
> >  	 * Make a full copy of the old memslot, the pointer will become stale
> >  	 * when the memslots are re-sorted by update_memslots().
> >  	 */
> > -	old = *slot;
> > +	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> > +	old = *tmp;
> > +	tmp = NULL;
> 
> Shall we keep this chunk to the patch where it will be used?  Other
> than that, it looks good to me.

I assume you're talking about doing this instead of using @tmp?

	old = *id_to_memslot(__kvm_memslots(kvm, as_id), id);

It's obviously possible, but I really like resulting diff for
__kvm_set_memory_region() in "KVM: Terminate memslot walks via used_slots"
when tmp is used.

@@ -1104,8 +1203,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
         * when the memslots are re-sorted by update_memslots().
         */
        tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
-       old = *tmp;
-       tmp = NULL;
+       if (tmp) {
+               old = *tmp;
+               tmp = NULL;
+       } else {
+               memset(&old, 0, sizeof(old));
+               old.id = id;
+       }


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region()
  2020-02-06 19:22     ` Sean Christopherson
@ 2020-02-06 19:36       ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-06 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 11:22:30AM -0800, Sean Christopherson wrote:
> On Thu, Feb 06, 2020 at 02:06:41PM -0500, Peter Xu wrote:
> > On Tue, Jan 21, 2020 at 02:31:52PM -0800, Sean Christopherson wrote:
> > 
> > [...]
> > 
> > > @@ -1101,52 +1099,55 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > >  	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
> > >  		return -EINVAL;
> > >  
> > > -	slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> > > -	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
> > > -	npages = mem->memory_size >> PAGE_SHIFT;
> > > -
> > > -	if (npages > KVM_MEM_MAX_NR_PAGES)
> > > -		return -EINVAL;
> > > -
> > >  	/*
> > >  	 * Make a full copy of the old memslot, the pointer will become stale
> > >  	 * when the memslots are re-sorted by update_memslots().
> > >  	 */
> > > -	old = *slot;
> > > +	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> > > +	old = *tmp;
> > > +	tmp = NULL;
> > 
> > Shall we keep this chunk to the patch where it will be used?  Other
> > than that, it looks good to me.
> 
> I assume you're talking about doing this instead of using @tmp?
> 
> 	old = *id_to_memslot(__kvm_memslots(kvm, as_id), id);

Yes.

> 
> It's obviously possible, but I really like resulting diff for
> __kvm_set_memory_region() in "KVM: Terminate memslot walks via used_slots"
> when tmp is used.
> 
> @@ -1104,8 +1203,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
>          * when the memslots are re-sorted by update_memslots().
>          */
>         tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> -       old = *tmp;
> -       tmp = NULL;
> +       if (tmp) {
> +               old = *tmp;
> +               tmp = NULL;
> +       } else {
> +               memset(&old, 0, sizeof(old));
> +               old.id = id;
> +       }

I normally don't do that, for each patch I'll try to make it
consistent to itself, assuming that follow-up patches can be rejected.
I don't have strong opinion either, please feel free to keep them if
no one disagrees.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-01-21 22:31 ` [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions Sean Christopherson
@ 2020-02-06 20:02   ` Peter Xu
  2020-02-06 21:21     ` Sean Christopherson
  2020-02-06 21:24   ` Peter Xu
  1 sibling, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 20:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:53PM -0800, Sean Christopherson wrote:

[...]

> -int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
> +void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
> +				  struct kvm_memory_slot *memslot)

If it's to flush TLB for a memslot, shall we remove the "dirty_log" in
the name of the function, because it has nothing to do with dirty
logging any more?  And...

>  {
> -	struct kvm_memslots *slots;
> -	struct kvm_memory_slot *memslot;
> -	bool flush = false;
> -	int r;
> -
> -	mutex_lock(&kvm->slots_lock);
> -
> -	r = kvm_clear_dirty_log_protect(kvm, log, &flush);
> -
> -	if (flush) {
> -		slots = kvm_memslots(kvm);
> -		memslot = id_to_memslot(slots, log->slot);
> -
> -		/* Let implementation handle TLB/GVA invalidation */
> -		kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
> -	}
> -
> -	mutex_unlock(&kvm->slots_lock);
> -	return r;
> +	/* Let implementation handle TLB/GVA invalidation */
> +	kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);

... This may not directly related to the current patch, but I'm
confused on why MIPS cannot use kvm_flush_remote_tlbs() to flush TLBs.
I know nothing about MIPS code, but IIUC here flush_shadow_memslot()
is a heavier operation that will also invalidate the shadow pages.
Seems to be an overkill here when we only changed write permission of
the PTEs?  I tried to check the first occurance (2a31b9db15353) but I
didn't find out any clue of it so far.

But that matters to this patch because if MIPS can use
kvm_flush_remote_tlbs(), then we probably don't need this
arch-specific hook any more and we can directly call
kvm_flush_remote_tlbs() after sync dirty log when flush==true.

>  }
>  
>  long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 97ce6c4f7b48..0adaf4791a6d 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -799,6 +799,11 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
>  	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
>  }
>  
> +void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)

Since at it, maybe we can start to use __weak attribute for new hooks
especially when it's empty for most archs?

E.g., define:

void __weak kvm_arch_sync_dirty_log(...) {}

In the common code, then only define it again in arch that has
non-empty implementation of this method?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-01-21 22:31 ` [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots Sean Christopherson
@ 2020-02-06 21:09   ` Peter Xu
  2020-02-07 18:33     ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 21:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:55PM -0800, Sean Christopherson wrote:
> Refactor memslot handling to treat the number of used slots as the de
> facto size of the memslot array, e.g. return NULL from id_to_memslot()
> when an invalid index is provided instead of relying on npages==0 to
> detect an invalid memslot.  Rework the sorting and walking of memslots
> in advance of dynamically sizing memslots to aid bisection and debug,
> e.g. with luck, a bug in the refactoring will bisect here and/or hit a
> WARN instead of randomly corrupting memory.
> 
> Alternatively, a global null/invalid memslot could be returned, i.e. so
> callers of id_to_memslot() don't have to explicitly check for a NULL
> memslot, but that approach runs the risk of introducing difficult-to-
> debug issues, e.g. if the global null slot is modified.  Constifying
> the return from id_to_memslot() to combat such issues is possible, but
> would require a massive refactoring of arch specific code and would
> still be susceptible to casting shenanigans.
> 
> Add function comments to update_memslots() and search_memslots() to
> explicitly (and loudly) state how memslots are sorted.
> 
> No functional change intended.
> 
> Tested-by: Christoffer Dall <christoffer.dall@arm.com>
> Tested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c |   2 +-
>  arch/x86/kvm/x86.c           |  14 +--
>  include/linux/kvm_host.h     |  18 ++-
>  virt/kvm/arm/mmu.c           |   9 +-
>  virt/kvm/kvm_main.c          | 220 ++++++++++++++++++++++++++---------
>  5 files changed, 189 insertions(+), 74 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 4afabedcbace..ea03cb868151 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4397,7 +4397,7 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
>  	slots = kvm_memslots(kvm);
>  	memslot = id_to_memslot(slots, log->slot);
>  	r = -ENOENT;
> -	if (!memslot->dirty_bitmap)
> +	if (!memslot || !memslot->dirty_bitmap)
>  		goto out;
>  
>  	/*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 07f7d6458b89..53d2a86cc91e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9630,9 +9630,9 @@ void kvm_arch_sync_events(struct kvm *kvm)
>  int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
>  {
>  	int i, r;
> -	unsigned long hva;
> +	unsigned long hva, uninitialized_var(old_npages);
>  	struct kvm_memslots *slots = kvm_memslots(kvm);
> -	struct kvm_memory_slot *slot, old;
> +	struct kvm_memory_slot *slot;
>  
>  	/* Called with kvm->slots_lock held.  */
>  	if (WARN_ON(id >= KVM_MEM_SLOTS_NUM))
> @@ -9640,7 +9640,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
>  
>  	slot = id_to_memslot(slots, id);
>  	if (size) {
> -		if (slot->npages)
> +		if (slot && slot->npages)
>  			return -EEXIST;
>  
>  		/*
> @@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
>  		if (IS_ERR((void *)hva))
>  			return PTR_ERR((void *)hva);
>  	} else {
> -		if (!slot->npages)
> +		if (!slot || !slot->npages)
>  			return 0;
>  
> -		hva = 0;
> +		hva = slot->userspace_addr;

Is this intended?

> +		old_npages = slot->npages;
>  	}
>  
> -	old = *slot;
>  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>  		struct kvm_userspace_memory_region m;
>  
> @@ -9673,7 +9673,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
>  	}
>  
>  	if (!size)
> -		vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
> +		vm_munmap(hva, old_npages * PAGE_SIZE);
>  
>  	return 0;
>  }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f05be99dc44a..60ddfdb69378 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -572,10 +572,11 @@ static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu)
>  	return vcpu->vcpu_idx;
>  }
>  
> -#define kvm_for_each_memslot(memslot, slots)	\
> -	for (memslot = &slots->memslots[0];	\
> -	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
> -		memslot++)
> +#define kvm_for_each_memslot(memslot, slots)				\
> +	for (memslot = &slots->memslots[0];				\
> +	     memslot < slots->memslots + slots->used_slots; memslot++)	\
> +		if (WARN_ON_ONCE(!memslot->npages)) {			\
> +		} else
>  
>  void kvm_vcpu_destroy(struct kvm_vcpu *vcpu);
>  
> @@ -635,12 +636,15 @@ static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu)
>  	return __kvm_memslots(vcpu->kvm, as_id);
>  }
>  
> -static inline struct kvm_memory_slot *
> -id_to_memslot(struct kvm_memslots *slots, int id)
> +static inline
> +struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
>  {
>  	int index = slots->id_to_index[id];
>  	struct kvm_memory_slot *slot;
>  
> +	if (index < 0)
> +		return NULL;
> +
>  	slot = &slots->memslots[index];
>  
>  	WARN_ON(slot->id != id);
> @@ -1005,6 +1009,8 @@ bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args);
>   * used in non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c.
>   * gfn_to_memslot() itself isn't here as an inline because that would
>   * bloat other code too much.
> + *
> + * IMPORTANT: Slots are sorted from highest GFN to lowest GFN!

(this confused me too..)

>   */
>  static inline struct kvm_memory_slot *
>  search_memslots(struct kvm_memslots *slots, gfn_t gfn)
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 23af65f8fd0f..a1d3813bad76 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1535,8 +1535,13 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  {
>  	struct kvm_memslots *slots = kvm_memslots(kvm);
>  	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
> -	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
> -	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
> +	phys_addr_t start, end;
> +
> +	if (WARN_ON_ONCE(!memslot))
> +		return;
> +
> +	start = memslot->base_gfn << PAGE_SHIFT;
> +	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
>  	stage2_wp_range(kvm, start, end);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 190c065da48d..9b614cf2ca20 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -565,7 +565,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
>  		return NULL;
>  
>  	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> -		slots->id_to_index[i] = slots->memslots[i].id = i;
> +		slots->id_to_index[i] = slots->memslots[i].id = -1;
>  
>  	return slots;
>  }
> @@ -869,63 +869,162 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
>  }
>  
>  /*
> - * Insert memslot and re-sort memslots based on their GFN,
> - * so binary search could be used to lookup GFN.
> - * Sorting algorithm takes advantage of having initially
> - * sorted array and known changed memslot position.
> + * Delete a memslot by decrementing the number of used slots and shifting all
> + * other entries in the array forward one spot.
> + */
> +static inline void kvm_memslot_delete(struct kvm_memslots *slots,
> +				      struct kvm_memory_slot *memslot)
> +{
> +	struct kvm_memory_slot *mslots = slots->memslots;
> +	int i;
> +
> +	if (WARN_ON(slots->id_to_index[memslot->id] == -1))
> +		return;
> +
> +	slots->used_slots--;
> +
> +	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots; i++) {
> +		mslots[i] = mslots[i + 1];
> +		slots->id_to_index[mslots[i].id] = i;
> +	}
> +	mslots[i] = *memslot;
> +	slots->id_to_index[memslot->id] = -1;
> +}
> +
> +/*
> + * "Insert" a new memslot by incrementing the number of used slots.  Returns
> + * the new slot's initial index into the memslots array.
> + */
> +static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)

The naming here didn't help me to understand but a bit more
confused...

How about "kvm_memslot_insert_end"?  Or even unwrap it.

> +{
> +	return slots->used_slots++;
> +}
> +
> +/*
> + * Move a changed memslot backwards in the array by shifting existing slots
> + * with a higher GFN toward the front of the array.  Note, the changed memslot
> + * itself is not preserved in the array, i.e. not swapped at this time, only
> + * its new index into the array is tracked.  Returns the changed memslot's
> + * current index into the memslots array.
> + */
> +static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
> +					    struct kvm_memory_slot *memslot)

"backward" makes me feel like it's moving towards smaller index,
instead it's moving to bigger index.  Same applies to "forward" below.
I'm not sure whether I'm the only one, though...

> +{
> +	struct kvm_memory_slot *mslots = slots->memslots;
> +	int i;
> +
> +	if (WARN_ON_ONCE(slots->id_to_index[memslot->id] == -1) ||
> +	    WARN_ON_ONCE(!slots->used_slots))
> +		return -1;
> +
> +	/*
> +	 * Move the target memslot backward in the array by shifting existing
> +	 * memslots with a higher GFN (than the target memslot) towards the
> +	 * front of the array.
> +	 */
> +	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
> +		if (memslot->base_gfn > mslots[i + 1].base_gfn)
> +			break;
> +
> +		WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);

Will this trigger?  Note that in __kvm_set_memory_region() we have
already checked overlap of memslots.

> +
> +		/* Shift the next memslot forward one and update its index. */
> +		mslots[i] = mslots[i + 1];
> +		slots->id_to_index[mslots[i].id] = i;
> +	}
> +	return i;
> +}
> +
> +/*
> + * Move a changed memslot forwards in the array by shifting existing slots with
> + * a lower GFN toward the back of the array.  Note, the changed memslot itself
> + * is not preserved in the array, i.e. not swapped at this time, only its new
> + * index into the array is tracked.  Returns the changed memslot's final index
> + * into the memslots array.
> + */
> +static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
> +					   struct kvm_memory_slot *memslot,
> +					   int start)

Same question on the naming.

> +{
> +	struct kvm_memory_slot *mslots = slots->memslots;
> +	int i;
> +
> +	for (i = start; i > 0; i--) {
> +		if (memslot->base_gfn < mslots[i - 1].base_gfn)
> +			break;

(The very careful ">=" converted to "<" here, looks correct, though
 after the refactoring it should not matter any more)

> +
> +		WARN_ON_ONCE(memslot->base_gfn == mslots[i - 1].base_gfn);

Same here.

> +
> +		/* Shift the next memslot back one and update its index. */
> +		mslots[i] = mslots[i - 1];
> +		slots->id_to_index[mslots[i].id] = i;
> +	}
> +	return i;
> +}
> +
> +/*
> + * Re-sort memslots based on their GFN to account for an added, deleted, or
> + * moved memslot.  Sorting memslots by GFN allows using a binary search during
> + * memslot lookup.
> + *
> + * IMPORTANT: Slots are sorted from highest GFN to lowest GFN!  I.e. the entry
> + * at memslots[0] has the highest GFN.
> + *
> + * The sorting algorithm takes advantage of having initially sorted memslots
> + * and knowing the position of the changed memslot.  Sorting is also optimized
> + * by not swapping the updated memslot and instead only shifting other memslots
> + * and tracking the new index for the update memslot.  Only once its final
> + * index is known is the updated memslot copied into its position in the array.
> + *
> + *  - When deleting a memslot, the deleted memslot simply needs to be moved to
> + *    the end of the array.
> + *
> + *  - When creating a memslot, the algorithm "inserts" the new memslot at the
> + *    end of the array and then it forward to its correct location.
> + *
> + *  - When moving a memslot, the algorithm first moves the updated memslot
> + *    backward to handle the scenario where the memslot's GFN was changed to a
> + *    lower value.  update_memslots() then falls through and runs the same flow
> + *    as creating a memslot to move the memslot forward to handle the scenario
> + *    where its GFN was changed to a higher value.
> + *
> + * Note, slots are sorted from highest->lowest instead of lowest->highest for
> + * historical reasons.  Originally, invalid memslots where denoted by having
> + * GFN=0, thus sorting from highest->lowest naturally sorted invalid memslots
> + * to the end of the array.  The current algorithm uses dedicated logic to
> + * delete a memslot and thus does not rely on invalid memslots having GFN=0.
> + *
> + * The other historical motiviation for highest->lowest was to improve the
> + * performance of memslot lookup.  KVM originally used a linear search starting
> + * at memslots[0].  On x86, the largest memslot usually has one of the highest,
> + * if not *the* highest, GFN, as the bulk of the guest's RAM is located in a
> + * single memslot above the 4gb boundary.  As the largest memslot is also the
> + * most likely to be referenced, sorting it to the front of the array was
> + * advantageous.  The current binary search starts from the middle of the array
> + * and uses an LRU pointer to improve performance for all memslots and GFNs.
>   */
>  static void update_memslots(struct kvm_memslots *slots,
> -			    struct kvm_memory_slot *new,
> +			    struct kvm_memory_slot *memslot,
>  			    enum kvm_mr_change change)
>  {
> -	int id = new->id;
> -	int i = slots->id_to_index[id];
> -	struct kvm_memory_slot *mslots = slots->memslots;
> +	int i;
>  
> -	WARN_ON(mslots[i].id != id);
> -	switch (change) {
> -	case KVM_MR_CREATE:
> -		slots->used_slots++;
> -		WARN_ON(mslots[i].npages || !new->npages);
> -		break;
> -	case KVM_MR_DELETE:
> -		slots->used_slots--;
> -		WARN_ON(new->npages || !mslots[i].npages);
> -		break;
> -	default:
> -		break;
> -	}
> +	if (change == KVM_MR_DELETE) {
> +		kvm_memslot_delete(slots, memslot);
> +	} else {
> +		if (change == KVM_MR_CREATE)
> +			i = kvm_memslot_insert_back(slots);
> +		else
> +			i = kvm_memslot_move_backward(slots, memslot);
> +		i = kvm_memslot_move_forward(slots, memslot, i);
>  
> -	while (i < KVM_MEM_SLOTS_NUM - 1 &&
> -	       new->base_gfn <= mslots[i + 1].base_gfn) {
> -		if (!mslots[i + 1].npages)
> -			break;
> -		mslots[i] = mslots[i + 1];
> -		slots->id_to_index[mslots[i].id] = i;
> -		i++;
> +		/*
> +		 * Copy the memslot to its new position in memslots and update
> +		 * its index accordingly.
> +		 */
> +		slots->memslots[i] = *memslot;
> +		slots->id_to_index[memslot->id] = i;
>  	}
> -
> -	/*
> -	 * The ">=" is needed when creating a slot with base_gfn == 0,
> -	 * so that it moves before all those with base_gfn == npages == 0.
> -	 *
> -	 * On the other hand, if new->npages is zero, the above loop has
> -	 * already left i pointing to the beginning of the empty part of
> -	 * mslots, and the ">=" would move the hole backwards in this
> -	 * case---which is wrong.  So skip the loop when deleting a slot.
> -	 */
> -	if (new->npages) {
> -		while (i > 0 &&
> -		       new->base_gfn >= mslots[i - 1].base_gfn) {
> -			mslots[i] = mslots[i - 1];
> -			slots->id_to_index[mslots[i].id] = i;
> -			i--;
> -		}
> -	} else
> -		WARN_ON_ONCE(i != slots->used_slots);
> -
> -	mslots[i] = *new;
> -	slots->id_to_index[mslots[i].id] = i;
>  }
>  
>  static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem)
> @@ -1104,8 +1203,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	 * when the memslots are re-sorted by update_memslots().
>  	 */
>  	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> -	old = *tmp;
> -	tmp = NULL;

I was confused in that patch, then...

> +	if (tmp) {
> +		old = *tmp;
> +		tmp = NULL;

... now I still don't know why it needs to set to NULL?

> +	} else {
> +		memset(&old, 0, sizeof(old));
> +		old.id = id;
> +	}
>  
>  	if (!mem->memory_size)
>  		return kvm_delete_memslot(kvm, mem, &old, as_id);
> @@ -1223,7 +1327,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>  
>  	slots = __kvm_memslots(kvm, as_id);
>  	*memslot = id_to_memslot(slots, id);
> -	if (!(*memslot)->dirty_bitmap)
> +	if (!(*memslot) || !(*memslot)->dirty_bitmap)
>  		return -ENOENT;
>  
>  	kvm_arch_sync_dirty_log(kvm, *memslot);
> @@ -1281,10 +1385,10 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>  
>  	slots = __kvm_memslots(kvm, as_id);
>  	memslot = id_to_memslot(slots, id);
> +	if (!memslot || !memslot->dirty_bitmap)
> +		return -ENOENT;
>  
>  	dirty_bitmap = memslot->dirty_bitmap;
> -	if (!dirty_bitmap)
> -		return -ENOENT;
>  
>  	kvm_arch_sync_dirty_log(kvm, memslot);
>  
> @@ -1392,10 +1496,10 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>  
>  	slots = __kvm_memslots(kvm, as_id);
>  	memslot = id_to_memslot(slots, id);
> +	if (!memslot || !memslot->dirty_bitmap)
> +		return -ENOENT;
>  
>  	dirty_bitmap = memslot->dirty_bitmap;
> -	if (!dirty_bitmap)
> -		return -ENOENT;
>  
>  	n = ALIGN(log->num_pages, BITS_PER_LONG) / 8;
>  
> -- 
> 2.24.1
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-06 20:02   ` Peter Xu
@ 2020-02-06 21:21     ` Sean Christopherson
  2020-02-06 21:41       ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-06 21:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 02:31:53PM -0800, Sean Christopherson wrote:
> 
> [...]
> 
> > -int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
> > +void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
> > +				  struct kvm_memory_slot *memslot)
> 
> If it's to flush TLB for a memslot, shall we remove the "dirty_log" in
> the name of the function, because it has nothing to do with dirty
> logging any more?  And...

I kept the "dirty_log" to allow arch code to implement logic specific to a
TLB flush during dirty logging, e.g. x86's lockdep assert on slots_lock.
And similar to the issue with MIPS below, to deter usage of the hook for
anything else, i.e. to nudge people to using kvm_flush_remote_tlbs()
directly.

> >  {
> > -	struct kvm_memslots *slots;
> > -	struct kvm_memory_slot *memslot;
> > -	bool flush = false;
> > -	int r;
> > -
> > -	mutex_lock(&kvm->slots_lock);
> > -
> > -	r = kvm_clear_dirty_log_protect(kvm, log, &flush);
> > -
> > -	if (flush) {
> > -		slots = kvm_memslots(kvm);
> > -		memslot = id_to_memslot(slots, log->slot);
> > -
> > -		/* Let implementation handle TLB/GVA invalidation */
> > -		kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
> > -	}
> > -
> > -	mutex_unlock(&kvm->slots_lock);
> > -	return r;
> > +	/* Let implementation handle TLB/GVA invalidation */
> > +	kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
> 
> ... This may not directly related to the current patch, but I'm
> confused on why MIPS cannot use kvm_flush_remote_tlbs() to flush TLBs.
> I know nothing about MIPS code, but IIUC here flush_shadow_memslot()
> is a heavier operation that will also invalidate the shadow pages.
> Seems to be an overkill here when we only changed write permission of
> the PTEs?  I tried to check the first occurance (2a31b9db15353) but I
> didn't find out any clue of it so far.
> 
> But that matters to this patch because if MIPS can use
> kvm_flush_remote_tlbs(), then we probably don't need this
> arch-specific hook any more and we can directly call
> kvm_flush_remote_tlbs() after sync dirty log when flush==true.

Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
that prevents calling kvm_flush_remote_tlbs() directly, but I have no
clue as to the important of that code.

> >  }
> >  
> >  long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> > index 97ce6c4f7b48..0adaf4791a6d 100644
> > --- a/arch/powerpc/kvm/book3s.c
> > +++ b/arch/powerpc/kvm/book3s.c
> > @@ -799,6 +799,11 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
> >  	return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
> >  }
> >  
> > +void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
> 
> Since at it, maybe we can start to use __weak attribute for new hooks
> especially when it's empty for most archs?
> 
> E.g., define:
> 
> void __weak kvm_arch_sync_dirty_log(...) {}
> 
> In the common code, then only define it again in arch that has
> non-empty implementation of this method?

I defer to Paolo, I'm indifferent at this stage.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-01-21 22:31 ` [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions Sean Christopherson
  2020-02-06 20:02   ` Peter Xu
@ 2020-02-06 21:24   ` Peter Xu
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-06 21:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:53PM -0800, Sean Christopherson wrote:

[...]

> @@ -1333,6 +1369,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
>  	unsigned long i, n;
>  	unsigned long *dirty_bitmap;
>  	unsigned long *dirty_bitmap_buffer;
> +	bool flush;
>  
>  	as_id = log->slot >> 16;
>  	id = (u16)log->slot;
> @@ -1356,7 +1393,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
>  	    (log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
>  	    return -EINVAL;
>  
> -	*flush = false;
> +	kvm_arch_sync_dirty_log(kvm, memslot);

Do we need this even for clear dirty log?

> +
> +	flush = false;
>  	dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
>  	if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
>  		return -EFAULT;

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-06 21:21     ` Sean Christopherson
@ 2020-02-06 21:41       ` Peter Xu
  2020-02-07 19:45         ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 21:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
> On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
> > On Tue, Jan 21, 2020 at 02:31:53PM -0800, Sean Christopherson wrote:
> > 
> > [...]
> > 
> > > -int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
> > > +void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
> > > +				  struct kvm_memory_slot *memslot)
> > 
> > If it's to flush TLB for a memslot, shall we remove the "dirty_log" in
> > the name of the function, because it has nothing to do with dirty
> > logging any more?  And...
> 
> I kept the "dirty_log" to allow arch code to implement logic specific to a
> TLB flush during dirty logging, e.g. x86's lockdep assert on slots_lock.
> And similar to the issue with MIPS below, to deter usage of the hook for
> anything else, i.e. to nudge people to using kvm_flush_remote_tlbs()
> directly.

The x86's lockdep assert is not that important afaict, since the two
callers of the new tlb_flush() hook will be with slots_lock for sure.

> 
> > >  {
> > > -	struct kvm_memslots *slots;
> > > -	struct kvm_memory_slot *memslot;
> > > -	bool flush = false;
> > > -	int r;
> > > -
> > > -	mutex_lock(&kvm->slots_lock);
> > > -
> > > -	r = kvm_clear_dirty_log_protect(kvm, log, &flush);
> > > -
> > > -	if (flush) {
> > > -		slots = kvm_memslots(kvm);
> > > -		memslot = id_to_memslot(slots, log->slot);
> > > -
> > > -		/* Let implementation handle TLB/GVA invalidation */
> > > -		kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
> > > -	}
> > > -
> > > -	mutex_unlock(&kvm->slots_lock);
> > > -	return r;
> > > +	/* Let implementation handle TLB/GVA invalidation */
> > > +	kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
> > 
> > ... This may not directly related to the current patch, but I'm
> > confused on why MIPS cannot use kvm_flush_remote_tlbs() to flush TLBs.
> > I know nothing about MIPS code, but IIUC here flush_shadow_memslot()
> > is a heavier operation that will also invalidate the shadow pages.
> > Seems to be an overkill here when we only changed write permission of
> > the PTEs?  I tried to check the first occurance (2a31b9db15353) but I
> > didn't find out any clue of it so far.
> > 
> > But that matters to this patch because if MIPS can use
> > kvm_flush_remote_tlbs(), then we probably don't need this
> > arch-specific hook any more and we can directly call
> > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
> 
> Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
> that prevents calling kvm_flush_remote_tlbs() directly, but I have no
> clue as to the important of that code.

As said above I think the x86 lockdep is really not necessary, then
considering MIPS could be the only one that will use the new hook
introduced in this patch...  Shall we figure that out first?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots
  2020-01-21 22:31 ` [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots Sean Christopherson
@ 2020-02-06 22:12   ` Peter Xu
  2020-02-07 15:38     ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 22:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:56PM -0800, Sean Christopherson wrote:
> Now that the memslot logic doesn't assume memslots are always non-NULL,
> dynamically size the array of memslots instead of unconditionally
> allocating memory for the maximum number of memslots.
> 
> Note, because a to-be-deleted memslot must first be invalidated, the
> array size cannot be immediately reduced when deleting a memslot.
> However, consecutive deletions will realize the memory savings, i.e.
> a second deletion will trim the entry.
> 
> Tested-by: Christoffer Dall <christoffer.dall@arm.com>
> Tested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  include/linux/kvm_host.h |  2 +-
>  virt/kvm/kvm_main.c      | 31 ++++++++++++++++++++++++++++---
>  2 files changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 60ddfdb69378..8bb6fb127387 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -431,11 +431,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
>   */
>  struct kvm_memslots {
>  	u64 generation;
> -	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
>  	/* The mapping table from slot id to the index in memslots[]. */
>  	short id_to_index[KVM_MEM_SLOTS_NUM];
>  	atomic_t lru_slot;
>  	int used_slots;
> +	struct kvm_memory_slot memslots[];

This patch is tested so I believe this works, however normally I need
to do similar thing with [0] otherwise gcc might complaint.  Is there
any trick behind to make this work?  Or is that because of different
gcc versions?

>  };
>  
>  struct kvm {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9b614cf2ca20..ed392ce64e59 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -565,7 +565,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
>  		return NULL;
>  
>  	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> -		slots->id_to_index[i] = slots->memslots[i].id = -1;
> +		slots->id_to_index[i] = -1;
>  
>  	return slots;
>  }
> @@ -1077,6 +1077,32 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>  	return old_memslots;
>  }
>  
> +/*
> + * Note, at a minimum, the current number of used slots must be allocated, even
> + * when deleting a memslot, as we need a complete duplicate of the memslots for
> + * use when invalidating a memslot prior to deleting/moving the memslot.
> + */
> +static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
> +					     enum kvm_mr_change change)
> +{
> +	struct kvm_memslots *slots;
> +	size_t old_size, new_size;
> +
> +	old_size = sizeof(struct kvm_memslots) +
> +		   (sizeof(struct kvm_memory_slot) * old->used_slots);
> +
> +	if (change == KVM_MR_CREATE)
> +		new_size = old_size + sizeof(struct kvm_memory_slot);
> +	else
> +		new_size = old_size;
> +
> +	slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
> +	if (likely(slots))
> +		memcpy(slots, old, old_size);

(Maybe directly copy into it?)

> +
> +	return slots;
> +}
> +
>  static int kvm_set_memslot(struct kvm *kvm,
>  			   const struct kvm_userspace_memory_region *mem,
>  			   struct kvm_memory_slot *old,
> @@ -1087,10 +1113,9 @@ static int kvm_set_memslot(struct kvm *kvm,
>  	struct kvm_memslots *slots;
>  	int r;
>  
> -	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> +	slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
>  	if (!slots)
>  		return -ENOMEM;
> -	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
>  
>  	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
>  		/*
> -- 
> 2.24.1
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION
  2020-01-21 22:31 ` [PATCH v5 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION Sean Christopherson
@ 2020-02-06 22:30   ` Peter Xu
  2020-02-06 23:11     ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-06 22:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Tue, Jan 21, 2020 at 02:31:57PM -0800, Sean Christopherson wrote:
> Add a KVM selftest to test moving the base gfn of a userspace memory
> region.  Although the basic concept of moving memory regions is not x86
> specific, the assumptions regarding large pages and MMIO shenanigans
> used to verify the correctness make this x86_64 only for the time being.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

(I'm a bit curious why write 2 first before 1...)

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION
  2020-02-06 22:30   ` Peter Xu
@ 2020-02-06 23:11     ` Sean Christopherson
  0 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-02-06 23:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 05:30:01PM -0500, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 02:31:57PM -0800, Sean Christopherson wrote:
> > Add a KVM selftest to test moving the base gfn of a userspace memory
> > region.  Although the basic concept of moving memory regions is not x86
> > specific, the assumptions regarding large pages and MMIO shenanigans
> > used to verify the correctness make this x86_64 only for the time being.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> (I'm a bit curious why write 2 first before 1...)

To verify KVM actually relocated the memslot and didn't leave anything in
the TLB.  If "2" isn't written, KVM could completely botch the MOVE but the
guest_code() would still signal pass because it would eventually see the
0-> transitions.

> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots
  2020-02-06 22:12   ` Peter Xu
@ 2020-02-07 15:38     ` Sean Christopherson
  2020-02-07 16:05       ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 15:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 05:12:08PM -0500, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 02:31:56PM -0800, Sean Christopherson wrote:
> > Now that the memslot logic doesn't assume memslots are always non-NULL,
> > dynamically size the array of memslots instead of unconditionally
> > allocating memory for the maximum number of memslots.
> > 
> > Note, because a to-be-deleted memslot must first be invalidated, the
> > array size cannot be immediately reduced when deleting a memslot.
> > However, consecutive deletions will realize the memory savings, i.e.
> > a second deletion will trim the entry.
> > 
> > Tested-by: Christoffer Dall <christoffer.dall@arm.com>
> > Tested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  include/linux/kvm_host.h |  2 +-
> >  virt/kvm/kvm_main.c      | 31 ++++++++++++++++++++++++++++---
> >  2 files changed, 29 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 60ddfdb69378..8bb6fb127387 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -431,11 +431,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
> >   */
> >  struct kvm_memslots {
> >  	u64 generation;
> > -	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
> >  	/* The mapping table from slot id to the index in memslots[]. */
> >  	short id_to_index[KVM_MEM_SLOTS_NUM];
> >  	atomic_t lru_slot;
> >  	int used_slots;
> > +	struct kvm_memory_slot memslots[];
> 
> This patch is tested so I believe this works, however normally I need
> to do similar thing with [0] otherwise gcc might complaint.  Is there
> any trick behind to make this work?  Or is that because of different
> gcc versions?

array[] and array[0] have the same net affect, but array[] is given special
treatment by gcc to provide extra sanity checks, e.g. requires the field to
be the end of the struct.  Last I checked, gcc also doesn't allow array[]
in unions.  There are probably other restrictions.

But, it's precisely because of those restrictions that using array[] is
preferred, as it provides extra protections, e.g. if someone moved memslots
to the top of the struct it would fail to compile.
 
> >  };
> >  
> >  struct kvm {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 9b614cf2ca20..ed392ce64e59 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -565,7 +565,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
> >  		return NULL;
> >  
> >  	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > -		slots->id_to_index[i] = slots->memslots[i].id = -1;
> > +		slots->id_to_index[i] = -1;
> >  
> >  	return slots;
> >  }
> > @@ -1077,6 +1077,32 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
> >  	return old_memslots;
> >  }
> >  
> > +/*
> > + * Note, at a minimum, the current number of used slots must be allocated, even
> > + * when deleting a memslot, as we need a complete duplicate of the memslots for
> > + * use when invalidating a memslot prior to deleting/moving the memslot.
> > + */
> > +static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
> > +					     enum kvm_mr_change change)
> > +{
> > +	struct kvm_memslots *slots;
> > +	size_t old_size, new_size;
> > +
> > +	old_size = sizeof(struct kvm_memslots) +
> > +		   (sizeof(struct kvm_memory_slot) * old->used_slots);
> > +
> > +	if (change == KVM_MR_CREATE)
> > +		new_size = old_size + sizeof(struct kvm_memory_slot);
> > +	else
> > +		new_size = old_size;
> > +
> > +	slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
> > +	if (likely(slots))
> > +		memcpy(slots, old, old_size);
> 
> (Maybe directly copy into it?)

I don't follow, are you saying do "*slots = *old"?

@new_size and @old_size are not guaranteed to be the same.  More
specifically, slots->memslots and old->slots are now flexible arrays with
potentially different sizes.  Doing "*slots = *old" would only copy the
standard members, a memcpy() would still be needed for @memlots.

A more effecient implementation would be:

	slots = kvalloc(new_size, GFP_KERNEL_ACCOUNT);
	if (likely(slots)) {
		memcpy(slots, old, old_size);
		if (change == KVM_MR_CREATE)
			memset((void *)slots + old_size, 0, new_size - old_size);
	}

to avoid unnecessarily zeroing out the entire thing.  I opted for the
simpler implementation as this is not performance critical code, for most
cases @slots won't be all that large, and I wanted to be absolutely sure
any mixup would hit zeroed memory and not uninitialized memory.

> 
> > +
> > +	return slots;
> > +}
> > +
> >  static int kvm_set_memslot(struct kvm *kvm,
> >  			   const struct kvm_userspace_memory_region *mem,
> >  			   struct kvm_memory_slot *old,
> > @@ -1087,10 +1113,9 @@ static int kvm_set_memslot(struct kvm *kvm,
> >  	struct kvm_memslots *slots;
> >  	int r;
> >  
> > -	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> > +	slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
> >  	if (!slots)
> >  		return -ENOMEM;
> > -	memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
> >  
> >  	if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
> >  		/*
> > -- 
> > 2.24.1
> > 
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots
  2020-02-07 15:38     ` Sean Christopherson
@ 2020-02-07 16:05       ` Peter Xu
  2020-02-07 16:15         ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-07 16:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 07:38:29AM -0800, Sean Christopherson wrote:
> On Thu, Feb 06, 2020 at 05:12:08PM -0500, Peter Xu wrote:
> > On Tue, Jan 21, 2020 at 02:31:56PM -0800, Sean Christopherson wrote:
> > > Now that the memslot logic doesn't assume memslots are always non-NULL,
> > > dynamically size the array of memslots instead of unconditionally
> > > allocating memory for the maximum number of memslots.
> > > 
> > > Note, because a to-be-deleted memslot must first be invalidated, the
> > > array size cannot be immediately reduced when deleting a memslot.
> > > However, consecutive deletions will realize the memory savings, i.e.
> > > a second deletion will trim the entry.
> > > 
> > > Tested-by: Christoffer Dall <christoffer.dall@arm.com>
> > > Tested-by: Marc Zyngier <maz@kernel.org>
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > ---
> > >  include/linux/kvm_host.h |  2 +-
> > >  virt/kvm/kvm_main.c      | 31 ++++++++++++++++++++++++++++---
> > >  2 files changed, 29 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 60ddfdb69378..8bb6fb127387 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -431,11 +431,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
> > >   */
> > >  struct kvm_memslots {
> > >  	u64 generation;
> > > -	struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
> > >  	/* The mapping table from slot id to the index in memslots[]. */
> > >  	short id_to_index[KVM_MEM_SLOTS_NUM];
> > >  	atomic_t lru_slot;
> > >  	int used_slots;
> > > +	struct kvm_memory_slot memslots[];
> > 
> > This patch is tested so I believe this works, however normally I need
> > to do similar thing with [0] otherwise gcc might complaint.  Is there
> > any trick behind to make this work?  Or is that because of different
> > gcc versions?
> 
> array[] and array[0] have the same net affect, but array[] is given special
> treatment by gcc to provide extra sanity checks, e.g. requires the field to
> be the end of the struct.  Last I checked, gcc also doesn't allow array[]
> in unions.  There are probably other restrictions.
> 
> But, it's precisely because of those restrictions that using array[] is
> preferred, as it provides extra protections, e.g. if someone moved memslots
> to the top of the struct it would fail to compile.

However...

xz-x1:tmp $ cat a.c
struct a {
    int s[];
};

int main(void) { }
xz-x1:tmp $ make a
cc     a.c   -o a
a.c:2:9: error: flexible array member in a struct with no named members
    2 |     int s[];
      |         ^
make: *** [<builtin>: a] Error 1

My gcc version is 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC).

>  
> > >  };
> > >  
> > >  struct kvm {
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 9b614cf2ca20..ed392ce64e59 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -565,7 +565,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
> > >  		return NULL;
> > >  
> > >  	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
> > > -		slots->id_to_index[i] = slots->memslots[i].id = -1;
> > > +		slots->id_to_index[i] = -1;
> > >  
> > >  	return slots;
> > >  }
> > > @@ -1077,6 +1077,32 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
> > >  	return old_memslots;
> > >  }
> > >  
> > > +/*
> > > + * Note, at a minimum, the current number of used slots must be allocated, even
> > > + * when deleting a memslot, as we need a complete duplicate of the memslots for
> > > + * use when invalidating a memslot prior to deleting/moving the memslot.
> > > + */
> > > +static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
> > > +					     enum kvm_mr_change change)
> > > +{
> > > +	struct kvm_memslots *slots;
> > > +	size_t old_size, new_size;
> > > +
> > > +	old_size = sizeof(struct kvm_memslots) +
> > > +		   (sizeof(struct kvm_memory_slot) * old->used_slots);
> > > +
> > > +	if (change == KVM_MR_CREATE)
> > > +		new_size = old_size + sizeof(struct kvm_memory_slot);
> > > +	else
> > > +		new_size = old_size;
> > > +
> > > +	slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
> > > +	if (likely(slots))
> > > +		memcpy(slots, old, old_size);
> > 
> > (Maybe directly copy into it?)
> 
> I don't follow, are you saying do "*slots = *old"?
> 
> @new_size and @old_size are not guaranteed to be the same.  More
> specifically, slots->memslots and old->slots are now flexible arrays with
> potentially different sizes.  Doing "*slots = *old" would only copy the
> standard members, a memcpy() would still be needed for @memlots.
> 
> A more effecient implementation would be:
> 
> 	slots = kvalloc(new_size, GFP_KERNEL_ACCOUNT);
> 	if (likely(slots)) {
> 		memcpy(slots, old, old_size);
> 		if (change == KVM_MR_CREATE)
> 			memset((void *)slots + old_size, 0, new_size - old_size);
> 	}
> 
> to avoid unnecessarily zeroing out the entire thing.  I opted for the
> simpler implementation as this is not performance critical code, for most
> cases @slots won't be all that large, and I wanted to be absolutely sure
> any mixup would hit zeroed memory and not uninitialized memory.

I made a silly mistake on reading "slots" as "old".  Ignore my
comment, sorry!  And please take my R-b for this patch too:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots
  2020-02-07 16:05       ` Peter Xu
@ 2020-02-07 16:15         ` Sean Christopherson
  2020-02-07 16:37           ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 16:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 11:05:46AM -0500, Peter Xu wrote:
> On Fri, Feb 07, 2020 at 07:38:29AM -0800, Sean Christopherson wrote:
> > On Thu, Feb 06, 2020 at 05:12:08PM -0500, Peter Xu wrote:
> > > This patch is tested so I believe this works, however normally I need
> > > to do similar thing with [0] otherwise gcc might complaint.  Is there
> > > any trick behind to make this work?  Or is that because of different
> > > gcc versions?
> > 
> > array[] and array[0] have the same net affect, but array[] is given special
> > treatment by gcc to provide extra sanity checks, e.g. requires the field to
> > be the end of the struct.  Last I checked, gcc also doesn't allow array[]
> > in unions.  There are probably other restrictions.
> > 
> > But, it's precisely because of those restrictions that using array[] is
> > preferred, as it provides extra protections, e.g. if someone moved memslots
> > to the top of the struct it would fail to compile.
> 
> However...
> 
> xz-x1:tmp $ cat a.c
> struct a {
>     int s[];
> };
> 
> int main(void) { }
> xz-x1:tmp $ make a
> cc     a.c   -o a
> a.c:2:9: error: flexible array member in a struct with no named members
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

gcc is telling you quite explicitly why it's angry.  Copy+paste from the
internet[*]:

  Flexible Array Member(FAM) is a feature introduced in the C99 standard of the
  C programming language.

  For the structures in C programming language from C99 standard onwards, we
  can declare an array without a dimension and whose size is flexible in nature.

  Such an array inside the structure should preferably be declared as the last 
  member of structure and its size is variable(can be changed be at runtime).
  
  The structure must contain at least one more named member in addition to the
  flexible array member. 

[*] https://www.geeksforgeeks.org/flexible-array-members-structure-c/

>     2 |     int s[];
>       |         ^
> make: *** [<builtin>: a] Error 1
> 
> My gcc version is 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC).

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots
  2020-02-07 16:15         ` Sean Christopherson
@ 2020-02-07 16:37           ` Peter Xu
  2020-02-07 16:47             ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-07 16:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 08:15:53AM -0800, Sean Christopherson wrote:
> On Fri, Feb 07, 2020 at 11:05:46AM -0500, Peter Xu wrote:
> > On Fri, Feb 07, 2020 at 07:38:29AM -0800, Sean Christopherson wrote:
> > > On Thu, Feb 06, 2020 at 05:12:08PM -0500, Peter Xu wrote:
> > > > This patch is tested so I believe this works, however normally I need
> > > > to do similar thing with [0] otherwise gcc might complaint.  Is there
> > > > any trick behind to make this work?  Or is that because of different
> > > > gcc versions?
> > > 
> > > array[] and array[0] have the same net affect, but array[] is given special
> > > treatment by gcc to provide extra sanity checks, e.g. requires the field to
> > > be the end of the struct.  Last I checked, gcc also doesn't allow array[]
> > > in unions.  There are probably other restrictions.
> > > 
> > > But, it's precisely because of those restrictions that using array[] is
> > > preferred, as it provides extra protections, e.g. if someone moved memslots
> > > to the top of the struct it would fail to compile.
> > 
> > However...
> > 
> > xz-x1:tmp $ cat a.c
> > struct a {
> >     int s[];
> > };
> > 
> > int main(void) { }
> > xz-x1:tmp $ make a
> > cc     a.c   -o a
> > a.c:2:9: error: flexible array member in a struct with no named members
>                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> gcc is telling you quite explicitly why it's angry.  Copy+paste from the
> internet[*]:
> 
>   Flexible Array Member(FAM) is a feature introduced in the C99 standard of the
>   C programming language.
> 
>   For the structures in C programming language from C99 standard onwards, we
>   can declare an array without a dimension and whose size is flexible in nature.
> 
>   Such an array inside the structure should preferably be declared as the last 
>   member of structure and its size is variable(can be changed be at runtime).
>   
>   The structure must contain at least one more named member in addition to the
>   flexible array member. 
> 
> [*] https://www.geeksforgeeks.org/flexible-array-members-structure-c/

Sorry again for not being able to identify the meaning of that
sentence myself.  My English is probably even worse than I thought...

So I think my r-b keeps.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots
  2020-02-07 16:37           ` Peter Xu
@ 2020-02-07 16:47             ` Sean Christopherson
  0 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 16:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 11:37:40AM -0500, Peter Xu wrote:
> Sorry again for not being able to identify the meaning of that
> sentence myself.

No worries.  

> My English is probably even worse than I thought...

I'm pretty sure this is true for everyone :-)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 12/19] KVM: Move memslot deletion to helper function
  2020-02-06 16:51       ` Peter Xu
@ 2020-02-07 17:59         ` Sean Christopherson
  2020-02-07 18:07           ` Sean Christopherson
  2020-02-07 18:17           ` Peter Xu
  0 siblings, 2 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 17:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 11:51:16AM -0500, Peter Xu wrote:
> On Thu, Feb 06, 2020 at 08:28:18AM -0800, Sean Christopherson wrote:
> > On Thu, Feb 06, 2020 at 11:14:15AM -0500, Peter Xu wrote:
> > > On Tue, Jan 21, 2020 at 02:31:50PM -0800, Sean Christopherson wrote:
> > > > Move memslot deletion into its own routine so that the success path for
> > > > other memslot updates does not need to use kvm_free_memslot(), i.e. can
> > > > explicitly destroy the dirty bitmap when necessary.  This paves the way
> > > > for dropping @dont from kvm_free_memslot(), i.e. all callers now pass
> > > > NULL for @dont.
> > > > 
> > > > Add a comment above the code to make a copy of the existing memslot
> > > > prior to deletion, it is not at all obvious that the pointer will become
> > > > stale during sorting and/or installation of new memslots.
> > > 
> > > Could you help explain a bit on this explicit comment?  I can follow
> > > up with the patch itself which looks all correct to me, but I failed
> > > to catch what this extra comment wants to emphasize...
> > 
> > It's tempting to write the code like this (I know, because I did it):
> > 
> > 	if (!mem->memory_size)
> > 		return kvm_delete_memslot(kvm, mem, slot, as_id);
> > 
> > 	new = *slot;
> > 
> > Where @slot is a pointer to the memslot to be deleted.  At first, second,
> > and third glances, this seems perfectly sane.
> > 
> > The issue is that slot was pulled from struct kvm_memslots.memslots, e.g.
> > 
> > 	slot = &slots->memslots[index];
> > 
> > Note that slots->memslots holds actual "struct kvm_memory_slot" objects,
> > not pointers to slots.  When update_memslots() sorts the slots, it swaps
> > the actual slot objects, not pointers.  I.e. after update_memslots(), even
> > though @slot points at the same address, it's could be pointing at a
> > different slot.  As a result kvm_free_memslot() in kvm_delete_memslot()
> > will free the dirty page info and arch-specific points for some random
> > slot, not the intended slot, and will set npages=0 for that random slot.
> 
> Ah I see, thanks.  Another alternative is we move the "old = *slot"
> copy into kvm_delete_memslot(), which could be even clearer imo.

The copy is also needed in __kvm_set_memory_region() for the MOVE case.

> However I'm not sure whether it's a good idea to drop the test-by for
> this.  Considering that comment change should not affect it, would you
> mind enrich the comment into something like this (or anything better)?
> 
> /*
>  * Make a full copy of the old memslot, the pointer will become stale
>  * when the memslots are re-sorted by update_memslots() in
>  * kvm_delete_memslot(), while to make the kvm_free_memslot() work as
>  * expected later on, we still need the cached memory slot.
>  */

As above, it's more subtle than just the kvm_delete_memslot() case.

	/*
	 * Make a full copy of the old memslot, the pointer will become stale
	 * when the memslots are re-sorted by update_memslots() when deleting
	 * or moving a memslot, and additional modifications to the old memslot
	 * need to be made after calling update_memslots().
	 */

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 12/19] KVM: Move memslot deletion to helper function
  2020-02-07 17:59         ` Sean Christopherson
@ 2020-02-07 18:07           ` Sean Christopherson
  2020-02-07 18:17           ` Peter Xu
  1 sibling, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 18:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 09:59:12AM -0800, Sean Christopherson wrote:
> On Thu, Feb 06, 2020 at 11:51:16AM -0500, Peter Xu wrote:
> > /*
> >  * Make a full copy of the old memslot, the pointer will become stale
> >  * when the memslots are re-sorted by update_memslots() in
> >  * kvm_delete_memslot(), while to make the kvm_free_memslot() work as
> >  * expected later on, we still need the cached memory slot.
> >  */
> 
> As above, it's more subtle than just the kvm_delete_memslot() case.
> 
> 	/*
> 	 * Make a full copy of the old memslot, the pointer will become stale
> 	 * when the memslots are re-sorted by update_memslots() when deleting
> 	 * or moving a memslot, and additional modifications to the old memslot
> 	 * need to be made after calling update_memslots().
> 	 */

Actually, that's not quite correct, as the same is true for all memslot
updates, and we still query @old after update_memslots() for CREATE and
FLAGS.  This is better.

        /*
         * Make a full copy of the old memslot, the pointer will become stale
         * when the memslots are re-sorted by update_memslots(), and the old
	 * memslot needs to be referenced after calling update_memslots(), e.g.
	 * to free its resources and for arch specific behavior.
         */


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 12/19] KVM: Move memslot deletion to helper function
  2020-02-07 17:59         ` Sean Christopherson
  2020-02-07 18:07           ` Sean Christopherson
@ 2020-02-07 18:17           ` Peter Xu
  1 sibling, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-07 18:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 09:59:12AM -0800, Sean Christopherson wrote:
> On Thu, Feb 06, 2020 at 11:51:16AM -0500, Peter Xu wrote:
> > On Thu, Feb 06, 2020 at 08:28:18AM -0800, Sean Christopherson wrote:
> > > On Thu, Feb 06, 2020 at 11:14:15AM -0500, Peter Xu wrote:
> > > > On Tue, Jan 21, 2020 at 02:31:50PM -0800, Sean Christopherson wrote:
> > > > > Move memslot deletion into its own routine so that the success path for
> > > > > other memslot updates does not need to use kvm_free_memslot(), i.e. can
> > > > > explicitly destroy the dirty bitmap when necessary.  This paves the way
> > > > > for dropping @dont from kvm_free_memslot(), i.e. all callers now pass
> > > > > NULL for @dont.
> > > > > 
> > > > > Add a comment above the code to make a copy of the existing memslot
> > > > > prior to deletion, it is not at all obvious that the pointer will become
> > > > > stale during sorting and/or installation of new memslots.
> > > > 
> > > > Could you help explain a bit on this explicit comment?  I can follow
> > > > up with the patch itself which looks all correct to me, but I failed
> > > > to catch what this extra comment wants to emphasize...
> > > 
> > > It's tempting to write the code like this (I know, because I did it):
> > > 
> > > 	if (!mem->memory_size)
> > > 		return kvm_delete_memslot(kvm, mem, slot, as_id);
> > > 
> > > 	new = *slot;
> > > 
> > > Where @slot is a pointer to the memslot to be deleted.  At first, second,
> > > and third glances, this seems perfectly sane.
> > > 
> > > The issue is that slot was pulled from struct kvm_memslots.memslots, e.g.
> > > 
> > > 	slot = &slots->memslots[index];
> > > 
> > > Note that slots->memslots holds actual "struct kvm_memory_slot" objects,
> > > not pointers to slots.  When update_memslots() sorts the slots, it swaps
> > > the actual slot objects, not pointers.  I.e. after update_memslots(), even
> > > though @slot points at the same address, it's could be pointing at a
> > > different slot.  As a result kvm_free_memslot() in kvm_delete_memslot()
> > > will free the dirty page info and arch-specific points for some random
> > > slot, not the intended slot, and will set npages=0 for that random slot.
> > 
> > Ah I see, thanks.  Another alternative is we move the "old = *slot"
> > copy into kvm_delete_memslot(), which could be even clearer imo.
> 
> The copy is also needed in __kvm_set_memory_region() for the MOVE case.

Right.  I actually meant to do all "old = *slot" in any function we
need to cache the data explicitly, with that we also need another one
after calling kvm_delete_memslot() for move.  But with the comment as
you suggested below it looks good to me too.

Thanks,

> 
> > However I'm not sure whether it's a good idea to drop the test-by for
> > this.  Considering that comment change should not affect it, would you
> > mind enrich the comment into something like this (or anything better)?
> > 
> > /*
> >  * Make a full copy of the old memslot, the pointer will become stale
> >  * when the memslots are re-sorted by update_memslots() in
> >  * kvm_delete_memslot(), while to make the kvm_free_memslot() work as
> >  * expected later on, we still need the cached memory slot.
> >  */
> 
> As above, it's more subtle than just the kvm_delete_memslot() case.
> 
> 	/*
> 	 * Make a full copy of the old memslot, the pointer will become stale
> 	 * when the memslots are re-sorted by update_memslots() when deleting
> 	 * or moving a memslot, and additional modifications to the old memslot
> 	 * need to be made after calling update_memslots().
> 	 */
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-02-06 21:09   ` Peter Xu
@ 2020-02-07 18:33     ` Sean Christopherson
  2020-02-07 20:39       ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 18:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Thu, Feb 06, 2020 at 04:09:44PM -0500, Peter Xu wrote:
> On Tue, Jan 21, 2020 at 02:31:55PM -0800, Sean Christopherson wrote:
> > @@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> >  		if (IS_ERR((void *)hva))
> >  			return PTR_ERR((void *)hva);
> >  	} else {
> > -		if (!slot->npages)
> > +		if (!slot || !slot->npages)
> >  			return 0;
> >  
> > -		hva = 0;
> > +		hva = slot->userspace_addr;
> 
> Is this intended?

Yes.  It's possible to allow VA=0 for userspace mappings.  It's extremely
uncommon, but possible.  Therefore "hva == 0" shouldn't be used to
indicate an invalid slot.

> > +		old_npages = slot->npages;
> >  	}
> >  
> > -	old = *slot;
> >  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> >  		struct kvm_userspace_memory_region m;
> >  

...

> > @@ -869,63 +869,162 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
> >  }
> >  
> >  /*
> > - * Insert memslot and re-sort memslots based on their GFN,
> > - * so binary search could be used to lookup GFN.
> > - * Sorting algorithm takes advantage of having initially
> > - * sorted array and known changed memslot position.
> > + * Delete a memslot by decrementing the number of used slots and shifting all
> > + * other entries in the array forward one spot.
> > + */
> > +static inline void kvm_memslot_delete(struct kvm_memslots *slots,
> > +				      struct kvm_memory_slot *memslot)
> > +{
> > +	struct kvm_memory_slot *mslots = slots->memslots;
> > +	int i;
> > +
> > +	if (WARN_ON(slots->id_to_index[memslot->id] == -1))
> > +		return;
> > +
> > +	slots->used_slots--;
> > +
> > +	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots; i++) {
> > +		mslots[i] = mslots[i + 1];
> > +		slots->id_to_index[mslots[i].id] = i;
> > +	}
> > +	mslots[i] = *memslot;
> > +	slots->id_to_index[memslot->id] = -1;
> > +}
> > +
> > +/*
> > + * "Insert" a new memslot by incrementing the number of used slots.  Returns
> > + * the new slot's initial index into the memslots array.
> > + */
> > +static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
> 
> The naming here didn't help me to understand but a bit more
> confused...
> 
> How about "kvm_memslot_insert_end"?  Or even unwrap it.

It's not guaranteed to be the end, as there could be multiple unused
entries at the back of the array.  I agree the naming isn't perfect, but
IMO it's the least crappy option and will be familiar to anyone with C++
STL (and other languages?) experience.  Arguably it would be better to
follow kernel naming for lists, e.g. head/tail, but there are no
convenient adverbs for the move helpers, e.g. kvm_memslot_move_backward()
would be kvm_memslot_move_towards_tail().

I'm very strongly opposed to unwrapping it.

The code would look like this.  Without a beefy comment, the high level
semantics of the KVM_MR_CREATE case are not at all clear.  Adding a
comment gets messy because putting it above the entire if-else makes it
difficult to understand that its *only* for the CREATE case, and I hate
having multi-line comments in if-else statements without brackets.

                if (change == KVM_MR_CREATE)
                        i = slots->used_slots++
                else
                        i = kvm_memslot_move_backward(slots, memslot);

> > +{
> > +	return slots->used_slots++;
> > +}
> > +
> > +/*
> > + * Move a changed memslot backwards in the array by shifting existing slots
> > + * with a higher GFN toward the front of the array.  Note, the changed memslot
> > + * itself is not preserved in the array, i.e. not swapped at this time, only
> > + * its new index into the array is tracked.  Returns the changed memslot's
> > + * current index into the memslots array.
> > + */
> > +static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
> > +					    struct kvm_memory_slot *memslot)
> 
> "backward" makes me feel like it's moving towards smaller index,
> instead it's moving to bigger index.  Same applies to "forward" below.
> I'm not sure whether I'm the only one, though...

Move forward towards the front, and backward towards the back.  In the
languages I am familiar with, e.g. C++ STL, JavaScript, Python, and Golang,
front==container[0] and back==container[len() - 1].

> > +{
> > +	struct kvm_memory_slot *mslots = slots->memslots;
> > +	int i;
> > +
> > +	if (WARN_ON_ONCE(slots->id_to_index[memslot->id] == -1) ||
> > +	    WARN_ON_ONCE(!slots->used_slots))
> > +		return -1;
> > +
> > +	/*
> > +	 * Move the target memslot backward in the array by shifting existing
> > +	 * memslots with a higher GFN (than the target memslot) towards the
> > +	 * front of the array.
> > +	 */
> > +	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
> > +		if (memslot->base_gfn > mslots[i + 1].base_gfn)
> > +			break;
> > +
> > +		WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);
> 
> Will this trigger?  Note that in __kvm_set_memory_region() we have
> already checked overlap of memslots.

If you screw up the code it will :-)  In a perfect world, no WARN() will
*ever* trigger.  All of the added WARN_ON_ONCE() are to help the next poor
soul that wants to modify this code.
 
> > +
> > +		/* Shift the next memslot forward one and update its index. */
> > +		mslots[i] = mslots[i + 1];
> > +		slots->id_to_index[mslots[i].id] = i;
> > +	}
> > +	return i;
> > +}
> > @@ -1104,8 +1203,13 @@ int __kvm_set_memory_region(struct kvm *kvm,

...

> >  	 * when the memslots are re-sorted by update_memslots().
> >  	 */
> >  	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> > -	old = *tmp;
> > -	tmp = NULL;
> 
> I was confused in that patch, then...
> 
> > +	if (tmp) {
> > +		old = *tmp;
> > +		tmp = NULL;
> 
> ... now I still don't know why it needs to set to NULL?

To make it abundantly clear that though shall not use @tmp, i.e. to force
using the copy and not the pointer.  Note, @tmp is also reused as an
iterator below.

> 
> > +	} else {
> > +		memset(&old, 0, sizeof(old));
> > +		old.id = id;
> > +	}
> >  
> >  	if (!mem->memory_size)
> >  		return kvm_delete_memslot(kvm, mem, &old, as_id);
> > @@ -1223,7 +1327,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
> >  
> >  	slots = __kvm_memslots(kvm, as_id);
> >  	*memslot = id_to_memslot(slots, id);
> > -	if (!(*memslot)->dirty_bitmap)
> > +	if (!(*memslot) || !(*memslot)->dirty_bitmap)
> >  		return -ENOENT;
> >  
> >  	kvm_arch_sync_dirty_log(kvm, *memslot);
> > @@ -1281,10 +1385,10 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
> >  
> >  	slots = __kvm_memslots(kvm, as_id);
> >  	memslot = id_to_memslot(slots, id);
> > +	if (!memslot || !memslot->dirty_bitmap)
> > +		return -ENOENT;
> >  
> >  	dirty_bitmap = memslot->dirty_bitmap;
> > -	if (!dirty_bitmap)
> > -		return -ENOENT;
> >  
> >  	kvm_arch_sync_dirty_log(kvm, memslot);
> >  
> > @@ -1392,10 +1496,10 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
> >  
> >  	slots = __kvm_memslots(kvm, as_id);
> >  	memslot = id_to_memslot(slots, id);
> > +	if (!memslot || !memslot->dirty_bitmap)
> > +		return -ENOENT;
> >  
> >  	dirty_bitmap = memslot->dirty_bitmap;
> > -	if (!dirty_bitmap)
> > -		return -ENOENT;
> >  
> >  	n = ALIGN(log->num_pages, BITS_PER_LONG) / 8;
> >  
> > -- 
> > 2.24.1
> > 
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-06 21:41       ` Peter Xu
@ 2020-02-07 19:45         ` Sean Christopherson
  2020-02-08  0:18           ` Peter Xu
  2020-02-17 15:35           ` Vitaly Kuznetsov
  0 siblings, 2 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 19:45 UTC (permalink / raw)
  To: Peter Xu, Vitaly Kuznetsov
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

+Vitaly for HyperV

On Thu, Feb 06, 2020 at 04:41:06PM -0500, Peter Xu wrote:
> On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
> > On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
> > > But that matters to this patch because if MIPS can use
> > > kvm_flush_remote_tlbs(), then we probably don't need this
> > > arch-specific hook any more and we can directly call
> > > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
> > 
> > Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
> > that prevents calling kvm_flush_remote_tlbs() directly, but I have no
> > clue as to the important of that code.
> 
> As said above I think the x86 lockdep is really not necessary, then
> considering MIPS could be the only one that will use the new hook
> introduced in this patch...  Shall we figure that out first?

So I prepped a follow-up patch to make kvm_arch_dirty_log_tlb_flush() a
MIPS-only hook and use kvm_flush_remote_tlbs() directly for arm and x86,
but then I realized x86 *has* a hook to do a precise remote TLB flush.
There's even an existing kvm_flush_remote_tlbs_with_address() call on a
memslot, i.e. this exact scenario.  So arguably, x86 should be using the
more precise flush and should keep kvm_arch_dirty_log_tlb_flush().

But, the hook is only used when KVM is running as an L1 on top of HyperV,
and I assume dirty logging isn't used much, if at all, for L1 KVM on
HyperV?

I see three options:

  1. Make kvm_arch_dirty_log_tlb_flush() MIPS-only and call
     kvm_flush_remote_tlbs() directly for arm and x86.  Add comments to
     explain when an arch should implement kvm_arch_dirty_log_tlb_flush().

  2. Change x86 to use kvm_flush_remote_tlbs_with_address() when flushing
     a memslot after the dirty log is grabbed by userspace.

  3. Keep the resulting code as is, but add a comment in x86's
     kvm_arch_dirty_log_tlb_flush() to explain why it uses
     kvm_flush_remote_tlbs() instead of the with_address() variant.

I strongly prefer to (2) or (3), but I'll defer to Vitaly as to which of
those is preferable.

I don't like (1) because (a) it requires more lines code (well comments),
to explain why kvm_flush_remote_tlbs() is the default, and (b) it would
require even more comments, which would be x86-specific in generic KVM,
to explain why x86 doesn't use its with_address() flush, or we'd lost that
info altogether.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-02-07 18:33     ` Sean Christopherson
@ 2020-02-07 20:39       ` Peter Xu
  2020-02-07 21:10         ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-07 20:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 10:33:25AM -0800, Sean Christopherson wrote:
> On Thu, Feb 06, 2020 at 04:09:44PM -0500, Peter Xu wrote:
> > On Tue, Jan 21, 2020 at 02:31:55PM -0800, Sean Christopherson wrote:
> > > @@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> > >  		if (IS_ERR((void *)hva))
> > >  			return PTR_ERR((void *)hva);
> > >  	} else {
> > > -		if (!slot->npages)
> > > +		if (!slot || !slot->npages)
> > >  			return 0;
> > >  
> > > -		hva = 0;
> > > +		hva = slot->userspace_addr;
> > 
> > Is this intended?
> 
> Yes.  It's possible to allow VA=0 for userspace mappings.  It's extremely
> uncommon, but possible.  Therefore "hva == 0" shouldn't be used to
> indicate an invalid slot.

Note that this is the deletion path in __x86_set_memory_region() not
allocation.  IIUC userspace_addr won't even be used in follow up code
path so it shouldn't really matter.  Or am I misunderstood somewhere?

> 
> > > +		old_npages = slot->npages;
> > >  	}
> > >  
> > > -	old = *slot;
> > >  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > >  		struct kvm_userspace_memory_region m;
> > >  
> 
> ...
> 
> > > @@ -869,63 +869,162 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
> > >  }
> > >  
> > >  /*
> > > - * Insert memslot and re-sort memslots based on their GFN,
> > > - * so binary search could be used to lookup GFN.
> > > - * Sorting algorithm takes advantage of having initially
> > > - * sorted array and known changed memslot position.
> > > + * Delete a memslot by decrementing the number of used slots and shifting all
> > > + * other entries in the array forward one spot.
> > > + */
> > > +static inline void kvm_memslot_delete(struct kvm_memslots *slots,
> > > +				      struct kvm_memory_slot *memslot)
> > > +{
> > > +	struct kvm_memory_slot *mslots = slots->memslots;
> > > +	int i;
> > > +
> > > +	if (WARN_ON(slots->id_to_index[memslot->id] == -1))
> > > +		return;
> > > +
> > > +	slots->used_slots--;
> > > +
> > > +	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots; i++) {
> > > +		mslots[i] = mslots[i + 1];
> > > +		slots->id_to_index[mslots[i].id] = i;
> > > +	}
> > > +	mslots[i] = *memslot;
> > > +	slots->id_to_index[memslot->id] = -1;
> > > +}
> > > +
> > > +/*
> > > + * "Insert" a new memslot by incrementing the number of used slots.  Returns
> > > + * the new slot's initial index into the memslots array.
> > > + */
> > > +static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
> > 
> > The naming here didn't help me to understand but a bit more
> > confused...
> > 
> > How about "kvm_memslot_insert_end"?  Or even unwrap it.
> 
> It's not guaranteed to be the end, as there could be multiple unused
> entries at the back of the array.  I agree the naming isn't perfect, but
> IMO it's the least crappy option and will be familiar to anyone with C++
> STL (and other languages?) experience.  Arguably it would be better to
> follow kernel naming for lists, e.g. head/tail, but there are no
> convenient adverbs for the move helpers, e.g. kvm_memslot_move_backward()
> would be kvm_memslot_move_towards_tail().
> 
> I'm very strongly opposed to unwrapping it.
> 
> The code would look like this.  Without a beefy comment, the high level
> semantics of the KVM_MR_CREATE case are not at all clear.  Adding a
> comment gets messy because putting it above the entire if-else makes it
> difficult to understand that its *only* for the CREATE case, and I hate
> having multi-line comments in if-else statements without brackets.
> 
>                 if (change == KVM_MR_CREATE)
>                         i = slots->used_slots++
>                 else
>                         i = kvm_memslot_move_backward(slots, memslot);

This is made too complicated, imho... A one-liner comment would be
clear enough to me.  :)

Please feel free to keep the original code as you wish.

> 
> > > +{
> > > +	return slots->used_slots++;
> > > +}
> > > +
> > > +/*
> > > + * Move a changed memslot backwards in the array by shifting existing slots
> > > + * with a higher GFN toward the front of the array.  Note, the changed memslot
> > > + * itself is not preserved in the array, i.e. not swapped at this time, only
> > > + * its new index into the array is tracked.  Returns the changed memslot's
> > > + * current index into the memslots array.
> > > + */
> > > +static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
> > > +					    struct kvm_memory_slot *memslot)
> > 
> > "backward" makes me feel like it's moving towards smaller index,
> > instead it's moving to bigger index.  Same applies to "forward" below.
> > I'm not sure whether I'm the only one, though...
> 
> Move forward towards the front, and backward towards the back.  In the
> languages I am familiar with, e.g. C++ STL, JavaScript, Python, and Golang,
> front==container[0] and back==container[len() - 1].

OK.

> 
> > > +{
> > > +	struct kvm_memory_slot *mslots = slots->memslots;
> > > +	int i;
> > > +
> > > +	if (WARN_ON_ONCE(slots->id_to_index[memslot->id] == -1) ||
> > > +	    WARN_ON_ONCE(!slots->used_slots))
> > > +		return -1;
> > > +
> > > +	/*
> > > +	 * Move the target memslot backward in the array by shifting existing
> > > +	 * memslots with a higher GFN (than the target memslot) towards the
> > > +	 * front of the array.
> > > +	 */
> > > +	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
> > > +		if (memslot->base_gfn > mslots[i + 1].base_gfn)
> > > +			break;
> > > +
> > > +		WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);
> > 
> > Will this trigger?  Note that in __kvm_set_memory_region() we have
> > already checked overlap of memslots.
> 
> If you screw up the code it will :-)  In a perfect world, no WARN() will
> *ever* trigger.  All of the added WARN_ON_ONCE() are to help the next poor
> soul that wants to modify this code.

I normally won't keep WARN_ON if it is 100% not triggering (100% here
I mean when e.g. it is checked twice so the 1st one will definitely
trigger first).  My question is more like a pure question in case I
overlooked something.  Please also feel free to keep it if you want.

>  
> > > +
> > > +		/* Shift the next memslot forward one and update its index. */
> > > +		mslots[i] = mslots[i + 1];
> > > +		slots->id_to_index[mslots[i].id] = i;
> > > +	}
> > > +	return i;
> > > +}
> > > @@ -1104,8 +1203,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
> 
> ...
> 
> > >  	 * when the memslots are re-sorted by update_memslots().
> > >  	 */
> > >  	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> > > -	old = *tmp;
> > > -	tmp = NULL;
> > 
> > I was confused in that patch, then...
> > 
> > > +	if (tmp) {
> > > +		old = *tmp;
> > > +		tmp = NULL;
> > 
> > ... now I still don't know why it needs to set to NULL?
> 
> To make it abundantly clear that though shall not use @tmp, i.e. to force
> using the copy and not the pointer.  Note, @tmp is also reused as an
> iterator below.

OK it still feels a bit strange, say, we can comment on that if you
wants to warn the others.  The difference is probably no useless
instruction executed.  But this is also trivial, I'll leave to the
others to judge.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-02-07 20:39       ` Peter Xu
@ 2020-02-07 21:10         ` Sean Christopherson
  2020-02-07 21:46           ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 21:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 03:39:09PM -0500, Peter Xu wrote:
> On Fri, Feb 07, 2020 at 10:33:25AM -0800, Sean Christopherson wrote:
> > On Thu, Feb 06, 2020 at 04:09:44PM -0500, Peter Xu wrote:
> > > On Tue, Jan 21, 2020 at 02:31:55PM -0800, Sean Christopherson wrote:
> > > > @@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> > > >  		if (IS_ERR((void *)hva))
> > > >  			return PTR_ERR((void *)hva);
> > > >  	} else {
> > > > -		if (!slot->npages)
> > > > +		if (!slot || !slot->npages)
> > > >  			return 0;
> > > >  
> > > > -		hva = 0;
> > > > +		hva = slot->userspace_addr;
> > > 
> > > Is this intended?
> > 
> > Yes.  It's possible to allow VA=0 for userspace mappings.  It's extremely
> > uncommon, but possible.  Therefore "hva == 0" shouldn't be used to
> > indicate an invalid slot.
> 
> Note that this is the deletion path in __x86_set_memory_region() not
> allocation.  IIUC userspace_addr won't even be used in follow up code
> path so it shouldn't really matter.  Or am I misunderstood somewhere?

No, but that's precisely why I don't want to zero out @hva, as doing so
implies that '0' indicates an invalid hva, which is wrong.

What if I change this to 

			hva = 0xdeadull << 48;

and add a blurb in the changelog about stuff hva with a non-canonical value
to indicate it's being destroyed.

> > > > +		old_npages = slot->npages;
> > > >  	}
> > > >  
> > > > -	old = *slot;
> > > >  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > > >  		struct kvm_userspace_memory_region m;
> > > >  

...

> > > > +{
> > > > +	struct kvm_memory_slot *mslots = slots->memslots;
> > > > +	int i;
> > > > +
> > > > +	if (WARN_ON_ONCE(slots->id_to_index[memslot->id] == -1) ||
> > > > +	    WARN_ON_ONCE(!slots->used_slots))
> > > > +		return -1;
> > > > +
> > > > +	/*
> > > > +	 * Move the target memslot backward in the array by shifting existing
> > > > +	 * memslots with a higher GFN (than the target memslot) towards the
> > > > +	 * front of the array.
> > > > +	 */
> > > > +	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
> > > > +		if (memslot->base_gfn > mslots[i + 1].base_gfn)
> > > > +			break;
> > > > +
> > > > +		WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);
> > > 
> > > Will this trigger?  Note that in __kvm_set_memory_region() we have
> > > already checked overlap of memslots.
> > 
> > If you screw up the code it will :-)  In a perfect world, no WARN() will
> > *ever* trigger.  All of the added WARN_ON_ONCE() are to help the next poor
> > soul that wants to modify this code.
> 
> I normally won't keep WARN_ON if it is 100% not triggering (100% here
> I mean when e.g. it is checked twice so the 1st one will definitely
> trigger first).  My question is more like a pure question in case I
> overlooked something.  Please also feel free to keep it if you want.

Ah.  The WARNs here as much to concisely document the assumptions and
conditions of the code as they are there to enforce those conditions.

> > > > +
> > > > +		/* Shift the next memslot forward one and update its index. */
> > > > +		mslots[i] = mslots[i + 1];
s> > > > +		slots->id_to_index[mslots[i].id] = i;
> > > > +	}
> > > > +	return i;
> > > > +}
> > > > @@ -1104,8 +1203,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > 
> > ...
> > 
> > > >  	 * when the memslots are re-sorted by update_memslots().
> > > >  	 */
> > > >  	tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> > > > -	old = *tmp;
> > > > -	tmp = NULL;
> > > 
> > > I was confused in that patch, then...
> > > 
> > > > +	if (tmp) {
> > > > +		old = *tmp;
> > > > +		tmp = NULL;
> > > 
> > > ... now I still don't know why it needs to set to NULL?
> > 
> > To make it abundantly clear that though shall not use @tmp, i.e. to force
> > using the copy and not the pointer.  Note, @tmp is also reused as an
> > iterator below.
> 
> OK it still feels a bit strange, say, we can comment on that if you
> wants to warn the others.  The difference is probably no useless
> instruction executed.  But this is also trivial, I'll leave to the
> others to judge.

After having suffered through deciphering this code and blundering into
nasty gotchas more than once, I'd really like to keep the nullification.
I'll add a comment to explain that the sole purpose is to kill @tmp so it
can't be used incorrectly and thus cause silent failure.

This is also another reason I'd like to keep the WARN_ONs.  When this code
goes awry, the result is usually silent corruption and delayed explosions,
i.e. failures that absolutely suck to debug.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-02-07 21:10         ` Sean Christopherson
@ 2020-02-07 21:46           ` Peter Xu
  2020-02-07 22:03             ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-07 21:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 01:10:16PM -0800, Sean Christopherson wrote:
> On Fri, Feb 07, 2020 at 03:39:09PM -0500, Peter Xu wrote:
> > On Fri, Feb 07, 2020 at 10:33:25AM -0800, Sean Christopherson wrote:
> > > On Thu, Feb 06, 2020 at 04:09:44PM -0500, Peter Xu wrote:
> > > > On Tue, Jan 21, 2020 at 02:31:55PM -0800, Sean Christopherson wrote:
> > > > > @@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> > > > >  		if (IS_ERR((void *)hva))
> > > > >  			return PTR_ERR((void *)hva);
> > > > >  	} else {
> > > > > -		if (!slot->npages)
> > > > > +		if (!slot || !slot->npages)
> > > > >  			return 0;
> > > > >  
> > > > > -		hva = 0;
> > > > > +		hva = slot->userspace_addr;
> > > > 
> > > > Is this intended?
> > > 
> > > Yes.  It's possible to allow VA=0 for userspace mappings.  It's extremely
> > > uncommon, but possible.  Therefore "hva == 0" shouldn't be used to
> > > indicate an invalid slot.
> > 
> > Note that this is the deletion path in __x86_set_memory_region() not
> > allocation.  IIUC userspace_addr won't even be used in follow up code
> > path so it shouldn't really matter.  Or am I misunderstood somewhere?
> 
> No, but that's precisely why I don't want to zero out @hva, as doing so
> implies that '0' indicates an invalid hva, which is wrong.
> 
> What if I change this to 
> 
> 			hva = 0xdeadull << 48;
> 
> and add a blurb in the changelog about stuff hva with a non-canonical value
> to indicate it's being destroyed.

IMO it's fairly common to have the case where "when A is XXX then
parameters B is invalid" happens in C.  OK feel free to keep any of
these as you prefer (how many times I spoke this only for today? :) as
long as the maintainers are fine with it.  And for sure an extra
comment would always be nice.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-02-07 21:46           ` Peter Xu
@ 2020-02-07 22:03             ` Sean Christopherson
  2020-02-07 22:24               ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-07 22:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 04:46:23PM -0500, Peter Xu wrote:
> On Fri, Feb 07, 2020 at 01:10:16PM -0800, Sean Christopherson wrote:
> > On Fri, Feb 07, 2020 at 03:39:09PM -0500, Peter Xu wrote:
> > > On Fri, Feb 07, 2020 at 10:33:25AM -0800, Sean Christopherson wrote:
> > > > On Thu, Feb 06, 2020 at 04:09:44PM -0500, Peter Xu wrote:
> > > > > On Tue, Jan 21, 2020 at 02:31:55PM -0800, Sean Christopherson wrote:
> > > > > > @@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> > > > > >  		if (IS_ERR((void *)hva))
> > > > > >  			return PTR_ERR((void *)hva);
> > > > > >  	} else {
> > > > > > -		if (!slot->npages)
> > > > > > +		if (!slot || !slot->npages)
> > > > > >  			return 0;
> > > > > >  
> > > > > > -		hva = 0;
> > > > > > +		hva = slot->userspace_addr;
> > > > > 
> > > > > Is this intended?
> > > > 
> > > > Yes.  It's possible to allow VA=0 for userspace mappings.  It's extremely
> > > > uncommon, but possible.  Therefore "hva == 0" shouldn't be used to
> > > > indicate an invalid slot.
> > > 
> > > Note that this is the deletion path in __x86_set_memory_region() not
> > > allocation.  IIUC userspace_addr won't even be used in follow up code
> > > path so it shouldn't really matter.  Or am I misunderstood somewhere?
> > 
> > No, but that's precisely why I don't want to zero out @hva, as doing so
> > implies that '0' indicates an invalid hva, which is wrong.
> > 
> > What if I change this to 
> > 
> > 			hva = 0xdeadull << 48;
> > 
> > and add a blurb in the changelog about stuff hva with a non-canonical value
> > to indicate it's being destroyed.
> 
> IMO it's fairly common to have the case where "when A is XXX then
> parameters B is invalid" happens in C.

I'm not arguing that's not the case.  My point is that there's nothing
special about '0', so why use it?  E.g. "hva = 1" would also be ok from a
functional perspective, but more obviously "wrong".

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots
  2020-02-07 22:03             ` Sean Christopherson
@ 2020-02-07 22:24               ` Peter Xu
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Xu @ 2020-02-07 22:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 02:03:25PM -0800, Sean Christopherson wrote:
> On Fri, Feb 07, 2020 at 04:46:23PM -0500, Peter Xu wrote:
> > On Fri, Feb 07, 2020 at 01:10:16PM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 07, 2020 at 03:39:09PM -0500, Peter Xu wrote:
> > > > On Fri, Feb 07, 2020 at 10:33:25AM -0800, Sean Christopherson wrote:
> > > > > On Thu, Feb 06, 2020 at 04:09:44PM -0500, Peter Xu wrote:
> > > > > > On Tue, Jan 21, 2020 at 02:31:55PM -0800, Sean Christopherson wrote:
> > > > > > > @@ -9652,13 +9652,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
> > > > > > >  		if (IS_ERR((void *)hva))
> > > > > > >  			return PTR_ERR((void *)hva);
> > > > > > >  	} else {
> > > > > > > -		if (!slot->npages)
> > > > > > > +		if (!slot || !slot->npages)
> > > > > > >  			return 0;
> > > > > > >  
> > > > > > > -		hva = 0;
> > > > > > > +		hva = slot->userspace_addr;
> > > > > > 
> > > > > > Is this intended?
> > > > > 
> > > > > Yes.  It's possible to allow VA=0 for userspace mappings.  It's extremely
> > > > > uncommon, but possible.  Therefore "hva == 0" shouldn't be used to
> > > > > indicate an invalid slot.
> > > > 
> > > > Note that this is the deletion path in __x86_set_memory_region() not
> > > > allocation.  IIUC userspace_addr won't even be used in follow up code
> > > > path so it shouldn't really matter.  Or am I misunderstood somewhere?
> > > 
> > > No, but that's precisely why I don't want to zero out @hva, as doing so
> > > implies that '0' indicates an invalid hva, which is wrong.
> > > 
> > > What if I change this to 
> > > 
> > > 			hva = 0xdeadull << 48;
> > > 
> > > and add a blurb in the changelog about stuff hva with a non-canonical value
> > > to indicate it's being destroyed.
> > 
> > IMO it's fairly common to have the case where "when A is XXX then
> > parameters B is invalid" happens in C.
> 
> I'm not arguing that's not the case.  My point is that there's nothing
> special about '0', so why use it?  E.g. "hva = 1" would also be ok from a
> functional perspective, but more obviously "wrong".

I think the answer is as simple as the original author thought 0 was
better than an arbitrary number on the stack, which I agree. :-)

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-07 19:45         ` Sean Christopherson
@ 2020-02-08  0:18           ` Peter Xu
  2020-02-08  0:42             ` Sean Christopherson
  2020-02-17 15:35           ` Vitaly Kuznetsov
  1 sibling, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-08  0:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 11:45:32AM -0800, Sean Christopherson wrote:
> +Vitaly for HyperV
> 
> On Thu, Feb 06, 2020 at 04:41:06PM -0500, Peter Xu wrote:
> > On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
> > > On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
> > > > But that matters to this patch because if MIPS can use
> > > > kvm_flush_remote_tlbs(), then we probably don't need this
> > > > arch-specific hook any more and we can directly call
> > > > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
> > > 
> > > Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
> > > that prevents calling kvm_flush_remote_tlbs() directly, but I have no
> > > clue as to the important of that code.
> > 
> > As said above I think the x86 lockdep is really not necessary, then
> > considering MIPS could be the only one that will use the new hook
> > introduced in this patch...  Shall we figure that out first?
> 
> So I prepped a follow-up patch to make kvm_arch_dirty_log_tlb_flush() a
> MIPS-only hook and use kvm_flush_remote_tlbs() directly for arm and x86,
> but then I realized x86 *has* a hook to do a precise remote TLB flush.
> There's even an existing kvm_flush_remote_tlbs_with_address() call on a
> memslot, i.e. this exact scenario.  So arguably, x86 should be using the
> more precise flush and should keep kvm_arch_dirty_log_tlb_flush().
> 
> But, the hook is only used when KVM is running as an L1 on top of HyperV,
> and I assume dirty logging isn't used much, if at all, for L1 KVM on
> HyperV?
> 
> I see three options:
> 
>   1. Make kvm_arch_dirty_log_tlb_flush() MIPS-only and call
>      kvm_flush_remote_tlbs() directly for arm and x86.  Add comments to
>      explain when an arch should implement kvm_arch_dirty_log_tlb_flush().
> 
>   2. Change x86 to use kvm_flush_remote_tlbs_with_address() when flushing
>      a memslot after the dirty log is grabbed by userspace.
> 
>   3. Keep the resulting code as is, but add a comment in x86's
>      kvm_arch_dirty_log_tlb_flush() to explain why it uses
>      kvm_flush_remote_tlbs() instead of the with_address() variant.
> 
> I strongly prefer to (2) or (3), but I'll defer to Vitaly as to which of
> those is preferable.
> 
> I don't like (1) because (a) it requires more lines code (well comments),
> to explain why kvm_flush_remote_tlbs() is the default, and (b) it would
> require even more comments, which would be x86-specific in generic KVM,
> to explain why x86 doesn't use its with_address() flush, or we'd lost that
> info altogether.
> 

I proposed the 4th solution here:

https://lore.kernel.org/kvm/20200207223520.735523-1-peterx@redhat.com/

I'm not sure whether that's acceptable, but if it can, then we can
drop the kvm_arch_dirty_log_tlb_flush() hook, or even move on to
per-slot tlb flushing.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-08  0:18           ` Peter Xu
@ 2020-02-08  0:42             ` Sean Christopherson
  2020-02-08  0:53               ` Peter Xu
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-08  0:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 07:18:32PM -0500, Peter Xu wrote:
> On Fri, Feb 07, 2020 at 11:45:32AM -0800, Sean Christopherson wrote:
> > +Vitaly for HyperV
> > 
> > On Thu, Feb 06, 2020 at 04:41:06PM -0500, Peter Xu wrote:
> > > On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
> > > > On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
> > > > > But that matters to this patch because if MIPS can use
> > > > > kvm_flush_remote_tlbs(), then we probably don't need this
> > > > > arch-specific hook any more and we can directly call
> > > > > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
> > > > 
> > > > Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
> > > > that prevents calling kvm_flush_remote_tlbs() directly, but I have no
> > > > clue as to the important of that code.
> > > 
> > > As said above I think the x86 lockdep is really not necessary, then
> > > considering MIPS could be the only one that will use the new hook
> > > introduced in this patch...  Shall we figure that out first?
> > 
> > So I prepped a follow-up patch to make kvm_arch_dirty_log_tlb_flush() a
> > MIPS-only hook and use kvm_flush_remote_tlbs() directly for arm and x86,
> > but then I realized x86 *has* a hook to do a precise remote TLB flush.
> > There's even an existing kvm_flush_remote_tlbs_with_address() call on a
> > memslot, i.e. this exact scenario.  So arguably, x86 should be using the
> > more precise flush and should keep kvm_arch_dirty_log_tlb_flush().
> > 
> > But, the hook is only used when KVM is running as an L1 on top of HyperV,
> > and I assume dirty logging isn't used much, if at all, for L1 KVM on
> > HyperV?
> > 
> > I see three options:
> > 
> >   1. Make kvm_arch_dirty_log_tlb_flush() MIPS-only and call
> >      kvm_flush_remote_tlbs() directly for arm and x86.  Add comments to
> >      explain when an arch should implement kvm_arch_dirty_log_tlb_flush().
> > 
> >   2. Change x86 to use kvm_flush_remote_tlbs_with_address() when flushing
> >      a memslot after the dirty log is grabbed by userspace.
> > 
> >   3. Keep the resulting code as is, but add a comment in x86's
> >      kvm_arch_dirty_log_tlb_flush() to explain why it uses
> >      kvm_flush_remote_tlbs() instead of the with_address() variant.
> > 
> > I strongly prefer to (2) or (3), but I'll defer to Vitaly as to which of
> > those is preferable.
> > 
> > I don't like (1) because (a) it requires more lines code (well comments),
> > to explain why kvm_flush_remote_tlbs() is the default, and (b) it would
> > require even more comments, which would be x86-specific in generic KVM,
> > to explain why x86 doesn't use its with_address() flush, or we'd lost that
> > info altogether.
> > 
> 
> I proposed the 4th solution here:
> 
> https://lore.kernel.org/kvm/20200207223520.735523-1-peterx@redhat.com/
> 
> I'm not sure whether that's acceptable, but if it can, then we can
> drop the kvm_arch_dirty_log_tlb_flush() hook, or even move on to
> per-slot tlb flushing.

This effectively is per-slot TLB flushing, it just has a different name.
I.e. s/kvm_arch_dirty_log_tlb_flush/kvm_arch_flush_remote_tlbs_memslot.
I'm not opposed to that name change.  And on second and third glance, I
probably prefer it.  That would more or less follow the naming of
kvm_arch_flush_shadow_all() and kvm_arch_flush_shadow_memslot().

I don't want to go straight to kvm_arch_flush_remote_tlb_with_address()
because that loses the important distinction (on x86) that slots_lock is
expected to be held.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-08  0:42             ` Sean Christopherson
@ 2020-02-08  0:53               ` Peter Xu
  2020-02-08  1:29                 ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Peter Xu @ 2020-02-08  0:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 04:42:33PM -0800, Sean Christopherson wrote:
> On Fri, Feb 07, 2020 at 07:18:32PM -0500, Peter Xu wrote:
> > On Fri, Feb 07, 2020 at 11:45:32AM -0800, Sean Christopherson wrote:
> > > +Vitaly for HyperV
> > > 
> > > On Thu, Feb 06, 2020 at 04:41:06PM -0500, Peter Xu wrote:
> > > > On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
> > > > > On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
> > > > > > But that matters to this patch because if MIPS can use
> > > > > > kvm_flush_remote_tlbs(), then we probably don't need this
> > > > > > arch-specific hook any more and we can directly call
> > > > > > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
> > > > > 
> > > > > Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
> > > > > that prevents calling kvm_flush_remote_tlbs() directly, but I have no
> > > > > clue as to the important of that code.
> > > > 
> > > > As said above I think the x86 lockdep is really not necessary, then
> > > > considering MIPS could be the only one that will use the new hook
> > > > introduced in this patch...  Shall we figure that out first?
> > > 
> > > So I prepped a follow-up patch to make kvm_arch_dirty_log_tlb_flush() a
> > > MIPS-only hook and use kvm_flush_remote_tlbs() directly for arm and x86,
> > > but then I realized x86 *has* a hook to do a precise remote TLB flush.
> > > There's even an existing kvm_flush_remote_tlbs_with_address() call on a
> > > memslot, i.e. this exact scenario.  So arguably, x86 should be using the
> > > more precise flush and should keep kvm_arch_dirty_log_tlb_flush().
> > > 
> > > But, the hook is only used when KVM is running as an L1 on top of HyperV,
> > > and I assume dirty logging isn't used much, if at all, for L1 KVM on
> > > HyperV?
> > > 
> > > I see three options:
> > > 
> > >   1. Make kvm_arch_dirty_log_tlb_flush() MIPS-only and call
> > >      kvm_flush_remote_tlbs() directly for arm and x86.  Add comments to
> > >      explain when an arch should implement kvm_arch_dirty_log_tlb_flush().
> > > 
> > >   2. Change x86 to use kvm_flush_remote_tlbs_with_address() when flushing
> > >      a memslot after the dirty log is grabbed by userspace.
> > > 
> > >   3. Keep the resulting code as is, but add a comment in x86's
> > >      kvm_arch_dirty_log_tlb_flush() to explain why it uses
> > >      kvm_flush_remote_tlbs() instead of the with_address() variant.
> > > 
> > > I strongly prefer to (2) or (3), but I'll defer to Vitaly as to which of
> > > those is preferable.
> > > 
> > > I don't like (1) because (a) it requires more lines code (well comments),
> > > to explain why kvm_flush_remote_tlbs() is the default, and (b) it would
> > > require even more comments, which would be x86-specific in generic KVM,
> > > to explain why x86 doesn't use its with_address() flush, or we'd lost that
> > > info altogether.
> > > 
> > 
> > I proposed the 4th solution here:
> > 
> > https://lore.kernel.org/kvm/20200207223520.735523-1-peterx@redhat.com/
> > 
> > I'm not sure whether that's acceptable, but if it can, then we can
> > drop the kvm_arch_dirty_log_tlb_flush() hook, or even move on to
> > per-slot tlb flushing.
> 
> This effectively is per-slot TLB flushing, it just has a different name.
> I.e. s/kvm_arch_dirty_log_tlb_flush/kvm_arch_flush_remote_tlbs_memslot.
> I'm not opposed to that name change.  And on second and third glance, I
> probably prefer it.  That would more or less follow the naming of
> kvm_arch_flush_shadow_all() and kvm_arch_flush_shadow_memslot().

Note that the major point of the above patchset is not about doing tlb
flush per-memslot or globally.  It's more about whether we can provide
a common entrance for TLB flushing.  Say, after that series, we should
be able to flush TLB on all archs (majorly, including MIPS) as:

  kvm_flush_remote_tlbs(kvm);

And with the same idea we can also introduce the ranged version.

> 
> I don't want to go straight to kvm_arch_flush_remote_tlb_with_address()
> because that loses the important distinction (on x86) that slots_lock is
> expected to be held.

Sorry I'm still puzzled on why that lockdep is so important and
special for x86...  For example, what if we move that lockdep to the
callers of the kvm_arch_dirty_log_tlb_flush() calls so it protects
even more arch (where we do get/clear dirty log)?  IMHO the callers
must be with the slots_lock held anyways no matter for x86 or not.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-08  0:53               ` Peter Xu
@ 2020-02-08  1:29                 ` Sean Christopherson
  2020-02-17 15:39                   ` Vitaly Kuznetsov
  0 siblings, 1 reply; 70+ messages in thread
From: Sean Christopherson @ 2020-02-08  1:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose,
	linux-mips, kvm, kvm-ppc, linux-arm-kernel, kvmarm, linux-kernel,
	Christoffer Dall, Philippe Mathieu-Daudé

On Fri, Feb 07, 2020 at 07:53:34PM -0500, Peter Xu wrote:
> On Fri, Feb 07, 2020 at 04:42:33PM -0800, Sean Christopherson wrote:
> > On Fri, Feb 07, 2020 at 07:18:32PM -0500, Peter Xu wrote:
> > > On Fri, Feb 07, 2020 at 11:45:32AM -0800, Sean Christopherson wrote:
> > > > +Vitaly for HyperV
> > > > 
> > > > On Thu, Feb 06, 2020 at 04:41:06PM -0500, Peter Xu wrote:
> > > > > On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
> > > > > > On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
> > > > > > > But that matters to this patch because if MIPS can use
> > > > > > > kvm_flush_remote_tlbs(), then we probably don't need this
> > > > > > > arch-specific hook any more and we can directly call
> > > > > > > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
> > > > > > 
> > > > > > Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
> > > > > > that prevents calling kvm_flush_remote_tlbs() directly, but I have no
> > > > > > clue as to the important of that code.
> > > > > 
> > > > > As said above I think the x86 lockdep is really not necessary, then
> > > > > considering MIPS could be the only one that will use the new hook
> > > > > introduced in this patch...  Shall we figure that out first?
> > > > 
> > > > So I prepped a follow-up patch to make kvm_arch_dirty_log_tlb_flush() a
> > > > MIPS-only hook and use kvm_flush_remote_tlbs() directly for arm and x86,
> > > > but then I realized x86 *has* a hook to do a precise remote TLB flush.
> > > > There's even an existing kvm_flush_remote_tlbs_with_address() call on a
> > > > memslot, i.e. this exact scenario.  So arguably, x86 should be using the
> > > > more precise flush and should keep kvm_arch_dirty_log_tlb_flush().
> > > > 
> > > > But, the hook is only used when KVM is running as an L1 on top of HyperV,
> > > > and I assume dirty logging isn't used much, if at all, for L1 KVM on
> > > > HyperV?
> > > > 
> > > > I see three options:
> > > > 
> > > >   1. Make kvm_arch_dirty_log_tlb_flush() MIPS-only and call
> > > >      kvm_flush_remote_tlbs() directly for arm and x86.  Add comments to
> > > >      explain when an arch should implement kvm_arch_dirty_log_tlb_flush().
> > > > 
> > > >   2. Change x86 to use kvm_flush_remote_tlbs_with_address() when flushing
> > > >      a memslot after the dirty log is grabbed by userspace.
> > > > 
> > > >   3. Keep the resulting code as is, but add a comment in x86's
> > > >      kvm_arch_dirty_log_tlb_flush() to explain why it uses
> > > >      kvm_flush_remote_tlbs() instead of the with_address() variant.
> > > > 
> > > > I strongly prefer to (2) or (3), but I'll defer to Vitaly as to which of
> > > > those is preferable.
> > > > 
> > > > I don't like (1) because (a) it requires more lines code (well comments),
> > > > to explain why kvm_flush_remote_tlbs() is the default, and (b) it would
> > > > require even more comments, which would be x86-specific in generic KVM,
> > > > to explain why x86 doesn't use its with_address() flush, or we'd lost that
> > > > info altogether.
> > > > 
> > > 
> > > I proposed the 4th solution here:
> > > 
> > > https://lore.kernel.org/kvm/20200207223520.735523-1-peterx@redhat.com/
> > > 
> > > I'm not sure whether that's acceptable, but if it can, then we can
> > > drop the kvm_arch_dirty_log_tlb_flush() hook, or even move on to
> > > per-slot tlb flushing.
> > 
> > This effectively is per-slot TLB flushing, it just has a different name.
> > I.e. s/kvm_arch_dirty_log_tlb_flush/kvm_arch_flush_remote_tlbs_memslot.
> > I'm not opposed to that name change.  And on second and third glance, I
> > probably prefer it.  That would more or less follow the naming of
> > kvm_arch_flush_shadow_all() and kvm_arch_flush_shadow_memslot().
> 
> Note that the major point of the above patchset is not about doing tlb
> flush per-memslot or globally.  It's more about whether we can provide
> a common entrance for TLB flushing.  Say, after that series, we should
> be able to flush TLB on all archs (majorly, including MIPS) as:
> 
>   kvm_flush_remote_tlbs(kvm);
> 
> And with the same idea we can also introduce the ranged version.
> 
> > 
> > I don't want to go straight to kvm_arch_flush_remote_tlb_with_address()
> > because that loses the important distinction (on x86) that slots_lock is
> > expected to be held.
> 
> Sorry I'm still puzzled on why that lockdep is so important and
> special for x86...  For example, what if we move that lockdep to the
> callers of the kvm_arch_dirty_log_tlb_flush() calls so it protects
> even more arch (where we do get/clear dirty log)?  IMHO the callers
> must be with the slots_lock held anyways no matter for x86 or not.


Following the breadcrumbs leads to the comment in
kvm_mmu_slot_remove_write_access(), which says:

        /*
         * kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
         * which do tlb flush out of mmu-lock should be serialized by
         * kvm->slots_lock otherwise tlb flush would be missed.
         */

I.e. write-protecting a memslot and grabbing the dirty log for the memslot
need to be serialized.  It's quite obvious *now* that get_dirty_log() holds
slots_lock, but the purpose of lockdep assertions isn't just to verify the
current functionality, it's to help ensure the correctness for future code
and to document assumptions in the code.

Digging deeper, there are four functions, all related to dirty logging, in
the x86 mmu that basically open code what x86's
kvm_arch_flush_remote_tlbs_memslot() would look like if it uses the range
based flushing.

Unless it's functionally incorrect (Vitaly?), going with option (2) and
naming the hook kvm_arch_flush_remote_tlbs_memslot() seems like the obvious
choice, e.g. the final cleanup gives this diff stat:

 arch/x86/kvm/mmu/mmu.c | 34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-07 19:45         ` Sean Christopherson
  2020-02-08  0:18           ` Peter Xu
@ 2020-02-17 15:35           ` Vitaly Kuznetsov
  1 sibling, 0 replies; 70+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-17 15:35 UTC (permalink / raw)
  To: Sean Christopherson, Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, linux-mips, kvm, kvm-ppc,
	linux-arm-kernel, kvmarm, linux-kernel, Christoffer Dall,
	Philippe Mathieu-Daudé

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> +Vitaly for HyperV
>
> On Thu, Feb 06, 2020 at 04:41:06PM -0500, Peter Xu wrote:
>> On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
>> > On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
>> > > But that matters to this patch because if MIPS can use
>> > > kvm_flush_remote_tlbs(), then we probably don't need this
>> > > arch-specific hook any more and we can directly call
>> > > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
>> > 
>> > Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
>> > that prevents calling kvm_flush_remote_tlbs() directly, but I have no
>> > clue as to the important of that code.
>> 
>> As said above I think the x86 lockdep is really not necessary, then
>> considering MIPS could be the only one that will use the new hook
>> introduced in this patch...  Shall we figure that out first?
>
> So I prepped a follow-up patch to make kvm_arch_dirty_log_tlb_flush() a
> MIPS-only hook and use kvm_flush_remote_tlbs() directly for arm and x86,
> but then I realized x86 *has* a hook to do a precise remote TLB flush.
> There's even an existing kvm_flush_remote_tlbs_with_address() call on a
> memslot, i.e. this exact scenario.  So arguably, x86 should be using the
> more precise flush and should keep kvm_arch_dirty_log_tlb_flush().
>
> But, the hook is only used when KVM is running as an L1 on top of HyperV,
> and I assume dirty logging isn't used much, if at all, for L1 KVM on
> HyperV?

(Sorry for the delayed reply, was traveling last week)

When KVM runs as an L1 on top of Hyper-V it uses eVMCS by default and
eVMCSv1 doesn't support PML. I've also just checked Hyper-V 2019 and it
hides SECONDARY_EXEC_ENABLE_PML from guests (this was expected).

>
> I see three options:
>
>   1. Make kvm_arch_dirty_log_tlb_flush() MIPS-only and call
>      kvm_flush_remote_tlbs() directly for arm and x86.  Add comments to
>      explain when an arch should implement kvm_arch_dirty_log_tlb_flush().
>
>   2. Change x86 to use kvm_flush_remote_tlbs_with_address() when flushing
>      a memslot after the dirty log is grabbed by userspace.
>
>   3. Keep the resulting code as is, but add a comment in x86's
>      kvm_arch_dirty_log_tlb_flush() to explain why it uses
>      kvm_flush_remote_tlbs() instead of the with_address() variant.
>
> I strongly prefer to (2) or (3), but I'll defer to Vitaly as to which of
> those is preferable.

I'd vote for (2): while this will effectively be kvm_flush_remote_tlbs()
for now, we may think of something smarter in the future (e.g. PV
interface for KVM-on-KVM).

>
> I don't like (1) because (a) it requires more lines code (well comments),
> to explain why kvm_flush_remote_tlbs() is the default, and (b) it would
> require even more comments, which would be x86-specific in generic KVM,
> to explain why x86 doesn't use its with_address() flush, or we'd lost that
> info altogether.
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-08  1:29                 ` Sean Christopherson
@ 2020-02-17 15:39                   ` Vitaly Kuznetsov
  2020-02-18 17:10                     ` Sean Christopherson
  0 siblings, 1 reply; 70+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-17 15:39 UTC (permalink / raw)
  To: Sean Christopherson, Peter Xu
  Cc: Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, linux-mips, kvm, kvm-ppc,
	linux-arm-kernel, kvmarm, linux-kernel, Christoffer Dall,
	Philippe Mathieu-Daudé

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Fri, Feb 07, 2020 at 07:53:34PM -0500, Peter Xu wrote:
>> On Fri, Feb 07, 2020 at 04:42:33PM -0800, Sean Christopherson wrote:
>> > On Fri, Feb 07, 2020 at 07:18:32PM -0500, Peter Xu wrote:
>> > > On Fri, Feb 07, 2020 at 11:45:32AM -0800, Sean Christopherson wrote:
>> > > > +Vitaly for HyperV
>> > > > 
>> > > > On Thu, Feb 06, 2020 at 04:41:06PM -0500, Peter Xu wrote:
>> > > > > On Thu, Feb 06, 2020 at 01:21:20PM -0800, Sean Christopherson wrote:
>> > > > > > On Thu, Feb 06, 2020 at 03:02:00PM -0500, Peter Xu wrote:
>> > > > > > > But that matters to this patch because if MIPS can use
>> > > > > > > kvm_flush_remote_tlbs(), then we probably don't need this
>> > > > > > > arch-specific hook any more and we can directly call
>> > > > > > > kvm_flush_remote_tlbs() after sync dirty log when flush==true.
>> > > > > > 
>> > > > > > Ya, the asid_flush_mask in kvm_vz_flush_shadow_all() is the only thing
>> > > > > > that prevents calling kvm_flush_remote_tlbs() directly, but I have no
>> > > > > > clue as to the important of that code.
>> > > > > 
>> > > > > As said above I think the x86 lockdep is really not necessary, then
>> > > > > considering MIPS could be the only one that will use the new hook
>> > > > > introduced in this patch...  Shall we figure that out first?
>> > > > 
>> > > > So I prepped a follow-up patch to make kvm_arch_dirty_log_tlb_flush() a
>> > > > MIPS-only hook and use kvm_flush_remote_tlbs() directly for arm and x86,
>> > > > but then I realized x86 *has* a hook to do a precise remote TLB flush.
>> > > > There's even an existing kvm_flush_remote_tlbs_with_address() call on a
>> > > > memslot, i.e. this exact scenario.  So arguably, x86 should be using the
>> > > > more precise flush and should keep kvm_arch_dirty_log_tlb_flush().
>> > > > 
>> > > > But, the hook is only used when KVM is running as an L1 on top of HyperV,
>> > > > and I assume dirty logging isn't used much, if at all, for L1 KVM on
>> > > > HyperV?
>> > > > 
>> > > > I see three options:
>> > > > 
>> > > >   1. Make kvm_arch_dirty_log_tlb_flush() MIPS-only and call
>> > > >      kvm_flush_remote_tlbs() directly for arm and x86.  Add comments to
>> > > >      explain when an arch should implement kvm_arch_dirty_log_tlb_flush().
>> > > > 
>> > > >   2. Change x86 to use kvm_flush_remote_tlbs_with_address() when flushing
>> > > >      a memslot after the dirty log is grabbed by userspace.
>> > > > 
>> > > >   3. Keep the resulting code as is, but add a comment in x86's
>> > > >      kvm_arch_dirty_log_tlb_flush() to explain why it uses
>> > > >      kvm_flush_remote_tlbs() instead of the with_address() variant.
>> > > > 
>> > > > I strongly prefer to (2) or (3), but I'll defer to Vitaly as to which of
>> > > > those is preferable.
>> > > > 
>> > > > I don't like (1) because (a) it requires more lines code (well comments),
>> > > > to explain why kvm_flush_remote_tlbs() is the default, and (b) it would
>> > > > require even more comments, which would be x86-specific in generic KVM,
>> > > > to explain why x86 doesn't use its with_address() flush, or we'd lost that
>> > > > info altogether.
>> > > > 
>> > > 
>> > > I proposed the 4th solution here:
>> > > 
>> > > https://lore.kernel.org/kvm/20200207223520.735523-1-peterx@redhat.com/
>> > > 
>> > > I'm not sure whether that's acceptable, but if it can, then we can
>> > > drop the kvm_arch_dirty_log_tlb_flush() hook, or even move on to
>> > > per-slot tlb flushing.
>> > 
>> > This effectively is per-slot TLB flushing, it just has a different name.
>> > I.e. s/kvm_arch_dirty_log_tlb_flush/kvm_arch_flush_remote_tlbs_memslot.
>> > I'm not opposed to that name change.  And on second and third glance, I
>> > probably prefer it.  That would more or less follow the naming of
>> > kvm_arch_flush_shadow_all() and kvm_arch_flush_shadow_memslot().
>> 
>> Note that the major point of the above patchset is not about doing tlb
>> flush per-memslot or globally.  It's more about whether we can provide
>> a common entrance for TLB flushing.  Say, after that series, we should
>> be able to flush TLB on all archs (majorly, including MIPS) as:
>> 
>>   kvm_flush_remote_tlbs(kvm);
>> 
>> And with the same idea we can also introduce the ranged version.
>> 
>> > 
>> > I don't want to go straight to kvm_arch_flush_remote_tlb_with_address()
>> > because that loses the important distinction (on x86) that slots_lock is
>> > expected to be held.
>> 
>> Sorry I'm still puzzled on why that lockdep is so important and
>> special for x86...  For example, what if we move that lockdep to the
>> callers of the kvm_arch_dirty_log_tlb_flush() calls so it protects
>> even more arch (where we do get/clear dirty log)?  IMHO the callers
>> must be with the slots_lock held anyways no matter for x86 or not.
>
>
> Following the breadcrumbs leads to the comment in
> kvm_mmu_slot_remove_write_access(), which says:
>
>         /*
>          * kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
>          * which do tlb flush out of mmu-lock should be serialized by
>          * kvm->slots_lock otherwise tlb flush would be missed.
>          */
>
> I.e. write-protecting a memslot and grabbing the dirty log for the memslot
> need to be serialized.  It's quite obvious *now* that get_dirty_log() holds
> slots_lock, but the purpose of lockdep assertions isn't just to verify the
> current functionality, it's to help ensure the correctness for future code
> and to document assumptions in the code.
>
> Digging deeper, there are four functions, all related to dirty logging, in
> the x86 mmu that basically open code what x86's
> kvm_arch_flush_remote_tlbs_memslot() would look like if it uses the range
> based flushing.
>
> Unless it's functionally incorrect (Vitaly?), going with option (2) and
> naming the hook kvm_arch_flush_remote_tlbs_memslot() seems like the obvious
> choice, e.g. the final cleanup gives this diff stat:

(I apologize again for not replying in time)

I think this is a valid approach and your option (2) would also be my
choice. I also don't think there's going to be a problem when (if)
Hyper-V adds support for PML (eVMCSv2?).

>
>  arch/x86/kvm/mmu/mmu.c | 34 +++++++++-------------------------
>  1 file changed, 9 insertions(+), 25 deletions(-)
>

Looks nice :-)

-- 
Vitaly


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions
  2020-02-17 15:39                   ` Vitaly Kuznetsov
@ 2020-02-18 17:10                     ` Sean Christopherson
  0 siblings, 0 replies; 70+ messages in thread
From: Sean Christopherson @ 2020-02-18 17:10 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Peter Xu, Paolo Bonzini, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, linux-mips, kvm, kvm-ppc,
	linux-arm-kernel, kvmarm, linux-kernel, Christoffer Dall,
	Philippe Mathieu-Daudé

On Mon, Feb 17, 2020 at 04:39:39PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> > Unless it's functionally incorrect (Vitaly?), going with option (2) and
> > naming the hook kvm_arch_flush_remote_tlbs_memslot() seems like the obvious
> > choice, e.g. the final cleanup gives this diff stat:
> 
> (I apologize again for not replying in time)

No worries, didn't hinder me in the slightest as I was buried in other
stuff last week anyways.

> I think this is a valid approach and your option (2) would also be my
> choice. I also don't think there's going to be a problem when (if)
> Hyper-V adds support for PML (eVMCSv2?).

Cool, thanks!

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2020-02-18 17:10 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-21 22:31 [PATCH v5 00/19] KVM: Dynamically size memslot arrays Sean Christopherson
2020-01-21 22:31 ` [PATCH v5 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot Sean Christopherson
2020-02-05 21:49   ` Peter Xu
2020-02-05 23:55     ` Sean Christopherson
2020-02-06  2:00       ` Peter Xu
2020-02-06  2:17         ` Sean Christopherson
2020-02-06  2:58           ` Peter Xu
2020-02-06  5:05             ` Sean Christopherson
2020-01-21 22:31 ` [PATCH v5 02/19] KVM: Reinstall old memslots if arch preparation fails Sean Christopherson
2020-02-05 22:08   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 03/19] KVM: Don't free new memslot if allocation of said memslot fails Sean Christopherson
2020-02-05 22:28   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 04/19] KVM: PPC: Move memslot memory allocation into prepare_memory_region() Sean Christopherson
2020-02-05 22:41   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 05/19] KVM: x86: Allocate memslot resources during prepare_memory_region() Sean Christopherson
2020-01-21 22:31 ` [PATCH v5 06/19] KVM: Drop kvm_arch_create_memslot() Sean Christopherson
2020-02-05 22:45   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 07/19] KVM: Explicitly free allocated-but-unused dirty bitmap Sean Christopherson
2020-01-21 22:31 ` [PATCH v5 08/19] KVM: Refactor error handling for setting memory region Sean Christopherson
2020-02-05 22:48   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 09/19] KVM: Move setting of memslot into helper routine Sean Christopherson
2020-02-06 16:26   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 10/19] KVM: Drop "const" attribute from old memslot in commit_memory_region() Sean Christopherson
2020-02-06 16:26   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 11/19] KVM: x86: Free arrays for old memslot when moving memslot's base gfn Sean Christopherson
2020-01-21 22:31 ` [PATCH v5 12/19] KVM: Move memslot deletion to helper function Sean Christopherson
2020-02-06 16:14   ` Peter Xu
2020-02-06 16:28     ` Sean Christopherson
2020-02-06 16:51       ` Peter Xu
2020-02-07 17:59         ` Sean Christopherson
2020-02-07 18:07           ` Sean Christopherson
2020-02-07 18:17           ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 13/19] KVM: Simplify kvm_free_memslot() and all its descendents Sean Christopherson
2020-02-06 16:29   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region() Sean Christopherson
2020-02-06 19:06   ` Peter Xu
2020-02-06 19:22     ` Sean Christopherson
2020-02-06 19:36       ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 15/19] KVM: Provide common implementation for generic dirty log functions Sean Christopherson
2020-02-06 20:02   ` Peter Xu
2020-02-06 21:21     ` Sean Christopherson
2020-02-06 21:41       ` Peter Xu
2020-02-07 19:45         ` Sean Christopherson
2020-02-08  0:18           ` Peter Xu
2020-02-08  0:42             ` Sean Christopherson
2020-02-08  0:53               ` Peter Xu
2020-02-08  1:29                 ` Sean Christopherson
2020-02-17 15:39                   ` Vitaly Kuznetsov
2020-02-18 17:10                     ` Sean Christopherson
2020-02-17 15:35           ` Vitaly Kuznetsov
2020-02-06 21:24   ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 16/19] KVM: Ensure validity of memslot with respect to kvm_get_dirty_log() Sean Christopherson
2020-01-21 22:31 ` [PATCH v5 17/19] KVM: Terminate memslot walks via used_slots Sean Christopherson
2020-02-06 21:09   ` Peter Xu
2020-02-07 18:33     ` Sean Christopherson
2020-02-07 20:39       ` Peter Xu
2020-02-07 21:10         ` Sean Christopherson
2020-02-07 21:46           ` Peter Xu
2020-02-07 22:03             ` Sean Christopherson
2020-02-07 22:24               ` Peter Xu
2020-01-21 22:31 ` [PATCH v5 18/19] KVM: Dynamically size memslot array based on number of used slots Sean Christopherson
2020-02-06 22:12   ` Peter Xu
2020-02-07 15:38     ` Sean Christopherson
2020-02-07 16:05       ` Peter Xu
2020-02-07 16:15         ` Sean Christopherson
2020-02-07 16:37           ` Peter Xu
2020-02-07 16:47             ` Sean Christopherson
2020-01-21 22:31 ` [PATCH v5 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION Sean Christopherson
2020-02-06 22:30   ` Peter Xu
2020-02-06 23:11     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).